I am struggling to come up with a regular expression to parse some logs that are very unstructured but always have a date that begins with the line that needs to be parsed.
An example is 2015-9-20 05:20:22 lots of log data and then the next date for the next line. So I would basically need to parse everything from the starting date until the next date.
2015-9-20 05:20:22 lots of log data
2015-9-20 05:21:22 lots of new log data
Is it possible to parse this using regular expression?
So I would basically need to parse everything from the starting date until the next date.
If you want to match lines beggining with one date, or beggining with the following day (startDate + 1 day), you can use it in your pattern as literal text.
Using the dates in your example:
^(?:2015-9-20|2015-9-21) .*
Code:
// Instantiate a Date object
Date startDate = new GregorianCalendar(2015, 8, 20).getTime();
// Calculate end date (+1 day)
Calendar endDate = Calendar.getInstance();
endDate.setTime(startDate);
endDate.add(Calendar.DATE, 1); // Add 1 day
// format dates the same way logs use
SimpleDateFormat ft =
new SimpleDateFormat ("y-M-d");
// Create regex
String datesRegex = "^(?:" + ft.format(startDate) + "|" + ft.format(endDate.getTime()) + ") .*";
DEMO
If you want to get all lines from one date to another, and not only those starting with a given date, you should match with the .DOTALL modifier:
^2015-9-20 .*?(?=^2015-9-21 |\z)
Code:
// Create regex
String datesRegex = "^" + ft.format(startDate) + " .*?(?=^" + ft.format(endDate.getTime()) + " |\\z)";
// Compile
Pattern.compile(datesRegex, Pattern.MULTILINE | Pattern.DOTALL);
DEMO
Assuming you're reading the file line-by-line, this should work for you:
^\d{4}-\d{1,2}-\d{2} \d{2}:\d{2}:\d{2} (.*)$
Code example:
String line = "2015-9-20 05:20:22 log data" + System.lineSeparator();
String pattern = "^\\d{4}-\\d{1,2}-\\d{2} \\d{2}:\\d{2}:\\d{2} (.*)$";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("Value after timestamp is: " + m.group(1));
} else {
System.out.println("NO MATCH");
}
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
public static void main( String args[] ){
String s1 = "2015-9-20 05:20:22 lots of log data";
String s2 = "2015-9-20 05:21:22 lots of new log data";
String pattern = "(\\d{4})-(0?\\d|1[0-2])-([012]\\d|3[01]) ([01]?\\d|2[0-4]):([0-5]?\\d):([0-5]?\\d)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(s1); //same for s2
if(m.find())
System.out.println("True");
else
System.out.println("False");
}
}
Output: True
Related
I want to remove elements a supplied Date Format String - for example convert the format "dd/MM/yyyy" to "MM/yyyy" by removing any non-M/y element.
What I'm trying to do is create a localised month/year format based on the existing day/month/year format provided for the Locale.
I've done this using regular expressions, but the solution seems longer than I'd expect.
An example is below:
public static void main(final String[] args) {
System.out.println(filterDateFormat("dd/MM/yyyy HH:mm:ss", 'M', 'y'));
System.out.println(filterDateFormat("MM/yyyy/dd", 'M', 'y'));
System.out.println(filterDateFormat("yyyy-MMM-dd", 'M', 'y'));
}
/**
* Removes {#code charsToRetain} from {#code format}, including any redundant
* separators.
*/
private static String filterDateFormat(final String format, final char...charsToRetain) {
// Match e.g. "ddd-"
final Pattern pattern = Pattern.compile("[" + new String(charsToRetain) + "]+\\p{Punct}?");
final Matcher matcher = pattern.matcher(format);
final StringBuilder builder = new StringBuilder();
while (matcher.find()) {
// Append each match
builder.append(matcher.group());
}
// If the last match is "mmm-", remove the trailing punctuation symbol
return builder.toString().replaceFirst("\\p{Punct}$", "");
}
Let's try a solution for the following date format strings:
String[] formatStrings = { "dd/MM/yyyy HH:mm:ss",
"MM/yyyy/dd",
"yyyy-MMM-dd",
"MM/yy - yy/dd",
"yyabbadabbadooMM" };
The following will analyze strings for a match, then print the first group of the match.
Pattern p = Pattern.compile(REGEX);
for(String formatStr : formatStrings) {
Matcher m = p.matcher(formatStr);
if(m.matches()) {
System.out.println(m.group(1));
}
else {
System.out.println("Didn't match!");
}
}
Now, there are two separate regular expressions I've tried. First:
final String REGEX = "(?:[^My]*)([My]+[^\\w]*[My]+)(?:[^My]*)";
With program output:
MM/yyyy
MM/yyyy
yyyy-MMM
Didn't match!
Didn't match!
Second:
final String REGEX = "(?:[^My]*)((?:[My]+[^\\w]*)+[My]+)(?:[^My]*)";
With program output:
MM/yyyy
MM/yyyy
yyyy-MMM
MM/yy - yy
Didn't match!
Now, let's see what the first regex actually matches to:
(?:[^My]*)([My]+[^\\w]*[My]+)(?:[^My]*) First regex =
(?:[^My]*) Any amount of non-Ms and non-ys (non-capturing)
([My]+ followed by one or more Ms and ys
[^\\w]* optionally separated by non-word characters
(implying they are also not Ms or ys)
[My]+) followed by one or more Ms and ys
(?:[^My]*) finished by any number of non-Ms and non-ys
(non-capturing)
What this means is that at least 2 M/ys are required to match the regex, although you should be careful that something like MM-dd or yy-DD will match as well, because they have two M-or-y regions 1 character long. You can avoid getting into trouble here by just keeping a sanity check on your date format string, such as:
if(formatStr.contains('y') && formatStr.contains('M') && m.matches())
{
String yMString = m.group(1);
... // other logic
}
As for the second regex, here's what it means:
(?:[^My]*)((?:[My]+[^\\w]*)+[My]+)(?:[^My]*) Second regex =
(?:[^My]*) Any amount of non-Ms and non-ys
(non-capturing)
( ) followed by
(?:[My]+ )+[My]+ at least two text segments consisting of
one or more Ms or ys, where each segment is
[^\\w]* optionally separated by non-word characters
(?:[^My]*) finished by any number of non-Ms and non-ys
(non-capturing)
This regex will match a slightly broader series of strings, but it still requires that any separations between Ms and ys be non-words ([^a-zA-Z_0-9]). Additionally, keep in mind that this regex will still match "yy", "MM", or similar strings like "yyy", "yyyy"..., so it would be useful to have a sanity check as described for the previous regular expression.
Additionally, here's a quick example of how one might use the above to manipulate a single date format string:
LocalDateTime date = LocalDateTime.now();
String dateFormatString = "dd/MM/yyyy H:m:s";
System.out.println("Old Format: \"" + dateFormatString + "\" = " +
date.format(DateTimeFormatter.ofPattern(dateFormatString)));
Pattern p = Pattern.compile("(?:[^My]*)([My]+[^\\w]*[My]+)(?:[^My]*)");
Matcher m = p.matcher(dateFormatString);
if(dateFormatString.contains("y") && dateFormatString.contains("M") && m.matches())
{
dateFormatString = m.group(1);
System.out.println("New Format: \"" + dateFormatString + "\" = " +
date.format(DateTimeFormatter.ofPattern(dateFormatString)));
}
else
{
throw new IllegalArgumentException("Couldn't shorten date format string!");
}
Output:
Old Format: "dd/MM/yyyy H:m:s" = 14/08/2019 16:55:45
New Format: "MM/yyyy" = 08/2019
I'll try to answer with the understanding of my question : how do I remove from a list/table/array of String, elements that does not exactly follow the patern 'dd/MM'.
so I'm looking for a function that looks like
public List<String> removeUnWantedDateFormat(List<String> input)
We can expect, from my knowledge on Dateformat, only 4 possibilities that you would want, hoping i dont miss any, which are "MM/yyyy", "MMM/yyyy", "MM/yy", "MM/yyyy". So that we know what we are looking for we can do an easy function.
public List<String> removeUnWantedDateFormat(List<String> input) {
String s1 = "MM/yyyy";
string s2 = "MMM/yyyy";
String s3 = "MM/yy";
string s4 = "MMM/yy";
for (String format:input) {
if (!s1.equals(format) && s2.equals(format) && s3.equals(format) && s4.equals(format))
input.remove(format);
}
return input;
}
Better not to use regex if you can, it costs a lot of resources. And great improvement would be to use an enum of the date format you accept, like this you have better control over it, and even replace them.
Hope this will help, cheers
edit: after i saw the comment, i think it would be better to use contains instead of equals, should work like a charm and instead of remove,
input = string expected.
so it would looks more like:
public List<String> removeUnWantedDateFormat(List<String> input) {
List<String> comparaisons = new ArrayList<>();
comparaison.add("MMM/yyyy");
comparaison.add("MMM/yy");
comparaison.add("MM/yyyy");
comparaison.add("MM/yy");
for (String format:input) {
for(String comparaison: comparaisons)
if (format.contains(comparaison)) {
format = comparaison;
break;
}
}
return input;
}
I have searched everywhere for this but couldn't get a specific solution, and the documentation also didn't cover this. So I want to extract the start date and end date from this string "1-Mar-2019 to 31-Mar-2019". The problem is I'm not able to extract both the date strings.
I found the closest solution here but couldn't post a comment asking how to extract values individually due to low reputation: https://stackoverflow.com/a/8116229/10735227
I'm using a regex pattern to look for the occurrences and to extract both occurrences to 2 strings first.
Here's what I tried:
Pattern p = Pattern.compile("(\\d{1,2}-[a-zA-Z]{3}-\\d{4})");
Matcher m = p.matcher(str);
while(m.find())
{
startdt = m.group(1);
enddt = m.group(1); //I think this is wrong, don't know how to fix it
}
System.out.println("startdt: "+startdt+" enddt: "+enddt);
Output is:
startdt: 31-Mar-2019 enddt: 31-Mar-2019
Additionally I need to use DateFormatter to convert the string to date (adding the trailing 0 before single digit date if required).
You can catch both dates simply calling the find method twice, if you only have one, this would only capture the first one :
String str = "1-Mar-2019 to 31-Mar-2019";
String startdt = null, enddt = null;
Pattern p = Pattern.compile("(\\d{1,2}-[a-zA-Z]{3}-\\d{4})");
Matcher m = p.matcher(str);
if(m.find()) {
startdt = m.group(1);
if(m.find()) {
enddt = m.group(1);
}
}
System.out.println("startdt: "+startdt+" enddt: "+enddt);
Note that this could be used with a while(m.find()) and a List<String to be able to extract every date your could find.
If your text may be messy, and you really need to use a regex to extract the date range, you may use
String str = "Text here 1-Mar-2019 to 31-Mar-2019 and tex there";
String startdt = "";
String enddt = "";
String date_rx = "\\d{1,2}-[a-zA-Z]{3}-\\d{4}";
Pattern p = Pattern.compile("(" + date_rx + ")\\s*to\\s*(" + date_rx + ")");
Matcher m = p.matcher(str);
if(m.find())
{
startdt = m.group(1);
enddt = m.group(2);
}
System.out.println("startdt: "+startdt+" enddt: "+enddt);
// => startdt: 1-Mar-2019 enddt: 31-Mar-2019
See the Java demo
Also, consider this enhancement: match the date as whole word to avoid partial matches in longer strings:
Pattern.compile("\\b(" + date_rx + ")\\s*to\\s*(" + date_rx + ")\\b")
If the range can be expressed with - or to you may replace to with (?:to|-), or even (?:to|\\p{Pd}) where \p{Pd} matches any hyphen/dash.
You can simply use String::split
String range = "1-Mar-2019 to 31-Mar-2019";
String dts [] = range.split(" ");
System.out.println(dts[0]);
System.out.println(dts[2]);
I want to convert a date to words. For example: 12/12/2012 --> twelve twelve two thousand twelve and I already made number to word converter. But now I have problem to print it out.
Here my code:
String patternString = "\\d{2}/\\d{2}/\\d{4}"; // date regex
Pattern pattern = Pattern.compile(patternString); // pattern compiling
Matcher matcher = pattern.matcher(nom); // matching with pattern with input text from user
if (matcher.find()) {
String get_data = matcher.group();
if(get_data.contains("/")){ // check either has "/" slash or not
String parts[] = get_data.split("[/]"); // split process
String get_day = parts[0]; // day will store in first array
String get_month = parts[1]; // month will store in second array
String get_year = parts[2]; // year will store in third array
String s = NumberConvert.convert(Integer.parseInt(get_day))
+ NumberConvert.convert(Integer.parseInt(get_month))
+ NumberConvert.convert(Integer.parseInt(get_year));
String replace = matcher.replaceAll(s); // replace number to words
System.out.println(replace);
}
} else {...}
Input text from user:
12/12/2012 +++ 23/11/2010
But the result print only first pattern and next pattern also replace with value of first pattern too.
twelve twelve two thousand twelve +++ twelve twelve two thousand twelve
Please suggest me the solution
An immediate solution to your problem would be to use Matcher.replaceFirst(), instead of Matcher.replaceAll(), since you only want the first date pattern to be replaced with your written version of the date.
String replace = matcher.replaceFirst(s);
If you would like to be able to process each numeric date one at a time, you can do so in a left-to-right fashion using this code:
String patternString = "\\d{2}/\\d{2}/\\d{4}";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(nom);
String output = "";
while (matcher.find()) {
String get_data = matcher.group();
String parts[] = get_data.split("/");
String get_day = parts[0];
String get_month = parts[1];
String get_year = parts[2];
String s = NumberConvert.convert(Integer.parseInt(get_day)) +
NumberConvert.convert(Integer.parseInt(get_month)) +
NumberConvert.convert(Integer.parseInt(get_year));
if (output.equals("")) {
output = s;
}
else {
output += " +++ " + s;
}
String replace = matcher.replaceFirst("");
matcher = pattern.matcher(replace);
}
After each iteration, the above code resets the Matcher using a string from which the previous date matched has been removed. This lets you "eat" one date at a time, from left to right, building the human readable date output as you go along.
I tried to get these date 22-APR-16 11.00.00.000000 and 22-APR-16 10.30.00.000000.
My codes are there but it cant find ,how can I do?
String pattern = "(Başlangıç Tarihi:\\s+)([0-9/:]+\\s+[0-9:]+)(.*)\\s+(Bitiş Tarihi:\\s+)([0-9/:]+\\s+[0-9:]+)(.*)";
Pattern r = Pattern.compile(pattern);
String text = "Başlangıç Tarihi: 22-APR-16 11.00.00.000000 AM Bitiş Tarihi: 22-APR-16 10.30.00.000000 PM";
Matcher m = r.matcher(text);
if(m.find())
{
String startDate = m.group(2);
String endDate = m.group(5);
System.out.println("Start Date : " + startDate);
System.out.println("End Date : " + endDate);
}
KISS
String pattern = "(Başlangıç Tarihi:\\s+)(\\d+-[A-Za-z]+-\\d+\\s[\\d.]+)(.*)\\s+(Bitiş Tarihi:\\s+)(\\d+-[A-Za-z]+-\\d+\\s[\\d.]+)";
Ideone Demo
Moreover, you can just use
(\\d+-[A-Za-z]+-\\d+\\s[\\d.]+)
and find all the matches using loop and store it an array or arraylist. Every even element will be start date and odd element will be end date
I'm making a date extractor using regex in java. Problem is that date is 20-05-2014 and my program is extracting 0-5-14. In short, how can I get the character on which I'm checking the second character of date?
int count = 0;
String data = "HellowRoldsThisis20-05-2014. farhan_rock#gmail.comHellowRoldsThisis.farhan#gmail.com";
String regexOfDate = "((?<=[0])[1-9]{2})|((?<=[12])[0-9])|((?<=[3])[01])\\.\\-\\_((?<=[0])[1-9])|((?<=[1])[0-2])\\.\\-\\_((?<=[2])[0-9]{4})"; \\THE PROBLEM
String[] extractedDate = new String[1000];
Pattern patternDate = Pattern.compile(regexOfDate);
Matcher matcherDate = patternDate.matcher(data);
while(matcherDate.find()){
System.out.println("Date "+count+"Start: "+matcherDate.start());
System.out.println("Date "+count+"End : "+matcherDate.end());
extractedDate[count] = data.substring(matcherDate.start(), matcherDate.end());
System.out.println("Date Extracted: "+extractedDate[count]);
}
You can try the regular expression:
// (0[1-9]|[12][0-9]|[3][01])[._-](0[1-9]|1[0-2])[._-](2[0-9]{3})
"(0[1-9]|[12][0-9]|[3][01])[._-](0[1-9]|1[0-2])[._-](2[0-9]{3})"
A single regex o match valid dates is awful.
I'd do:
String regexOfDate = "(?<!\\d)\\d{2}[-_.]\\d{2}[-_.]\\d{4}(?!\\d)";
to extract the potential date, then test if it is valid.