Regular Expressions in Java: Matching a date value surrounded by other data

Regular Expressions in Java: Matching a date value surrounded by other data - java

I have a lot of files I am retrieving data from, and I have hit a wall with date values surrounded by other data. I am using Java, and the regular expression I am using works for the variable string_i_currently_match however I need it to match example_string_i_need_to_match
String example_string_i_need_to_match = "data 10/12/2010, data, data";
String string_i_currently_match = "10/12/2010,";
Pattern pattern = Pattern.compile(
"^(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)\\d\\d(?:,)$"
);
Matcher matcher = pattern.matcher(fileString);
boolean found = false;
while (matcher.find()) {
System.out.printf("I found the text \"%s\" starting at " +
"index %d and ending at index %d.\n",
matcher.group(), matcher.start(), matcher.end());
found = true;
}
if(!found){
System.out.println("No match found.");
}
Perhaps it's because I'm exhausted, but I can't get it to match. Any help, even pointers would be greatly appreciated.
Edit: To clarify, I do not want to match data, data but just get the index of the date its self.

The ^ sign matches the start of the string and $ matches the end. Removing those allows the pattern to match dates within the string.
Like this:
"(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)\\d\\d(?:,)"

This will match your date:
[\d]{2}/[\d]{2}/[\d]{4}
In what you posted, you made at least one error: Only matches a date at the start of the string.

String ResultString = null;
try {
Pattern regex = Pattern.compile("\\b[0-9]{2}/[0-9]{2}/[0-9]{4}\\b");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group();
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
Unless I am overlooking something this should match your date.
See it working here : http://ideone.com/HETGU

Related

How to create a regex that accepts specific characters?

I have this regex:
^[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>[-\]]{8,}$
I need a regex to accept a minimum word length of 8, letters(uppercase & lowercase), numbers and these characters:
!#$%&'*+-/=?^_`{|}~"(),:;<>#[]
It works when I tested it here.
This is how I used it in Java Android.
public static final String regex = "^[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>[-\\]]{8,}$";
This is the error that I received.
java.util.regex.PatternSyntaxException: Missing closing bracket in character class near index 49
^[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>[-\]]{8,}$

If you just want to test if a given input string matches your pattern, you may use String#matches directly, e.g.
String regex = "[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>\\[\\]-]{8,}";
String input = "Jon#Skeet#123";
if (input.matches(regex)) {
System.out.println("Found a match");
}
else {
System.out.println("No match");
}
If you wanted to parse a larger input text and identify such matching words, then you would want to use a formal Pattern and Matcher. But, I don't see the need for this just based on your question.

You have to use pattern marcher concept. it may help you.
follow tutorial : https://www.mkyong.com/regular-expressions/how-to-validate-password-with-regular-expression/
Here is one Example.
try {
Pattern pattern;
Matcher matcher;
final String PASSWORD_PATTERN = "((?=.*\\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%]).{6,20})";
pattern = Pattern.compile(PASSWORD_PATTERN);
matcher = pattern.matcher(password_string );
if(matcher.matches()){
Log.e("TAG", "TRUE")
}else{
Log.e("TAG", "FALSE")
}
} catch (RuntimeException e) {
return false;
}

Can't split a line in Java

I am facing a problem that I don't know correctly split this line. I only need RandomAdresas0 100 2018 1.
String line = Files.readAllLines(Paths.get(failas2)).get(userInp);
System.out.println(line);
arr = line.split("[\\s\\-\\.\\'\\?\\,\\_\\#]+");;
Content in line:
[Pastatas{pastatoAdresas='RandomAdresas0',pastatoAukstuSkaicius=100,pastatoPastatymoData=2018, pastatoButuKiekis=1}]

You can try this code (basically extracting a string between two delimiters):
String ss = "[Pastatas{pastatoAdresas='RandomAdresas0',pastatoAukstuSkaicius=100,pastatoPastatymoData=2018, pastatoButuKiekis=1}]";
Pattern pattern = Pattern.compile("=(.*?)[,}]");
Matcher matcher = pattern.matcher(ss);
while (matcher.find()) {
System.out.println(matcher.group(1).replace("'", ""));
}
This output:
RandomAdresas0
100
2018

Remove all the characters before '{' including '{'
Remove all the characters after '}' including '}'
You can do the both by using indexOf method and substring.
Now you will left with only the following:
pastatoAdresas='RandomAdresas0',pastatoAukstuSkaicius=100,pastatoPastatymoData=2018, pastatoButuKiekis=1
After this read this [thread][1] : Parse a string with key=value pair in a map?

Here is a solution using a regular expression and the Pattern & Matcher classes. The values you are after can be retrieved using the group() method and you get all values by looping as long as find() returns true.
String data = "[Pastatas{pastatoAdresas='RandomAdresas0',pastatoAukstuSkaicius=100,pastatoPastatymoData=2018, pastatoButuKiekis=1}]";
Pattern pattern = Pattern.compile("=([^, }]*)");
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
System.out.printf("[%d:%d] %s", matcher.start(), matcher.end(), matcher.group(1));
}
The matched value is in group 1, group 0 matches the whole reg ex

Java regex does not match as expected

I'm starting with regex in Java recently, and I cant wrap my head around this problem.
Pattern p = Pattern.compile("[^A-Z]+");
Matcher matcher = p.matcher("GETs");
if (matcher.matches()) {
System.out.println("Matched.");
} else {
System.out.println("Did not match.");
}
Result: Did not Match(Unexpected result) Explain this
I get the output "Did not match." This is strange to me, while reading https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html,
I'm using the X+, which matches "One, or more times".
I thought my code in words would go something like this:
"Check if there is one or more characters in the string "GETs" which does not belong in A to Z."
So I'm expecting the following result:
"Yes, there is one character that does not belong to A-Z in "GETs", the regex was a match."
However this is not the case, I'm confused to why this is.
I tried the following:
Pattern p = Pattern.compile("[A-Z]+");
Matcher matcher = p.matcher("GETs");
if (matcher.matches()) {
System.out.println("Matched.");
} else {
System.out.println("Did not match.");
}
Result: Did not match. (Expected result)
Pattern p = Pattern.compile("[A-Z]+");
Matcher matcher = p.matcher("GET");
if (matcher.matches()) {
System.out.println("Matched.");
} else {
System.out.println("Did not match.");
}
Result: Matched. (Expected result)
Please, explain why my first example did not work.

Matcher.matches returns true only if the ENTIRE region
matches the pattern.
For the output you are looking for, use Matches.find instead
Explanation of each case:
Pattern p = Pattern.compile("[^A-Z]+");
Matcher matcher = p.matcher("GETs");
if (matcher.matches()) {
Fails because the ENTIRE region 'GETs' isn't lowercase
Pattern p = Pattern.compile("[A-Z]+");
Matcher matcher = p.matcher("GETs");
if (matcher.matches()) {
This fails because the ENTIRE region 'GETs' isn't uppercase
Pattern p = Pattern.compile("[A-Z]+");
Matcher matcher = p.matcher("GET");
if (matcher.matches()) {
The ENTIRE region 'GET' is uppercase, the pattern matches.

You're very first regex asks to match any character that is not in an uppercase range of A-Z. The match is on the lowercase "s" in GETs.

if you want a regex to match either in UPPERCASE and lowercase, you can use this:
String test = "yes";
String test2= "YEs";
test.matches("(?i).*\\byes\\b.*");
test2.matches("(?i).*\\byes\\b.*");
will return true in the two cases

Regex for finding mp4 in string

I want to get all .mp4 URLs of this String using Regex.
Also I want to know how to get only the last .mp4 URL using Regex.
Thanks
contentType=application/x-mpegURL, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.m3u8},
Variant{bitrate=0, contentType=application/dash+xml, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.mpd},
Variant{bitrate=320000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/320x180/YqZ72rzLj3VWVhy4.mp4},
Variant{bitrate=832000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/640x360/A2vMgzo2ElpPP6TE.mp4},
Variant{bitrate=2176000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/1280x720/j9xbNzRZqEbYs_2s.mp4}]}]";

Regex:
https?.*?\.mp4
Literal http
Followed by an optional 's': s?
Remove the question mark if they will all use HTTPS.
Followed by as few characters as possible: .*?
Followed by an mp4 extension (literal dot) \.mp4

2 Approaches:
If you're sure the URL's will always begin with https:// and will not contain a mp4 after the complete URL is finished, then you can use
pattern = "https://.*mp4";
String[] arr = {
"contentType=application/x-mpegURL, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.m3u8}",
"Variant{bitrate=0, contentType=application/dash+xml, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.mpd}",
"Variant{bitrate=320000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/320x180/YqZ72rzLj3VWVhy4.mp4}",
"Variant{bitrate=832000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/640x360/A2vMgzo2ElpPP6TE.mp4}",
"Variant{bitrate=2176000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/1280x720/j9xbNzRZqEbYs_2s.mp4}]}]"
};
String pattern = "https://.*mp4";
Pattern r = Pattern.compile(pattern);
for (String line : arr) {
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println(m.group(0));
} else {
System.out.println("NO MATCH");
}
}
If not, to Support all types of URL's then change your pattern to what is defined here with a little modification,
String pattern =
"(((ht|f)tp(s?)\\:\\/\\/|~\\/|\\/)|www.)" +
"(\\w+:\\w+#)?(([-\\w]+\\.)+(com|org|net|gov" +
"|mil|biz|info|mobi|name|aero|jobs|museum" +
"|travel|[a-z]{2}))(:[\\d]{1,5})?" +
"(((\\/([-\\w~!$+|.,=]|%[a-f\\d]{2})+)+|\\/)+|\\?|#)?" +
"((\\?([-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" +
"([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)" +
"(&(?:[-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" +
"([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)*)*" +
"(#([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)?\\b"+"mp4";
Output:
NO MATCH
NO MATCH
https://video.twimg.com/amplify_video/822938952332144642/vid/320x180/YqZ72rzLj3VWVhy4.mp4
https://video.twimg.com/amplify_video/822938952332144642/vid/640x360/A2vMgzo2ElpPP6TE.mp4
https://video.twimg.com/amplify_video/822938952332144642/vid/1280x720/j9xbNzRZqEbYs_2s.mp4

IndexOutOfBoundsException when using Matcher.find()

This Java program showing me IndexOutOfBoundsException when it tries to invoke group(1). If I replace 1 with 0 then the whole line is printed.. What do I have to do?
Pattern pattern = Pattern.compile("<abhi> abhinesh </abhi>");
Matcher matcher = pattern.matcher("<abhi> abhinesh </abhi>");
if (matcher.find())
System.out.println(matcher.group(1));
else
System.out.println("Not found");

index starts at 0 so use matcher.group(0)
Edit : To match the text between tag use this regex <abhi>(.*)<\\/abhi>

This post may shed more light on your question.
Confused about Matcher Group.
In short you haven't defined any regular expression grouping to reference an alternate group. You only have the full matching string.
Below if you try adding a grouped regular expression to parse the xml you'll notice 0 has the full string, 1 has the begin tag, 2 has the value, and 3 has the end tag.
Pattern pattern = Pattern.compile("<([a-z]+)>([a-z ]+)</([a-z]+)>");
Matcher matcher = pattern.matcher("<abhi> abhinesh </abhi>");
if (matcher.find()){
System.out.println(matcher.group(0));//<abhi> abhinesh </abhi>
System.out.println(matcher.group(1));//abhi
System.out.println(matcher.group(2));// abhinesh
System.out.println(matcher.group(3));//abhi
}else{
System.out.println("Not found");
}

Try this this regex:
<abhi>(.*)<\\/abhi>
The text you're after will be stored in the first capture group.
Example:
String regex = "<abhi>(.*)<\\/abhi>";
String input = "<abhi>foo</abhi>";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
if (m.find()) {
System.out.println(m.group(1));
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular Expressions in Java: Matching a date value surrounded by other data - java

The ^ sign matches the start of the string and $ matches the end. Removing those allows the pattern to match dates within the string. Like this: "(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)\\d\\d(?:,)"

This will match your date: [\d]{2}/[\d]{2}/[\d]{4} In what you posted, you made at least one error: Only matches a date at the start of the string.

Related

How to create a regex that accepts specific characters?

Can't split a line in Java

Java regex does not match as expected

Regex for finding mp4 in string

IndexOutOfBoundsException when using Matcher.find()

Categories

Resources