How to get regex matched group values

How to get regex matched group values - java

I have following lines of code
String time = "14:35:59.99";
String timeRegex = "(([01][0-9])|(2[0-3])):([0-5][0-9]):([0-5][0-9])(.([0-9]{1,3}))?";
String hours, minutes, seconds, milliSeconds;
Pattern pattern = Pattern.compile(timeRegex);
Matcher matcher = pattern.matcher(time);
if (matcher.matches()) {
hours = matcher.replaceAll("$1");
minutes = matcher.replaceAll("$4");
seconds = matcher.replaceAll("$5");
milliSeconds = matcher.replaceAll("$7");
}
I am getting hours, minutes, seconds, and milliSeconds using the matcher.replace method and back references of regex groups. Is there any better method to get value of regex groups. I tried
hours = matcher.group(1);
but it throws the following exception:
java.lang.IllegalStateException: No match found
at java.util.regex.Matcher.group(Matcher.java:477)
at com.abnamro.cil.test.TimeRegex.main(TimeRegex.java:70)
Am I missing something here?

It works fine if you avoid calling matcher.replaceAll. When you call replaceAll it forgets any previous matches.
String time = "14:35:59.99";
String timeRegex = "([01][0-9]|2[0-3]):([0-5][0-9]):([0-5][0-9])(?:\\.([0-9]{1,3}))?";
Pattern pattern = Pattern.compile(timeRegex);
Matcher matcher = pattern.matcher(time);
if (matcher.matches()) {
String hours = matcher.group(1);
String minutes = matcher.group(2);
String seconds = matcher.group(3);
String miliSeconds = matcher.group(4);
System.out.println(hours + ", " + minutes + ", " + seconds + ", " + miliSeconds);
}
Notice that I've also made a couple of improvements to your regular expression:
I've used non-capturing groups (?: ... ) for the groups that you aren't interested in capturing.
I've changed . which matches any character to \\. which matches only a dot.
See it working online: ideone

It works if you use matcher.find() before calling the group function.

Related

Extract multiple dates (dd-MMM-yyyy format) from a string in java

I have searched everywhere for this but couldn't get a specific solution, and the documentation also didn't cover this. So I want to extract the start date and end date from this string "1-Mar-2019 to 31-Mar-2019". The problem is I'm not able to extract both the date strings.
I found the closest solution here but couldn't post a comment asking how to extract values individually due to low reputation: https://stackoverflow.com/a/8116229/10735227
I'm using a regex pattern to look for the occurrences and to extract both occurrences to 2 strings first.
Here's what I tried:
Pattern p = Pattern.compile("(\\d{1,2}-[a-zA-Z]{3}-\\d{4})");
Matcher m = p.matcher(str);
while(m.find())
{
startdt = m.group(1);
enddt = m.group(1); //I think this is wrong, don't know how to fix it
}
System.out.println("startdt: "+startdt+" enddt: "+enddt);
Output is:
startdt: 31-Mar-2019 enddt: 31-Mar-2019
Additionally I need to use DateFormatter to convert the string to date (adding the trailing 0 before single digit date if required).

You can catch both dates simply calling the find method twice, if you only have one, this would only capture the first one :
String str = "1-Mar-2019 to 31-Mar-2019";
String startdt = null, enddt = null;
Pattern p = Pattern.compile("(\\d{1,2}-[a-zA-Z]{3}-\\d{4})");
Matcher m = p.matcher(str);
if(m.find()) {
startdt = m.group(1);
if(m.find()) {
enddt = m.group(1);
}
}
System.out.println("startdt: "+startdt+" enddt: "+enddt);
Note that this could be used with a while(m.find()) and a List<String to be able to extract every date your could find.

If your text may be messy, and you really need to use a regex to extract the date range, you may use
String str = "Text here 1-Mar-2019 to 31-Mar-2019 and tex there";
String startdt = "";
String enddt = "";
String date_rx = "\\d{1,2}-[a-zA-Z]{3}-\\d{4}";
Pattern p = Pattern.compile("(" + date_rx + ")\\s*to\\s*(" + date_rx + ")");
Matcher m = p.matcher(str);
if(m.find())
{
startdt = m.group(1);
enddt = m.group(2);
}
System.out.println("startdt: "+startdt+" enddt: "+enddt);
// => startdt: 1-Mar-2019 enddt: 31-Mar-2019
See the Java demo
Also, consider this enhancement: match the date as whole word to avoid partial matches in longer strings:
Pattern.compile("\\b(" + date_rx + ")\\s*to\\s*(" + date_rx + ")\\b")
If the range can be expressed with - or to you may replace to with (?:to|-), or even (?:to|\\p{Pd}) where \p{Pd} matches any hyphen/dash.

You can simply use String::split
String range = "1-Mar-2019 to 31-Mar-2019";
String dts [] = range.split(" ");
System.out.println(dts[0]);
System.out.println(dts[2]);

Replace characters in a String, in a specific location

I have the following string;
String s = "Hellow world,how are you?\"The other day, where where you?\"";
And I want to replace the , but only the one that is inside the quotation mark \"The other day, where where you?\".
Is it possible with regex?

String s = "Hellow world,how are you?\"The other day, where where you?\"";
Pattern pattern = Pattern.compile("\"(.*?)\"");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
s = s.substring(0, matcher.start()) + matcher.group().replace(',','X') +
s.substring(matcher.end(), s.length());
}
If there are more then two quotes this splits the text into in quote/out of quote and only processes inside quotes. However if there are odd number of quotes (unmatched quotes), the last quote is ignored.

If you are sure this is always the last "," you can do that
String s = "Hellow world,how are you?\"The other day, where where you?\"";
int index = s.lastIndexOf(",");
if( index >= 0 )
s = new StringBuilder(s).replace(index , index + 1,"X").toString();
System.out.println(s);
Hope it helps.

How to split a long string in Java?

How to edit this string and split it into two?
String asd = {RepositoryName: CodeCommitTest,RepositoryId: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef};
I want to make two strings.
String reponame;
String RepoID;
reponame should be CodeCommitTest
repoID should be 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef
Can someone help me get it? Thanks

Here is Java code using a regular expression in case you can't use a JSON parsing library (which is what you probably should be using):
String pattern = "^\\{RepositoryName:\\s(.*?),RepositoryId:\\s(.*?)\\}$";
String asd = "{RepositoryName: CodeCommitTest,RepositoryId: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef}";
String reponame = "";
String repoID = "";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(asd);
if (m.find()) {
reponame = m.group(1);
repoID = m.group(2);
System.out.println("Found reponame: " + reponame + " with repoID: " + repoID);
} else {
System.out.println("NO MATCH");
}
This code has been tested in IntelliJ and runs without error.
Output:
Found reponame: CodeCommitTest with repoID: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef

Assuming there aren't quote marks in the input, and that the repository name and ID consist of letters, numbers, and dashes, then this should work to get the repository name:
Pattern repoNamePattern = Pattern.compile("RepositoryName: *([A-Za-z0-9\\-]+)");
Matcher matcher = repoNamePattern.matcher(asd);
if (matcher.find()) {
reponame = matcher.group(1);
}
and you can do something similar to get the ID. The above code just looks for RepositoryName:, possibly followed by spaces, followed by one or more letters, digits, or hyphen characters; then the group(1) method extracts the name, since it's the first (and only) group enclosed in () in the pattern.

Certain strings that should be found by a working Regex are missed, and I need help identifying why

I have a set of strings, which I cycle through, checking those against the following set of regex, to try and separate the first small section from the rest of the string. The regex works in almost all cases, but unfortunately I have no idea why it fails occasionally. I’ve been using Pattern Matcher to print out the string, if the pattern is found.
Two example working strings:
98. SORGHUM Moench - Millets Annuals or rhizomatous perennials; inflorescence …
99. MISCANTHUS Andersson - Silver-grasses Rhizomatous perennials; inflorescence …
Two example failed strings:
100. ZEA L. - Maize Annuals; male and female inflorescences separate, the …
26. POA L. (Parodiochloa C.E. Hubb.) - Meadow-grasses Annuals or perennials with or without stolons or rhizomes; sheaths overlapping or some …
Regex’s used so far:
Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
Pattern endOfGenusTwo = Pattern.compile("(?<=(^\\d+" + genusNames[l].toUpperCase() + "))");
Pattern endOfGenusThree = Pattern.compile("(?<=(\\d+\\. " + genusNames[l] + "))");
Pattern endOfGenusFour = Pattern.compile("(?<=(\\d+" + genusNames[l] + "))");
Pattern endOfGenusFive = Pattern.compile("(?<=(\\. " + genusNames[l] + "))");
The first of these is the one thats producing the reliable results so far.
Example Code
Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
Matcher endOfGenusFinder = endOfGenus.matcher(descriptionPartBits[b]);
if (endOfGenusFinder.find()) {
System.out.print(descriptionPartBits[b] + ":- ");
System.out.print(genusNames[l] + "\n");
String[] genusNameBits = descriptionPartBits[b].split("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
}
Desired Output. This is what is produced by strings that work. Strings that don't work simply don't appear in the output:
98. SORGHUM Moench - Millets Annuals or rhizomatous perennials:- Sorghum
99. MISCANTHUS Andersson - Silver-grasses Rhizomatous perennials:- Miscanthus

From regex tutorial:
Lookahead and lookbehind, collectively called "lookaround", are
zero-length assertions just like the start and end of line, and start
and end of word anchors explained earlier in this tutorial.
Lookahead and lookbehind only return true or false.
So I changed your code example:
Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. ZEA L))(.+)$");
// Matcher matcher = endOfGenus.matcher("98. SORGHUM Moench - Millets Annuals or rhizomatous perennials; inflorescence …");
Matcher matcher = endOfGenus.matcher("100. ZEA L. - Maize Annuals; male and female inflorescences separate, the …");
while (matcher.find()) {
String group1 = matcher.group(1);
String group2 = matcher.group(2);
System.out.println("group1=" + group1);
System.out.println("group2=" + group2);
}
Group 1 is matched by (^\\d+\\. ZEA L). Group 2 is matched by (.+).

Regex for floor in address

I have this regex:
String regexPattern = "[0-9A-Za-z]+(st|nd|rd|th)" + " " + "floor";
I want to test it against:
String lineString = "8th floor, Prince's Building, 12 Chater Road";
so I do:
boolean isMatching = lineString.matches(regexPattern);
and it return false. Why?
I thought it had something to do with whitespaces in Java, so I removed the whitespace in the regexPattern variable so it reads
regexPattern = "[0-9A-Za-z]+(st|nd|rd|th)floor";
and matched it with a string without white space:
String lineString = "8thfloor,Prince'sBuilding,12ChaterRoad"
it still returns false. Why? Any help very much appreciated.

String.matches() only returns true if the entire string matches the pattern.
Try adding .* to the beginning and end of your regex.
Example:
String regex = ".*[0-9A-Za-z]+(st|nd|rd|th)" + " " + "floor.*";
This is not the best approach, however...
Here's a better alternative:
String input = "8th floor, Prince's Building, 12 Chater Road";
String regex = "[0-9A-Za-z]+(st|nd|rd|th)" + " " + "floor";
Pattern p = Pattern.compile(regex);
boolean isMatch = p.matcher(input).find();
If you want to extract the floor number, do this:
String input = "8th floor, Prince's Building, 12 Chater Road";
String regex = "([0-9A-Za-z])+(st|nd|rd|th)" + " " + "floor";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
if (m.find()) {
String num = m.group(1);
String suffix = m.group(2);
System.out.println("Welcome to the " + num + suffix + " floor!");
// prints 'Welcome to the 8th floor!'
}
Check out the Pattern API for a boatload of info about Java regular expressions.

Edited, per comments ...
The [0-9A-Za-z]+ part is greedily matching until the end of th.
Try [0-9] instead.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to get regex matched group values - java

It works if you use matcher.find() before calling the group function.

Related

Extract multiple dates (dd-MMM-yyyy format) from a string in java

Replace characters in a String, in a specific location

How to split a long string in Java?

Certain strings that should be found by a working Regex are missed, and I need help identifying why

Regex for floor in address

Categories

Resources