IllegalStateException with Pattern/Matcher - java

I'm using Matcher to capture groups using a regular expression in Java and it keeps throwing an IllegalStateException even though I know that the expression matches.
This is my code:
String safeName = Pattern.compile("(\\.\\w+)$").matcher("google.ca").group();
I'm expecting safeName to be .ca as captured with the capturing group in the regular expression but instead I get:
IllegalStateException: No match found
I also tried with .group(0) and .group(1) but the same error occurs.
According to the documentation for group() and group(int group):
Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().
What am I doing wrong?

Matcher is helper class which handles iterating over data to search for substrings matching regex. It is possible that entire string will contain many sub-strings which can be matched, so by calling group() you can't specify which actual match you are interested in. To solve this problem Matcher lets you iterate over all matching sub-strings and then use parts you are interested in.
So before you can use group you need to let Matcher iterate over your string to find() match for your regex. To check if regex matches entire String we can use matches() method instead of find().
Generally to find all matching substrings we are using
Pattern p = Pattern.compiler("yourPattern");
Matcher m = p.matcher("yourData");
while(m.find()){
String match = m.group();
//here we can do something with match...
}
Since you are assuming that text you want to find exists only once in your string (at its end) you don't need to use loop, but simple if (or conditional operator) should solve your problem.
Matcher m = Pattern.compile("(\\.\\w+)$").matcher("google.ca");
String safeName = m.find() ? m.group() : null;

Related

Java: Regular Expression not matching?

I am trying to extract a special sequence out of a String using the following Regular Expression:
[(].*[)]
My Pattern should only match if the String contains () with text between them.
Somehow, i I create a new Pattern using Pattern#compile(myString) and then match the String using Matcher matcher = myPattern.matcher(); it doesn't find anything, even though I tried it on regexr.com and it worked there.
My Pattern is a static final Pattern object in another class (I directly used Pattern#compile(myString).
Example String to match:
save (xxx,yyy)
The likely problem here is your quantifier.
Since you're using greedy * with a combination of . for any character, your match will not delimit correctly as . will also match closing ).
Try using reluctant [(].*?[)].
See quantifiers in docs.
You can also escape parenthesis instead of using custom character classes, like so: \\( and \\), but that has nothing to do with your issue.
Also note (thanks esprittn)
The * quantifier will match 0+ characters, so if you want to restrict your matches to non-empty parenthesis, use .+? instead - that'll guarantee at least one character inside your parenthesis.
Hope the below code helps : its extracts the data between '(' & ')' including them .
String pattern = "\\(.*\\)";
String line = "save(xx,yy)";
Pattern TokenPattern = Pattern.compile(pattern);
Matcher m = TokenPattern.matcher(line);
while (m.find()) {
int start = m.start(0);
int end = m.end(0);
System.out.println(line.substring(start, end));
}
to remove the brackets change 'start' to 'start+1' and 'end' to 'end-1' to change the bounding indexes of the sub-string being taken.

How can I correct my regular expression in Java?

I met a problem when I try to extract a segment from a string using Java. The original string is looks like test/data/20/0000893220-97-000850.txt, and I want to extract the segment which is behind the third /.
My regular expression is like
String m_str = "test/data/20/0000893220-97-000850.txt";
Pattern reg = Pattern.compile("[.*?].txt");
Matcher matcher = reg.matcher(m_str);
System.out.println(matcher.group(0));
The expected result is 0000893220-97-000850, but obviously, I failed. How can I correct this?
[^\/]+$
https://regex101.com/r/tS4nS2/2
This will extract the last segment in a string that contains after slashes. It would work great if you want that, as opposed to only the third section.
To find and extract the match, you don't need a match group (hence, no ()), however, you need to instruct the matcher to only look for the pattern, since .matches() will attempt to compare the entire string. Here is the relevant bit and here is a full example:
matcher.find(); //finds any occurrence of the pattern in the string
System.out.println(matcher.group()); //returns the entire occurence
Note the lack of index inside the call .group().
On a separate note, in Java, you don't necessarily need regex - extracting the last part can be done using plain Java
String matched = m_str.split('/')[2];
This would capture the third segment while
String[] matches = m_str.split('/');
String matched = matches[matches.length-1];
Would give you the last part.

Simple java regex match and replace

So I have myString which contains the string
"border-bottom: solid 1px #ccc;width:8px;background:#bc0b43;float:left;height:12px"
I want to use regex to check that it contains "width:8px" (\bwidth\s*:\s*(\d+)px)
If true, add the width value (i.e. 8 for above example) to my myList.
Attempt:
if (myString.contains("\\bwidth\\s*:\\s*(\\d+)px")) {
myList.add(valueofwidth) //unsure how to select the width value using regex
}
Any help?
EDIT: So I've looked into contains method and found that it doesn't allow regex. matches will allow regex but it looks for a complete match instead.
You need to use Matcher#find() method for that.
From the documentation: -
Attempts to find the next subsequence of the input sequence that
matches the pattern.
And then you can get the captured group out of it: -
Matcher matcher = Pattern.compile("\\bwidth\\s*:\\s*(\\d+)px").matcher(myString);
if (matcher.find()) {
myList.add(matcher.group(1));
}
You have to use a Matcher and matcher.find():
Pattern pattern = Pattern.compile("(\\bwidth\\s*:\\s*(?<width>\\d+)px)");
Matcher matcher = pattern.matcher(args);
while (matcher.find()) {
myList.add(matcher.group("width");
}
Your main problem is that contains() doesn't accept a regex, it accepts a literal String.
matches() on the other hand does accept a regex parameter, but must match the entire string to return true.
Next, once you have your match, you can use replaceAll() to extract your target content:
if (myString.matches(".*\\bwidth\\s*:\\s*\\d+px.*")) {
myList.add(myString.replaceAll(".*\\bwidth\\s*:\\s*(\\d+)px.*", "$1"))
}
This replaces the entire input String with the contents of group #1, which your original regex captures.
Note that I removed the redundant brackets from your original matching regex, but left them in for the replace to capture the target content.

Finding substring in RegEx Java

Hello I have a question about RegEx. I am currently trying to find a way to grab a substring of any letter followed by any two numbers such as: d09.
I came up with the RegEx ^[a-z]{1}[0-9]{2}$ and ran it on the string
sedfdhajkldsfakdsakvsdfasdfr30.reed.op.1xp0
However, it never finds r30, the code below shows my approach in Java.
Pattern pattern = Pattern.compile("^[a-z]{1}[0-9]{2}$");
Matcher matcher = pattern.matcher("sedfdhajkldsfakdsakvsdfasdfr30.reed.op.1xp0");
if(matcher.matches())
System.out.println(matcher.group(1));
it never prints out anything because matcher never finds the substring (when I run it through the debugger), what am I doing wrong?
There are three errors:
Your expression contains anchors. ^ matches only at the start of the string, and $ only matches at the end. So your regular expression will match "r30" but not "foo_r30_bar". You are searching for a substring so you should remove the anchors.
The matches should be find.
You don't have a group 1 because you have no parentheses in your regular expression. Use group() instead of group(1).
Try this:
Pattern pattern = Pattern.compile("[a-z][0-9]{2}");
Matcher matcher = pattern.matcher("sedfdhajkldsfakdsakvsdfasdfr30.reed.op.1xp0");
if(matcher.find()) {
System.out.println(matcher.group());
}
ideone
Matcher Documentation
A matcher is created from a pattern by invoking the pattern's matcher method. Once created, a matcher can be used to perform three different kinds of match operations:
The matches method attempts to match the entire input sequence against the pattern.
The lookingAt method attempts to match the input sequence, starting at the beginning, against the pattern.
The find method scans the input sequence looking for the next subsequence that matches the pattern.
It doesn't match because ^ and $ delimite the start and the end of the string. If you want it to be anywhere, remove that and you will succed.
Your regex is anchored, as such it will never match unless the whole input matches your regex. Use [a-z][0-9]{2}.
Don't use .matches() but .find(): .matches() is shamefully misnamed and tries to match the whole input.
How about "[a-z][0-9][0-9]"? That should find all of the substrings that you are looking for.
^[a-z]{1}[0-9]{2}$
sedfdhajkldsfakdsakvsdfasdfr30.reed.op.1xp0
as far as i can read this
find thr first lower gives[s] caps letter after it there should be two numbers meaning the length of your string is and always will be 3 word chars
Maybe if i have more data about your string i can help
EDIT
if you are sure of *number of dots then
change this line
Matcher matcher = pattern.matcher("sedfdhajkldsfakdsakvsdfasdfr30.reed.op.1xp0");
to
Matcher matcher = pattern.matcher("sedfdhajkldsfakdsakvsdfasdfr30.reed.op.1xp0".split("\.")[0]);
note:-
using my solution you should omit the leading ^ for pattern
read this page for Spliting strings

Find string in between two strings using regular expression

I am using a regular expression for finding string in between two strings
Code:
Pattern pattern = Pattern.compile("EMAIL_BODY_XML_START_NODE"+"(.*)(\\n+)(.*)"+"EMAIL_BODY_XML_END_NODE");
Matcher matcher = pattern.matcher(part);
if (matcher.find()) {
..........
It works fine for texts but when text contains special characters like newline it's break
You need to compile the pattern such that . matches line terminaters as well. To do this you need to use the DOTALL flag.
Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
edit: Sorry, it's been a while since I've had this problem. You'll also have to change the middle regex from (.*)(\\n+)(.*) to (.*?). You need to lazy quantifier (*?) if you have multiple EMAIL_BODY_XML_START_NODE elements. Otherwise the regex will match the start of the first element with the end of the last element rather than having separate matches for each element. Though I'm guessing this is unlikely to be the case for you.

Categories

Resources