Regex matching in online tester but not in JAVA - java

I'm trying to extract the text BetClic from this string popup_siteinfo(this, '/click/betclic', '373', 'BetClic', '60€');
I wrote a simple regex that works on Regex Tester but that doesn't work on Java.
Here's the regex
'\d+', '(.*?)'
here's Java output
Exception in thread "main" java.lang.IllegalStateException: No match found
at java.util.regex.Matcher.group(Matcher.java:485)
at javaapplication1.JavaApplication1.main(JavaApplication1.java:74)
Java Result: 1
and here's my code
Pattern pattern = Pattern.compile("'\\d+', '(.*?)'");
Matcher matcher = pattern.matcher(onMouseOver);
System.out.print(matcher.group(1));
where the onMouseOver string is popup_siteinfo(this, '/click/betclic', '373', 'BetClic', '60€');
I'm not an expert with regex, but I'm quite sure that mine isn't wrong at all!
Suggestions?

You need to call find() before group(...):
Pattern pattern = Pattern.compile("'\\d+', '(.*?)'");
Matcher matcher = pattern.matcher(onMouseOver);
if(matcher.find()) {
System.out.print(matcher.group(1));
}
else {
System.out.print("no match");
}

You're calling group(1) without having first called a matching operation (such as find()).- which is the cause of IllegalStateException.
And if you have to use that grouped cases for replacement then this isn't needed if you're just using $1 since the replaceAll() is the matching operation.

Related

Pattern matching using Regex in Java

I have a stream from which I read a string that looks like the following:
event.tag.report tag_id=0xABCD0029605, type=ISOB_80K, antenna=1, frequency=918250, rssi=-471, tx_power=330, time=2017-12-18T19:44:07.198
^^^^^^^^^^^^^
I am trying to use Regex to just get the highlighted part (underlined by ^^^^) for every string that I read. My pattern for the Regex is as follows:
.*\\s(tag_id=)(.{38})(\\,\\s)(.*)$
However, this does not work for tag_ids which are longer than or shorter than 38 digits.
Can someone help me with a string pattern that will help me just get the highlighted area in the string independent of its size?
Looks to me as though you want all hexidecimal characters:
"tag_id=(0x[A-F0-9]+)"
So
Pattern pattern = Pattern.compile("tag_id=(0x[A-F0-9]+)");
Matcher matcher = pattern.matcher("event.tag.report tag_id=0x313532384D3135374333343435393031, type=ISOC");
if (matcher.find())
System.out.println(matcher.group(1));
returns:
0x313532384D3135374333343435393031

Why my regular expression matches but does not capture a group?

I am trying to extract the information from the following string:
//YES: We got a match.
I want to extract the information defining two groups
Everything between // and :
all the rest behind :
The pattern matches correctly but I cannot extract the groups.
String example = "//YES: We got a match.";
String COMMENT_PATTERN = "//(\\w+):(.*)";
Pattern pattern = Pattern.compile(COMMENT_PATTERN);
example.matches(COMMENT_PATTERN); // true
Matcher matcher = pattern.matcher(example);
matcher.group(1); // raises an exception
I tried it as well with named groups:
String COMMENT_PATTERN = "//(?<init>\\w+):(?<rest>.*)";
...
matcher.group("init"); // raises an exception
Why my patterns cannot extract the specified groups?
You have to call either find() or matches() on the matcher to cause it to run the matching process before you can extract groups. The
example.matches(COMMENT_PATTERN);
creates its own internal Matcher, calls matches() and then discards the Matcher - it's equivalent to
Pattern.compile(COMMENT_PATTERN).matcher(example).matches()

JAVA - Regular Expressions : Unclosed Character Class

The following is throwing an exception:
Pattern.matches(""+input.charAt(i),"\\s");
java.util.regex.PatternSyntaxException:
Unclosed character class near index 0.
I don't understand why. Does the text I am matching against also need to have escaped characters?
Screenshot of workspace in case it helps.
You got the parameters in the wrong order (from the documentation)
Pattern.matches(String regex, CharSequence input)
The way you are using it seems wrong.
You should do
Pattern p = Pattern.compile("[ \\t\\n]");
Matcher m = p.matcher(""+input.charAt(i));
boolean b = m.matches();
From the reference
There is a special class for whitespaces. You code can be simplified to:
Pattern.matches("\\s", <your_input>);
I guess it should be
Pattern.matches("\\s",String.valueOf(input.charAt(i));
Better break it up this way....
Pattern pattern = Pattern.compile("\\s");
Matcher matcher = pattern.matcher("Your_Source_String");

Pattern syntax error

The following regex works in the find dialog of Eclipse but throws an exception in Java.
I can't find why
(?<=(00|\\+))?[\\d]{1}[\\d]*
The syntax error is at runtime when executing:
Pattern.compile("(?<=(00|\\+))?[\\d]{1}[\\d]*")
In the find I used
(?<=(00|\+))?[\d]{1}[\d]*
I want to match phone numbers with or without the + or 00. But that is not the point because I get a Syntax error at position 13. I don't get the error if I get rid of the second "?"
Pattern.compile("(?<=(00|\\+))[\\d]{1}[\\d]*")
Please consider that instead of 1 sometime I need to use a greater number and anyway the question is about the syntax error
If your data looks like 00ddddd or +ddddd where d is digit you want to get #Bergi's regex (?<=00|\\+)\\d+ will do the trick. But if your data sometimes don't have any part that you want to ignore like ddddd then you probably should use group mechanism like
String[] data={"+123456","00123456","123456"};
Pattern p=Pattern.compile("(?:00|\\+)?(\\d+)");
Matcher m=null;
for (String s:data){
m=p.matcher(s);
if(m.find())
System.out.println(m.group(1));
}
output
123456
123456
123456
Here is an example that works for me:
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(?<=00|\\+)(\\d+)");
Matcher matcher = pattern.matcher("+1123456");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
}
You might shorten your regex a lot. The character classes are not needed when there is only one class inside - just use \d. And {1} is quite useless as well. Also, you can use + for matching "one or more" (it's short for {1,}). Next the additional grouping in your lookbehind should not be needed.
And last, why is that lookbehind optional (with ?)? Just leave it away if you don't need it. This might even be the source of your pattern syntax error - a lookaround must not be optional.
Try this:
/(?<=00|\+)\d+/
Java:
"(?<=00|\\+)\\d+"

Negating a Regular Expression for string replacement

I have the following code that can replace the email address in a String in Java:
addressStr.replaceFirst("([a-zA-Z0-9_\\-\\.]+)#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})", "")
So, a string with John Smith <john#smith.com> would become John Smith <>. How do I negate it so that it will instead replace all that doesn't match the email address and have the final result as just john#smith.com?
I tried to put in the ^ and ?<= at the front but it doesn't work.
Well, it's not the regex you need to change but the calling code. Your regex matches the e-mail address (in a weird way), and the replace() removes it from the string.
So just use
Pattern regex = Pattern.compile("([a-zA-Z0-9_\\-\\.]+)#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})");
Matcher regexMatcher = regex.matcher(addressStr);
if (regexMatcher.find()) {
address = regexMatcher.group();
}
The complete Java regex for catching e-mails would be as follows:
"(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])"
Take a look at https://www.rfc-editor.org/rfc/rfc2822#section-3.4.1 for more info on this.
A bit complicated but it is valid for all known and valid emails formats (yours do not allows mails like bob+bib#gmail.com which are valid).
For your problem, as stated multiple times, just find (stealing Tim Pietzcker piece of code):
Pattern regex = Pattern.compile("(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])");
Matcher regexMatcher = regex.matcher(addressStr);
foundMatch = regexMatcher.find();
You can try:
String mailId = Pattern.compile(regexp, Pattern.LITERAL).matcher(addressStr).group();
Idea here is to get the matched string rather than trying to replace everything else with blank. You can extract the pattern into a field if this operation is repetitive.
Just don't replace.... use match(es) instead.

Categories

Resources