JAVA - Regular Expressions : Unclosed Character Class - java

The following is throwing an exception:
Pattern.matches(""+input.charAt(i),"\\s");
java.util.regex.PatternSyntaxException:
Unclosed character class near index 0.
I don't understand why. Does the text I am matching against also need to have escaped characters?
Screenshot of workspace in case it helps.

You got the parameters in the wrong order (from the documentation)
Pattern.matches(String regex, CharSequence input)

The way you are using it seems wrong.
You should do
Pattern p = Pattern.compile("[ \\t\\n]");
Matcher m = p.matcher(""+input.charAt(i));
boolean b = m.matches();
From the reference

There is a special class for whitespaces. You code can be simplified to:
Pattern.matches("\\s", <your_input>);

I guess it should be
Pattern.matches("\\s",String.valueOf(input.charAt(i));

Better break it up this way....
Pattern pattern = Pattern.compile("\\s");
Matcher matcher = pattern.matcher("Your_Source_String");

Related

JAVA REGEX: Match until the specific character

I have this Java code
String cookies = TextUtils.join(";", LoginActivity.msCookieManager.getCookieStore().getCookies());
Log.d("TheCookies", cookies);
Pattern csrf_pattern = Pattern.compile("csrf_cookie=(.+)(?=;)");
Matcher csrf_matcher = csrf_pattern.matcher(cookies);
while (csrf_matcher.find()) {
json.put("csrf_key", csrf_matcher.group(1));
Log.d("CSRF KEY", csrf_matcher.group(1));
}
The String contains something like this:
SessionID=sessiontest;csrf_cookie=e18d027da2fb95e888ebede711f1bc39;ci_session=3f4675b5b56bfd0ba4dae46249de0df7994ee21e
Im trying to get the csrf_cookie data by using this Regular Expression:
csrf_cookie=(.+)(?=;)
I expect a result like this in the code:
csrf_matcher.group(1);
e18d027da2fb95e888ebede711f1bc39
instead I get a:
3492f8670f4b09a6b3c3cbdfcc59e512;ci_session=8d823b309a361587fac5d67ad4706359b40d7bd0
What is the possible work around for this problem?
Here is a one-liner using String#replaceAll:
String input = "SessionID=sessiontest;csrf_cookie=e18d027da2fb95e888ebede711f1bc39;ci_session=3f4675b5b56bfd0ba4dae46249de0df7994ee21e";
String cookie = input.replaceAll(".*csrf_cookie=([^;]*).*", "$1");
System.out.println(cookie);
e18d027da2fb95e888ebede711f1bc39
Demo
Note: We could have used a formal regex pattern matcher, and in face you may want to do this if you need to do this search/replacement often in your code.
You are getting more data than expected because you are using an greedy '+' (It will match as long as it can)
For example the pattern a+ could match on aaa the following: a, aa, and aaa. Where the later is 'preferred' if the pattern is greedy.
So you are matching
csrf_cookie=e18d027da2fb95e888ebede711f1bc39;ci_session=3f4675b5b56bfd0ba4dae46249de0df7994ee21e;
as long as it ends with a ';'. The first ';' is skipped with .+ and the last ';' is found with the possitive lookahead
To make a patter ungreedy/lazy use +? instead of + (so a+? would match a (three times) on aaa string)
So try with:
csrf_cookie=(.+?);
or just match anything that is not a ';'
csrf_cookie=([^;]*);
that way you don't need to make it lazy.

most elegant way of escaping $ in java

I have a base String "abc def", I am trying to replace my base string with "abc$ def$" using replaceFirst(), which is running into errors as $ is not escaped.
I tried doing it with Pattern and Matcher APIs, as given below,
newValue = "abc$ def$";
if(newValue.contains("$")){
Pattern specialCharacters = Pattern.compile("$");
Matcher newMatcherValue = specialCharacters.matcher(newValue) ;
newValue = newMatcherValue.replaceAll("\\\\$") ;
}
This runs into an error. Is there any elegant way of replacing my second string "abc$ def$" with "abc\\\\$ def\\\\$" so as to use the replacefirst() API successfully?
Look at Pattern.quote() to quote a regex and Matcher.quoteReplacement() to quote a replacement string.
That said, does this do what you want it to?
System.out.println("abc def".replaceAll("([\\w]+)\\b", "$1\\$"));
This prints out abc$ def$
You can use replaceAll just in one step:
String newValueScaped = newValue.replaceAll("\\$", "\\\\$")
$ has a special mining in regex, so you need to scape it. It's used to match the end of the data.

Regex matching in online tester but not in JAVA

I'm trying to extract the text BetClic from this string popup_siteinfo(this, '/click/betclic', '373', 'BetClic', '60€');
I wrote a simple regex that works on Regex Tester but that doesn't work on Java.
Here's the regex
'\d+', '(.*?)'
here's Java output
Exception in thread "main" java.lang.IllegalStateException: No match found
at java.util.regex.Matcher.group(Matcher.java:485)
at javaapplication1.JavaApplication1.main(JavaApplication1.java:74)
Java Result: 1
and here's my code
Pattern pattern = Pattern.compile("'\\d+', '(.*?)'");
Matcher matcher = pattern.matcher(onMouseOver);
System.out.print(matcher.group(1));
where the onMouseOver string is popup_siteinfo(this, '/click/betclic', '373', 'BetClic', '60€');
I'm not an expert with regex, but I'm quite sure that mine isn't wrong at all!
Suggestions?
You need to call find() before group(...):
Pattern pattern = Pattern.compile("'\\d+', '(.*?)'");
Matcher matcher = pattern.matcher(onMouseOver);
if(matcher.find()) {
System.out.print(matcher.group(1));
}
else {
System.out.print("no match");
}
You're calling group(1) without having first called a matching operation (such as find()).- which is the cause of IllegalStateException.
And if you have to use that grouped cases for replacement then this isn't needed if you're just using $1 since the replaceAll() is the matching operation.

Pattern syntax error

The following regex works in the find dialog of Eclipse but throws an exception in Java.
I can't find why
(?<=(00|\\+))?[\\d]{1}[\\d]*
The syntax error is at runtime when executing:
Pattern.compile("(?<=(00|\\+))?[\\d]{1}[\\d]*")
In the find I used
(?<=(00|\+))?[\d]{1}[\d]*
I want to match phone numbers with or without the + or 00. But that is not the point because I get a Syntax error at position 13. I don't get the error if I get rid of the second "?"
Pattern.compile("(?<=(00|\\+))[\\d]{1}[\\d]*")
Please consider that instead of 1 sometime I need to use a greater number and anyway the question is about the syntax error
If your data looks like 00ddddd or +ddddd where d is digit you want to get #Bergi's regex (?<=00|\\+)\\d+ will do the trick. But if your data sometimes don't have any part that you want to ignore like ddddd then you probably should use group mechanism like
String[] data={"+123456","00123456","123456"};
Pattern p=Pattern.compile("(?:00|\\+)?(\\d+)");
Matcher m=null;
for (String s:data){
m=p.matcher(s);
if(m.find())
System.out.println(m.group(1));
}
output
123456
123456
123456
Here is an example that works for me:
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(?<=00|\\+)(\\d+)");
Matcher matcher = pattern.matcher("+1123456");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
}
You might shorten your regex a lot. The character classes are not needed when there is only one class inside - just use \d. And {1} is quite useless as well. Also, you can use + for matching "one or more" (it's short for {1,}). Next the additional grouping in your lookbehind should not be needed.
And last, why is that lookbehind optional (with ?)? Just leave it away if you don't need it. This might even be the source of your pattern syntax error - a lookaround must not be optional.
Try this:
/(?<=00|\+)\d+/
Java:
"(?<=00|\\+)\\d+"

Negating a Regular Expression for string replacement

I have the following code that can replace the email address in a String in Java:
addressStr.replaceFirst("([a-zA-Z0-9_\\-\\.]+)#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})", "")
So, a string with John Smith <john#smith.com> would become John Smith <>. How do I negate it so that it will instead replace all that doesn't match the email address and have the final result as just john#smith.com?
I tried to put in the ^ and ?<= at the front but it doesn't work.
Well, it's not the regex you need to change but the calling code. Your regex matches the e-mail address (in a weird way), and the replace() removes it from the string.
So just use
Pattern regex = Pattern.compile("([a-zA-Z0-9_\\-\\.]+)#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})");
Matcher regexMatcher = regex.matcher(addressStr);
if (regexMatcher.find()) {
address = regexMatcher.group();
}
The complete Java regex for catching e-mails would be as follows:
"(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])"
Take a look at https://www.rfc-editor.org/rfc/rfc2822#section-3.4.1 for more info on this.
A bit complicated but it is valid for all known and valid emails formats (yours do not allows mails like bob+bib#gmail.com which are valid).
For your problem, as stated multiple times, just find (stealing Tim Pietzcker piece of code):
Pattern regex = Pattern.compile("(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])");
Matcher regexMatcher = regex.matcher(addressStr);
foundMatch = regexMatcher.find();
You can try:
String mailId = Pattern.compile(regexp, Pattern.LITERAL).matcher(addressStr).group();
Idea here is to get the matched string rather than trying to replace everything else with blank. You can extract the pattern into a field if this operation is repetitive.
Just don't replace.... use match(es) instead.

Categories

Resources