I have a String input as well as String pattern and assume they could contain all sort of such characters which have special meaning for regex, and I would like exact word replacement to take place without giving any special consideration to special characters. Any special meaning should be ignored. And I won't know at compile time exactly how many such special characters might be present in either the input string or the input pattern.
So here is the formal problem statement:-
Assume the the object input_string is the input of type String.
Then we have another string input_pattern which is also an object of type String.
Now I want to perform the following:-
String result=input_string.replaceFirst(input_pattern,"replacewithsomethingdoesntmatter");
the replacement should take place in 'exact' match manner, without considering any regex special meaning of characters if present in the strings. How to make it happen?
You can use the Pattern.quote() method to escape characters that have a special meaning in regular expressions:
String pattern = "^(.*)$";
String quotedPattern = Pattern.quote(pattern);
System.out.println(quotedPattern);
This will wrap the pattern in quotation markers (\Q and \E), indicating that the wrapped sequence needs to be matched literally.
Alternatively, you can wrap the pattern in quotation markers manually:
String pattern = "^(.*)$";
String quotedPattern = "\\Q" + pattern + "\\E";
System.out.println(quotedPattern);
The first approach is probably safer, because it will also make accommodations for expressions that already contain quotation markers.
Related
I'm trying to understand Pattern.quote using the following code:
String pattern = Pattern.quote("1252343% 8 567 hdfg gf^$545");
System.out.println("Pattern is : "+pattern);
produces the output:
Pattern is : \Q1252343% 8 567 hdfg gf^$545\E
What are \Q and \E here? The documentation description says :
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
But Pattern.quote's return type is String and not a compiled Pattern object.
Why is this method required and what are some usage examples?
\Q means "start of literal text" (i.e. regex "open quote")
\E means "end of literal text" (i.e. regex "close quote")
Calling the Pattern.quote() method wraps the string in \Q...\E, which turns the text is into a regex literal. For example, Pattern.quote(".*") would match a dot and then an asterisk:
System.out.println("foo".matches(".*")); // true
System.out.println("foo".matches(Pattern.quote(".*"))); // false
System.out.println(".*".matches(Pattern.quote(".*"))); // true
The method's purpose is to not require the programmer to have to remember the special terms \Q and \E and to add a bit of readability to the code - regex is hard enough to read already. Compare:
someString.matches(Pattern.quote(someLiteral));
someString.matches("\\Q" + someLiteral + "\\E"));
Referring to the javadoc:
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
The Pattern.quote method quotes part of a regex pattern to make regex interpret it as string literals.
Say you have some user input in your search program, and you want to regex for it. But this input may have unsafe characters so you can use
Pattern pattern = Pattern.compile(Pattern.quote(userInput));
This method does not quote a Pattern but, as you point out, wraps a String in regex quotes.
\Q and \E, among all others, are thoroughly documented on the java.util.regex.Pattern Javadoc page. They mean "begin Quote", "End quote" and demark a region where all the chars have the literal meaning. The way to use the return of Pattern.quote is to feed it to Pattern.compile, or any other method that accepts a pattern string, such as String.split.
If you compile the String returned by Pattern.quote, you'll get a Pattern which matches the literal string that you quoted.
\Q and \E mark the beginning and end of the quoted part of the string.
Regex collides frequently with normal strings. Say I want a regex to search for a certain string that is only known at runtime. How can we be sure that the string doesn't have regex meaning eg(".*.*.*")? We quote it.
This method used to make the pattern treated as a sequence of literal characters.
This has the same effect as a PATTERN.LITERAL flag.
I am parsing a .txt, line by line, with considering a target token. I use a regex processor engine.
I match each line against:
"(^|.*[\\s])"+token+"([\\s].*|$)"
where token is a string. When:
token="6-7(3-7"
it arises the following exception:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Unclosed group near index 27
(^|.*[\s])6-7(3-7([\s].*|$)
How can I solve this?
You have special characters in your token.
Have a look at Pattern.quote():
public static String quote(String s)
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
This should do the trick for you:
String pattern = "(^|.*[\\s])" + Pattern.quote(token) + "([\\s].*|$)";
No need for doing the string magic yourself! :-)
You should make sure to escape special characters in any plain-text string you use to make regex patterns. Replace "(" with "\(", and similarly for bare backslashes (before any other steps), periods, and all other special characters, at least all those you expect to see in the input. (If it's arbitrary input from users, assume every character will be included.)
I'm trying to enforce validation of an ID that includes the first two letters being letters and the next four being numbers, there can be one 0 i.e. 0333 but can never be full zeroes with 0000 therefore something like ID0000 is not allowed. The expression I came up with seems to check out when testing it online but doesn't seem to work when trying to enforce it in the program:
\b(?![A-Z]{2}[0]{4})[A-Z]{2}[0-9]{4}\b
and heres the code I'm currently using to implement it:
String pattern = "/\b(?![A-Z]{2}[0]{4})[A-Z]{2}[0-9]{4}\b/";
Pattern regEx = Pattern.compile(pattern);
String ingID = ingredID.getText().toString();
Matcher m = regEx.matcher(ingID);
if (m.matches()) {
ingredID.setError("Please enter a valid Ingrediant ID");
}
For some reason it doesn't seem to validate correctly with accepting ids like ID0000 when it shouldn't be. Any thoughts folks ?
Change your regex pattern to "\\b(?![A-Z]{2}[0]{4})[A-Z]{2}[0-9]{4}\\b"
Your problem is essentially that Java isn't all that Regex-friendly; you need to deal with the limitations of Java strings in order to create a string that can be used as a Regex pattern. Since \ is the escape character in Regex and the escape character in Java strings (and since there's no such thing as a raw string literal in Java), you must double-escape anything that must be escaped in the Regex in order to create a literal \ character within the Java string, which, when parsed as a Regex pattern, will be correctly treated as the escape character.
So, for instance, the Regex pattern /\b/ (where /, as mentioned in my comment, delimits the pattern itself) would be represented in Java as the string "\\b".
I'm trying to understand Pattern.quote using the following code:
String pattern = Pattern.quote("1252343% 8 567 hdfg gf^$545");
System.out.println("Pattern is : "+pattern);
produces the output:
Pattern is : \Q1252343% 8 567 hdfg gf^$545\E
What are \Q and \E here? The documentation description says :
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
But Pattern.quote's return type is String and not a compiled Pattern object.
Why is this method required and what are some usage examples?
\Q means "start of literal text" (i.e. regex "open quote")
\E means "end of literal text" (i.e. regex "close quote")
Calling the Pattern.quote() method wraps the string in \Q...\E, which turns the text is into a regex literal. For example, Pattern.quote(".*") would match a dot and then an asterisk:
System.out.println("foo".matches(".*")); // true
System.out.println("foo".matches(Pattern.quote(".*"))); // false
System.out.println(".*".matches(Pattern.quote(".*"))); // true
The method's purpose is to not require the programmer to have to remember the special terms \Q and \E and to add a bit of readability to the code - regex is hard enough to read already. Compare:
someString.matches(Pattern.quote(someLiteral));
someString.matches("\\Q" + someLiteral + "\\E"));
Referring to the javadoc:
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
The Pattern.quote method quotes part of a regex pattern to make regex interpret it as string literals.
Say you have some user input in your search program, and you want to regex for it. But this input may have unsafe characters so you can use
Pattern pattern = Pattern.compile(Pattern.quote(userInput));
This method does not quote a Pattern but, as you point out, wraps a String in regex quotes.
\Q and \E, among all others, are thoroughly documented on the java.util.regex.Pattern Javadoc page. They mean "begin Quote", "End quote" and demark a region where all the chars have the literal meaning. The way to use the return of Pattern.quote is to feed it to Pattern.compile, or any other method that accepts a pattern string, such as String.split.
If you compile the String returned by Pattern.quote, you'll get a Pattern which matches the literal string that you quoted.
\Q and \E mark the beginning and end of the quoted part of the string.
Regex collides frequently with normal strings. Say I want a regex to search for a certain string that is only known at runtime. How can we be sure that the string doesn't have regex meaning eg(".*.*.*")? We quote it.
This method used to make the pattern treated as a sequence of literal characters.
This has the same effect as a PATTERN.LITERAL flag.
I am trying to work out a formula to match a following pattern:
input string example:
'444'/'443'/'434'/'433'/'344'/'334'/'333'
if any of the patterns above exist in a particular input string I want to match it as the same pattern.
also is it possible to do a variable substitution using regex? meaning check for the 3 chars of the string by using each character as a variable and just doing an increment/decrement for each character? so that you dont have to specify the particular number ranges (hardcoding the pattern string ) for different patterns?
Is there any good library one can use for this?? I was working with Pattern class in java.
If you have any link which would be helpful please pass it through :)
Thank you.
Let's first consider this pattern: [34]{3}
The […] is a character class, it matches exactly one of the characters in the set. The {n} is an exact finite repetition.
So, [34]{3} informally means "exactly 3 of either '3' or '4'". Thus, it matches "333", "334", "343", "344", "433", "434", "443", "444", and nothing else.
As a string literal, the pattern is "[34]{3}". If you don't want to hardcode this pattern, then just generate similar-looking strings that follows this template "[…]{n}". Just put the characters that you want to match in the …, and substitute n with the number you want.
Here's an example:
String alpha = "aeiou";
int n = 5;
String pattern = String.format("[%s]{%s}", alpha, n);
System.out.println(pattern);
// [aeiou]{5}
We've now seen that the pattern is not hardcoded, but rather programmatically generated depending on the values of the variables alpha and n. The pattern [aeiou]{5} will 5 consecutive lowercase vowels, e.g. "ooiae", "ioauu", "eeeee", etc.
It's again not clear if you just want to match these kinds of strings, or if they have to appear like '…'/'…'/'…'/'…'/'…'. If the latter is desired, then simply compose the pattern as desired, using repetition and grouping as necessary. You can also just programmatically copy and paste the pattern 5 times if that's simpler. Here's an example:
String p5 = String.format("'%s'/'%<s'/'%<s'/'%<s'/'%<s'", pattern);
System.out.println(p5);
// '[aeiou]{5}'/'[aeiou]{5}'/'[aeiou]{5}'/'[aeiou]{5}'/'[aeiou]{5}'
This will now match strings like "'aeooi'/'eeiuu'/'uaooo'/'eeeia'/'eieio'".
Caveat
Do be careful about what goes in alpha. Specifically, -, [. ], &&, ^, etc, are special metacharacters in Java character class definition. If you restrict alpha to contain only digits/letters, then you will probably not run into any problems, but e.g. [^a] does NOT mean "either '^' or 'a'". It in fact means "anything but 'a'. See java.util.regex.Pattern for exact character class syntax.
You can use the regex:
('\\d{3}'/){6}'\\d{3}'
Pattern.Compile takes a String as its parameter. Though that's probably most often supplied in the form of a string literal, if you have variable upper and lower bounds for your pattern, you can use something like StringBuilder to build your string, then pass that result to Pattern.Compile.