replaceFirst() fails when replacing with "$" - java

I don't understand why the "$" is special.
String str = "bla aa";
String tag = "$";
str = str.replaceFirst("aa", tag);
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 1
If I change the tag = "\\$", then it works fine. But why does it need to be escaped? thanks in advance.

Because it is a special regex symbol (in results it's about capturing groups), and replaceFirst takes regex arguments. The documentation explicitly warns you:
Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceFirst(java.lang.String). Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired.
Now a bit more about $. In the regex pattern it means "end of line".
In the replacement string, $g means "the g th group". So for a regex a([a-z]+)([0-9]+), you have two groups - $1 and $2, and you can refer to them when replacing. See the explanation here

Replace first takes regular expression. According to Pattern javadoc $ matches The end of a line.

$ matches the end of the line in a regex. So if you need it as a simple character, you need to escape it. You can find more at JAVA Pattern

Related

java regular expression and replace all occurrences

I want to replace one string in a big string, but my regular expression is not proper I guess. So it's not working.
Main string is
Some sql part which is to be replaced
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'
String to find and replace is
Based on some condition sql part to be replaced
hemp.EMPLOYEE_NAME = 'xxx'
I have tried this with
Pattern and Matcher class is used and
Pattern pat1 = Pattern.compile("/^hemp.EMPLOYEE_NAME\\s=\\s\'\\w\'\\s[and|or]*/$", Pattern.CASE_INSENSITIVE);
Matcher mat = pat1.matcher(cond);
while (mat.find()) {
System.out.println("Match: " + mat.group());
cond = mat.replaceFirst("xx "+mat.group()+"x");
mat = pat1.matcher(cond);
}
It's not working, not entering the loop at all. Any help is appreciated.
Obviously not - your regexp pattern doesn't make any sense.
The opening /: In some languages, regexps aren't strings and start with an opening slash. Java is not one of those languages, and it has nothing to do with regexps itself. So, this looks for a literal slash in that SQL, which isn't there, thus, failure.
^ is regexpese for 'start of string'. Your string does not start with hemp.EMPLOYEE_NAME, so that also doesn't work. Get rid of both / and ^ here.
\\s is one whitespace character (there are many whitespace characters - this matches any one of them, exactly one though). Your string doesn't have any spaces. Your intent, surely, was \\s* which matches 0 to many of them, i.e.: \\s* is: "Whitespace is allowed here". \\s is: There must be exactly one whitespace character here. Make all the \\s in your regexp an \\s*.
\\w is exactly one 'word' character (which is more or less a letter or digit), you obviously wanted \\w*.
[and|or] this is regexpese for: "An a, or an n, or a d, or an o, or an r, or a pipe symbol". Clearly you were looking for (and|or) which is regexpese for: Either the sequence "and", or the sequence "or".
* - so you want 0 to many 'and' or 'or', which makes no sense.
closing slash: You don't want this.
closing $: You don't want this - it means 'end of string'. Your string didn't end here.
The code itself:
replaceFirst, itself, also does regexps. You don't want to double apply this stuff. That's not how you replace a found result.
This is what you wanted:
Matcher mat = pat1.matcher(cond);
mat.replaceFirst("replacement goes here");
where replacement can include references to groups in the match if you want to take parts of what you matched (i.e. don't use mat.group(), use those references).
More generally did you read any regexp tutorial, did any testing, or did any reading of the javadoc of Pattern and Matcher?
I've been developing for a few years. It's just personal experience, perhaps, but, reading is pretty fundamental.
Instead of the anchors ^ and $, you can use word boundaries \b to prevent a partial match.
If you want to match spaces on the same line, you can use \h to match horizontal whitespace char, as \s can also match a newline.
You can use replaceFirst on the string using $0 to get the full match, and an inline modifier (?i) for a case insensitive match.
Note that using [and|or] is a character class matching one of the listed chars and escape the dot to match it literally, or else . matches any char except a newline.
(?i)\bhemp\.EMPLOYEE_NAME\h*=\h*'\w+'\h+(?:and|or)\b
See a regex demo or a Java demo
For example
String regex = "\\bhemp\\.EMPLOYEE_NAME\\h*=\\h*'\\w+'\\h+(?:and|or)\\b";
String string = "cond = emp.EMAIL_ID = 'xx#xx.com' AND\n"
+ "emp.PERMANENT_ADDR LIKE('%98n%') \n"
+ "AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'";
System.out.println(string.replaceFirst(regex, "xx$0x"));
Output
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND xxhemp.EMPLOYEE_NAME = 'xxx' andx is_active='Y'

How to search an string with [ character at the beginning using regular expression in sparql [duplicate]

I'm trying to understand Pattern.quote using the following code:
String pattern = Pattern.quote("1252343% 8 567 hdfg gf^$545");
System.out.println("Pattern is : "+pattern);
produces the output:
Pattern is : \Q1252343% 8 567 hdfg gf^$545\E
What are \Q and \E here? The documentation description says :
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
But Pattern.quote's return type is String and not a compiled Pattern object.
Why is this method required and what are some usage examples?
\Q means "start of literal text" (i.e. regex "open quote")
\E means "end of literal text" (i.e. regex "close quote")
Calling the Pattern.quote() method wraps the string in \Q...\E, which turns the text is into a regex literal. For example, Pattern.quote(".*") would match a dot and then an asterisk:
System.out.println("foo".matches(".*")); // true
System.out.println("foo".matches(Pattern.quote(".*"))); // false
System.out.println(".*".matches(Pattern.quote(".*"))); // true
The method's purpose is to not require the programmer to have to remember the special terms \Q and \E and to add a bit of readability to the code - regex is hard enough to read already. Compare:
someString.matches(Pattern.quote(someLiteral));
someString.matches("\\Q" + someLiteral + "\\E"));
Referring to the javadoc:
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
The Pattern.quote method quotes part of a regex pattern to make regex interpret it as string literals.
Say you have some user input in your search program, and you want to regex for it. But this input may have unsafe characters so you can use
Pattern pattern = Pattern.compile(Pattern.quote(userInput));
This method does not quote a Pattern but, as you point out, wraps a String in regex quotes.
\Q and \E, among all others, are thoroughly documented on the java.util.regex.Pattern Javadoc page. They mean "begin Quote", "End quote" and demark a region where all the chars have the literal meaning. The way to use the return of Pattern.quote is to feed it to Pattern.compile, or any other method that accepts a pattern string, such as String.split.
If you compile the String returned by Pattern.quote, you'll get a Pattern which matches the literal string that you quoted.
\Q and \E mark the beginning and end of the quoted part of the string.
Regex collides frequently with normal strings. Say I want a regex to search for a certain string that is only known at runtime. How can we be sure that the string doesn't have regex meaning eg(".*.*.*")? We quote it.
This method used to make the pattern treated as a sequence of literal characters.
This has the same effect as a PATTERN.LITERAL flag.

Regular expression in java

I know it's a simple problem but i'm blocked on it : i want to retrieve all strings written in this form :
$F{ETIQX}
Where X is a number. i wrote this regular expression but i'm getting errors :
if (textField.getText().matches("$F{ETIQ\d}")){
System.out.println("matches!!");
}
Any help will be appreciated.
i want to retrieve all strings
Then you shouldn't be using .matches() in the first place. but a Matcher and .find(). .matches() is a misnomer. It will succeed only if the whole input matches the regex (in contradiction with the definiton of regex matching which can occur anywhere in the input).
Also, your regex should be:
"\\$F\\{ETIQ\\d\\}"
(you need to escape backslashes in a Java string)
$, { and } are regex metacharacters; the first is an anchor matching the end of input, the two latter are bounds for a repetition quantifier.
Your code should read:
private static final Pattern PATTERN = Pattern.compile("\\$F\\{ETIQ\\d\\}");
// ...
final Matcher m = PATTERN.matcher(textField.getText());
while (m.find())
// work with m.group()
\$F\{ETIQ\d\}
escape character which have meaning in regex.
$ means end of string
{ means start of a quantifier
} means end of a quantifier
for matching these you must escape them to match them literally.
here is a demo http://regex101.com/r/xT4mR6
In java \ has no meaning and will throuw an error , so we need to escape \ with \.

What is the use of Pattern.quote method?

I'm trying to understand Pattern.quote using the following code:
String pattern = Pattern.quote("1252343% 8 567 hdfg gf^$545");
System.out.println("Pattern is : "+pattern);
produces the output:
Pattern is : \Q1252343% 8 567 hdfg gf^$545\E
What are \Q and \E here? The documentation description says :
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
But Pattern.quote's return type is String and not a compiled Pattern object.
Why is this method required and what are some usage examples?
\Q means "start of literal text" (i.e. regex "open quote")
\E means "end of literal text" (i.e. regex "close quote")
Calling the Pattern.quote() method wraps the string in \Q...\E, which turns the text is into a regex literal. For example, Pattern.quote(".*") would match a dot and then an asterisk:
System.out.println("foo".matches(".*")); // true
System.out.println("foo".matches(Pattern.quote(".*"))); // false
System.out.println(".*".matches(Pattern.quote(".*"))); // true
The method's purpose is to not require the programmer to have to remember the special terms \Q and \E and to add a bit of readability to the code - regex is hard enough to read already. Compare:
someString.matches(Pattern.quote(someLiteral));
someString.matches("\\Q" + someLiteral + "\\E"));
Referring to the javadoc:
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
The Pattern.quote method quotes part of a regex pattern to make regex interpret it as string literals.
Say you have some user input in your search program, and you want to regex for it. But this input may have unsafe characters so you can use
Pattern pattern = Pattern.compile(Pattern.quote(userInput));
This method does not quote a Pattern but, as you point out, wraps a String in regex quotes.
\Q and \E, among all others, are thoroughly documented on the java.util.regex.Pattern Javadoc page. They mean "begin Quote", "End quote" and demark a region where all the chars have the literal meaning. The way to use the return of Pattern.quote is to feed it to Pattern.compile, or any other method that accepts a pattern string, such as String.split.
If you compile the String returned by Pattern.quote, you'll get a Pattern which matches the literal string that you quoted.
\Q and \E mark the beginning and end of the quoted part of the string.
Regex collides frequently with normal strings. Say I want a regex to search for a certain string that is only known at runtime. How can we be sure that the string doesn't have regex meaning eg(".*.*.*")? We quote it.
This method used to make the pattern treated as a sequence of literal characters.
This has the same effect as a PATTERN.LITERAL flag.

What is this Java regex code doing?

I just found this method inside a "Utils"-type class in our codebase. It was written a long time ago by a developer who no longer works for us. What in tarnation is it doing? What is it returning?!? Of course, there's no JavaDocs or comments.
public static String stripChars(String toChar, String ptn){
String stripped = "";
stripped = toChar.replaceAll(ptn, "$1");
return stripped.trim();
}
Thanks in advance!
It's a very short alias, essentially. This:
stripChars(a, b)
Is equivalent to:
a.replaceAll(b, "$1").trim()
It seems to replace everything in "toChar" which matches the regular expression "ptn" with the first group to match in "toChar"
Regular expressions have a concept of groups, for example matching "year 2012" and replacing it with "year 1012", or "year 2006" with "year 1007" (changing the first 20 to 10) can be accomplished by replacing
"year 20([0-9][9-9])" with "year 20$1" -- That is, match the entire string, and then replace it "year 20" followed by the first group ($1). The group is the first thing in parenthesis.
Anyway, your method then replaces everything that matches "ptn" in "toChar" with the first group in the regular expression "ptn". So given
stripChars("year 2012", "year 20([0-9][9-9]"); You would receive back only "12" because the entire text would match and be replaced by only the first group.
It then trips any leading or trailing whitespace.
The pattern string that is passed as argument method seems to contain a matching group and the call to replace all is going to replace the entire match to the paatern with the portion that matched the first group. You should look for the call hierarchy of this method to find some regexes passed to the method along with the strings that are being worked upon,
It's just replacing a string with its own subset of matched characters and then trimming the spaces from both end.
Fo example
So if you want a word to be replaced by a series of digits of that word
Use the regex \b.*?(\d*).*?\b
and then boom,your replaceAll method will give these results
hey123wow->123
what666->666
how888->888
$0 refers to the whole matched string i.e hey123wow,what666,how888 in this example
$1 refers to the group.i.e.(\d*) in this example i.e.123,666,888
$2 would refer to the second group which does not exist in this example.
toChar.replaceAll(ptn, "$1");
Its replacing all the occurences of ptn in toChar with the captured group $1 which we don't know where it is.
Capture groups are patterns inside brackets (): -
For E.G in the below Regex : -
"(\\d+)(cd)"
$0 denotes the complete match
$1 denotes the first capture group (\\d+)
$2 denotes the second capture group (cd)
String str1 = "xyz12cd";
// This will replace `12cd` with the first capture group `12`
str1 = str1.replaceAll("(\\d+)(cd)", "$1");
System.out.println(str1);
For learning more about Regular Expression, you can refer to the following links: -
http://www.vogella.com/articles/JavaRegularExpressions/article.html
http://docs.oracle.com/javase/tutorial/essential/regex/

Categories

Resources