This question already has answers here:
Java, escaping (using) quotes in a regex
(2 answers)
Closed 8 years ago.
Sorry I could not find anything that works and hence I am asking this question. I have a basic string that could have feet("), inch(') or comma(,). All I want to do is identify those and escape them before further processing. Not having any luck with Regex, as you can tell I am not good with it yet. Need help. Thanks much!
Someone hinted at it in your comments, but its not entirely correct since String#replace only takes a single character, and you want to provide more than one for the replacement.
Say you have some function foo() that returns some regular expression that isn't escaped properly, with respect to the "\"" char, or the "\'" char:
String regexp = Bar.foo();
regexp = regexp.replaceAll("(\\\"|\\\')", "\\\\$0");
Pattern yourPatternName = Pattern.compile(regexp);
A little explanation: In Java, you need to escape certain special characters, such as n to mean newline ('\n'), or t to mean tab ('t'). Since you are already escaping them, they are no longer the literal characters '\' + 'n', for example. So, you need to escape them a second time, so that way when the regular expression is compiled, Pattern#compiler will see the two characters "\n" rather than the single character, which is the newline. To escape the '\n' character, you need to, of course, place in a new '\' character. Since we are doing a java.lang.String, we need to still escape that slash one more time.
As for the comma, you don't need to escape that. You only need to escape special characters. For a list of the ones that Pattern recognizes, you can check here:
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Related
This question already has answers here:
Java regular expressions and dollar sign
(5 answers)
Closed 3 years ago.
How to replace all "$$$" present in a String?
I tried
story.replaceAll("$$$","\n")
This displays a warning: Anchor $ in unexpected position and the code fails to work. The code takes the "$" symbol as an anchor for a regular expression. I just need to replace that symbol.
Is there any way to do this?
"$" is a special character for regular expressions.
Try the following:
System.out.println(story.replaceAll("\\$\\$\\$", "\n"));
We are escaping the "$" character with a '\' in the above code.
There are several ways you can do this. It depends on what you want to do, and how elegant your solution is:
String replacement = "\n"; // The replacement string
// The first way:
story.replaceAll("[$]{3}", replacement);
// Second way:
story.replaceAll("\\${3}", replacement);
// Third way:
story.replaceAll("\\$\\$\\$", replacement);
You can replace any special characters (Regular Expression-wise) by escaping that character with a backslash. Since Java-literals use the backslash as escaping-character too, you need to escape the backslash itself.
story.replaceAll("\\${3}", something);
By using {3}behind the $, you say, that it should be found exactly three times. Looks a bit more elegant than "\\$\\$\\$".
something is thus your replacement, for example "" or \n, depending on what you want.
this will surely work..
story.replaceAll("\\$\\$\\$","\n")
YOu can do this for any special character.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
I have a question about removing unwanted character, or in a better sense, keep only certain ones. I have stumbled upon something called String literal and I don't understand how it can help me with achieving my goal. I stumbled upon this somewhere before but don't understand how to use it.
The String literal "[^\p{Alpha}-']" may be used to match any
character that is NOT alphabetic, a dash, or apostrophe; you may find
this useful when using replaceAll()
I understand what replaceAll() does, but other things I don't understand are the little codes like [a-zA-Z] that you can use in it and where to look to find more of them. So I pretty much want to do what the quotes says, and only keep the letters and some punctuation.
The process you are describing is called Regular Expressions or regex for short. It's a tool implemented in many programming languages (including Java) which allows you to handle strings with one line of code, which would otherwise be more complicated and annoying.
I suggest this link for a more in depth tutorial.
replaceAll() uses regexes.
There's too much to explain in a single post, but I will explain a little.
Here's a regex: [^A-Za-z.?!]
[] signifies a character class. It will match one of the contained characters (as modified by meta-characters).
^ When this is the first character in a char class, it is a meta-character meaning NOT.
A-Z signifies a range. Anything between those ASCII/Unicode values will be matched
The ., ?, ! are treated as literals (in other contexts they can become meta-characters).
So, the regex, if quoted and put in a replaceAll() will change everything that's not alphabetic, ., ?, or !.
The second parameter in replaceAll() also accepts some special regex-related characters, like $1 does not literally mean $1.
You'll need to learn about more advanced regex things (capture groups) before you use $1.
I'm currently trying to deal with "leetspeak" in regex. I have a class with a letter, and it will be filled with possible "leet" alternatives in it. However, some of those alternatives are multiple characters long, and I'm having a hard time figuring out how to include those in a class. For example
[kK"|<"]
Now I understand quotation marks don't work like that, but I can't find a way to have this match either k, K, or |< without it matching the | or < individually.
My questions is how can I include a string of characters within a class?
Also, I want to make sure it's treated literally, so I will need to include \Q and \E somewhere in the solution.
You could use a class for both k and K then match |< by itself.
"[kK]|\\|<"
If you are wanting to include \Q and \E ...
"[kK]|\\Q|<\\E"
"k|K|\\|<"
The pipe allows you to "or" a multicharacter string and escaping it with a backslash allows you to include a pipe in such a string. You'll need to escape the backslash with another backslash if the string is in quotation marks, so the backslash can be placed as such in the Regex.
Use this regex:
[kK]|\|<
In Java, you need to escape the backslash, so this becomes
[kK]|\\|<
Option 2: escape the leet
As you suggested yourself, using \\Q some leet \\E lets you match anything without worrying that you may need to escape a special regex character.
Explanation
The character class [kK] matches one char that is either a k or a K
OR |
\|< matches |<
Just something I don't understand the full meaning behind. I understand that I need to escape any special meaning characters if I want to find them using regex. And I also read somewhere that you need to escape the backslash in Java if it's inside a String literal. My question though is if I "escape" the backslash, doesn't it lose its meaning? So then it wouldn't be able to escape the following plus symbol?
Throws an error (but shouldn't it work since that's how you escape those special characters?):
replaceAll("\+\s", ""));
Works:
replaceAll("\\+\\s", ""));
Hopefully that makes sense. I'm just trying to understand the functionality behind why I need those extra slashes when the regex tutorials I've read don't mention them. And things like "\+" should find the plus symbol.
There are two "escapings" going on here. The first backslash is to escape the second backslash for the Java language, to create an actual backslash character. The backslash character is what escapes the + or the s for interpretation by the regular expression engine. That's why you need two backslashes -- one for Java, one for the regular expression engine. With only one backslash, Java reports \s and \+ as illegal escape characters -- not for regular expressions, but for an actual character in the Java language.
Funda behind extra slashes is that , first slash '\' is escape for the string and second slash '\' is escape for the regex.
I want to replace the question mark (?) in a string with some other words. What is the regular expression for question mark.
For example, I want to replace question mark in "word=?" to something else, say "stackoverflow". Then the result would be "word=stackoverflow".
What is the syntax in java?
string.replaceFirst("\\?", yourWord)
That will replace the first occurrence of a '?' in your code with whatever yourWord is.
If you want to replace every '?' with yourWord then use string.replaceAll("\\?", yourWord).
See the javadocs for more info.
As a general rule, you can take the "magic" out of magic characters such as "?", "*", "." and so forth, by using the escape character, which is a backslash ("\").
The tricky part is that in Java, in a string, the backslash is ALREADY used as an escape, so to construct a Java String whose value is "\?", you have to code it as "\\?" so as to escape the escape character.