This question already has an answer here:
Java - Regex - Remove comments
(1 answer)
Closed 7 years ago.
I get the error: "Invalid ecape sequence on this regex:
(\/\*[^/*]*(?:(?!\/\*|\*\/)[/*][^/*]*)*\*\/)|(\{.*?\})
Are there any other regexes that are more suitable or what can I do to fix this regex?
You need to escape the backslashes one more time. This is a "feature" of Java's strings. Java "consumes" the backslashes that you have written because it recognizes special characters like '\t'. When it sees, for example '\/' at the beginning of your regex, it thinks you're asking for a special character, and it complains because this sequence is not valid for that purpose. To get the backslash considered in the regex, you need '\\'.
That being said, this entire approach to handling comments and braces is not going to work generally because it's going to have trouble with a variety of cases like nested blocks in braces. (Just to name one of many.)
(\/\*[^\/*]*(?:(?!\/\*|\*\/)[\/*][^\/*]*)*\*\/)|(\{.*?\})
This is the correct regex, you missed escaping the forward slashes which represent the start and end of a regex sequence.
Here is a simplified version (\/\*.*\*\/|\{.*\})
Related
This question already has answers here:
Java RegEx meta character (.) and ordinary dot?
(9 answers)
Closed 6 years ago.
I am trying to solve following task:
Match the pattern abc.def.ghi.jkl, where each variable a,b,c,d,e,f,g,h,i,j,k,l can be any single character except the newline.
For above question I am matching the input to regex :
"([^\\n]{3}(.)){3}([^\\n]{3})"
// this is the regex pattern I am using currently
What am I doing wrong? Please help me correct the above regex so that it does not match the incorrect input I have provided in the title. Currently it matches to it somehow. Although I have provided 3 it is apparently matching to more than 3 characters.
. has a special meaning in regular expression patterns.
If you want to get a "simple dot", you need to quote/escape it (as "\\.").
And that special meaning is (under normal configuration) "any character except line breaks", which exactly matches your other condition, so you can simplify this to
"(...)\\.(...)\\.(...)\\.(...)"
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
I have a question about removing unwanted character, or in a better sense, keep only certain ones. I have stumbled upon something called String literal and I don't understand how it can help me with achieving my goal. I stumbled upon this somewhere before but don't understand how to use it.
The String literal "[^\p{Alpha}-']" may be used to match any
character that is NOT alphabetic, a dash, or apostrophe; you may find
this useful when using replaceAll()
I understand what replaceAll() does, but other things I don't understand are the little codes like [a-zA-Z] that you can use in it and where to look to find more of them. So I pretty much want to do what the quotes says, and only keep the letters and some punctuation.
The process you are describing is called Regular Expressions or regex for short. It's a tool implemented in many programming languages (including Java) which allows you to handle strings with one line of code, which would otherwise be more complicated and annoying.
I suggest this link for a more in depth tutorial.
replaceAll() uses regexes.
There's too much to explain in a single post, but I will explain a little.
Here's a regex: [^A-Za-z.?!]
[] signifies a character class. It will match one of the contained characters (as modified by meta-characters).
^ When this is the first character in a char class, it is a meta-character meaning NOT.
A-Z signifies a range. Anything between those ASCII/Unicode values will be matched
The ., ?, ! are treated as literals (in other contexts they can become meta-characters).
So, the regex, if quoted and put in a replaceAll() will change everything that's not alphabetic, ., ?, or !.
The second parameter in replaceAll() also accepts some special regex-related characters, like $1 does not literally mean $1.
You'll need to learn about more advanced regex things (capture groups) before you use $1.
This question already has answers here:
Java doesn't work with regex \s, says: invalid escape sequence
(3 answers)
Closed 2 years ago.
I have a very long regular expression that seems to be having issues, but only when imported from a text file. I've narrowed it down to the following section (shown here as a literal String):
"(?i)(?<!\\w)\\w{2,3}(?=\\))"
As you can see, near the end, I am trying to escape a closing parenthesis for a lookahead. Now, if this is hard-coded, like:
Pattern myPattern = Pattern.compile("(?i)(?<!\\w)\\w{2,3}(?=\\))");
It works completely as expected. If, however, I read it from a text file, like:
File patternFile = new File("patterns.txt");
List<String> patternText = FileUtils.readLines(patternFile);
String ucText = patternText.get(0).trim();
Pattern myPattern = Pattern.compile(ucText);
Then I get the error message:
Exception in thread "Thread-4" java.util.regex.PatternSyntaxException: Unmatched closing ')' near index 25
(?i)(?<!\\w)\\w{2,3}(?=\\))
^
So, why is this happening? Why is escaping a closing parenthesis legal when hard-coded, but not when reading from a text file?
You're writing a Java string literal. \) is not a legal escape code for Java string literals.
You need to escape every backslash with \\ to create a string with a single backslash for the regex.
only when imported from a text file
You have to print that to the console.
If it prints out (?i)(?<!\w)\w{2,3}(?=\)) its ok,
if it prints out with it double escaped, you have to un-escape those
A good way to un-escape the escape character is do a global find/replace
(this is %90 of the parsing)
Find "(?x)\\\\ \\\\"
Replace "\\\\"
Un-escape non-escapes is a relative approach.
And it depends upon the character and the substitution,
or no action on either. This is mostly language specific,
but you can roll your own. For this, the basic's are ...
Find "(?xs)\\\\ (.)"
Replace roll your own"
This question already has answers here:
Java, escaping (using) quotes in a regex
(2 answers)
Closed 8 years ago.
Sorry I could not find anything that works and hence I am asking this question. I have a basic string that could have feet("), inch(') or comma(,). All I want to do is identify those and escape them before further processing. Not having any luck with Regex, as you can tell I am not good with it yet. Need help. Thanks much!
Someone hinted at it in your comments, but its not entirely correct since String#replace only takes a single character, and you want to provide more than one for the replacement.
Say you have some function foo() that returns some regular expression that isn't escaped properly, with respect to the "\"" char, or the "\'" char:
String regexp = Bar.foo();
regexp = regexp.replaceAll("(\\\"|\\\')", "\\\\$0");
Pattern yourPatternName = Pattern.compile(regexp);
A little explanation: In Java, you need to escape certain special characters, such as n to mean newline ('\n'), or t to mean tab ('t'). Since you are already escaping them, they are no longer the literal characters '\' + 'n', for example. So, you need to escape them a second time, so that way when the regular expression is compiled, Pattern#compiler will see the two characters "\n" rather than the single character, which is the newline. To escape the '\n' character, you need to, of course, place in a new '\' character. Since we are doing a java.lang.String, we need to still escape that slash one more time.
As for the comma, you don't need to escape that. You only need to escape special characters. For a list of the ones that Pattern recognizes, you can check here:
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
This question already has answers here:
Removing a substring between two characters (java)
(3 answers)
Closed 9 years ago.
I want to remove a string that is between two characters and also the characters itself , lets say for example:
i want to replace all the occurrence of the string between "#?" and ";" and remove it with the characters.
From this
"this #?anystring; is #?anystring2jk; test"
To This
"this is test"
how could i do it in java ?
#computerish your answer executes with errors in Java. The modified version works.
myString.replaceAll("#\\?.*?;", "");
The reason being the ? should be escaped by 2 backslashes else the JVM compiler throws a runtime error illegal escape character. You escape ? characters using the backslash .However, the backslash character() is itself a special character, so you need to escape it as well with another backslash.
Use regex:
myString.replaceAll("#\?.*?;", "");
string.replaceAll(start+".*"+end, "")
is the easy starting point. You might have to deal with greediness of the regex operators, however.