Is using "\\\\" to match '\' with Regex in Java the most Readable Way? - java

I know that the following works but it is not that readable, is there any way to make it more readable in the code itself without the addition of a comment?
//Start her off
String sampleregex = "\\\\";
if (input.matches(sampleregex))
//do something
//do some more

Why not
if (input.contains("\\")) {
}
since you simply appear to be looking for a backward slash ?

Assuming you mean "\\\\" instead of "////":
You could escape it with \Q and \E, which removes one layer of backslashes: "\\Q\\\E", but that's not that much better. You could also use Pattern.quote("\\") to have it escaped at runtime. But personally, I'd just stick with "\\\\".
(As an aside, you need four of them because \ is used to escape things in both the regex engine and in Java Strings, so you need to escape once so the regex engine knows you're not trying to scape anything else (so that's \\); then you need to escape both of those so Java knows you're not escaping something in the string (so that's \\\\)).

/ is not a regex metacharacter, so the regex string "/" matches a single slash, and "////" matches four in a row.
I imagine you meant to ask about matching a single backslash, rather than a forward slash, in which case, no, you need to put "\\\\" in your regex string literal to match a single backslash. (And I needed to enter eight to make four show up on SO--damn!)

My solution is similiar to Soldier.moth's but with a twist. Create a constants file which contains common regular expressions and keep adding to it. The expressions as constants can even be combined providing a layer of abstraction to building regular expressions, but in the end they still often end up messy.
public static final String SINGLE_BACKSLASH = "\\\\";

The one solution I've thought of is to do
String singleSlash = "\\\\";
if(input.matches(singleSlash))
//...

Using better names for your variables and constants, and composing them step by step is a good way to do without comments, for example:
final string backslash = "\\";
final string regexEscapedBackslash = backslash + backslash;
if (input.matches(regexEscapedBackslash)) {
...

Related

Remove everything from a string upto a certain character and optionally a string if it follows too

I am looking to write a regex that can remove any characters upto the first &emsp and if there is a (new section) following &emsp then remove that as well. But the following regex doesn't seem to work. Why? How do I correct this?
String removeEmsp =" “[<centd>[</centd>]§ 431:10A–126 (new section)[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.";
Pattern removeEmspPattern1 = Pattern.compile("(.*( (\\(new section\\)))?)(.*)", Pattern.MULTILINE);
System.out.println(removeEmspPattern1.matcher(removeEmsp).replaceAll("$2"));
Have you tried String Split? This creates an array of strings from a string, based on a deliminator.
Once you have the string split, just select the elements of the array that you need for print statement.
Read more here
Your regex is very long and I do not want to debug it. However the tip is that some characters have special meaning in regular expressions. For example & means "and". Squire brackets allow defining characters groups etc. Such characters must be escaped if you want them to be interpreted as just characters and not regex commands. To escape special character you have to write \ in front of it. But \ is escape character for java too, so it should be duplicate.
For example to replace ampersand by letter A you should write str.replaceAll("\\&", "A")
Now you have all information you need. Try to start from simpler regex and then expand it to what you need. Good luck.
EDIT
BTW parsing XML and/or HTML using regular expressions is possible but is highly not recommended. Use special parser for such formats.
Try this:
String removeEmsp =" “[<centd>[</centd>]§ 431:10A–126 (new section)[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.";
System.out.println(removeEmsp.replaceFirst("^.*?\\ (\\(new\\ssection\\))?", ""));
System.out.println(removeEmsp.replaceAll("^.*?\\ (\\(new\\ssection\\))?", ""));
Output:
[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.
[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.
It will remove everything up to " " and optionally, the following "(new section)" text if any.

String.replaceAll(...) of Java not working for \\ and \

I want to convert the directory path from:
C:\Users\Host\Desktop\picture.jpg
to
C:\\Users\\Host\\Desktop\\picture.jpg
I am using replaceAll() function and other replace functions but they do not work.
How can I do this?
I have printed the statement , it gives me the one which i wanted ie
C:\Users\Host\Desktop\picture.jpg
but now when i pass this variable to open the file, i get this exception why?
java.io.FileNotFoundException: C:\Users\Host\Desktop\picture.jpg
EDIT: Changed from replaceAll to replace - you don't need a regex here, so don't use one. (It was a really poor design decision on the part of the Java API team, IMO.)
My guess (as you haven't provided enough information) is that you're doing something like:
text.replace("\\", "\\\\");
Strings are immutable in Java, so you need to use the return value, e.g.
String newText = oldText.replace("\\", "\\\\");
If that doesn't answer your question, please provide more information.
(I'd also suggest that usually you shouldn't be doing this yourself anyway - if this is to include the information in something like a JSON response, I'd expect the wider library to perform escaping for you.)
Note that the doubling is required as \ is an escape character for Java string (and character) literals. Note that as replace doesn't treat the inputs as regular expression patterns, there's no need to perform further doubling, unlike replaceAll.
EDIT: You're now getting a FileNotFoundException because there isn't a filename with double backslashes in - what made you think there was? If you want it as a valid filename, why are you doubling the backslashes?
You have to use :
String t2 = t1.replaceAll("\\\\", "\\\\\\\\");
or (without pattern) :
String t2 = t1.replace("\\", "\\\\");
Each "\" has to be preceeded by an other "\". But it's also true for the preceeding "\" so you have to write four backslashes each time you want one in regex.
In strings \ is bydefault used as escape character therefore in order to select "\" in a string you have to use "\" and for "\" (i.e blackslack two times) use "\\". This will solve your problem and thos will also apply to other symbols also like "
Two explanations:
1. Replace double backslashes to one (not what you asked)
You have to escape the backslash by backslashes. Like this:
String newPath = oldPath.replaceAll("\\\\\\\\", "\\");
The first parameter needs to be escaped twice. Once for the Java Compiler and once because you use regular expressions. So you want to replace two backslashes by one. So, since we have to escape a backslash add one backslash. Now you have \\. This will be compiled to \. BUT!! you have to escape the backslash once again because the first parameter of the replaceAll method uses regular expressions. So to escape it, add a backslash, but that backslash needs to be escaped, so we get \\\\. These for backslashes represents one backslash in the regex. But you want to replace the double backslash to one. So use 8 backslashes.
The second parameter of the replaceAll method isn't using regular expressions, but it has to be escaped as well. So, you need to escape it once for the Java Compiler and once for the replace method: \\\\. This is compiled to two backslashes, which are being interpreted as 1 backslash in the replaceAll method.
2. Replace single backslash to a pair of backslashes (what you asked)
String newPath = oldPath.replaceAll("\\\\", "\\\\\\\\");
Same logic as above.
3. Use replace() instead of replaceAll().
String newPath = oldPath.replace("\\", "\\\\");
The difference is that the replace() method doesn't use regular expressions, so you don't have to escape every backslash twice for the first parameter.
Hopefully, I explained well...
-- Edit: Fixed error, as pointed out by xehpuk --

Regular Expression Java Error

I can't run this regular expression on Java:
String regex = "/^{m:\"(.*)\",s:([0-9]{1,15}),r:([0-9]{1,15}),t:([0-9]{1,2})}$/";
String data = "{m:\"texttexttext\",s:1231,r:23123,t:1}";
Pattern p = Pattern.compile(regex_Write_clientToServer);
Matcher a = p.matcher(data);
This the same regex and the same data on regex site's tester ( as http://gskinner.com/RegExr/ ) works fine!
Two problems:
In java, (unlike perl etc) regexes are not wrapped in / characters
You must escape your { literals:
Try this:
String regex = "^\\{m:\"(.*)\",s:([0-9]{1,15}),r:([0-9]{1,15}),t:([0-9]{1,2})\\}$";
There are two problems:
The forward slashes aren't part of the pattern itself, and shouldn't be included.
You need to escape the braces at the start and end, as otherwise they'll be treated as repetition quantifiers. This may not be the case in other regular expression implementations, but it's certainly the case in Java - when I tried just removing the slashes, I got an exception in Pattern.compile.
Try this:
String regex="^\\{m:\"(.*)\",s:([0-9]{1,15}),r:([0-9]{1,15}),t:([0-9]{1,2})\\}$";
(That works with your sample data.)
As an aside, if this is meant to be parsing JSON, I would personally not try to do it with regular expressions - use a real JSON parser instead. It'll be a lot more flexible in the long run.
Two things:
Java does not require you to have any kind of begin/end character. so you can drop the / chars
Also, Java requires you to escape any regex metacharacters if you want to match them. In your case, the brace characters '{' and '}' need to be preceded by a double backslash (one for java escape, one for regex escape):
"^\\{m:\"(.*)\",s:([0-9]{1,15}),r:([0-9]{1,15}),t:([0-9]{1,2})\\}$"

Refactor Regex Pattern - Java

I have the following aaaa_bb_cc string to match and written a regex pattern like
\\w{4}+\\_\\w{2}\\_\\w{2} and it works. Is there any simple regex which can do this same ?
You don't need to escape the underscores:
\w{4}+_\w{2}_\w{2}
And you can collapse the last two parts, if you don't capture them anyway:
\w{4}+(?:_\w{2}){2}
Doesn't get shorter, though.
(Note: Re-add the needed backslashes for Java's strings, if you like; I prefer to omit them while talking about regular expressions :))
I sometimes do what I call "meta-regexing" as follows:
String pattern = "x{4}_x{2}_x{2}".replace("x", "[a-z]");
System.out.println(pattern); // prints "[a-z]{4}_[a-z]{2}_[a-z]{2}"
Note that this doesn't use \w, which can match an underscore. That is, your original pattern would match "__________".
If x really needs to be replaced with [a-zA-Z0-9], then just do it in the one place (instead of 3 places).
Other examples
Regex for metamap in Java
How do I convert CamelCase into human-readable names in Java?
Yes, you can use just \\w{4}_\\w{2}_\\w{2} or maybe \\w{4}(_\\w{2}){2}.
Looks like your \w does not need to match underscore, so you can use [a-zA-Z0-9] instead
[a-zA-Z0-9]{4}_[a-zA-Z0-9]{2}_[a-zA-Z0-9]{2}

Escaping a String from getting regex parsed in Java

In Java, suppose I have a String variable S, and I want to search for it inside of another String T, like so:
if (T.matches(S)) ...
(note: the above line was T.contains() until a few posts pointed out that that method does not use regexes. My bad.)
But now suppose S may have unsavory characters in it. For instance, let S = "[hi". The left square bracket is going to cause the regex to fail. Is there a function I can call to escape S so that this doesn't happen? In this particular case, I would like it to be transformed to "\[hi".
String.contains does not use regex, so there isn't a problem in this case.
Where a regex is required, rather rejecting strings with regex special characters, use java.util.regex.Pattern.quote to escape them.
As Tom Hawtin said, you need to quote the pattern. You can do this in two ways (edit: actually three ways, as pointed out by #diastrophism):
Surround the string with "\Q" and "\E", like:
if (T.matches("\\Q" + S + "\\E"))
Use Pattern instead. The code would be something like this:
Pattern sPattern = Pattern.compile(S, Pattern.LITERAL);
if (sPattern.matcher(T).matches()) { /* do something */ }
This way, you can cache the compiled Pattern and reuse it. If you are using the same regex more than once, you almost certainly want to do it this way.
Note that if you are using regular expressions to test whether a string is inside a larger string, you should put .* at the start and end of the expression. But this will not work if you are quoting the pattern, since it will then be looking for actual dots. So, are you absolutely certain you want to be using regular expressions?
Try Pattern.quote(String). It will fix up anything that has special meaning in the string.
Any particular reason not to use String.indexOf() instead? That way it will always be interpreted as a regular string rather than a regex.
Regex uses the backslash character '\' to escape a literal. Given that java also uses the backslash character you would need to use a double bashslash like:
String S = "\\[hi"
That will become the String:
\[hi
which will be passed to the regex.
Or if you only care about a literal String and don't need a regex you could do the following:
if (T.indexOf("[hi") != -1) {
T.contains() (according to javadoc : http://java.sun.com/javase/6/docs/api/java/lang/String.html) does not use regexes. contains() delegates to indexOf() only.
So, there are NO regexes used here. Were you thinking of some other String method ?

Categories

Resources