I am trying to extract some data with spaces from a string. Here is my code:
if (somestring.contains("LOOP with spaces")) {
i++;
}
Is there a regular expression to extract this: I tried this but did not work
somestring.contains("LOOP\swith\s spaces"))
Use matches instead if you want to use a regular expression as an argument:
if (somestring.matches(".*LOOP\\swith\\sspaces.*")){
Note that (1) .* means any number of any character and, (2) you need to escape the backslash in Java: \\ is interpreted as \.
Related
I would like to match URL strings which can be specified in the following manner.
xxx.yyy.com (For example, the regular expression should match all strings like 4xxx.yyy.com, xxx4.yyy.com, xxx.yyy.com, 4xxx4.yyy.com, 444xxx666.yyy.com, abcxxxdef.yyy.com etc).
I have tried to use
([a-zA-Z0-9]+$)xxx([a-zA-Z0-9]+$).yyy.com
([a-zA-Z0-9]*)xxx([a-zA-Z0-9]*).yyy.com
But they don't work. Please help me write a correct regular expression. Thanks in advance.
Note: I'm trying to do this in Java.
If you want to make sure there is xxx and you want to allow all non whitespace chars before and after. If you want to match the whole string, you could add anchors at the start and end.
Note to escape the dot to match it literally.
^\S*xxx\S*\.yyy\.com$
^ Start of string
\S*xxx\S* Match xxx between optional non whitespace chars
\.yyy Match .yyy
\.com Match .com
$ End of string
Regex demo
In Java double escape the backslash
String regex = "^\\S*xxx\\S*\\.yyy\\.com$";
Or specify the characters on the left and right that you would allow to match in the character class:
^[0-9A-Za-z!##$%^&*()_+]*xxx[0-9A-Za-z!##$%^&*()_+]*\.yyy\.com$
Regex demo
I tried splitting like this-
tableData.split("\\"")
but it does not work.
It seems that you tried to escape it same way as you would escape | which is "\\|". But difference between | and " is that
| is metacharacter in regex engine (it represents OR operator)
" is metacharacter in Java language in string literal (it represents start/end of the string)
To escape any String metacharacter (like ") you need to place before it other String metacharacter responsible for escaping which is \1. So to create String which would contain " like this is "quote" you would need to write it as
String s = "this is \"quote\"";
// ^^ ^^ these represent " literal, not end of string
Same idea is applied if we would like to create \ literal (we would need to escape it by placing another \ before it). For instance if we would want to create string representing c:\foo\bar we would need to write it as
String s = "c:\\foo\\bar";
// ^^ ^^ these will represent \ literal
So as you see \ is used to escape metacharacters (make them simple literals).
This character is used in Java language for Strings, but it also is used in regex engine to escape its metacharacters:
\, ^, $, ., |, ?, *, +, (, ), [, {.
If you would like to create regex which will match [ character you will need to use regex \[ but String representing this regex in Java needs to be written as
String leftBracketRegex = "\\[";
// ^^ - Remember what was said earlier?
// To create \ literal in String we need to escape it
So to split on [ we would need to invoke split("\\[") because regex representing [ is \[ which needs to be written as "\\[" in Java.
Since " is not special character in regex but it is special in String we need to escape it only in string literal by writing it as
split("\"");
1) \ is also used to create other characters line separators \n, tab \t. It can also be used to create Unicode characters like \uXXXX where XXXX is index of character in Unicode table in hexadecimal form.
You have escaped the \ by putting in \ twice, try
tableData.split("\"")
Why does this happen?
A backslash escapes the following character. Since the next character is another backslash, the second backslash will be escaped, thus the doublequote won't.
Your resulting escaped string is \", where it should really be just ".
Edit:
Also keep in mind, that String.split() interprets its pattern parameter as a regular expression, which has several special characters, which have to be escaped in the resulting string.
So if you want split by a .(which is a special regex character), you need to specify it as String.split("\\."). The first backslash escapes the escaping function of the second backlash and would result in "\.".
In case of regex characters you could also just use Pattern.quote(); to escape your desired delimiter, but this is far out of the scope the question orignally had.
Try with single backslash \
tableData.split("\"")
Try like this by escaping " with single backslash \ :
tableData.split("\"")
You are not escaping properly. The snippet code will not even compile because of it. The correct way to do it is
tableData.split("\"");
A single backslash will do the trick.
Like this:
tableData.split("\"");
You can actually split without the backward slash. You only have to use single quote
tableData.split('"');
The line
System.out.println("\\");
prints a single back-slash (\). And
System.out.println("\\\\");
prints double back-slashes (\\). Understood!
But why in the following code:
class ReplaceTest
{
public static void main(String[] args)
{
String s = "hello.world";
s = s.replaceAll("\\.", "\\\\");
System.out.println(s);
}
}
is the output:
hello\world
instead of
hello\\world
After all, the replaceAll() method is replacing a dot (\\.) with (\\\\).
Can someone please explain this?
When replacing characters using regular expressions, you're allowed to use backreferences, such as \1 to replace a using a grouping within the match.
This, however, means that the backslash is a special character, so if you actually want to use a backslash it needs to be escaped.
Which means it needs to actually be escaped twice when using it in a Java string. (First for the string parser, then for the regex parser.)
The javadoc of replaceAll says:
Note that backslashes ( \ ) and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
Matcher.quoteReplacement(java.lang.String) to suppress the special
meaning of these characters, if desired.
This is a formatted addendum to my comment
s = s.replaceAll("\\.", Matcher.quoteReplacement("\\"));
IS MORE READABLE AND MEANINGFUL THAN
s = s.replaceAll("\\.", "\\\\\\");
If you don't need regex for replacing and just need to replace exact strings, escape regex control characters before replace
String trickyString = "$Ha!I'm tricky|.|";
String safeToUseInReplaceAllString = Pattern.quote(trickyString);
The backslash is an escape character in Java Strings. e.g. backslash has a predefined meaning in Java. You have to use "\ \" to define a single backslash. If you want to define " \ w" then you must be using "\ \ w" in your regex. If you want to use backslash you as a literal you have to type \ \ \ \ as \ is also a escape character in regular expressions.
I believe in this particular case it would be easier to use replace instead of replace all.
Reverend Gonzo Has the correct answer when he talks about escaping the character.
Using replaceAll:
s = s.replaceAll("\\.", "\\\\\\\\");
Using replace:
s = s.replaceAll(".", "\\");
replace just takes a string to match to, not a regular expression.
I don't like this implementation of regex. We should be able to escape characters with a single '\' , not '\'. But anyway if you want to get THIS.Out_Of_That you can do:
String prefix = role.replaceFirst("(\\.).*", "");
So you get prefix = THIS;
String to be split
abc:def:ghi\:klm:nop
String should be split based on ":"
"\" is escape character. So "\:" should not be treated as token.
split(":") gives
[abc]
[def]
[ghi\]
[klm]
[nop]
Required output is array of string
[abc]
[def]
[ghi\:klm]
[nop]
How can the \: be ignored
Use a look-behind assertion:
split("(?<!\\\\):")
This will only match if there is no preceding \. Using double escaping \\\\ is required as one is required for the string declaration and one for the regular expression.
Note however that this will not allow you to escape backslashes, in the case that you want to allow a token to end with a backslash. To do that you will have to first replace all double backslashes with
string.replaceAll("\\\\\\\\", ESCAPE_BACKSLASH)
(where ESCAPE_BACKSLASH is a string which will not occur in your input) and then, after splitting using the look-behind assertion, replace the ESCAPE_BACKSLASH string with an unescaped backslash with
token.replaceAll(ESCAPE_BACKSLASH, "\\\\")
Gumbo was right using a look-behind assertion, but in case your string contains the escaped escape character (e.g. \\) right in front of a comma, the split might break. See this example:
test1\,test1,test2\\,test3\\\,test3\\\\,test4
If you do a simple look-behind split for (?<!\\), as Gumbo suggested, the string gets split into two parts only test1\,test1 and test2\\,test3\\\,test3\\\\,test4. This is because the look-behind just checks one character back for the escape character. What would actually be correct, if the string is split on commas and commas preceded by an even number of escape characters.
To achieve this a slightly more complex (double) look-behind expression is needed:
(?<!(?<![^\\]\\(?:\\{2}){0,10})\\),
Using this more complex regular expression in Java, again requires to escape all \ by \\. So this should be a more sophisticated answer to your question:
"any comma separated string".split("(?<!(?<![^\\\\]\\\\(?:\\\\{2}){0,10})\\\\),");
Note: Java does not support infinite repetitions inside of lookbehinds. Therefore only up to 10 repeating double escape characters are checked by using the expression {0,10}. If needed, you can increase this value by adjusting the latter number.
I have comma separated list of regular expressions:
.{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]
I have done a split on the comma. Now I'm trying to match this regex against a generated password. The problem is that Pattern.compile does not like square brackets that is not escaped.
Can some please give me a simple function that takes a string like so: [0-9] and returns the escaped string \[0-9\].
For some reason, the above answer didn't work for me. For those like me who come after, here is what I found.
I was expecting a single backslash to escape the bracket, however, you must use two if you have the pattern stored in a string. The first backslash escapes the second one into the string, so that what regex sees is \]. Since regex just sees one backslash, it uses it to escape the square bracket.
\\]
In regex, that will match a single closing square bracket.
If you're trying to match a newline, for example though, you'd only use a single backslash. You're using the string escape pattern to insert a newline character into the string. Regex doesn't see \n - it sees the newline character, and matches that. You need two backslashes because it's not a string escape sequence, it's a regex escape sequence.
You can use Pattern.quote(String).
From the docs:
public static String quote​(String s)
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
You can use the \Q and \E special characters...anything between \Q and \E is automatically escaped.
\Q[0-9]\E
Pattern.compile() likes square brackets just fine. If you take the string
".{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]"
and split it on commas, you end up with five perfectly valid regexes: the first one matches eight non-line-separator characters, the second matches an ASCII digit, and so on. Unless you really want to match strings like ".{8}" and "[0-9]", I don't see why you would need to escape anything.