Java regular expression value.split("\\."), "the back slash dot" divides by character? - java

From what I understand, the backslash dot (\.) means one character of any character? So because backslash is an escape, it should be backslash backslash dot ("\\.")
What does this do to a string? I just saw this in an existing code I am working on. From what I understand, it will split the string into individual characters. Why do this instead of String.toCharArray(). So this splits the string to an array of string which contains only one char for each string in the array?

My guess is that you are missing that backslash ('\') characters are escape characters in Java String literals. So when you want to use a '\' escape in a regex written as a Java String you need to escape it; e.g.
Pattern.compile("\."); // Java syntax error
// A regex that matches a (any) character
Pattern.compile(".");
// A regex that matches a literal '.' character
Pattern.compile("\\.");
// A regex that matches a literal '\' followed by one character
Pattern.compile("\\\\.");
The String.split(String separatorRegex) method splits a String into substrings separated by substrings matching the regex. So str.split("\\.") will split str into substrings separated by a single literal '.' character.

The regex "." would match any character as you state. However an escaped dot "\." would match literal dot characters. Thus 192.168.1.1 split on "\." would result in {"192", "168", "1", "1"}.
Your wording isn't completely clear, but I think this is what you're asking.

Related

Regex match double quote which does not follow slash character

I have string like this:
"abcd\" efg\" hi" jklm"
I want to get sub string between two first characters ", which is not \"
For example, in the above string, I want to get abcd\" efg\" hi
Currently, I replace \" by another character, then use the regex "([^"]*)" to extract the sub string between two first characters ". Is there any way to use regex directly without replacing \" by another character.
Use this regex:
[^\\]?"(.*?[^\\])"
Explanation:
[^\\]? match an optional single character which is not backslash
"(.*? match a quote followed by anything (non-greedy)
[^\\])" match a quote preceded by anything other than backslash
This regex will match the least content between an opening quote and closing quote which does not have a backslash.
Regex101

Java regex for matching certain parts of JSON substrings

I am trying to write a regex that looks for strings with the following pattern:
Begin with an opening bracket { followed by a double-quote "
Then allows for a string of 1+ alphanumeric characters a-zA-Z0-9
Then another double-quote " followed by a colon : and an opening brace [
Then allows for any string of 0+ alphanumeric characters a-zA-Z0-9
So some strings that would match the regex:
{"hello":[blah
{"hello":[
{"1":[
And some strings that would not match:
{hello:[blah
hello":[
{"2:[
So far, the best I've been able to come up with is:
String regex = "{\"[a-zA-Z0-9]+\":\[[a-zA-Z0-9]*";
if(myString.matches(regex))
// do something
But I know I'm way off base. Can any regex gurus help reel me in? Thanks in advance!
String regex = "{\"[a-zA-Z0-9]+\":\[[a-zA-Z0-9]*";
The problem here is that you need an extra backslash before the square bracket. This is because you need the regex to contain \[ in order to match a square bracket, which means the string literal needs to contain \\[ to escape the backslash for the Java code parser. Similarly, you may also need to escape the { in the regex as it is a metacharacter (for bounded repetition counts)
String regex = "\\{\"[a-zA-Z0-9]+\":\\[[a-zA-Z0-9]*";

Check string contains whitespace along with some other char sequence using regex in java

am using regex expression to check if a string contains white space.
my regex is : ^\\s+$
for example if my string is my name then regex matches should return true.
but it is returning true only if my string contains only spaces no other character.
How to check if a string contains a whitespace or tab or carriage return characters in between/start/end of some string.
^(.*\s+.*)+$ seems to work for me. Accepts anything as long as there is at least one space in the string. This will match the entire string.
If you only want to check for the presence of a space, you can just use \s without any begin or end markers in the string. The difference is that this will only match the individual spaces.
Your regex is not correct.
That's a string representing a regular expression. (as tchrist pointed out correctly)
The corresponding pattern that you get when using Pattern.compile() matches only strings containing one or more whitespace characters, starting from the beginning until the end. Thus, the matching string only consists of whitespace characters.
Try this string instead for Pattern.compile():
"\\s+"
The difference is that without the anchors "^" and "$" there may be other characters around the whitespace character. The whitespace character(s) may be everywhere in the string.
Using this pattern-string the whitespace character(s) must be at the beginning:
"^\\s+"
And here the sequence of whitespace characters has to be at the end:
"\\s+$"
Use org.apache.commons.lang.StringUtils.containsAny(). See http://commons.apache.org/lang/api-3.1/org/apache/commons/lang3/StringUtils.html.

Splitting a string that has escape sequence using regular expression in Java

String to be split
abc:def:ghi\:klm:nop
String should be split based on ":"
"\" is escape character. So "\:" should not be treated as token.
split(":") gives
[abc]
[def]
[ghi\]
[klm]
[nop]
Required output is array of string
[abc]
[def]
[ghi\:klm]
[nop]
How can the \: be ignored
Use a look-behind assertion:
split("(?<!\\\\):")
This will only match if there is no preceding \. Using double escaping \\\\ is required as one is required for the string declaration and one for the regular expression.
Note however that this will not allow you to escape backslashes, in the case that you want to allow a token to end with a backslash. To do that you will have to first replace all double backslashes with
string.replaceAll("\\\\\\\\", ESCAPE_BACKSLASH)
(where ESCAPE_BACKSLASH is a string which will not occur in your input) and then, after splitting using the look-behind assertion, replace the ESCAPE_BACKSLASH string with an unescaped backslash with
token.replaceAll(ESCAPE_BACKSLASH, "\\\\")
Gumbo was right using a look-behind assertion, but in case your string contains the escaped escape character (e.g. \\) right in front of a comma, the split might break. See this example:
test1\,test1,test2\\,test3\\\,test3\\\\,test4
If you do a simple look-behind split for (?<!\\), as Gumbo suggested, the string gets split into two parts only test1\,test1 and test2\\,test3\\\,test3\\\\,test4. This is because the look-behind just checks one character back for the escape character. What would actually be correct, if the string is split on commas and commas preceded by an even number of escape characters.
To achieve this a slightly more complex (double) look-behind expression is needed:
(?<!(?<![^\\]\\(?:\\{2}){0,10})\\),
Using this more complex regular expression in Java, again requires to escape all \ by \\. So this should be a more sophisticated answer to your question:
"any comma separated string".split("(?<!(?<![^\\\\]\\\\(?:\\\\{2}){0,10})\\\\),");
Note: Java does not support infinite repetitions inside of lookbehinds. Therefore only up to 10 repeating double escape characters are checked by using the expression {0,10}. If needed, you can increase this value by adjusting the latter number.

How to escape a square bracket for Pattern compilation?

I have comma separated list of regular expressions:
.{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]
I have done a split on the comma. Now I'm trying to match this regex against a generated password. The problem is that Pattern.compile does not like square brackets that is not escaped.
Can some please give me a simple function that takes a string like so: [0-9] and returns the escaped string \[0-9\].
For some reason, the above answer didn't work for me. For those like me who come after, here is what I found.
I was expecting a single backslash to escape the bracket, however, you must use two if you have the pattern stored in a string. The first backslash escapes the second one into the string, so that what regex sees is \]. Since regex just sees one backslash, it uses it to escape the square bracket.
\\]
In regex, that will match a single closing square bracket.
If you're trying to match a newline, for example though, you'd only use a single backslash. You're using the string escape pattern to insert a newline character into the string. Regex doesn't see \n - it sees the newline character, and matches that. You need two backslashes because it's not a string escape sequence, it's a regex escape sequence.
You can use Pattern.quote(String).
From the docs:
public static String quote​(String s)
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
You can use the \Q and \E special characters...anything between \Q and \E is automatically escaped.
\Q[0-9]\E
Pattern.compile() likes square brackets just fine. If you take the string
".{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]"
and split it on commas, you end up with five perfectly valid regexes: the first one matches eight non-line-separator characters, the second matches an ASCII digit, and so on. Unless you really want to match strings like ".{8}" and "[0-9]", I don't see why you would need to escape anything.

Categories

Resources