What's the difference between these similar Java regexes?

What's the difference between these similar Java regexes? - java

Are these three related Java regexes just different syntaxes for doing the same thing?
String resultString = subjectString.replaceAll("(?m)^\\d+\\.\\s*", "");
String resultString = subjectString.replace("^[0-9]+\\. *", "");
String resultString = subjectString.replaceAll('\\d+\.\\s+', '');

No, they are different:
(?m)^\\d+\\.\\s* matches
one or more digits at the begin of a line (note m modifier in (?m)), followed by
a literal ., followed by
zero or more whitespace characters (equivalent to [ \t\n\x0B\f\r]);
^[0-9]+\\. * matches
one or more digits at the begin of the string, followed by
a literal ., followed by
zero of more spaces;
\\d+\.\\s+ matches
one or more digits at any position, followed by
a literal ., followed by
one or more whitespace characters.
Besides that, as Adrian Smith has noted, replace does not expect a regular expression but a single char or a CharacterSequence (String implements that interface).

replace doesn't accept a regexp; it accepts a literal string (i.e. will really search for exactly those characters). replaceAll accepts a regexp.
The third one isn't valid because single quotes are used. Single quotes represent individual characters which are char. Double quotes create strings (multiple characters) which are Strings.

Close, each replace a number followed by a period followed by white-space, i.e. 11.. But each one has slight difference:
The first will replace requires that the digit be at the beginning of a line and the white-space can be anything, i.e. a tab.
The second isn't valid, as noted, but if it were replaceAll() the white-space could only be the space character.
The third doesn't have to be at the beginning of the line and will replace any white-space characters like the first one.
The other differences are simply syntax.

Related

How do i check if string contains char sequence and backslash "\"?

I'm trying to get true in the following test. I have a string with the backslash, that for some reason doesn't recognized.
String s = "Good news\\ everyone!";
Boolean test = s.matches("(.*)news\\.");
System.out.println(test);
I've tried a lot of variants, but only one (.*)news(.*) works. But that actually means any characters after news, i need only with \.
How can i do that?

Group the elements at the end:(.*)news\\(.*)

You can use this instead :
Boolean test = s.matches("(.*)news\\\\(.*)");

Try something like:
Boolean test = s.matches(".*news\\\\.*");
Here .* means any number of characters followed by news, followed by double back slashes (escaped in a string) and then any number of characters after that (can be zero as well).
With your regex what it means is:
.* Any number of characters
news\\ - matches by "news\" (see one slash)
. followed by one character.
which doesn't satisfies for String in your program "Good news\ everyone!"

You are testing for an escaped occurrence of a literal dot: ".".
Refactor your pattern as follows (inferring the last part as you need it for a full match):
String s = "Good news\\ everyone!";
System.out.println(s.matches("(.*)news\\\\.*"));
Output
true
Explanation
The back-slash is used to escape characters and the back-slash itself in Java Strings
In Java Pattern representations, you need to double-escape your back-slashes for representing a literal back-slash ("\\\\"), as double-back-slashes are already used to represent special constructs (e.g. \\p{Punct}), or escape them (e.g. the literal dot \\.).
String.matches will attempt to match the whole String against your pattern, so you need the terminal part of the pattern I've added

you can try this :
String s = "Good news\\ everyone!";
Boolean test = s.matches("(.*)news\\\\(.*)");
System.out.println(test);

String.replaceAll() with [\d]* appends replacement String inbetween characters, why?

I have been trying for hours now to get a regex statement that will match an unknown quantity of consecutive numbers. I believe [0-9]* or [\d]* should be what I want yet when I use Java's String.replaceAll it adds my replacement string in places that shouldn't be matching the regex.
For example:
I have an input string of "This is my99String problem"
If my replacement string is "~"
When I run this
myString.replaceAll("[\\d]*", "~" )
or
myString.replaceAll("[0-9]*", "~" )
my return string is "~T~h~i~s~ ~i~s~ ~m~y~~S~t~r~i~n~g~ ~p~r~o~b~l~e~m~"
As you can see the numbers have been replaced but why is it also appending my replacement string in between characters.
I want it to look like "This is my~String problem"
What am I doing wrong and why is java matching like this.

\\d* matches 0 or more digits, and so it even matches an empty string. And you have an empty string before every character in your string. So, for each of them, it replaces it with ~, hence the result.
Try using \\d+ instead. And you don't need to include \\d in character class.

[\\d]*
matches zero or more (as defined by *). Hence you're getting matches all through your strings. If you use
[\\d]+
that'll match 1 or more numbers.
From the doc:
Greedy quantifiers
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times

What is the responsibility of (.*) in the Java String?

What is the responsibility of (.*) in the third line and how it works?
String Str = new String("Welcome to Tutorialspoint.com");
System.out.print("Return Value :" );
System.out.println(Str.matches("(.*)Tutorials(.*)"));

.matches() is a call to parse Str using the regex provided.
Regex, or Regular Expressions, are a way of parsing strings into groups. In the example provided, this matches any string which contains the word "Tutorials". (.*) simply means "a group of zero or more of any character".
This page is a good regex reference (for very basic syntax and examples).

Your expression matches any word prefixed and suffixed by any character of word Tutorial. .* means occurrence of any character any number of times including zero times.
The . represents regular expression meta-character which means any character.
The * is a regular expression quantifier, which means 0 or more occurrences of the expression character it was associated with.

matches takes regular expression string as parameter and (.*) means capture any character zero or more times greedily

.* means a group of zero or more of any character

In Regex:
.
Wildcard: Matches any single character except \n
for example pattern a.e matches ave in nave and ate in water
*
Matches the previous element zero or more times
for example pattern \d*\.\d matches .0, 19.9, 219.9

There is no reason to put parentheses around the .*, nor is there a reason to instantiate a String if you've already got a literal String. But worse is the fact that the matches() method is out of place here.
What it does is greedily matching any character from the start to the end of a String. Then it backtracks until it finds "Tutorials", after which it will again match any characters (except newlines).
It's better and more clear to use the find method. The find method simply finds the first "Tutorials" within the String, and you can remove the "(.*)" parts from the pattern.
As a one liner for convenience:
System.out.printf("Return value : %b%n", Pattern.compile("Tutorials").matcher("Welcome to Tutorialspoint.com").find());

Check string contains whitespace along with some other char sequence using regex in java

am using regex expression to check if a string contains white space.
my regex is : ^\\s+$
for example if my string is my name then regex matches should return true.
but it is returning true only if my string contains only spaces no other character.
How to check if a string contains a whitespace or tab or carriage return characters in between/start/end of some string.

^(.*\s+.*)+$ seems to work for me. Accepts anything as long as there is at least one space in the string. This will match the entire string.
If you only want to check for the presence of a space, you can just use \s without any begin or end markers in the string. The difference is that this will only match the individual spaces.

Your regex is not correct.
That's a string representing a regular expression. (as tchrist pointed out correctly)
The corresponding pattern that you get when using Pattern.compile() matches only strings containing one or more whitespace characters, starting from the beginning until the end. Thus, the matching string only consists of whitespace characters.
Try this string instead for Pattern.compile():
"\\s+"
The difference is that without the anchors "^" and "$" there may be other characters around the whitespace character. The whitespace character(s) may be everywhere in the string.
Using this pattern-string the whitespace character(s) must be at the beginning:
"^\\s+"
And here the sequence of whitespace characters has to be at the end:
"\\s+$"

Use org.apache.commons.lang.StringUtils.containsAny(). See http://commons.apache.org/lang/api-3.1/org/apache/commons/lang3/StringUtils.html.

How to escape a square bracket for Pattern compilation?

I have comma separated list of regular expressions:
.{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]
I have done a split on the comma. Now I'm trying to match this regex against a generated password. The problem is that Pattern.compile does not like square brackets that is not escaped.
Can some please give me a simple function that takes a string like so: [0-9] and returns the escaped string \[0-9\].

For some reason, the above answer didn't work for me. For those like me who come after, here is what I found.
I was expecting a single backslash to escape the bracket, however, you must use two if you have the pattern stored in a string. The first backslash escapes the second one into the string, so that what regex sees is \]. Since regex just sees one backslash, it uses it to escape the square bracket.
\\]
In regex, that will match a single closing square bracket.
If you're trying to match a newline, for example though, you'd only use a single backslash. You're using the string escape pattern to insert a newline character into the string. Regex doesn't see \n - it sees the newline character, and matches that. You need two backslashes because it's not a string escape sequence, it's a regex escape sequence.

You can use Pattern.quote(String).
From the docs:
public static String quote(String s)
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.

You can use the \Q and \E special characters...anything between \Q and \E is automatically escaped.
\Q[0-9]\E

Pattern.compile() likes square brackets just fine. If you take the string
".{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]"
and split it on commas, you end up with five perfectly valid regexes: the first one matches eight non-line-separator characters, the second matches an ASCII digit, and so on. Unless you really want to match strings like ".{8}" and "[0-9]", I don't see why you would need to escape anything.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

What's the difference between these similar Java regexes? - java

Are these three related Java regexes just different syntaxes for doing the same thing? String resultString = subjectString.replaceAll("(?m)^\\d+\\.\\s", ""); String resultString = subjectString.replace("^[0-9]+\\. ", ""); String resultString = subjectString.replaceAll('\\d+\.\\s+', '');

Related

How do i check if string contains char sequence and backslash "\"?

String.replaceAll() with [\d]* appends replacement String inbetween characters, why?

What is the responsibility of (.*) in the Java String?

Check string contains whitespace along with some other char sequence using regex in java

How to escape a square bracket for Pattern compilation?

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

What's the difference between these similar Java regexes? - java

Are these three related Java regexes just different syntaxes for doing the same thing? String resultString = subjectString.replaceAll("(?m)^\\d+\\.\\s*", ""); String resultString = subjectString.replace("^[0-9]+\\. *", ""); String resultString = subjectString.replaceAll('\\d+\.\\s+', '');

Related

How do i check if string contains char sequence and backslash "\"?

String.replaceAll() with [\d]* appends replacement String inbetween characters, why?

What is the responsibility of (.*) in the Java String?

Check string contains whitespace along with some other char sequence using regex in java

How to escape a square bracket for Pattern compilation?

Categories

Resources

Are these three related Java regexes just different syntaxes for doing the same thing? String resultString = subjectString.replaceAll("(?m)^\\d+\\.\\s", ""); String resultString = subjectString.replace("^[0-9]+\\. ", ""); String resultString = subjectString.replaceAll('\\d+\.\\s+', '');