String.replaceAll() with [\d]* appends replacement String inbetween characters, why? - java

I have been trying for hours now to get a regex statement that will match an unknown quantity of consecutive numbers. I believe [0-9]* or [\d]* should be what I want yet when I use Java's String.replaceAll it adds my replacement string in places that shouldn't be matching the regex.
For example:
I have an input string of "This is my99String problem"
If my replacement string is "~"
When I run this
myString.replaceAll("[\\d]*", "~" )
or
myString.replaceAll("[0-9]*", "~" )
my return string is "~T~h~i~s~ ~i~s~ ~m~y~~S~t~r~i~n~g~ ~p~r~o~b~l~e~m~"
As you can see the numbers have been replaced but why is it also appending my replacement string in between characters.
I want it to look like "This is my~String problem"
What am I doing wrong and why is java matching like this.

\\d* matches 0 or more digits, and so it even matches an empty string. And you have an empty string before every character in your string. So, for each of them, it replaces it with ~, hence the result.
Try using \\d+ instead. And you don't need to include \\d in character class.

[\\d]*
matches zero or more (as defined by *). Hence you're getting matches all through your strings. If you use
[\\d]+
that'll match 1 or more numbers.
From the doc:
Greedy quantifiers
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times

Related

How do i check if string contains char sequence and backslash "\"?

I'm trying to get true in the following test. I have a string with the backslash, that for some reason doesn't recognized.
String s = "Good news\\ everyone!";
Boolean test = s.matches("(.*)news\\.");
System.out.println(test);
I've tried a lot of variants, but only one (.*)news(.*) works. But that actually means any characters after news, i need only with \.
How can i do that?
Group the elements at the end:(.*)news\\(.*)
You can use this instead :
Boolean test = s.matches("(.*)news\\\\(.*)");
Try something like:
Boolean test = s.matches(".*news\\\\.*");
Here .* means any number of characters followed by news, followed by double back slashes (escaped in a string) and then any number of characters after that (can be zero as well).
With your regex what it means is:
.* Any number of characters
news\\ - matches by "news\" (see one slash)
. followed by one character.
which doesn't satisfies for String in your program "Good news\ everyone!"
You are testing for an escaped occurrence of a literal dot: ".".
Refactor your pattern as follows (inferring the last part as you need it for a full match):
String s = "Good news\\ everyone!";
System.out.println(s.matches("(.*)news\\\\.*"));
Output
true
Explanation
The back-slash is used to escape characters and the back-slash itself in Java Strings
In Java Pattern representations, you need to double-escape your back-slashes for representing a literal back-slash ("\\\\"), as double-back-slashes are already used to represent special constructs (e.g. \\p{Punct}), or escape them (e.g. the literal dot \\.).
String.matches will attempt to match the whole String against your pattern, so you need the terminal part of the pattern I've added
you can try this :
String s = "Good news\\ everyone!";
Boolean test = s.matches("(.*)news\\\\(.*)");
System.out.println(test);

How to match a substring following after a string satisfying the specific pattern

Imagine, that I have the string 12.34some_text.
How can I match the substring following after the second character (4 in my case) after the . character. In that particular case the string I want to match is some_text.
For the string 56.78another_text it will be another_text and so on.
All accepted strings have the pattern \d\d\.\d\d\w*
If you wish to match everything from the second character after a specific one (i.e. the dot) you can use a lookbehind, like this:
(?<=[.]\d{2})(\w*)
demo
(?<=[.]\d{2}) is a positive lookbehind that matches a dot [.] followed by two digits \d{2}.
Since you are using java and the given pattern is \d\d\.\d\d\w* you will get some_text from 12.34some_textby using
String s="12.34some_text";
s.substring(5,s.length());
and you can compare the substring!

How to replace strings using java String.replaceAll() excluding some patterns?

I am using String.Replaceall to replace forward slash / followed or preceded by a space with a comma followed by space ", " EXCEPT some patterns (for example n/v, n/d should not be affected)
ALL the following inputs
"nausea/vomiting"
"nausea /vomiting"
"nausea/ vomiting"
"nausea / vomiting"
Should be outputted as
nausea, vomiting
HOWEVER ALL the following inputs
"user have n/v but not other/ complications"
"user have n/d but not other / complications"
Should be outputted as follows
"user have n/v but not other, complications"
"user have n/d but not other, complications"
I have tried
String source= "nausea/vomiting"
String regex= "([^n/v])(\\s*/\\s*)";
source.replaceAll(regex, ", ");
But it cuts the a before / and gives me nause , vomiting
Does any body know a solution?
Your first capturing group, ([^n/v]), captures any single character that is not the letter n, the letter v, or a slash (/). In this case, it's matching the a at the end of nausea and capturing it to be replaced.
You need to be a bit more clear about what you are and are not replacing here. Do you just want to make sure there's a comma instead when it doesn't end in "vomiting" or "d"? You can use non-capturing groups to indicate this:
(?=asdf) does not capture but when placed at the end ensures that right after the match the string will contain asdf; (?!asdf) ensures that it will not. Whichever you use, the question mark after the initial parenthesis ensures that any text it matches will not be returned or replaced when the match is found.
Also, do not forget that in Java source you must always double up any backslashes you put in string literals.
[^n/v] is a character class, and means anything except a n, / or a v.
You are probably looking for something like a negative lookbehind:
String regex= "(?<!\\bn)(\\s*/\\s*)";
This will match any of your slash and space combinations that are not preceded by just an n, and works for all your examples. You can read more on lookaround here.

What is the responsibility of (.*) in the Java String?

What is the responsibility of (.*) in the third line and how it works?
String Str = new String("Welcome to Tutorialspoint.com");
System.out.print("Return Value :" );
System.out.println(Str.matches("(.*)Tutorials(.*)"));
.matches() is a call to parse Str using the regex provided.
Regex, or Regular Expressions, are a way of parsing strings into groups. In the example provided, this matches any string which contains the word "Tutorials". (.*) simply means "a group of zero or more of any character".
This page is a good regex reference (for very basic syntax and examples).
Your expression matches any word prefixed and suffixed by any character of word Tutorial. .* means occurrence of any character any number of times including zero times.
The . represents regular expression meta-character which means any character.
The * is a regular expression quantifier, which means 0 or more occurrences of the expression character it was associated with.
matches takes regular expression string as parameter and (.*) means capture any character zero or more times greedily
.* means a group of zero or more of any character
In Regex:
.
Wildcard: Matches any single character except \n
for example pattern a.e matches ave in nave and ate in water
*
Matches the previous element zero or more times
for example pattern \d*\.\d matches .0, 19.9, 219.9
There is no reason to put parentheses around the .*, nor is there a reason to instantiate a String if you've already got a literal String. But worse is the fact that the matches() method is out of place here.
What it does is greedily matching any character from the start to the end of a String. Then it backtracks until it finds "Tutorials", after which it will again match any characters (except newlines).
It's better and more clear to use the find method. The find method simply finds the first "Tutorials" within the String, and you can remove the "(.*)" parts from the pattern.
As a one liner for convenience:
System.out.printf("Return value : %b%n", Pattern.compile("Tutorials").matcher("Welcome to Tutorialspoint.com").find());

What's the difference between these similar Java regexes?

Are these three related Java regexes just different syntaxes for doing the same thing?
String resultString = subjectString.replaceAll("(?m)^\\d+\\.\\s*", "");
String resultString = subjectString.replace("^[0-9]+\\. *", "");
String resultString = subjectString.replaceAll('\\d+\.\\s+', '');
No, they are different:
(?m)^\\d+\\.\\s* matches
one or more digits at the begin of a line (note m modifier in (?m)), followed by
a literal ., followed by
zero or more whitespace characters (equivalent to [ \t\n\x0B\f\r]);
^[0-9]+\\. * matches
one or more digits at the begin of the string, followed by
a literal ., followed by
zero of more spaces;
\\d+\.\\s+ matches
one or more digits at any position, followed by
a literal ., followed by
one or more whitespace characters.
Besides that, as Adrian Smith has noted, replace does not expect a regular expression but a single char or a CharacterSequence (String implements that interface).
replace doesn't accept a regexp; it accepts a literal string (i.e. will really search for exactly those characters). replaceAll accepts a regexp.
The third one isn't valid because single quotes are used. Single quotes represent individual characters which are char. Double quotes create strings (multiple characters) which are Strings.
Close, each replace a number followed by a period followed by white-space, i.e. 11.. But each one has slight difference:
The first will replace requires that the digit be at the beginning of a line and the white-space can be anything, i.e. a tab.
The second isn't valid, as noted, but if it were replaceAll() the white-space could only be the space character.
The third doesn't have to be at the beginning of the line and will replace any white-space characters like the first one.
The other differences are simply syntax.

Categories

Resources