How to replace strings using java String.replaceAll() excluding some patterns? - java

I am using String.Replaceall to replace forward slash / followed or preceded by a space with a comma followed by space ", " EXCEPT some patterns (for example n/v, n/d should not be affected)
ALL the following inputs
"nausea/vomiting"
"nausea /vomiting"
"nausea/ vomiting"
"nausea / vomiting"
Should be outputted as
nausea, vomiting
HOWEVER ALL the following inputs
"user have n/v but not other/ complications"
"user have n/d but not other / complications"
Should be outputted as follows
"user have n/v but not other, complications"
"user have n/d but not other, complications"
I have tried
String source= "nausea/vomiting"
String regex= "([^n/v])(\\s*/\\s*)";
source.replaceAll(regex, ", ");
But it cuts the a before / and gives me nause , vomiting
Does any body know a solution?

Your first capturing group, ([^n/v]), captures any single character that is not the letter n, the letter v, or a slash (/). In this case, it's matching the a at the end of nausea and capturing it to be replaced.
You need to be a bit more clear about what you are and are not replacing here. Do you just want to make sure there's a comma instead when it doesn't end in "vomiting" or "d"? You can use non-capturing groups to indicate this:
(?=asdf) does not capture but when placed at the end ensures that right after the match the string will contain asdf; (?!asdf) ensures that it will not. Whichever you use, the question mark after the initial parenthesis ensures that any text it matches will not be returned or replaced when the match is found.
Also, do not forget that in Java source you must always double up any backslashes you put in string literals.

[^n/v] is a character class, and means anything except a n, / or a v.
You are probably looking for something like a negative lookbehind:
String regex= "(?<!\\bn)(\\s*/\\s*)";
This will match any of your slash and space combinations that are not preceded by just an n, and works for all your examples. You can read more on lookaround here.

Related

Regex pattern matching with multiple strings

Forgive me. I am not familiarized much with Regex patterns.
I have created a regex pattern as below.
String regex = Pattern.quote(value) + ", [NnoneOoff0-9\\-\\+\\/]+|[NnoneOoff0-9\\-\\+\\/]+, "
+ Pattern.quote(value);
This regex pattern is failing with 2 different set of strings.
value = "207e/160";
Use Case 1 -
When channelStr = "207e/160, 149/80"
Then channelStr.matches(regex), returns "true".
Use Case 2 -
When channelStr = "207e/160, 149/80, 11"
Then channelStr.matches(regex), returns "false".
Not able to figure out why? As far I can understand it may be because of the multiple spaces involved when more than 2 strings are present with separated by comma.
Not sure what should be correct pattern I should write for more than 2 strings.
Any help will be appreciated.
If you print your pattern, it is:
\Q207e/160\E, [NnoneOoff0-9\-\+\/]+|[NnoneOoff0-9\-\+\/]+, \Q207e/160\E
It consists of an alternation | matching a mandatory comma as well on the left as on the right side.
Using matches(), should match the whole string and that is the case for 207e/160, 149/80 so that is a match.
Only for this string 207e/160, 149/80, 11 there are 2 comma's, so you do get a partial match for the first part of the string, but you don't match the whole string so matches() returns false.
See the matches in this regex demo.
To match all the values, you can use a repeating pattern:
^[NnoeOf0-9+/-]+(?:,\h*[NnoeOf0-90+/-]+)*$
^ Start of string
[NnoeOf0-9\\+/-]+
(?: Non capture group
,\h* Match a comma and optional horizontal whitespace chars
[NnoeOf0-90-9\\+/-]+ Match 1+ any of the listed in the character class
)* Close the non capture group and optionally repeat it (if there should be at least 1 comma, then the quantifier can be + instead of *)
$ End of string
Regex demo
Example using matches():
String channelStr1 = "207e/160, 149/80";
String channelStr2 = "207e/160, 149/80, 11";
String regex = "^[NnoeOf0-9+/-]+(?:,\\h*[NnoeOf0-90+/-]+)*$";
System.out.println(channelStr1.matches(regex));
System.out.println(channelStr2.matches(regex));
Output
true
true
Note that in the character class you can put - at the end not having to escape it, and the + and / also does not have to be escaped.
You can use regex101 to test your RegEx. it has a description of everything that's going on to help with debugging. They have a quick reference section bottom right that you can use to figure out what you can do with examples and stuff.
A few things, you can add literals with \, so \" for a literal double quote.
If you want the pattern to be one or more of something, you would use +. These are called quantifiers and can be applied to groups, tokens, etc. The token for a whitespace character is \s. So, one or more whitespace characters would be \s+.
It's difficult to tell exactly what you're trying to do, but hopefully pointing you to regex101 will help. If you want to provide examples of the current RegEx you have, what you want to match and then the strings you're using to test it I'll be happy to provide you with an example.
^(?:[NnoneOoff0-9\\-\\+\\/]+ *(?:, *(?!$)|$))+$
^ Start
(?: ... ) Non-capturing group that defines an item and its separator. After each item, except the last, the separator (,) must appear. Spaces (one, several, or none) can appear before and after the comma, which is specified with *. This group can appear one or more times to the end of the string, as specified by the + quantifier after the group's closing parenthesis.
Regex101 Test

Regex to find decimal or non decimal number on same or next line in java

I have the following text
My thing 0.02
My thing 100.2
My thing 65
My thing
0.03
My thing
13
My thing
45.67 stuff
I want to extract the 'My thing' and the number associated with it can split it and put it into an map (I know the keys will over-wreite each other in this example- its just the example Im using here- My thing will actually be incorporated into its own map so it isn't an issue)
Mything=0.02,Mything=100.2,Mything=65,Mything=0.03,Mything=13,Mything=45.67
I tried
Pattern match_pattern = Pattern.compile(start.trim()+"\\n.*?\\d*\\.\\d*\\s",Pattern.DOTALL);
but this doesn't quite do what I want
The pattern for an integer or decimal might be \d+(\.\d+)? so if you want to look for start followed by that number and optional whitespace in between you might try the pattern start + "\\s*\\d+(\\.\\d+)?" (line breaks are whitespace as well) and apply the pattern to multiline text (i.e. don't apply it to individual lines). If there can be anything in between (not just whitespace) you'll want to use .* along with the DOT_ALL flag instead of \s*.
Breakdown of the expression start + "\\s*\\d+(\\.\\d+)?"
start contains a subexpression which is provided from elsewhere. If you want to make sure it is treated as a literal (i.e. special characters like * etc. are not interpreted wrap it with \Q and \E, i.e. "\\Q" + start + "\\E")
\s* (or \\s* in a Java string literal) means "any whitespace" which also includes line breaks
\d+(\.\d+)? (or \\d+(\\.\\d+)? in a Java string literal) means "one or more digits followed by zero or one group consisting of a dot and one or more digits" - this means the "dot and one or more digits" part is optional but if there is a dot it must be followed by at least one digit.
Additional note: if you want to access the capturing groups e.g. to extract the number you'll want to use a non-capturing group for the optional part and wrap the entire (sub-)expression in a capturing group, e.g. (\d+(?:\.\d+)?). In that case, if you'd use Pattern and Matcher, you could access the number using group(1) - or if you wrap start in a group as well (like "(\\Q" + start + "\\E)\\s*(\\d+(?:\\.\\d+)?)") you'd get the first part as group(1) and the second part as group(2).
If you simply want to extract the records you could do it like
String s = "My thing 0.02\nMy thing 100.2\nMy thing 65\nMy thing\n"+
"0.03\nMy thing\n13\nMy thing\n 45.67 stuff\n";
Matcher m = Pattern.compile("(My thing)\\s*(\\d+(?:\\.\\d+)?)").matcher(s);
Then loop through the matches and add to the dictionary, or what ever... ;)
while (m.find()) {
// Add to dictionary, group 1 is key, 2 is value
System.out.println("Found: " + m.group(0)+ ":" + m.group(1)+":" + m.group(2));
}
See it here at ideone.

Find java comments (multi and single line) using regex

I found the following regex online at http://regexlib.com/
(\/\*(\s*|.*?)*\*\/)|(\/\/.*)
It seems to work well for the following matches:
// Compute the exam average score for the midterm exam
/**
* The HelloWorld program implements an application that
*/
BUT it also tends to match
http://regexr.com/foo.html?q=bar
at least starting at the //
I'm new to regex and a total infant, but I read that if you put a caret at the beginning it forces the match to start at the beginning of the line, however this doesn't seem to work on RegExr.
I'm using the following:
^(\/\*(\s*|.*?)*\*\/)|(\/\/.*)$
The regex you are looking for is one that allows the comment beginning (// or /*) to appear anywhere except in each of the regexps that result in tokens that can contain those substrings inside. If you look at the lexical structure of java language, you'll see that the only lexical element that can contain a // or a /* inside is the string literal, so to match a comment inside a string you have to match all the string (for not having a string literal before your match that happens to begin a string literal --- and contain your comment inside)
So, the string before your comment should be composed of any valid string that don't begin a string literal (without ending) and so, it can be rounded by any number of string literals with any string that doesn't form a string literal in between. If you consider a string literal, it should be matched by the following:
\"()*\"
and the inside of the parenthesis must be filled with something that cannot be a \n, a single ", a single \, and also not a unicode literal \uxxxx that results in a valid " (java forbids to use normal java characters to be encoded as unicode sequences, so this last case doesn't apply) but can be a escaped \\ or a escaped \", so this leads to
\"([^\\\"\n]|\\.)*\"
and this can be repeated any number of times optionaly, and preceded of any character not being a " (that should begin the last part considered):
([^\\"](\"([^\\\"\n]|\\.)*\")?)*
well, the previous part to our valid string should be matched by this string, and then comes the comment string, it can be any of two forms:
\/\/[^\n]*$
or
/\*([^\*]|\*[^\/])*\*\/
(this is, a slash, an asterisk (escaped), and any number of things that can be: either something different than a * or * followed by something not a /, to finally reach a */ sequence)
These can be grouped in an alternative group, as in:
(\/\/[^\n]*\n|\/\*([^\*]|\*[^\/])*\*\/)
finally, our expression shows:
^([^\\"](\"([^\\\"\n]|\\.)*\")?)*(\/\/[^\n]*|\/\*([^\*]|\*[^/])*\*\/)
But you should be careful that your matched comment begins not at the beginning, but in the 4th group (in the mark of the 4th left parenthesis) and the regexp should match the string from the beginning, see demo
Note
Think you are matching not only the comment, but the text before. This makes the result match to be composed of what is before the matching you want and the matched. Also think that if you try this regexp with several comments in sequence, it will match only the last, as we have not covered the case of a /* ... /* .... */ sequence (the comment is also something that can be embedded into a comment, but considering also this case will make you hate regexps forever. The correct way to cope with this problem is to write a lex/flex specification to get the java tokens and you'll only get them, but this is out of scope in this explanation. See an probably valid example here.
You can try this pattern:
(?ms)^[^'"\n]*?(?:(?:"(?:\\.|[^"])*"|'\\?.')[^'"\n]*?)*((?:(?://[^\n]*|/\*.*?\*/)[ \t]*)+)
This captures comments in group 1, but only if the comment is not inside a string. Demo.
Breakdown:
(?ms) multiline flag, makes ^ match at the start of a line
singleline flag makes . match newlines
^ start of line
[^'"\n]*? match anything but " or ' or newline
(?: then, any number strings:
(?:
" start with a quote...
(?: ...followed by any number of...
\\. ...a backslash and the escaped character
| or
[^"] any character other than "
)*
" ...and finally the closing quote
| or...
'\\?.' a single character in single quotes, possibly escaped
)
[^'"\n]*? and everything up to the next string or newline
)*
( finally, capture (any number of) comments:
(?:
(?: either...
//[^\n]* a single line comment
| or
/\*.*?\*/ a multiline comment
)
[ \t]* and any subsequent comments if only separated by whitespace
)+
)

Escape symbol while spliting string using regex in java

I have a string that recieved while parsing XML-document:
"ListOfItems/Item[Name='Model/Id']/Price"
And I need to split it by delimeter - "/"
String[] nodes = path.split("/") , but with one condition:
"If backslash presence in name of item, like in an example above, I must skip this block and don't split it."
ie after spliting a must get next array of nodes:
ListOfItems, Item[Name='Model/Id'], Price
How can I do it using regex expression?
Thanks for help!
You can split using this regex:
/(?=(?:(?:[^']*'){2})*[^']*$)
RegEx Demo
This regex basically splits on only forward slashes / that are followed be even number of single quotes, which in other words mean that / inside single quotes are not matched for splitting.
A way consists to use this pattern with the find method and to check if the last match is empty. The advantage is that you don't need to add an additional lookahead to test the string until the end for each possible positions. The items you need are in the capture group 1:
\\G/?((?>[^/']+|'[^']*')*)|$
The \G is an anchor that matches either the start of the string or the position after the previous match. Using this forces all the matchs to be contiguous.
(?>[^/']+|'[^']*')* defines the possible content of an item: all that is not a / or a ', or a string between quotes.
Note that the description of a string between quotes can be improved to deal with escaped quotes: '(?>[^'\\]+|\\.)*' (with the s modifier)
The alternation with the $ is only here to ensure that you have parsed all the string until the end. The capture group 1 of the last match must be empty. If it is null, this means that the global research has stopped before the end (for example in case of unbalanced quotes)
example

String.replaceAll() with [\d]* appends replacement String inbetween characters, why?

I have been trying for hours now to get a regex statement that will match an unknown quantity of consecutive numbers. I believe [0-9]* or [\d]* should be what I want yet when I use Java's String.replaceAll it adds my replacement string in places that shouldn't be matching the regex.
For example:
I have an input string of "This is my99String problem"
If my replacement string is "~"
When I run this
myString.replaceAll("[\\d]*", "~" )
or
myString.replaceAll("[0-9]*", "~" )
my return string is "~T~h~i~s~ ~i~s~ ~m~y~~S~t~r~i~n~g~ ~p~r~o~b~l~e~m~"
As you can see the numbers have been replaced but why is it also appending my replacement string in between characters.
I want it to look like "This is my~String problem"
What am I doing wrong and why is java matching like this.
\\d* matches 0 or more digits, and so it even matches an empty string. And you have an empty string before every character in your string. So, for each of them, it replaces it with ~, hence the result.
Try using \\d+ instead. And you don't need to include \\d in character class.
[\\d]*
matches zero or more (as defined by *). Hence you're getting matches all through your strings. If you use
[\\d]+
that'll match 1 or more numbers.
From the doc:
Greedy quantifiers
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times

Categories

Resources