Java RegEx doesn't replaceAll

Java RegEx doesn't replaceAll - java

I was trying to replace concatenation symbol '+' with '||' in given multi-line script, however it seems that java regex just replaces 1 occurrence, instead of all.
String ss="A+B+C+D";
Matcher mm=Pattern.compile("(?imc)(.+)\\s*\\+\\s*(.+)").matcher(ss);
while(mm.find())
{
System.out.println(mm.group(1));
System.out.println(mm.group(2));
ss=mm.replaceAll("$1 \\|\\| $2");
}
System.out.println(ss); // Output: A+B+C||D, Expected: A||B||C||D

The reason you only replace one element, is because you match the entire line. The regular expression you use "(?imc)(.+)\\s*\\+\\s*(.+)", matches anything (.+) until the end, then reverts, so it can match the rest \\s*\\+.... So basically your group 1 is .+ almost everything, but the last + and beyond. Therefore replaceAll can only match once, and will terminate after that one replacement.
What you need is a replacement that finds + optionally wrapped in spaces:
Pattern.compile("(?imc)\\s*\\+\\s*");
This should match all you want to match, and does not match the entire line, but only your replacement character.

You could just use:
ss = ss.replaceAll("\\+", "||")
as #ernest_k has pointed out. If you really want to continue using a matcher with iteration, then use Matcher#appendReplacement with a StringBuffer:
String ss = "A+B+C+D";
Matcher mm = Pattern.compile("\\+").matcher(ss);
StringBuffer sb = new StringBuffer();
while (mm.find()) {
mm.appendReplacement(sb, "||");
}
mm.appendTail(sb);
System.out.println(sb);

I thing maybe we would just need a simple string replace:
Demo
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "\\+";
final String string = "A+B+C+D";
final String subst = "||";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(subst);
System.out.println(result);
This link on the right panel explains your original expression. The first capturing group does match between one and unlimited times, as many times as possible, thus it would not work here. If we would have changed them to (.+?), it would have partially worked, yet still unnecessary.

Related

How to parse string using regex

I'm pretty new to java, trying to find a way to do this better. Potentially using a regex.
String text = test.get(i).toString()
// text looks like this in string form:
// EnumOption[enumId=test,id=machine]
String checker = text.replace("[","").replace("]","").split(",")[1].split("=")[1];
// checker becomes machine
My goal is to parse that text string and just return back machine. Which is what I did in the code above.
But that looks ugly. I was wondering what kinda regex can be used here to make this a little better? Or maybe another suggestion?

Use a regex' lookbehind:
(?<=\bid=)[^],]*
See Regex101.
(?<= ) // Start matching only after what matches inside
\bid= // Match "\bid=" (= word boundary then "id="),
[^],]* // Match and keep the longest sequence without any ']' or ','
In Java, use it like this:
import java.util.regex.*;
class Main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(?<=\\bid=)[^],]*");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
if (matcher.find()) {
System.out.println(matcher.group(0));
}
}
}
This results in
machine

Assuming you’re using the Polarion ALM API, you should use the EnumOption’s getId method instead of deparsing and re-parsing the value via a string:
String id = test.get(i).getId();

Using the replace and split functions don't take the structure of the data into account.
If you want to use a regex, you can just use a capturing group without any lookarounds, where enum can be any value except a ] and comma, and id can be any value except ].
The value of id will be in capture group 1.
\bEnumOption\[enumId=[^=,\]]+,id=([^\]]+)\]
Explanation
\bEnumOption Match EnumOption preceded by a word boundary
\[enumId= Match [enumId=
[^=,\]]+, Match 1+ times any char except = , and ]
id= Match literally
( Capture group 1
[^\]]+ Match 1+ times any char except ]
)\]
Regex demo | Java demo
Pattern pattern = Pattern.compile("\\bEnumOption\\[enumId=[^=,\\]]+,id=([^\\]]+)\\]");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
machine
If there can be more comma separated values, you could also only match id making use of negated character classes [^][]* before and after matching id to stay inside the square bracket boundaries.
\bEnumOption\[[^][]*\bid=([^,\]]+)[^][]*\]
In Java
String regex = "\\bEnumOption\\[[^][]*\\bid=([^,\\]]+)[^][]*\\]";
Regex demo

A regex can of course be used, but sometimes is less performant, less readable and more bug-prone.
I would advise you not use any regex that you did not come up with yourself, or at least understand completely.
PS: I think your solution is actually quite readable.
Here's another non-regex version:
String text = "EnumOption[enumId=test,id=machine]";
text = text.substring(text.lastIndexOf('=') + 1);
text = text.substring(0, text.length() - 1);
Not doing you a favor, but the downvote hurt, so here you go:
String input = "EnumOption[enumId=test,id=machine]";
Matcher matcher = Pattern.compile("EnumOption\\[enumId=(.+),id=(.+)\\]").matcher(input);
if(!matcher.matches()) {
throw new RuntimeException("unexpected input: " + input);
}
System.out.println("enumId: " + matcher.group(1));
System.out.println("id: " + matcher.group(2));

Java regex to match the start of the word?

Objective: for a given term, I want to check if that term exist at the start of the word. For example if the term is 't'. then in the sentance:
"This is the difficult one Thats it"
I want it to return "true" because of :
This, the, Thats
so consider:
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "/\\b"+term+"[^\\b]*?\\b/gi";
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
}
}
I am getting following Exception:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Illegal/unsupported escape sequence near index 7
/\bt[^\b]*?\b/gi
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.escape(Pattern.java:2416)
at java.util.regex.Pattern.range(Pattern.java:2577)
at java.util.regex.Pattern.clazz(Pattern.java:2507)
at java.util.regex.Pattern.sequence(Pattern.java:2030)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.util.regex.Pattern.matches(Pattern.java:1128)
at java.lang.String.matches(String.java:2063)
at HelloWorld.main(HelloWorld.java:8)
Also the following does not work:
import java.util.regex.*;
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "\\b"+term+"gi";
//String regex = ".";
System.out.println(regex);
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
System.out.println(m.find());
}
}
Example:
{ This , one, Two, Those, Thanks }
for words This Two Those Thanks; result should be true.
Thanks

Since you're using the Java regex engine, you need to write the expressions in a way Java understands. That means removing trailing and leading slashes and adding flags as (?<flags>) at the beginning of the expression.
Thus you'd need this instead:
String regex = "(?i)\\b"+term+".*?\\b"
Have a look at regular-expressions.info/java.html for more information. A comparison of supported features can be found here (just as an entry point): regular-expressions.info/refbasic.html

In Java we don't surround regex with / so instead of "/regex/flags" we just write regex. If you want to add flags you can do it with (?flags) syntax and place it in regex at position from which flag should apply, for instance a(?i)a will be able to find aa and aA but not Aa because flag was added after first a.
You can also compile your regex into Pattern like this
Pattern pattern = Pattern.compile(regex, flags);
where regex is String (again not enclosed with /) and flag is integer build from constants from Pattern like Pattern.DOTALL or when you need more flags you can use Pattern.CASE_INSENSITIVE|Pattern.MULTILINE.
Next thing which may confuse you is matches method. Most people are mistaken by its name, because they assume that it will try to check if it can find in string element which can be matched by regex, but in reality, it checks if entire string can be matched by regex.
What you seem to want is mechanism to test of some regex can be found at least once in string. In that case you may either
add .* at start and end of your regex to let other characters which are not part of element you want to find be matched by regex engine, but this way matches must iterate over entire string
use Matcher object build from Pattern (representing your regex), and use its find() method, which will iterate until it finds match for regex, or will find end of string. I prefer this approach because it will not need to iterate over entire string, but will stop when match will be found.
So your code could look like
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.find());
In case your term could contain some regex special characters but you want regex engine to treat them as normal characters you need to make sure that they will be escaped. To do this you can use Pattern.quote method which will add all necessary escapes for you, so instead of
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
for safety you should use
Pattern pattern = Pattern.compile("\\b"+Pattern.quote(term), Pattern.CASE_INSENSITIVE);

String regex = "(?i)\\b"+term;
In Java, the modifiers must be inserted between "(?" and ")" and there is a variant for turning them off again: "(?-" and ")".
For finding all words beginning with "T" or "t", you may want to use Matcher's find method repeatedly. If you just need the offset, Matcher's start method returns the offset.
If you need to match the full word, use
String regex = "(?i)\\b"+term + "\\w*";

String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("^[+"+term+"].*",Pattern.CASE_INSENSITIVE);
String[] strings = str.split(" ");
for (String s : strings) {
if (pattern.matcher(s).matches()) {
System.out.println(s+"-->"+true);
} else {
System.out.println(s+"-->"+false);
}
}

validate empty variable regex

I have a regex scipt which validate a field variable for some extensions (pdf, doc, jpeg, jpg, and png). But sometimes, this field can be empty. I see on some topics that "^$" can solved my problem. I try a lot of combinaisons (cause I do not know regex) but it doesn't work. I give you my current code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String REGEX = "([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png))\$)";
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(field_Fichier1.getFileName());
return matcher.matches();
Thanks for your help

// Mine = doesn't work for empty field
//String REGEX = "([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png))\$)";
// Anubhava = doesn't work for empty field
//String REGEX = "([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png)))?";
// or
//String REGEX = "([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png)))";
// Bohemian = can't be run = error: "Groovy:illegal string body character after dollar sign;"
String REGEX = "^$|([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png))\$)";

Why do you have \$ in your regex. You can just make your whole regex optional to allow for empty string match:
String REGEX = "([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png)))?";
? in the end will make whole regex match optional thus allowing it match "" as well.

Just add ^$| to the front of your regex:
String REGEX = "^$|([^##]+(\\.(?i)(pdf|doc|docx|jpeg|mp3|jpg|png))\$)";
Note that I haven't checked your existing regex - I'm assuming it works for non-blank input.

extract a string with regex

I'm a beginner using regex, I have Strings like String1= "DELIVERY 'text1' 'text2'" and string2="DELIVERY 'text1'", I want to extract "text1". I tried this pattern
Pattern p = Pattern.compile("^DELIVERY\\s'(.*)'");
Matcher m2 = p.matcher(string);
if (m2.find()) {
System.out.println(m2.group(1));
}
the result was : text1' 'text2 for the 1st string and text1 for the second
i tried this too
Pattern p = Pattern.compile("^DELIVERY\\s'(.*)'\\s'(.*)'");
Matcher m2 = p.matcher(string);
if (m2.find()) {
System.out.println(m2.group(1));
}
it return a result only for String1

Your first attempt was almost right. Just replace:
.*
With:
.*?
This makes the operator "non-greedy", so it will "swallow up" as little matched text as possible.

Your regex .* is "greedy", and consumes as much input as possible yet still match, so it will consume everything from the first to the last quote.
Instead use a relictant version by adding ?, ie .*? to costume as little as possible yet still match, which won't skip iver a quote.
Combine this change with some java Kung Fu and you can do it all in one line:
String quoted = str.replaceAll(".*DELIVERY\\s'(.*?)'.*", "$1");

if you only want to have 'text1', try this regex:
"DELIVERY '([^']*)"
or without grouping:
"(?<=DELIVERY ')[^']*"

How to replace any occurrence of a word between quotes

I need to be able to replace all occurrences of the word "and" ONLY when it occurs between single quotes. For example replacing "and" with "XXX" in the string:
This and that 'with you and me and others' and not 'her and him'
Results in:
This and that 'with you XXX me XXX others' and not 'her XXX him'
I have been able to come up with regular expressions which nearly gets every case, but I'm failing with the "and" between the two sets of quoted text.
My code:
String str = "This and that 'with you and me and others' and not 'her and him'";
String patternStr = ".*?\\'.*?(?i:and).*?\\'.*";
Pattern pattern= Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.matches());
while(matcher.matches()) {
System.out.println("in matcher");
str = str.replaceAll("(?:\\')(.*?)(?i:and)(.*?)(?:\\')", "'$1XXX$2'");
matcher = pattern.matcher(str);
}
System.out.println(str);

Try this code:
str = "This and that 'with you and me and others' and not 'her and him'";
Matcher matcher = Pattern.compile("('[^']*?')").matcher(str);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(sb, matcher.group(1).replaceAll("and", "XXX"));
}
matcher.appendTail(sb);
System.out.println("Output: " + sb);
OUTPUT
Output: This and that 'with you XXX me XXX others' and not 'her XXX him'

String str = "This and that 'with you and me and others' and not 'her and him'";
Pattern p = Pattern.compile("(\\s+)and(\\s+)(?=[^']*'(?:[^']*+'[^']*+')*+[^']*+$)");
System.out.println(p.matcher(str).replaceAll("$1XXX$2"));
The idea is, each time you find the complete word and, you you scan from the current match position to the end of the string, looking for an odd number of single-quotes. If the lookahead succeeds, the matched word must be between a pair of quotes.
Of course, this assumes quotes always come in matched pairs, and that quotes can't be escaped. Quotes escaped with backslashes can be dealt with, but it makes the regex much longer.
I'm also assuming the target word never appears at the beginning or end of a quoted sequence, which seems reasonable for the word and. If you want to allow for target words that are not surrounded by whitespace, you could use something like "\\band\\b" instead, but be aware of Java's problems in the area of word characters vs word boundaries.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java RegEx doesn't replaceAll - java

Related

How to parse string using regex

Java regex to match the start of the word?

validate empty variable regex

extract a string with regex

How to replace any occurrence of a word between quotes

Categories

Resources