Replace group 1 of Java regex with out replacing the entire regex

Replace group 1 of Java regex with out replacing the entire regex - java

I have a regex pattern that will have only one group. I need to find texts in the input strings that follows the pattern and replace ONLY the match group 1. For example I have the regex pattern and the string to be applied on as shown below. The replacement string is "<---->"
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher("plan plans lander planitia");
The expected result is
plan p<--->s <--->der p<--->itia
I tried following approaches
String test = "plan plans lander planitia";
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher(test);
String result = "";
while(m.find()){
result = test.replaceAll(m.group(1),"<--->");
}
System.out.print(result);
This gives result as
p<---> p<--->s <--->der p<--->itia
Another approach
String test = "plan plans lander planitia";
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher(test);
String result = "";
while(m.find()){
result = test.replaceAll("\\w*(lan)\\w+","<--->");
}
System.out.print(result);
Result is
plan <---> <---> <--->
I have gone through this link. Here the part of the string before the match is always constant and is "foo" but in my case it varies. Also I have looked at this and this but I am unable to apply any on the solutions given to my present scenario.
Any help is appreciated

You need to use the following pattern with capturing groups:
(\w*)lan(\w+)
^-1-^ ^-2-^
and replace with $1<--->$2
See the regex demo
The point is that we use a capturing group around the parts that we want to keep and just match what we want to discard.
Java demo:
String str = "plan plans lander planitia";
System.out.println(str.replaceAll("(\\w*)lan(\\w+)", "$1<--->$2"));
// => plan p<--->s <--->der p<--->itia
If you need to be able to replace the Group 1 and keep the rest, you may use the replace callback method emulation with Matcher#appendReplacement:
String text = "plan plans lander planitia";
String pattern = "\\w*(lan)\\w+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group(0).replaceFirst(Pattern.quote(m.group(1)), "<--->"));
}
m.appendTail(sb); // append the rest of the contents
System.out.println(sb.toString());
// output => plan p<--->s <--->der p<--->itia
See another Java demo
Here, since we process a match by match, we should only replace the Group 1 contents once with replaceFirst, and since we replace the substring as a literal, we should Pattern.quote it.

To dynamically control the replacement value, use a find() loop with appendReplacement(), finalizing the result with appendTail().
That way you have full control of the replacement value. In your case, the pattern is the following, and you can get the positions indicated.
start(1)
↓ end(1)
↓ ↓
\\w*(lan)\\w+
↑ ↑
start() end()
You can then extract the values to keep.
String input = "plan plans lander planitia";
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher(input);
while (m.find())
m.appendReplacement(buf, input.substring(m.start(), m.start(1)) +
"<--->" +
input.substring(m.end(1), m.end()));
String output = m.appendTail(buf).toString();
System.out.println(output);
Output
plan p<--->s <--->der p<--->itia
If you don't like that it uses the original string, you can use the matched substring instead.
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher("plan plans lander planitia");
while (m.find()) {
String match = m.group();
int start = m.start();
m.appendReplacement(buf, match.substring(0, m.start(1) - start) +
"<--->" +
match.substring(m.end(1) - start, m.end() - start));
}
String output = m.appendTail(buf).toString();

While Wiktors explanation of the use of capturing groups is completely correct, you could avoid using them at all. The \\w* at the start of your pattern seems irrelevant, as you want to keep it anyways, so we can simply leave it out of the pattern. The check for a word-character after lan can be done using a lookahead, like (?=\w), so we actually only match lan in a pattern like "lan(?=\\w)" and can do a simple replace with "<--->" (or whatever you like).

I like others solutions. This is slightly optimalised bulletproof version:
public static void main (String [] args) {
int groupPosition = 1;
String replacement = "foo";
Pattern r = Pattern.compile("foo(bar)");
Matcher m = r.matcher("bar1234foobar1234bar");
StringBuffer sb = new StringBuffer();
while (m.find()) {
StringBuffer buf = new StringBuffer(m.group());
buf.replace(m.start(groupPosition)-m.start(), m.end(groupPosition)-m.start(), replacement);
m.appendReplacement(sb, buf.toString());
}
m.appendTail(sb);
System.out.println(sb.toString()); // result is "bar1234foofoo1234bar"
}

Related

Extract text with Java regex with reference support

I am having some problem writing a method in Java. It basically extracts text with matching pattern and returns ALL the extractions. It simply works just like java.util.regex.Matcher's find()/matches() then group() :
Matcher matcher = pattern.matcher(fileContent);
StringBuilder sb = new StringBuilder();
while(matcher.matches()) {
sb.append(matcher.group()).append("\n");
}
return sb.toString();
However, I would like the extractions to be formatted with the references(dollar sign,$) and literal-character-escaping (backslash,\) support, just like the replacement in Matcher.replaceAll(replacement)(Doc). For example:
fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
pattern = Pattern.compile("bb.*(.)(abb)");
extractionFormatter = "$1: $0, \\$$2";
The expected output would be:
a: bbcac aabb, $abb
b: bbccc babb, $abb
I hope you understand what I am trying to do. Do you know if there is any existing library/method that can achieve this without having me to reinvent the wheel?

You can use the results method from the Matcher class which returns a stream of MatchResults to first get all matches, get the results as string using MatchResult.group, replace now using the method String.replaceAll using the pattern as regex and your extractionFormatter as replacement and finally join all using new line:
String fileContent = "aaabbcac aabb\n" +
"bcbcbbccc babba";
Pattern pattern = Pattern.compile("bb.*(.)(abb)");
String extractionFormatter = "$1: $0, \\$$2";
String output = pattern.matcher(fileContent)
.results()
.map(MatchResult::group)
.map(s -> s.replaceAll(pattern.pattern(), extractionFormatter))
.collect(Collectors.joining(System.lineSeparator()));
System.out.println(output);

You can use String.replaceAll instead.
The thing to note is that if you want to get the desired output with capture groups, you would have to match (to remove) from the string that should not be there in the replacement.
Using a pattern that would give the desired output:
String fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
String pattern = "(?m)^.*?(bb\\S*).*(.)(abb).*$";
String extractionFormatter = "$2: $1 $2$3, \\$$3";
System.out.print(fileContent.replaceAll(pattern, extractionFormatter));
Output
a: bbcac aabb, $abb
b: bbccc babb, $abb
See a Java demo.
Or using the Stringbuilder, Matcher and the while loop:
String fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
String pat = "bb.*(.)(abb)";
Pattern pattern = Pattern.compile(pat);
Matcher matcher = pattern.matcher(fileContent);
String extractionFormatter = "$1: $0, \\$$2";
StringBuilder sb = new StringBuilder();
while(matcher.find()) {
sb.append(matcher.group().replaceAll(pat, extractionFormatter)).append("\n");
}
System.out.print(sb);
See a Java demo.

Java regex to extract and replace by value

Input String
${abc.xzy}/demo/${ttt.bbb}
test${kkk.mmm}
RESULT
World/demo/Hello
testSystem
The text inside the curly brackets are keys to my properties. I want to replace those properties with run time values.
I can do the following to get the regex match but what should i put in the replace logic to change the ${..} matched with the respective run time value in the input string.
Pattern p = Pattern.compile("\\{([^}]*)\\}");
Matcher m = p.matcher(s);
while (m.find()) {
// replace logic comes here
}

An alternative may be using a third-party lib such as Apache Commons Text.
They have StringSubstitutor class looks very promising.
Map valuesMap = HashMap();
valuesMap.put("abc.xzy", "World");
valuesMap.put("ttt.bbb", "Hello");
valuesMap.put("kkk.mmm", "System");
String templateString = "${abc.xzy}/demo/${ttt.bbb} test${kkk.mmm}"
StringSubstitutor sub = new StringSubstitutor(valuesMap);
String resolvedString = sub.replace(templateString);
For more info check out Javadoc https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringSubstitutor.html

You may use the following solution:
String s = "${abc.xzy}/demo/${ttt.bbb}\ntest${kkk.mmm}";
Map<String, String> map = new HashMap<String, String>();
map.put("abc.xzy", "World");
map.put("ttt.bbb", "Hello");
map.put("kkk.mmm", "System");
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("\\$\\{([^{}]+)\\}").matcher(s);
while (m.find()) {
String value = map.get(m.group(1));
m.appendReplacement(result, value != null ? value : m.group());
}
m.appendTail(result);
System.out.println(result.toString());
See the Java demo online, output:
World/demo/Hello
testSystem
The regex is
\$\{([^{}]+)\}
See the regex demo. It matches a ${ string, then captures any 1+ chars other than { and } into Group 1 and then matches }. If Group 1 value is present in the Map as a key, the replacement is the key value, else, the matched text is pasted back where it was in the input string.

Your regex needs to include the dollar. Also making the inner group lazy is sufficient to not include any } in the resulting key String.
String regex = "\\$\\{(.+?)\\}";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
while (m.find()) {
String key = m.group(1); // This is your matching group (the one in braces).
String value = someMap.get(key);
s.replaceFirst(regex, value != null ? value : "missingKey");
m = p.matcher(s); // you could alternatively reset the existing Matcher, but just create a new one, for simplicity's sake.
}
You could streamline this, by extracting the cursor position, and doing the replacement yourself, for the string. But either way, you need to reset your matcher, because otherwise it will parse on the old String.

The_Cute_Hedgehog's answer is good, but includes a dependency.
Wiktor Stribiżew's answer is missing a special case.
My answer aim to using java build-in regex and try to improve from Wiktor Stribiżew's answer. (Improve in Java code only, the regex is Ok)
Improvements:
Using StringBuilder is faster than StringBuffer
Initial StringBuilder capable to (int)(s.length()*1.2), avoid relocating memory many times in case of large input template s.
Avoid the case of regex special characters make wrong result by appendReplacement (like "cost: $100"). You can fix this problem in Wiktor Stribiżew's code by escape $ character in the replacement String like this value.replaceAll("\\$", "\\\\\\$")
Here is the improved code:
String s = "khj${abc.xzy}/demo/${ttt.bbb}\ntest${kkk.mmm}{kkk.missing}string";
Map<String, String> map = new HashMap<>();
map.put("abc.xzy", "World");
map.put("ttt.bbb", "cost: $100");
map.put("kkk.mmm", "System");
StringBuilder result = new StringBuilder((int)(s.length()*1.2));
Matcher m = Pattern.compile("\\$\\{([^}]+)\\}").matcher(s);
int nonCaptureIndex = 0;
while (m.find()) {
String value = map.get(m.group(1));
if (value != null) {
int index = m.start();
if (index > nonCaptureIndex) {
result.append(s.substring(nonCaptureIndex, index));
}
result.append(value);
nonCaptureIndex = m.end();
}
}
result.append(s.substring(nonCaptureIndex, s.length()));
System.out.println(result.toString());

First and second tocen regex

How could I get the first and the second text in "" from the string?
I could do it with indexOf but this is really boring ((
For example I have a String for parse like: "aaa":"bbbbb"perhapsSomeOtherText
And I d like to get aaa and bbbbb with the help of Regex pattern - this will help me to use it in switch statement and will greatly simplify my app/

If all that you have is colon delimited string just split it:
String str = ...; // colon delimited
String[] parts = str.split(":");
Note, that split() receives regex and compilies it every time. To improve performance of your code you can use Pattern as following:
private static Pattern pColonSplitter = Pattern.compile(":");
// now somewhere in your code:
String[] parts = pColonSplitter.split(str);
If however you want to use pattern for matching and extraction of string fragments in more complicated cases, do it like following:
Pattert p = Patter.compile("(\\w+):(\\w+):");
Matcher m = p.matcher(str);
if (m.find()) {
String a = m.group(1);
String b = m.group(2);
}
Pay attention on brackets that define captured group.

Something like this?
Pattern pattern = Pattern.compile("\"([^\"]*)\"");
Matcher matcher = pattern.matcher("\"aaa\":\"bbbbb\"perhapsSomeOtherText");
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
aaa
bbbbb

String str = "\"aaa\":\"bbbbb\"perhapsSomeOtherText";
Pattern p = Pattern.compile("\"\\w+\""); // word between ""
Matcher m = p.matcher(str);
while(m.find()){
System.out.println(m.group().replace("\"", ""));
}
output:
aaa
bbbbb

there are several ways to do this
Use StringTokenizer or Scanner with UseDelimiter method

Get an array of Strings matching a pattern from a String

I have a long string let's say
I like this #computer and I want to buy it from #XXXMall.
I know the regular expression pattern is
Pattern tagMatcher = Pattern.compile("[#]+[A-Za-z0-9-_]+\\b");
Now i want to get all the hashtags in an array. How can i use this expression to get array of all hash tags from string something like
ArrayList hashtags = getArray(pattern, str)

You can write like?
private static List<String> getArray(Pattern tagMatcher, String str) {
Matcher m = tagMatcher.matcher(str);
List<String> l = new ArrayList<String>();
while(m.find()) {
String s = m.group(); //will give you "#computer"
s = s.substring(1); // will give you just "computer"
l.add(s);
}
return l;
}
Also you can use \\w- instead of A-Za-z0-9-_ making the regex [#]+[\\w]+\\b

This link would surely be helpful for achieving what you want.
It says:
The find() method searches for occurrences of the regular expressions
in the text passed to the Pattern.matcher(text) method, when the
Matcher was created. If multiple matches can be found in the text, the
find() method will find the first, and then for each subsequent call
to find() it will move to the next match.
The methods start() and end() will give the indexes into the text
where the found match starts and ends.
Example:
String text =
"This is the text which is to be searched " +
"for occurrences of the word 'is'.";
String patternString = "is";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
int count = 0;
while(matcher.find()) {
count++;
System.out.println("found: " + count + " : "
+ matcher.start() + " - " + matcher.end());
}
You got the hint now.

Here is one way, using Matcher
Pattern tagMatcher = Pattern.compile("#+[-\\w]+\\b");
Matcher m = tagMatcher.matcher(stringToMatch);
ArrayList<String> hashtags = new ArrayList<>();
while (m.find()) {
hashtags.add(m.group());
}
I took the liberty of simplifying your regex. # does not need to be in a character class. [A-Za-z0-9_] is the same as \w, so [A-Za-z0-9-_] is the same as [-\w]

You can use :
String val="I like this #computer and I want to buy it from #XXXMall.";
String REGEX = "(?<=#)[A-Za-z0-9-_]+";
List<String> list = new ArrayList<String>();
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(val);
while(matcher.find()){
list.add(matcher.group());
}
(?<=#) Positive Lookbehind - Assert that the character # literally be matched.

you can use the following code for getting the names
String saa = "#{akka}nikhil#{kumar}aaaaa";
Pattern regex = Pattern.compile("#\\{(.*?)\\}");
Matcher m = regex.matcher(saa);
while(m.find()) {
String s = m.group(1);
System.out.println(s);
}
It will print
akka
kumar

replace substring using regex

I have a string which contains many <xxx> values.
I want to retrive the value inside <>, do some manipulation and re-insert the new value into the string.
What I did is
input = This is <abc_d> a sample <ea1_j> input <lmk_02> string
while(input.matches(".*<.+[\S][^<]>.*"))
{
value = input.substring(input.indexOf("<") + 1, input.indexOf(">"));
//calculate manipulatedValue from value
input = input.replaceFirst("<.+>", manipulatedValue);
}
but after the first iteration, value contains abc_d> a sample <ea1_j> input <lmk_02. I believe indexOf(">") will give the first index of ">". Where did I go wrong?

This is a slightly easier way of accomplishing what you are trying to do:
String input = "This is <abc_d> a sample <ea1_j> input <lmk_02> string";
Matcher matcher = Pattern.compile("<([^>]*)>").matcher(input);
StringBuffer sb = new StringBuffer();
while(matcher.find()) {
matcher.appendReplacement(sb, manipulateValue(matcher.group(1)));
}
matcher.appendTail(sb);
System.out.println(sb.toString());

This is a good use case for the appendReplacement and appendTail idiom:
Pattern p = Pattern.compile("<([^>]+)>");
Matcher m = p.matcher(input);
StringBuffer out = new StringBuffer():
while(m.find()) {
String value = m.group(1);
// calculate manipulatedValue
m.appendReplacement(out, Matcher.quoteReplacement(manipulatedValue));
}
m.appendTail(out);

Try using an escape character \\ to the regex.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Replace group 1 of Java regex with out replacing the entire regex - java

Related

Extract text with Java regex with reference support

Java regex to extract and replace by value

First and second tocen regex

Get an array of Strings matching a pattern from a String

replace substring using regex

Categories

Resources