replace substring using regex - java

I have a string which contains many <xxx> values.
I want to retrive the value inside <>, do some manipulation and re-insert the new value into the string.
What I did is
input = This is <abc_d> a sample <ea1_j> input <lmk_02> string
while(input.matches(".*<.+[\S][^<]>.*"))
{
value = input.substring(input.indexOf("<") + 1, input.indexOf(">"));
//calculate manipulatedValue from value
input = input.replaceFirst("<.+>", manipulatedValue);
}
but after the first iteration, value contains abc_d> a sample <ea1_j> input <lmk_02. I believe indexOf(">") will give the first index of ">". Where did I go wrong?

This is a slightly easier way of accomplishing what you are trying to do:
String input = "This is <abc_d> a sample <ea1_j> input <lmk_02> string";
Matcher matcher = Pattern.compile("<([^>]*)>").matcher(input);
StringBuffer sb = new StringBuffer();
while(matcher.find()) {
matcher.appendReplacement(sb, manipulateValue(matcher.group(1)));
}
matcher.appendTail(sb);
System.out.println(sb.toString());

This is a good use case for the appendReplacement and appendTail idiom:
Pattern p = Pattern.compile("<([^>]+)>");
Matcher m = p.matcher(input);
StringBuffer out = new StringBuffer():
while(m.find()) {
String value = m.group(1);
// calculate manipulatedValue
m.appendReplacement(out, Matcher.quoteReplacement(manipulatedValue));
}
m.appendTail(out);

Try using an escape character \\ to the regex.

Related

Java regex to extract and replace by value

Input String
${abc.xzy}/demo/${ttt.bbb}
test${kkk.mmm}
RESULT
World/demo/Hello
testSystem
The text inside the curly brackets are keys to my properties. I want to replace those properties with run time values.
I can do the following to get the regex match but what should i put in the replace logic to change the ${..} matched with the respective run time value in the input string.
Pattern p = Pattern.compile("\\{([^}]*)\\}");
Matcher m = p.matcher(s);
while (m.find()) {
// replace logic comes here
}
An alternative may be using a third-party lib such as Apache Commons Text.
They have StringSubstitutor class looks very promising.
Map valuesMap = HashMap();
valuesMap.put("abc.xzy", "World");
valuesMap.put("ttt.bbb", "Hello");
valuesMap.put("kkk.mmm", "System");
String templateString = "${abc.xzy}/demo/${ttt.bbb} test${kkk.mmm}"
StringSubstitutor sub = new StringSubstitutor(valuesMap);
String resolvedString = sub.replace(templateString);
For more info check out Javadoc https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringSubstitutor.html
You may use the following solution:
String s = "${abc.xzy}/demo/${ttt.bbb}\ntest${kkk.mmm}";
Map<String, String> map = new HashMap<String, String>();
map.put("abc.xzy", "World");
map.put("ttt.bbb", "Hello");
map.put("kkk.mmm", "System");
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("\\$\\{([^{}]+)\\}").matcher(s);
while (m.find()) {
String value = map.get(m.group(1));
m.appendReplacement(result, value != null ? value : m.group());
}
m.appendTail(result);
System.out.println(result.toString());
See the Java demo online, output:
World/demo/Hello
testSystem
The regex is
\$\{([^{}]+)\}
See the regex demo. It matches a ${ string, then captures any 1+ chars other than { and } into Group 1 and then matches }. If Group 1 value is present in the Map as a key, the replacement is the key value, else, the matched text is pasted back where it was in the input string.
Your regex needs to include the dollar. Also making the inner group lazy is sufficient to not include any } in the resulting key String.
String regex = "\\$\\{(.+?)\\}";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
while (m.find()) {
String key = m.group(1); // This is your matching group (the one in braces).
String value = someMap.get(key);
s.replaceFirst(regex, value != null ? value : "missingKey");
m = p.matcher(s); // you could alternatively reset the existing Matcher, but just create a new one, for simplicity's sake.
}
You could streamline this, by extracting the cursor position, and doing the replacement yourself, for the string. But either way, you need to reset your matcher, because otherwise it will parse on the old String.
The_Cute_Hedgehog's answer is good, but includes a dependency.
Wiktor Stribiżew's answer is missing a special case.
My answer aim to using java build-in regex and try to improve from Wiktor Stribiżew's answer. (Improve in Java code only, the regex is Ok)
Improvements:
Using StringBuilder is faster than StringBuffer
Initial StringBuilder capable to (int)(s.length()*1.2), avoid relocating memory many times in case of large input template s.
Avoid the case of regex special characters make wrong result by appendReplacement (like "cost: $100"). You can fix this problem in Wiktor Stribiżew's code by escape $ character in the replacement String like this value.replaceAll("\\$", "\\\\\\$")
Here is the improved code:
String s = "khj${abc.xzy}/demo/${ttt.bbb}\ntest${kkk.mmm}{kkk.missing}string";
Map<String, String> map = new HashMap<>();
map.put("abc.xzy", "World");
map.put("ttt.bbb", "cost: $100");
map.put("kkk.mmm", "System");
StringBuilder result = new StringBuilder((int)(s.length()*1.2));
Matcher m = Pattern.compile("\\$\\{([^}]+)\\}").matcher(s);
int nonCaptureIndex = 0;
while (m.find()) {
String value = map.get(m.group(1));
if (value != null) {
int index = m.start();
if (index > nonCaptureIndex) {
result.append(s.substring(nonCaptureIndex, index));
}
result.append(value);
nonCaptureIndex = m.end();
}
}
result.append(s.substring(nonCaptureIndex, s.length()));
System.out.println(result.toString());

Regex lazy solution for java?

I have a string "hooRayNexTcapItaLnextcapitall"
I want to capture the first instance of "next" (NexT - in this case)
My soultion:
(.*)([nN][eE][xX][tT])([cC][aA][pP][iI][tT][aA][lL])(.*)
My solution group1 returns next instead of Next
How can I correct my regex to capture the first next instead of capturing the last next?
Edit 1:
Let me put my question properly,
If the string contains any combination of upper and lower case letters that spell "NextCapital", reverse the characters of the word "Next". Case should be preserved. If "NextCapital" occurs multiple times, only update the first occurrence.
So, I am using group to capture. But my group is capturing the last occurrence of "nextCapital" instead of first occurrence.
Ex:
Input: hooRayNexTcapItaLnextcapitall
output: hooRayTxeNcapItaLnextcapitall
Edit 2:
Please correct my code.
My java code:
Pattern ptn = Pattern.compile("(.*)([nN][eE][xX][tT])([cC][aA][pP][iI][tT][aA][lL])(.*)");
//sb = hooRayNexTcapItaLnextcapitall
Matcher mtc = ptn.matcher(sb);
StringBuilder c = new StringBuilder();
if(mtc.find()){
StringBuilder d = new StringBuilder();
StringBuilder e = new StringBuilder();
d.append(mtc.group(1));
e.append(mtc.group(2));
e.reverse();
d.append(e);
d.append(mtc.group(3));
d.append(mtc.group(4));
sb = d;
}
Your regex actually works if you get group 2. Test it here! Your regex does not need to be that complicated.
Your regex can just be this:
next
If you use Matcher.find and turn on CASE_INSENSITIVE option, you can find the first substring of the string that matches the pattern. Then, use group() to get the actual string:
Matcher matcher = Pattern.compile("next", Pattern.CASE_INSENSITIVE).matcher("hooRayNexTcapItaLnextcapitall");
if (matcher.find()) {
System.out.println(matcher.group());
}
EDIT:
After seeing your requirements, I wrote this code:
String input = "hooRayNexTcapItaLnextcapitall";
Matcher m = Pattern.compile("next(?=capital)", Pattern.CASE_INSENSITIVE).matcher(input);
if (m.find()) {
StringBuilder outputBuilder = new StringBuilder(input);
StringBuilder reverseBuilder = new StringBuilder(input.substring(m.start(), m.end()));
outputBuilder.replace(m.start(), m.end(), reverseBuilder.reverse().toString());
System.out.println(outputBuilder);
}
I used a lookahead to match next only if there is capital after it. After a match is found, I created a string builder with the input, and another string builder with the matched portion of the input. Then, I replaced the matched range with the reverse of the second string builder.
String target = "next";
int index = line.toLowerCase().indexOf(target);
if (index != -1) {
line = line.substring(index, index + target.length());
System.out.println(line);
} else {
System.out.println("Not Found");
}
This would be my first attempt which allows room for adjusting the desired String to locate.
Otherwise you may use this ReGeX solution to achieve the same effect:
Pattern pattern = Pattern.compile("(?i)next");
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
System.out.println(matcher.group());
}
The pattern "(?i)next" finds the substring matching "next" ignoring case.
Edit : This would reverse the order of the first occurrence of next.
String input = "hooRayNexTcapItaLnextcapitall";
String target = "nextcapital";
int index = input.toLowerCase().indexOf(target);
if (index != -1) {
String first = input.substring(index, index + target.length());
first = new StringBuilder(first.substring(0, 4)).reverse().toString() + first.substring(4, first.length());
input = input.substring(0, index) + first + input.substring(index + target.length(), input.length());
}
Edit Again : Here is a "fixed" form of your code.
String input = "hooRayNexTcapItaLnextcapitall";
Pattern ptn = Pattern.compile("([nN][eE][xX][tT])([cC][aA][pP][iI][tT][aA][lL])");
Matcher mtc = ptn.matcher(input);
if(mtc.find()){
StringBuilder d = new StringBuilder(mtc.group(1));
StringBuilder e = new StringBuilder(mtc.group(2));
input = input.replaceFirst(d.toString() + e.toString(), d.reverse().toString() + e.toString());
System.out.println(input);
}
Your regex is grabbing the second potential match for your group due to the default greedy nature of regex. Effectively, the first (.*) is grabbing as much as it can while still satisfying the rest of your regex.
To get what you intend, you can add a question mark to the first group, making it (.*?). This will make it non-greedy, grabbing the smallest string possible while still satisfying the rest of your regex.

JAVA Get text from String

Hi I get this String from server :
id_not="autoincrement"; id_obj="-"; id_tr="-"; id_pgo="-"; typ_not=""; tresc="Nie wystawił"; datetime="-"; lon="-"; lat="-";
I need to create a new String e.x String word and send a value which I get from String tresc="Nie wystawił"
Like #Jan suggest in comment you can use regex for example :
String str = "id_not=\"autoincrement\"; id_obj=\"-\"; id_tr=\"-\"; id_pgo=\"-\"; typ_not=\"\"; tresc=\"Nie wystawił\"; datetime=\"-\"; lon=\"-\"; lat=\"-\";";
Pattern p = Pattern.compile("tresc(.*?);");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group());
}
Output
tresc="Nie wystawił";
If you want to get only the value of tresc you can use :
Pattern p = Pattern.compile("tresc=\"(.*?)\";");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(1));
}
Output
Nie wystawił
Something along the lines of
Pattern p = Pattern.compile("tresc=\"([^\"]+)\");
Matcher m = p.matcher(stringFromServer);
if(m.find()) {
String whatYouWereLookingfor = m.group(1);
}
should to the trick. JSON parsing might be much better in the long run if you need additional values
Your question is unclear but i think you get a string from server and from that string you want the string/value for tresc. You can first search for tresc in the string you get. like:
serverString.substring(serverString.indexOf("tresc") + x , serverString.length());
Here replace x with 'how much further you want to pick characters.
Read on substring and delimiters
As values are separated by semicolon so annother solution could be:
int delimiter = serverstring.indexOf(";");
//in string thus giving you the index of where it is in the string
// Now delimiter can be -1, if lets say the string had no ";" at all in it i.e. no ";" is not found.
//check and account for it.
if (delimiter != -1)
String subString= serverstring.substring(5 , iend);
Here 5 means tresc is on number five in string, so it will five you tresc part.
You can then use it anyway you want.

Replace group 1 of Java regex with out replacing the entire regex

I have a regex pattern that will have only one group. I need to find texts in the input strings that follows the pattern and replace ONLY the match group 1. For example I have the regex pattern and the string to be applied on as shown below. The replacement string is "<---->"
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher("plan plans lander planitia");
The expected result is
plan p<--->s <--->der p<--->itia
I tried following approaches
String test = "plan plans lander planitia";
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher(test);
String result = "";
while(m.find()){
result = test.replaceAll(m.group(1),"<--->");
}
System.out.print(result);
This gives result as
p<---> p<--->s <--->der p<--->itia
Another approach
String test = "plan plans lander planitia";
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher(test);
String result = "";
while(m.find()){
result = test.replaceAll("\\w*(lan)\\w+","<--->");
}
System.out.print(result);
Result is
plan <---> <---> <--->
I have gone through this link. Here the part of the string before the match is always constant and is "foo" but in my case it varies. Also I have looked at this and this but I am unable to apply any on the solutions given to my present scenario.
Any help is appreciated
You need to use the following pattern with capturing groups:
(\w*)lan(\w+)
^-1-^ ^-2-^
and replace with $1<--->$2
See the regex demo
The point is that we use a capturing group around the parts that we want to keep and just match what we want to discard.
Java demo:
String str = "plan plans lander planitia";
System.out.println(str.replaceAll("(\\w*)lan(\\w+)", "$1<--->$2"));
// => plan p<--->s <--->der p<--->itia
If you need to be able to replace the Group 1 and keep the rest, you may use the replace callback method emulation with Matcher#appendReplacement:
String text = "plan plans lander planitia";
String pattern = "\\w*(lan)\\w+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group(0).replaceFirst(Pattern.quote(m.group(1)), "<--->"));
}
m.appendTail(sb); // append the rest of the contents
System.out.println(sb.toString());
// output => plan p<--->s <--->der p<--->itia
See another Java demo
Here, since we process a match by match, we should only replace the Group 1 contents once with replaceFirst, and since we replace the substring as a literal, we should Pattern.quote it.
To dynamically control the replacement value, use a find() loop with appendReplacement(), finalizing the result with appendTail().
That way you have full control of the replacement value. In your case, the pattern is the following, and you can get the positions indicated.
start(1)
↓ end(1)
↓ ↓
\\w*(lan)\\w+
↑ ↑
start() end()
You can then extract the values to keep.
String input = "plan plans lander planitia";
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher(input);
while (m.find())
m.appendReplacement(buf, input.substring(m.start(), m.start(1)) +
"<--->" +
input.substring(m.end(1), m.end()));
String output = m.appendTail(buf).toString();
System.out.println(output);
Output
plan p<--->s <--->der p<--->itia
If you don't like that it uses the original string, you can use the matched substring instead.
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher("plan plans lander planitia");
while (m.find()) {
String match = m.group();
int start = m.start();
m.appendReplacement(buf, match.substring(0, m.start(1) - start) +
"<--->" +
match.substring(m.end(1) - start, m.end() - start));
}
String output = m.appendTail(buf).toString();
While Wiktors explanation of the use of capturing groups is completely correct, you could avoid using them at all. The \\w* at the start of your pattern seems irrelevant, as you want to keep it anyways, so we can simply leave it out of the pattern. The check for a word-character after lan can be done using a lookahead, like (?=\w), so we actually only match lan in a pattern like "lan(?=\\w)" and can do a simple replace with "<--->" (or whatever you like).
I like others solutions. This is slightly optimalised bulletproof version:
public static void main (String [] args) {
int groupPosition = 1;
String replacement = "foo";
Pattern r = Pattern.compile("foo(bar)");
Matcher m = r.matcher("bar1234foobar1234bar");
StringBuffer sb = new StringBuffer();
while (m.find()) {
StringBuffer buf = new StringBuffer(m.group());
buf.replace(m.start(groupPosition)-m.start(), m.end(groupPosition)-m.start(), replacement);
m.appendReplacement(sb, buf.toString());
}
m.appendTail(sb);
System.out.println(sb.toString()); // result is "bar1234foofoo1234bar"
}

Increment digit inside string

Hi i want to increment the integer values in between the string.
for example the initial string is -- m1p1b1.
The below code is working correctly, But it has one problem.
When the string is m10p10b10 it gives the result m21p21b21 not m11p11b11.
Also the integer length between the string dynamic, So i cant do any static code.
Pattern digitPattern = Pattern.compile("(\\d)");
Matcher matcher = digitPattern.matcher("m1p1b1");
StringBuffer result = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(result, String.valueOf(Integer.parseInt(matcher.group(1)) + 1));
}
matcher.appendTail(result);
System.out.println(result.toString());
Change \\d to \\d+ to match one or more digits:
Pattern digitPattern = Pattern.compile("\\d+");
Matcher matcher = digitPattern.matcher("m10p10b10");
StringBuffer result = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(result, String.valueOf(Integer.parseInt(matcher.group(0)) + 1));
}
matcher.appendTail(result);
System.out.println(result.toString()); // => m11p11b11
See the IDEONE demo
Note you do not have to capture the whole pattern with (...), you can access the value using matcher.group(0).

Categories

Resources