I have a string that may look kind of like this: "aaaaffdddd" and want to replace characters that occur 3 times (or more) with [NUMBER_OF_CHARACTERS][ONE_TIME_THE_CHARACTER] - I am not very confident with RegEx, but I came up with "([A-z])(\1{2,})" to find exactly those. However, in javas String.replaceAll() I have no possibility to refer to the number of characters in a group (?) and if I use Matcher.appendReplace() and a StringBuffer I lose the rest of my string since the result should still include characters which do not occur 3 or more times.
The example above should encode to "4aff4d"
This is not easy as you cannot get # of matches in replacement part easily. Try this code:
Pattern pat = Pattern.compile("(?i)([A-Z])(?=\\1{2})");
String str = "aaaaffdddd";
Matcher mat = pat.matcher(str);
Map<String, Integer> charMap = new HashMap<>();
while(mat.find()) {
String key = mat.group();
if (!charMap.containsKey(key))
charMap.put(key, 3);
else
charMap.put(key, charMap.get(key)+1);
}
System.out.println("map " + charMap);
for (Entry<String, Integer> e: charMap.entrySet()) {
str = str.replaceAll(e.getKey() + "+", e.getValue() + e.getKey());
}
System.out.println(str);
OUTPUT:
map {d=4, a=4}
4aff4d
You can try this (not tested)
String str = "aaaaffdddd";
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile("([A-z])(\\1{2,})");
Matcher m = p.matcher(str);
while (m.find()) {
m.appendReplacement(sb, "" + (m.group(2).length() + 1) + m.group(1));
}
System.out.println(sb);
After using appendReplacement on a StringBuffer I had to call appendTail in order to rebuild the rest of the String. Thanks to Holger for his Suggestion!
Related
Input String
${abc.xzy}/demo/${ttt.bbb}
test${kkk.mmm}
RESULT
World/demo/Hello
testSystem
The text inside the curly brackets are keys to my properties. I want to replace those properties with run time values.
I can do the following to get the regex match but what should i put in the replace logic to change the ${..} matched with the respective run time value in the input string.
Pattern p = Pattern.compile("\\{([^}]*)\\}");
Matcher m = p.matcher(s);
while (m.find()) {
// replace logic comes here
}
An alternative may be using a third-party lib such as Apache Commons Text.
They have StringSubstitutor class looks very promising.
Map valuesMap = HashMap();
valuesMap.put("abc.xzy", "World");
valuesMap.put("ttt.bbb", "Hello");
valuesMap.put("kkk.mmm", "System");
String templateString = "${abc.xzy}/demo/${ttt.bbb} test${kkk.mmm}"
StringSubstitutor sub = new StringSubstitutor(valuesMap);
String resolvedString = sub.replace(templateString);
For more info check out Javadoc https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringSubstitutor.html
You may use the following solution:
String s = "${abc.xzy}/demo/${ttt.bbb}\ntest${kkk.mmm}";
Map<String, String> map = new HashMap<String, String>();
map.put("abc.xzy", "World");
map.put("ttt.bbb", "Hello");
map.put("kkk.mmm", "System");
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("\\$\\{([^{}]+)\\}").matcher(s);
while (m.find()) {
String value = map.get(m.group(1));
m.appendReplacement(result, value != null ? value : m.group());
}
m.appendTail(result);
System.out.println(result.toString());
See the Java demo online, output:
World/demo/Hello
testSystem
The regex is
\$\{([^{}]+)\}
See the regex demo. It matches a ${ string, then captures any 1+ chars other than { and } into Group 1 and then matches }. If Group 1 value is present in the Map as a key, the replacement is the key value, else, the matched text is pasted back where it was in the input string.
Your regex needs to include the dollar. Also making the inner group lazy is sufficient to not include any } in the resulting key String.
String regex = "\\$\\{(.+?)\\}";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
while (m.find()) {
String key = m.group(1); // This is your matching group (the one in braces).
String value = someMap.get(key);
s.replaceFirst(regex, value != null ? value : "missingKey");
m = p.matcher(s); // you could alternatively reset the existing Matcher, but just create a new one, for simplicity's sake.
}
You could streamline this, by extracting the cursor position, and doing the replacement yourself, for the string. But either way, you need to reset your matcher, because otherwise it will parse on the old String.
The_Cute_Hedgehog's answer is good, but includes a dependency.
Wiktor Stribiżew's answer is missing a special case.
My answer aim to using java build-in regex and try to improve from Wiktor Stribiżew's answer. (Improve in Java code only, the regex is Ok)
Improvements:
Using StringBuilder is faster than StringBuffer
Initial StringBuilder capable to (int)(s.length()*1.2), avoid relocating memory many times in case of large input template s.
Avoid the case of regex special characters make wrong result by appendReplacement (like "cost: $100"). You can fix this problem in Wiktor Stribiżew's code by escape $ character in the replacement String like this value.replaceAll("\\$", "\\\\\\$")
Here is the improved code:
String s = "khj${abc.xzy}/demo/${ttt.bbb}\ntest${kkk.mmm}{kkk.missing}string";
Map<String, String> map = new HashMap<>();
map.put("abc.xzy", "World");
map.put("ttt.bbb", "cost: $100");
map.put("kkk.mmm", "System");
StringBuilder result = new StringBuilder((int)(s.length()*1.2));
Matcher m = Pattern.compile("\\$\\{([^}]+)\\}").matcher(s);
int nonCaptureIndex = 0;
while (m.find()) {
String value = map.get(m.group(1));
if (value != null) {
int index = m.start();
if (index > nonCaptureIndex) {
result.append(s.substring(nonCaptureIndex, index));
}
result.append(value);
nonCaptureIndex = m.end();
}
}
result.append(s.substring(nonCaptureIndex, s.length()));
System.out.println(result.toString());
I have a regex pattern that will have only one group. I need to find texts in the input strings that follows the pattern and replace ONLY the match group 1. For example I have the regex pattern and the string to be applied on as shown below. The replacement string is "<---->"
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher("plan plans lander planitia");
The expected result is
plan p<--->s <--->der p<--->itia
I tried following approaches
String test = "plan plans lander planitia";
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher(test);
String result = "";
while(m.find()){
result = test.replaceAll(m.group(1),"<--->");
}
System.out.print(result);
This gives result as
p<---> p<--->s <--->der p<--->itia
Another approach
String test = "plan plans lander planitia";
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher(test);
String result = "";
while(m.find()){
result = test.replaceAll("\\w*(lan)\\w+","<--->");
}
System.out.print(result);
Result is
plan <---> <---> <--->
I have gone through this link. Here the part of the string before the match is always constant and is "foo" but in my case it varies. Also I have looked at this and this but I am unable to apply any on the solutions given to my present scenario.
Any help is appreciated
You need to use the following pattern with capturing groups:
(\w*)lan(\w+)
^-1-^ ^-2-^
and replace with $1<--->$2
See the regex demo
The point is that we use a capturing group around the parts that we want to keep and just match what we want to discard.
Java demo:
String str = "plan plans lander planitia";
System.out.println(str.replaceAll("(\\w*)lan(\\w+)", "$1<--->$2"));
// => plan p<--->s <--->der p<--->itia
If you need to be able to replace the Group 1 and keep the rest, you may use the replace callback method emulation with Matcher#appendReplacement:
String text = "plan plans lander planitia";
String pattern = "\\w*(lan)\\w+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group(0).replaceFirst(Pattern.quote(m.group(1)), "<--->"));
}
m.appendTail(sb); // append the rest of the contents
System.out.println(sb.toString());
// output => plan p<--->s <--->der p<--->itia
See another Java demo
Here, since we process a match by match, we should only replace the Group 1 contents once with replaceFirst, and since we replace the substring as a literal, we should Pattern.quote it.
To dynamically control the replacement value, use a find() loop with appendReplacement(), finalizing the result with appendTail().
That way you have full control of the replacement value. In your case, the pattern is the following, and you can get the positions indicated.
start(1)
↓ end(1)
↓ ↓
\\w*(lan)\\w+
↑ ↑
start() end()
You can then extract the values to keep.
String input = "plan plans lander planitia";
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher(input);
while (m.find())
m.appendReplacement(buf, input.substring(m.start(), m.start(1)) +
"<--->" +
input.substring(m.end(1), m.end()));
String output = m.appendTail(buf).toString();
System.out.println(output);
Output
plan p<--->s <--->der p<--->itia
If you don't like that it uses the original string, you can use the matched substring instead.
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher("plan plans lander planitia");
while (m.find()) {
String match = m.group();
int start = m.start();
m.appendReplacement(buf, match.substring(0, m.start(1) - start) +
"<--->" +
match.substring(m.end(1) - start, m.end() - start));
}
String output = m.appendTail(buf).toString();
While Wiktors explanation of the use of capturing groups is completely correct, you could avoid using them at all. The \\w* at the start of your pattern seems irrelevant, as you want to keep it anyways, so we can simply leave it out of the pattern. The check for a word-character after lan can be done using a lookahead, like (?=\w), so we actually only match lan in a pattern like "lan(?=\\w)" and can do a simple replace with "<--->" (or whatever you like).
I like others solutions. This is slightly optimalised bulletproof version:
public static void main (String [] args) {
int groupPosition = 1;
String replacement = "foo";
Pattern r = Pattern.compile("foo(bar)");
Matcher m = r.matcher("bar1234foobar1234bar");
StringBuffer sb = new StringBuffer();
while (m.find()) {
StringBuffer buf = new StringBuffer(m.group());
buf.replace(m.start(groupPosition)-m.start(), m.end(groupPosition)-m.start(), replacement);
m.appendReplacement(sb, buf.toString());
}
m.appendTail(sb);
System.out.println(sb.toString()); // result is "bar1234foofoo1234bar"
}
I have a long string let's say
I like this #computer and I want to buy it from #XXXMall.
I know the regular expression pattern is
Pattern tagMatcher = Pattern.compile("[#]+[A-Za-z0-9-_]+\\b");
Now i want to get all the hashtags in an array. How can i use this expression to get array of all hash tags from string something like
ArrayList hashtags = getArray(pattern, str)
You can write like?
private static List<String> getArray(Pattern tagMatcher, String str) {
Matcher m = tagMatcher.matcher(str);
List<String> l = new ArrayList<String>();
while(m.find()) {
String s = m.group(); //will give you "#computer"
s = s.substring(1); // will give you just "computer"
l.add(s);
}
return l;
}
Also you can use \\w- instead of A-Za-z0-9-_ making the regex [#]+[\\w]+\\b
This link would surely be helpful for achieving what you want.
It says:
The find() method searches for occurrences of the regular expressions
in the text passed to the Pattern.matcher(text) method, when the
Matcher was created. If multiple matches can be found in the text, the
find() method will find the first, and then for each subsequent call
to find() it will move to the next match.
The methods start() and end() will give the indexes into the text
where the found match starts and ends.
Example:
String text =
"This is the text which is to be searched " +
"for occurrences of the word 'is'.";
String patternString = "is";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
int count = 0;
while(matcher.find()) {
count++;
System.out.println("found: " + count + " : "
+ matcher.start() + " - " + matcher.end());
}
You got the hint now.
Here is one way, using Matcher
Pattern tagMatcher = Pattern.compile("#+[-\\w]+\\b");
Matcher m = tagMatcher.matcher(stringToMatch);
ArrayList<String> hashtags = new ArrayList<>();
while (m.find()) {
hashtags.add(m.group());
}
I took the liberty of simplifying your regex. # does not need to be in a character class. [A-Za-z0-9_] is the same as \w, so [A-Za-z0-9-_] is the same as [-\w]
You can use :
String val="I like this #computer and I want to buy it from #XXXMall.";
String REGEX = "(?<=#)[A-Za-z0-9-_]+";
List<String> list = new ArrayList<String>();
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(val);
while(matcher.find()){
list.add(matcher.group());
}
(?<=#) Positive Lookbehind - Assert that the character # literally be matched.
you can use the following code for getting the names
String saa = "#{akka}nikhil#{kumar}aaaaa";
Pattern regex = Pattern.compile("#\\{(.*?)\\}");
Matcher m = regex.matcher(saa);
while(m.find()) {
String s = m.group(1);
System.out.println(s);
}
It will print
akka
kumar
I have a string which contains many <xxx> values.
I want to retrive the value inside <>, do some manipulation and re-insert the new value into the string.
What I did is
input = This is <abc_d> a sample <ea1_j> input <lmk_02> string
while(input.matches(".*<.+[\S][^<]>.*"))
{
value = input.substring(input.indexOf("<") + 1, input.indexOf(">"));
//calculate manipulatedValue from value
input = input.replaceFirst("<.+>", manipulatedValue);
}
but after the first iteration, value contains abc_d> a sample <ea1_j> input <lmk_02. I believe indexOf(">") will give the first index of ">". Where did I go wrong?
This is a slightly easier way of accomplishing what you are trying to do:
String input = "This is <abc_d> a sample <ea1_j> input <lmk_02> string";
Matcher matcher = Pattern.compile("<([^>]*)>").matcher(input);
StringBuffer sb = new StringBuffer();
while(matcher.find()) {
matcher.appendReplacement(sb, manipulateValue(matcher.group(1)));
}
matcher.appendTail(sb);
System.out.println(sb.toString());
This is a good use case for the appendReplacement and appendTail idiom:
Pattern p = Pattern.compile("<([^>]+)>");
Matcher m = p.matcher(input);
StringBuffer out = new StringBuffer():
while(m.find()) {
String value = m.group(1);
// calculate manipulatedValue
m.appendReplacement(out, Matcher.quoteReplacement(manipulatedValue));
}
m.appendTail(out);
Try using an escape character \\ to the regex.
I am trying to find environment variables in input and replace them with values.
The pattern of env variable is ${\\.}
Pattern myPattern = Pattern.compile( "(${\\.})" );
String line ="${env1}sojods${env2}${env3}";
How can I replace env1 with 1 and env2 with 2 and env3 with 3, so
that after this I will have a new string 1sojods23?
Strings in Java are immutable, which makes this somewhat tricky if you are talking about an arbitrary number of things you need to find and replace.
Specifically you need to define your replacements in a Map, use a StringBuilder (before Java 9, less performant StringBuffer should have been used) and the appendReplacements() and appendTail() methods from Matcher. The final result will be stored in your StringBuilder (or StringBuffer).
Map<String, String> replacements = new HashMap<String, String>() {{
put("${env1}", "1");
put("${env2}", "2");
put("${env3}", "3");
}};
String line ="${env1}sojods${env2}${env3}";
String rx = "(\\$\\{[^}]+\\})";
StringBuilder sb = new StringBuilder(); //use StringBuffer before Java 9
Pattern p = Pattern.compile(rx);
Matcher m = p.matcher(line);
while (m.find())
{
// Avoids throwing a NullPointerException in the case that you
// Don't have a replacement defined in the map for the match
String repString = replacements.get(m.group(1));
if (repString != null)
m.appendReplacement(sb, repString);
}
m.appendTail(sb);
System.out.println(sb.toString());
Output:
1sojods23
I know this is old, I was myself looking for a, appendReplacement/appendTail example when I found it; However, the OP's question doesn't need those complicated multi-line solutions I saw here.
In this exact case, when the string to replace holds itself the value we want to replace with, then this could be done easily with replaceAll:
String line ="${env1}sojods${env2}${env3}";
System.out.println( line.replaceAll("\\$\\{env([0-9]+)\\}", "$1") );
// Output => 1sojods23
DEMO
When the replacement is random based on some conditions or logic on each match, then you can use appendReplacement/appendTail for example
Hopefully you would find this code useful:
Pattern phone = Pattern.compile("\\$\\{env([0-9]+)\\}");
String line ="${env1}sojods${env2}${env3}";
Matcher action = phone.matcher(line);
StringBuffer sb = new StringBuffer(line.length());
while (action.find()) {
String text = action.group(1);
action.appendReplacement(sb, Matcher.quoteReplacement(text));
}
action.appendTail(sb);
System.out.println(sb.toString());
The output is the expected: 1sojods23.
This gives you 1sojods23:
String s = "${env1}sojods${env2}${env3}";
final Pattern myPattern = Pattern.compile("\\$\\{[^\\}]*\\}");
Matcher m = myPattern.matcher(s);
int i = 0;
while (m.find()) {
s = m.replaceFirst(String.valueOf(++i));
m = myPattern.matcher(s);
}
System.out.println(s);
and this works too:
final String re = "\\$\\{[^\\}]*\\}";
String s = "${env1}sojods${env2}${env3}";
int i = 0;
String t;
while (true) {
t = s.replaceFirst(re, String.valueOf(++i));
if (s.equals(t)) {
break;
} else {
s = t;
}
}
System.out.println(s);
You can use a StringBuffer in combination with the Matcher appendReplacement() method, but if the the pattern does not match, there is no point in creating the StringBuffer.
For example, here is a pattern that matches ${...}. Group 1 is the contents between the braces.
static Pattern rxTemplate = Pattern.compile("\\$\\{([^}\\s]+)\\}");
And here is sample function that uses that pattern.
private static String replaceTemplateString(String text) {
StringBuffer sb = null;
Matcher m = rxTemplate.matcher(text);
while (m.find()) {
String t = m.group(1);
t = t.toUpperCase(); // LOOKUP YOUR REPLACEMENT HERE
if (sb == null) {
sb = new StringBuffer(text.length());
}
m.appendReplacement(sb, t);
}
if (sb == null) {
return text;
} else {
m.appendTail(sb);
return sb.toString();
}
}
Map<String, String> replacements = new HashMap<String, String>() {
{
put("env1", "1");
put("env2", "2");
put("env3", "3");
}
};
String line = "${env1}sojods${env2}${env3}";
String rx = "\\$\\{(.*?)\\}";
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile(rx);
Matcher m = p.matcher(line);
while (m.find()) {
// Avoids throwing a NullPointerException in the case that you
// Don't have a replacement defined in the map for the match
String repString = replacements.get(m.group(1));
if (repString != null)
m.appendReplacement(sb, repString);
}
m.appendTail(sb);
System.out.println(sb.toString());
In the above example we can use map with just key and values --keys can be env1 ,env2 ..
Use groups once it is matched ${env1} will be your first group and then you use regex to replace what is in each group.
Pattern p = Pattern.compile("(${\\.})");
Matcher m = p.matcher(line);
while (m.find())
for (int j = 0; j <= m.groupCount(); j++)
//here you do replacement - check on the net how to do it;)