I am having some problem writing a method in Java. It basically extracts text with matching pattern and returns ALL the extractions. It simply works just like java.util.regex.Matcher's find()/matches() then group() :
Matcher matcher = pattern.matcher(fileContent);
StringBuilder sb = new StringBuilder();
while(matcher.matches()) {
sb.append(matcher.group()).append("\n");
}
return sb.toString();
However, I would like the extractions to be formatted with the references(dollar sign,$) and literal-character-escaping (backslash,\) support, just like the replacement in Matcher.replaceAll(replacement)(Doc). For example:
fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
pattern = Pattern.compile("bb.*(.)(abb)");
extractionFormatter = "$1: $0, \\$$2";
The expected output would be:
a: bbcac aabb, $abb
b: bbccc babb, $abb
I hope you understand what I am trying to do. Do you know if there is any existing library/method that can achieve this without having me to reinvent the wheel?
You can use the results method from the Matcher class which returns a stream of MatchResults to first get all matches, get the results as string using MatchResult.group, replace now using the method String.replaceAll using the pattern as regex and your extractionFormatter as replacement and finally join all using new line:
String fileContent = "aaabbcac aabb\n" +
"bcbcbbccc babba";
Pattern pattern = Pattern.compile("bb.*(.)(abb)");
String extractionFormatter = "$1: $0, \\$$2";
String output = pattern.matcher(fileContent)
.results()
.map(MatchResult::group)
.map(s -> s.replaceAll(pattern.pattern(), extractionFormatter))
.collect(Collectors.joining(System.lineSeparator()));
System.out.println(output);
You can use String.replaceAll instead.
The thing to note is that if you want to get the desired output with capture groups, you would have to match (to remove) from the string that should not be there in the replacement.
Using a pattern that would give the desired output:
String fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
String pattern = "(?m)^.*?(bb\\S*).*(.)(abb).*$";
String extractionFormatter = "$2: $1 $2$3, \\$$3";
System.out.print(fileContent.replaceAll(pattern, extractionFormatter));
Output
a: bbcac aabb, $abb
b: bbccc babb, $abb
See a Java demo.
Or using the Stringbuilder, Matcher and the while loop:
String fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
String pat = "bb.*(.)(abb)";
Pattern pattern = Pattern.compile(pat);
Matcher matcher = pattern.matcher(fileContent);
String extractionFormatter = "$1: $0, \\$$2";
StringBuilder sb = new StringBuilder();
while(matcher.find()) {
sb.append(matcher.group().replaceAll(pat, extractionFormatter)).append("\n");
}
System.out.print(sb);
See a Java demo.
Related
I have a outlook message with body. I need to get strings of certain pattern S15345,S15366 etc.
How can i achieve this in java?
I tried giving like below,
String array[] = body.split("[S[0-9]]");
In this case better way is to use Pattern Matcher with this regex S\d{5} or if the pattern can contain one or more digits you can use S\d+ instead
String body = ...
Pattern pattern = Pattern.compile("S\\d{5}");
Matcher matcher = pattern.matcher(body);
List<String> result = new ArrayList<>();
while (matcher.find()){
result.add(matcher.find());
}
If you are using Java 9+ you can use :
String body = ...
List<String> result = Pattern.compile("S\\d{5}")
.matcher(body)
.results()
.map(MatchResult::group)
.collect(Collectors.toList());
I have a regex pattern that will have only one group. I need to find texts in the input strings that follows the pattern and replace ONLY the match group 1. For example I have the regex pattern and the string to be applied on as shown below. The replacement string is "<---->"
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher("plan plans lander planitia");
The expected result is
plan p<--->s <--->der p<--->itia
I tried following approaches
String test = "plan plans lander planitia";
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher(test);
String result = "";
while(m.find()){
result = test.replaceAll(m.group(1),"<--->");
}
System.out.print(result);
This gives result as
p<---> p<--->s <--->der p<--->itia
Another approach
String test = "plan plans lander planitia";
Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher(test);
String result = "";
while(m.find()){
result = test.replaceAll("\\w*(lan)\\w+","<--->");
}
System.out.print(result);
Result is
plan <---> <---> <--->
I have gone through this link. Here the part of the string before the match is always constant and is "foo" but in my case it varies. Also I have looked at this and this but I am unable to apply any on the solutions given to my present scenario.
Any help is appreciated
You need to use the following pattern with capturing groups:
(\w*)lan(\w+)
^-1-^ ^-2-^
and replace with $1<--->$2
See the regex demo
The point is that we use a capturing group around the parts that we want to keep and just match what we want to discard.
Java demo:
String str = "plan plans lander planitia";
System.out.println(str.replaceAll("(\\w*)lan(\\w+)", "$1<--->$2"));
// => plan p<--->s <--->der p<--->itia
If you need to be able to replace the Group 1 and keep the rest, you may use the replace callback method emulation with Matcher#appendReplacement:
String text = "plan plans lander planitia";
String pattern = "\\w*(lan)\\w+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group(0).replaceFirst(Pattern.quote(m.group(1)), "<--->"));
}
m.appendTail(sb); // append the rest of the contents
System.out.println(sb.toString());
// output => plan p<--->s <--->der p<--->itia
See another Java demo
Here, since we process a match by match, we should only replace the Group 1 contents once with replaceFirst, and since we replace the substring as a literal, we should Pattern.quote it.
To dynamically control the replacement value, use a find() loop with appendReplacement(), finalizing the result with appendTail().
That way you have full control of the replacement value. In your case, the pattern is the following, and you can get the positions indicated.
start(1)
↓ end(1)
↓ ↓
\\w*(lan)\\w+
↑ ↑
start() end()
You can then extract the values to keep.
String input = "plan plans lander planitia";
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher(input);
while (m.find())
m.appendReplacement(buf, input.substring(m.start(), m.start(1)) +
"<--->" +
input.substring(m.end(1), m.end()));
String output = m.appendTail(buf).toString();
System.out.println(output);
Output
plan p<--->s <--->der p<--->itia
If you don't like that it uses the original string, you can use the matched substring instead.
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher("plan plans lander planitia");
while (m.find()) {
String match = m.group();
int start = m.start();
m.appendReplacement(buf, match.substring(0, m.start(1) - start) +
"<--->" +
match.substring(m.end(1) - start, m.end() - start));
}
String output = m.appendTail(buf).toString();
While Wiktors explanation of the use of capturing groups is completely correct, you could avoid using them at all. The \\w* at the start of your pattern seems irrelevant, as you want to keep it anyways, so we can simply leave it out of the pattern. The check for a word-character after lan can be done using a lookahead, like (?=\w), so we actually only match lan in a pattern like "lan(?=\\w)" and can do a simple replace with "<--->" (or whatever you like).
I like others solutions. This is slightly optimalised bulletproof version:
public static void main (String [] args) {
int groupPosition = 1;
String replacement = "foo";
Pattern r = Pattern.compile("foo(bar)");
Matcher m = r.matcher("bar1234foobar1234bar");
StringBuffer sb = new StringBuffer();
while (m.find()) {
StringBuffer buf = new StringBuffer(m.group());
buf.replace(m.start(groupPosition)-m.start(), m.end(groupPosition)-m.start(), replacement);
m.appendReplacement(sb, buf.toString());
}
m.appendTail(sb);
System.out.println(sb.toString()); // result is "bar1234foofoo1234bar"
}
How could I get the first and the second text in "" from the string?
I could do it with indexOf but this is really boring ((
For example I have a String for parse like: "aaa":"bbbbb"perhapsSomeOtherText
And I d like to get aaa and bbbbb with the help of Regex pattern - this will help me to use it in switch statement and will greatly simplify my app/
If all that you have is colon delimited string just split it:
String str = ...; // colon delimited
String[] parts = str.split(":");
Note, that split() receives regex and compilies it every time. To improve performance of your code you can use Pattern as following:
private static Pattern pColonSplitter = Pattern.compile(":");
// now somewhere in your code:
String[] parts = pColonSplitter.split(str);
If however you want to use pattern for matching and extraction of string fragments in more complicated cases, do it like following:
Pattert p = Patter.compile("(\\w+):(\\w+):");
Matcher m = p.matcher(str);
if (m.find()) {
String a = m.group(1);
String b = m.group(2);
}
Pay attention on brackets that define captured group.
Something like this?
Pattern pattern = Pattern.compile("\"([^\"]*)\"");
Matcher matcher = pattern.matcher("\"aaa\":\"bbbbb\"perhapsSomeOtherText");
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
aaa
bbbbb
String str = "\"aaa\":\"bbbbb\"perhapsSomeOtherText";
Pattern p = Pattern.compile("\"\\w+\""); // word between ""
Matcher m = p.matcher(str);
while(m.find()){
System.out.println(m.group().replace("\"", ""));
}
output:
aaa
bbbbb
there are several ways to do this
Use StringTokenizer or Scanner with UseDelimiter method
I have a string which contains many <xxx> values.
I want to retrive the value inside <>, do some manipulation and re-insert the new value into the string.
What I did is
input = This is <abc_d> a sample <ea1_j> input <lmk_02> string
while(input.matches(".*<.+[\S][^<]>.*"))
{
value = input.substring(input.indexOf("<") + 1, input.indexOf(">"));
//calculate manipulatedValue from value
input = input.replaceFirst("<.+>", manipulatedValue);
}
but after the first iteration, value contains abc_d> a sample <ea1_j> input <lmk_02. I believe indexOf(">") will give the first index of ">". Where did I go wrong?
This is a slightly easier way of accomplishing what you are trying to do:
String input = "This is <abc_d> a sample <ea1_j> input <lmk_02> string";
Matcher matcher = Pattern.compile("<([^>]*)>").matcher(input);
StringBuffer sb = new StringBuffer();
while(matcher.find()) {
matcher.appendReplacement(sb, manipulateValue(matcher.group(1)));
}
matcher.appendTail(sb);
System.out.println(sb.toString());
This is a good use case for the appendReplacement and appendTail idiom:
Pattern p = Pattern.compile("<([^>]+)>");
Matcher m = p.matcher(input);
StringBuffer out = new StringBuffer():
while(m.find()) {
String value = m.group(1);
// calculate manipulatedValue
m.appendReplacement(out, Matcher.quoteReplacement(manipulatedValue));
}
m.appendTail(out);
Try using an escape character \\ to the regex.
If the source string contains the pattern, then replace it with something or remove it. One way to do it is to do something like this
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(sourceString);
while(m.find()){
String subStr = m.group().replaceAll('something',""); // remove the pattern sequence
String strPart1 = sourceString.subString(0,m.start());
String strPart2 = sourceString.subString(m.start()+1);
String resultingStr = strPart1+subStr+strPart2;
p.matcher(...);
}
But I want something like this
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(sourceString);
while(m.find()){
m.group.replaceAll(...);// change the group and it is source string is automatically updated
}
Is this possible?
Thanks
// change the group and it is source string is automatically updated
There is no way what so ever to change any string in Java, so what you're asking for is impossible.
To remove or replace a pattern with a string can be achieved with a call like
someString = someString.replaceAll(toReplace, replacement);
To transform the matched substring, as seems to be indicated by your line
m.group().replaceAll("something","");
the best solution is probably to use
A StringBuffer for the result
Matcher.appendReplacement and Matcher.appendTail.
Example:
String regex = "ipsum";
String sourceString = "lorem ipsum dolor sit";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(sourceString);
StringBuffer sb = new StringBuffer();
while (m.find()) {
// For example: transform match to upper case
String replacement = m.group().toUpperCase();
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
sourceString = sb.toString();
System.out.println(sourceString); // "lorem IPSUM dolor sit"
Assuming you want to replace all occurences of a certain pattern, try this:
String source = "aabbaabbaabbaa";
String result = source.replaceAll("aa", "xx"); //results in xxbbxxbbxxbbxx
Removing the pattern would then be:
String result = source.replaceAll("aa", ""); //results in bbbbbb