Java regular expression and pattern - java

I have a outlook message with body. I need to get strings of certain pattern S15345,S15366 etc.
How can i achieve this in java?
I tried giving like below,
String array[] = body.split("[S[0-9]]");

In this case better way is to use Pattern Matcher with this regex S\d{5} or if the pattern can contain one or more digits you can use S\d+ instead
String body = ...
Pattern pattern = Pattern.compile("S\\d{5}");
Matcher matcher = pattern.matcher(body);
List<String> result = new ArrayList<>();
while (matcher.find()){
result.add(matcher.find());
}
If you are using Java 9+ you can use :
String body = ...
List<String> result = Pattern.compile("S\\d{5}")
.matcher(body)
.results()
.map(MatchResult::group)
.collect(Collectors.toList());

Related

Extract text with Java regex with reference support

I am having some problem writing a method in Java. It basically extracts text with matching pattern and returns ALL the extractions. It simply works just like java.util.regex.Matcher's find()/matches() then group() :
Matcher matcher = pattern.matcher(fileContent);
StringBuilder sb = new StringBuilder();
while(matcher.matches()) {
sb.append(matcher.group()).append("\n");
}
return sb.toString();
However, I would like the extractions to be formatted with the references(dollar sign,$) and literal-character-escaping (backslash,\) support, just like the replacement in Matcher.replaceAll(replacement)(Doc). For example:
fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
pattern = Pattern.compile("bb.*(.)(abb)");
extractionFormatter = "$1: $0, \\$$2";
The expected output would be:
a: bbcac aabb, $abb
b: bbccc babb, $abb
I hope you understand what I am trying to do. Do you know if there is any existing library/method that can achieve this without having me to reinvent the wheel?
You can use the results method from the Matcher class which returns a stream of MatchResults to first get all matches, get the results as string using MatchResult.group, replace now using the method String.replaceAll using the pattern as regex and your extractionFormatter as replacement and finally join all using new line:
String fileContent = "aaabbcac aabb\n" +
"bcbcbbccc babba";
Pattern pattern = Pattern.compile("bb.*(.)(abb)");
String extractionFormatter = "$1: $0, \\$$2";
String output = pattern.matcher(fileContent)
.results()
.map(MatchResult::group)
.map(s -> s.replaceAll(pattern.pattern(), extractionFormatter))
.collect(Collectors.joining(System.lineSeparator()));
System.out.println(output);
You can use String.replaceAll instead.
The thing to note is that if you want to get the desired output with capture groups, you would have to match (to remove) from the string that should not be there in the replacement.
Using a pattern that would give the desired output:
String fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
String pattern = "(?m)^.*?(bb\\S*).*(.)(abb).*$";
String extractionFormatter = "$2: $1 $2$3, \\$$3";
System.out.print(fileContent.replaceAll(pattern, extractionFormatter));
Output
a: bbcac aabb, $abb
b: bbccc babb, $abb
See a Java demo.
Or using the Stringbuilder, Matcher and the while loop:
String fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
String pat = "bb.*(.)(abb)";
Pattern pattern = Pattern.compile(pat);
Matcher matcher = pattern.matcher(fileContent);
String extractionFormatter = "$1: $0, \\$$2";
StringBuilder sb = new StringBuilder();
while(matcher.find()) {
sb.append(matcher.group().replaceAll(pat, extractionFormatter)).append("\n");
}
System.out.print(sb);
See a Java demo.

Need to extract the values that follow a pattern in the string in Java

I have a string as below:
String a= "member;range=12001-*: CN=marimar,OU=Employees,OU=Cisco
Users,DC=cisco,DC=com, CN=cadautel,OU=Employees,OU=Cisco
Users,DC=cisco,DC=com CN=rajaki,OU=Employees,OU=Cisco
Users,DC=cisco,DC=com";
I need to get the values of the CN attribute like 'marimar','cadautel,'rajaki' .
I have to use Java 7 to do that and hence I cannot use String.split() Can anybody help me out to come up with the logic.
Thanks!
String#split isn't the best tool for this job. Use a pattern matcher instead:
String a = "member;range=12001-*: CN=marimar,OU=Employees,OU=Cisco
Users,DC=cisco,DC=com, CN=cadautel,OU=Employees,OU=Cisco
Users,DC=cisco,DC=com CN=rajaki,OU=Employees,OU=Cisco
Users,DC=cisco,DC=com";
String pattern = "CN=([^,]+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(a);
while (m.find()) {
System.out.println("CN attribute: " + m.group(1) );
}
Demo

First and second tocen regex

How could I get the first and the second text in "" from the string?
I could do it with indexOf but this is really boring ((
For example I have a String for parse like: "aaa":"bbbbb"perhapsSomeOtherText
And I d like to get aaa and bbbbb with the help of Regex pattern - this will help me to use it in switch statement and will greatly simplify my app/
If all that you have is colon delimited string just split it:
String str = ...; // colon delimited
String[] parts = str.split(":");
Note, that split() receives regex and compilies it every time. To improve performance of your code you can use Pattern as following:
private static Pattern pColonSplitter = Pattern.compile(":");
// now somewhere in your code:
String[] parts = pColonSplitter.split(str);
If however you want to use pattern for matching and extraction of string fragments in more complicated cases, do it like following:
Pattert p = Patter.compile("(\\w+):(\\w+):");
Matcher m = p.matcher(str);
if (m.find()) {
String a = m.group(1);
String b = m.group(2);
}
Pay attention on brackets that define captured group.
Something like this?
Pattern pattern = Pattern.compile("\"([^\"]*)\"");
Matcher matcher = pattern.matcher("\"aaa\":\"bbbbb\"perhapsSomeOtherText");
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
aaa
bbbbb
String str = "\"aaa\":\"bbbbb\"perhapsSomeOtherText";
Pattern p = Pattern.compile("\"\\w+\""); // word between ""
Matcher m = p.matcher(str);
while(m.find()){
System.out.println(m.group().replace("\"", ""));
}
output:
aaa
bbbbb
there are several ways to do this
Use StringTokenizer or Scanner with UseDelimiter method

How split a string using regex pattern

How split a [0] like words from string using regex pattern.0 can replace any integer number.
I used regex pattern,
private static final String REGEX = "[\\d]";
But it returns string with [.
Spliting Code
Pattern p=Pattern.compile(REGEX);
String items[] = p.split(lure_value_save[0]);
You have to escape the brackets:
String REGEX = "\\[\\d+\\]";
Java doesn't offer an elegant solution to extract the numbers. This is the way to go:
Pattern p = Pattern.compile(REGEX);
String test = "[0],[1],[2]";
Matcher m = p.matcher(test);
List<String> matches = new ArrayList<String>();
while (m.find()) {
matches.add(m.group());
}

regular expression for matching a pattern in Pattern class in java

I have a string m="hell,hj;nk,.txt"
I want my string as string m="hellhjnk.txt"
I am using:
Pattern p=Pattern.compile("(\"([^\"]*)(\\.)([a-z]{1,4}[\"]))|'([^']+)(\\.)([a-z]{1,4})'");
It is working for double quotes and extension.
How it will work for removing space,comma,semicolon?
You could just do:
m = m.replaceAll("[,; ]","");
The Pattern class is used for matching. You can essentially do the same thing:
Pattern p = Pattern.compile("[;, ]");
String m = "hell,hj;nk,.txt";
Matcher matcher = p.matcher(m);
System.out.println(matcher.replaceAll(""));

Categories

Resources