Java repetitive pattern matching - java

I am trying to get each of the repetitive matches of a simple regular expression in Java:
(\\[[^\\[]*\\])*
which matches any string enclosed in [], as long as it does not contain the [ character. For example, it would match
[a][nice][repetitive][pattern]
There is no prior knowledge of how many such groups exist and I cannot find a way of accessing the individual matching groups via a pattern matcher, i.e. can't get
[a], [nice], [repetitive], [pattern]
(or, even better, the text without the brackets), in 4 different strings.
Using pattern.matcher() I always get the last group.
Surely there must be a simple way of doing this in Java, which I am missing?
Thanks for any help.

while (matcher.find()) {
System.out.println(matcher.group(1));
}
http://download.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#find%28%29

String string = "[a][nice][repetitive][pattern]";
String regexp = "\\[([^\\[]*)\\]";
Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}

I would use split
String string = "[a][nice][repetitive][pattern]";
String[] words = string.substring(1, string.length()-1).split("\\]\\[");
System.out.println(Arrays.toString(words));
prints
[a, nice, repetitive, pattern]

Here's my attempt :)
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Foo {
public static void main(String[] args) {
final String text = "[a][nice][repetitive][pattern]";
System.out.println(getStrings(text)); // Prints [a, nice, repetitive, pattern]
}
private static final Pattern pattern = Pattern.compile("\\[([^\\]]+)]");
public static List<String> getStrings(final String text) {
final List<String> strings = new ArrayList<String>();
final Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
strings.add(matcher.group(1));
}
return strings;
}
}

Related

Regex to capture comma separated groups of text in parentheses [Java]

I have a string that contains one or more (comma-separated) values, surrounded by quotes and enclosed in parentheses. So it can be of the type os IN ('WIN', 'MAC', 'LNU') (for multiple values) or just os IN ('WIN') for a single value.
I need to extract the values in a List.
I have tried this regex, but it captures all the values into one single list element as one whole String as 'WIN', 'MAC', instead of two String values of WIN and MAC -
List<String> matchList = new ArrayList<>();
Pattern regex = Pattern.compile("\\((.+?)\\)");
Matcher regexMatcher = regex.matcher(processedFilterString);
while (regexMatcher.find()) {//Finds Matching Pattern in String
matchList.add(regexMatcher.group(1));//Fetching Group from String
}
Result:
Input: os IN ('WIN', 'MAC')
Output:
['WIN', 'MAC']
length: 1
In it's current form, the regex matches one or more characters surrounded by parentheses and captures them in a group, which is probably why the result is just one string. How can I adapt it to capture each of the values separately?
Edit - Just adding some more details. The input string can have multiple IN clauses containing other criteria, such as id IN ('xxxxxx') AND os IN ('WIN', 'MAC'). Also, the length of the matched characters is not necessarily the same, so it could be - os IN ('WIN', 'MAC', 'LNUX').
You may try splitting the CSV string from the IN clause:
List<String> matchList = null;
Pattern regex = Pattern.compile("\\((.+?)\\)");
Matcher regexMatcher = regex.matcher(processedFilterString);
if (regexMatcher.find()) {
String match = regexMatcher.group(1).replaceAll("^'|'$", "");
String[] terms = match.split("'\\s*,\\s*'");
matchList = Arrays.stream(terms).collect(Collectors.toList());
}
Note that if your input string could contain multiple IN clauses, then the above would need to be modified to use a while loop.
What I see from the examples in your question, your regular expression needs to find strings of at least three upper-case letters enclosed in single quotes.
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Solution {
public static void main(String[] args) {
String s = "os IN ('WIN', 'MAC', 'LNUX')";
Pattern pattern = Pattern.compile("'([A-Z]{3,})'");
Matcher matcher = pattern.matcher(s);
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group(1));
}
System.out.println(list);
}
}
Running the above code produces the following output:
[WIN, MAC, LNUX]

Java get text between the last square brackets in a String with regex

I need to extract the text between the last brackets of a string. This is how it looks like:
String text= "[text1][text2][text3][text4]";
I need to get
String result = "text4"
I have tried with Regex but i can't manage to make it work. I would appreciate some help with getting the regex and the substring. Thank you very much
Use the regex, .+(\[.+\])$ and capture group(1).
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String text = "[text1][text2][text3][text4]";
Matcher matcher = Pattern.compile(".+(\\[.+\\])$").matcher(text);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
}
}
Output:
[text4]
Explanation of the regex at regex101:
You don't need regex. You can use lastIndexOf
String text= "[text1][text2][text3][text4]";
System.out.println(text.substring(text.lastIndexOf("[")+1, text.lastIndexOf("]")));

java regex find match between commas

I am trying to find a match between commas if it contains a specific string.
so far i have ,(.*?myString.?*),
Obviously this finds all the input between the first comma in the entire input and the first comma after the string i want. How do i reference the comma immediately before the string that i want?
Edit: i also want to find the match that occurs after a specific set of characters
ie. occurs after (fooo)
dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa,dsfoooafd,safdsa,gfhhhgdtheMatchfhhhfd,dsafdsa
returns gfhhhgdtheMatchfhhhfd, not gfdsgdtheMatchfdsgfd
The following regex should do it :
[^,]+theMatch.*?(?=,)
see regex demo / explanation
Java ( demo )
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class RegEx {
public static void main(String[] args) {
String s = "dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa";
String r = "[^,]+theMatch.*?(?=,)";
Pattern p = Pattern.compile(r);
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group()); // gfdsgdtheMatchfdsgfd
}
}
}
Edit
use this regex fooo.*?([^,]+theMatch.*?)(?=,) demo
You are finding too much because .* will include the comma.
You need the following regular expression: ,([^,]*myinput[^,]*),
[^,]* basically says find all non-comma characters.
I would suggest the following code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,myinput,dsafdsa";
Pattern p = Pattern.compile(",([^,]*myinput[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group(0));
// prints out ",myinput,"
System.out.println(m.group(1));
// prints out "myinput"
}
}
}
Here is a StackOverflow question that is basically the same with some very good answers associated:
Regex to find internal match between two characters
For more on regular expressions in Java look here: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
If you want the position of the comma proceeding your input string use the following code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,myinput,dsafdsa";
Pattern p = Pattern.compile(",([^,]*myinput[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(str.indexOf(m.group(0)));
// prints out "16"
}
}
}
By feeding the match of the regular expression into the String Method indexOf( you are able to locate the position of the start of your string.
Edit:
To find the occurrence of a string following another string, simply modify the regex to: fooo.*,([^,]*theMatch[^,]*),
fooo.* will greedily consume all characters between fooo and the start of your match.
Example code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa,dsfoooafd,safdsa,gfhhhgdtheMatchfhhhfd,dsafdsa";
Pattern p = Pattern.compile("fooo.*,([^,]*theMatch[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group(1));
// prints out: gfhhhgdtheMatchfhhhfd
}
}
}
The usual approach is to use a pattern that cannot match your delimiter in place of .. In this case, you need that only at the front of the pattern; you can use a reluctant quantifier at the back as you already do (though you've misspelled it). For example:
,([^,]*myString.*?),

Replace a regular expression with another regex

I want to replace some regex with regex in java for e.g.
Requirement:
Input: xyxyxyP
Required Output : xyzxyzxyzP
means I want to replace "(for)+\{" to "(for\{)+\{" . Is there any way to do this?
I have tried the following code
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class ReplaceDemo2 {
private static String REGEX = "(xy)+P";
private static String INPUT = "xyxyxyP";
private static String REGEXREPLACE = "(xyz)+P";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
INPUT = m.replaceAll(REGEXREPLACE);
System.out.println(INPUT);
}
}
but the output is (xyz)+P .
You can achieve it with a \G based regex:
String s = "xyxyxyP";
String pattern = "(?:(?=xy)|(?!^)\\G)xy(?=(?:xy)*P)";
System.out.println(s.replaceAll(pattern, "$0z"));
See a regex demo and an IDEONE demo.
In short, the regex matches:
(?:(?=xy)|(?!^)\\G) - either a location followed with xy ((?=xy)) or the location after the previous successful match ((?!^)\\G)
xy - a sequence of literal characters xy but only if followed with...
(?=(?:xy)*P) - zero or more sequences of xy (due to (?:xy)*) followed with a P.

Extract substring that appears after certain pattern

I need to extract a substring that appears after a certain pattern in the input string. I have been trying various combinations but not getting expected output.
The input string can be in following 2 forms
1. 88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507 BISCAYNE BLVD STE
2. 88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507
I need to write a regex that will be applicable to above 2 variations and extract '149IF1007JMO2507' part that follows 'SNDR REF:'.
Please find below sample program that i have written.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTester {
private static final String input = "88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507 BISCAYNE BLVD STE";
private static Pattern pattern = Pattern.compile(".*SNDR REF:(.*?)(\\s.)*");
private static Matcher matcher = pattern.matcher(input);
public static void main (String[] args) {
if (matcher.matches()) {
System.out.println(matcher.group(1));
}
}
}
Output:149IF1007JMO2507 BISCAYNE BLVD STE
I want output to be '149IF1007JMO2507'
Thank you.
You can use the following idiom to find your sub-string:
String[] examples = {
"88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507 BISCAYNE BLVD STE",
"88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507"
};
// ┌ look-behind for "SNDR REF:"
// | ┌ anything, reluctantly quantified
// | | ┌ lookahead for
// | | | whitespace or end of input
Pattern p = Pattern.compile("(?<=SNDR\\sREF:).+?(?=\\s|$)");
// iterating examples
for (String s: examples) {
Matcher m = p.matcher(s);
// iterating single matches (one per example here)
while (m.find()) {
System.out.printf("Found: %s%n", m.group());
}
}
Output
Found: 149IF1007JMO2507
Found: 149IF1007JMO2507
Note
I expect you don't know in advance it's going to be "149IF1007JMO2507", hence the contextual matching.
You can use this regexp:
private static Pattern pattern = Pattern.compile(".*SNDR REF:([^\\s]+).*");
This will take everything after "SNDR REF
You can do it with replaceAll
str = str.replaceAll(".*(REF:(\\S+)).*", "$2");

Categories

Resources