Extract substring that appears after certain pattern - java

I need to extract a substring that appears after a certain pattern in the input string. I have been trying various combinations but not getting expected output.
The input string can be in following 2 forms
1. 88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507 BISCAYNE BLVD STE
2. 88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507
I need to write a regex that will be applicable to above 2 variations and extract '149IF1007JMO2507' part that follows 'SNDR REF:'.
Please find below sample program that i have written.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTester {
private static final String input = "88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507 BISCAYNE BLVD STE";
private static Pattern pattern = Pattern.compile(".*SNDR REF:(.*?)(\\s.)*");
private static Matcher matcher = pattern.matcher(input);
public static void main (String[] args) {
if (matcher.matches()) {
System.out.println(matcher.group(1));
}
}
}
Output:149IF1007JMO2507 BISCAYNE BLVD STE
I want output to be '149IF1007JMO2507'
Thank you.

You can use the following idiom to find your sub-string:
String[] examples = {
"88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507 BISCAYNE BLVD STE",
"88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507"
};
// ┌ look-behind for "SNDR REF:"
// | ┌ anything, reluctantly quantified
// | | ┌ lookahead for
// | | | whitespace or end of input
Pattern p = Pattern.compile("(?<=SNDR\\sREF:).+?(?=\\s|$)");
// iterating examples
for (String s: examples) {
Matcher m = p.matcher(s);
// iterating single matches (one per example here)
while (m.find()) {
System.out.printf("Found: %s%n", m.group());
}
}
Output
Found: 149IF1007JMO2507
Found: 149IF1007JMO2507
Note
I expect you don't know in advance it's going to be "149IF1007JMO2507", hence the contextual matching.

You can use this regexp:
private static Pattern pattern = Pattern.compile(".*SNDR REF:([^\\s]+).*");
This will take everything after "SNDR REF

You can do it with replaceAll
str = str.replaceAll(".*(REF:(\\S+)).*", "$2");

Related

Java Regex capture nested matches

I am having trouble with regex here.
Say i have this input:
608094.21.1.2014.TELE.&BIG00Z.1.1.GBP
My regex looks like this
(\d\d\d\d\.\d?\d\.\d?\d)|(\d?\d\.\d?\d\.\d?\d?\d\d)
I want to extract the date 21.1.2014 out of the string, but all i get is
8094.21.1
I think my problem here is, that 21.1.2014 starts within the (wrong) match before. Is there a simple way to make the matcher look for the next match not after the end of the match before but one character after the beginning of the match before?
You could use a regex like this:
\d{1,2}\.\d{1,2}\.\d{4}
Working demo
Or shorten it and use:
(\d{1,2}\.){2}\d{4}
If the date is always surrounded by dot:
\.(\d\d\d\d\.\d?\d\.\d?\d|\d?\d\.\d?\d\.\d?\d?\d\d)\.
I hope this will help you.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String x = "608094.21.1.2014.TELE.&BIG00Z.1.1.GBP";
String pattern = "[0-9]{2}.[0-9]{1}.[0-9]{4}";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(x);
if (m.find( )) {
System.out.println("Found value: " + m.group() );
}else {
System.out.println("NO MATCH");
}
}

java regex find match between commas

I am trying to find a match between commas if it contains a specific string.
so far i have ,(.*?myString.?*),
Obviously this finds all the input between the first comma in the entire input and the first comma after the string i want. How do i reference the comma immediately before the string that i want?
Edit: i also want to find the match that occurs after a specific set of characters
ie. occurs after (fooo)
dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa,dsfoooafd,safdsa,gfhhhgdtheMatchfhhhfd,dsafdsa
returns gfhhhgdtheMatchfhhhfd, not gfdsgdtheMatchfdsgfd
The following regex should do it :
[^,]+theMatch.*?(?=,)
see regex demo / explanation
Java ( demo )
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class RegEx {
public static void main(String[] args) {
String s = "dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa";
String r = "[^,]+theMatch.*?(?=,)";
Pattern p = Pattern.compile(r);
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group()); // gfdsgdtheMatchfdsgfd
}
}
}
Edit
use this regex fooo.*?([^,]+theMatch.*?)(?=,) demo
You are finding too much because .* will include the comma.
You need the following regular expression: ,([^,]*myinput[^,]*),
[^,]* basically says find all non-comma characters.
I would suggest the following code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,myinput,dsafdsa";
Pattern p = Pattern.compile(",([^,]*myinput[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group(0));
// prints out ",myinput,"
System.out.println(m.group(1));
// prints out "myinput"
}
}
}
Here is a StackOverflow question that is basically the same with some very good answers associated:
Regex to find internal match between two characters
For more on regular expressions in Java look here: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
If you want the position of the comma proceeding your input string use the following code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,myinput,dsafdsa";
Pattern p = Pattern.compile(",([^,]*myinput[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(str.indexOf(m.group(0)));
// prints out "16"
}
}
}
By feeding the match of the regular expression into the String Method indexOf( you are able to locate the position of the start of your string.
Edit:
To find the occurrence of a string following another string, simply modify the regex to: fooo.*,([^,]*theMatch[^,]*),
fooo.* will greedily consume all characters between fooo and the start of your match.
Example code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa,dsfoooafd,safdsa,gfhhhgdtheMatchfhhhfd,dsafdsa";
Pattern p = Pattern.compile("fooo.*,([^,]*theMatch[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group(1));
// prints out: gfhhhgdtheMatchfhhhfd
}
}
}
The usual approach is to use a pattern that cannot match your delimiter in place of .. In this case, you need that only at the front of the pattern; you can use a reluctant quantifier at the back as you already do (though you've misspelled it). For example:
,([^,]*myString.*?),

Replace a regular expression with another regex

I want to replace some regex with regex in java for e.g.
Requirement:
Input: xyxyxyP
Required Output : xyzxyzxyzP
means I want to replace "(for)+\{" to "(for\{)+\{" . Is there any way to do this?
I have tried the following code
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class ReplaceDemo2 {
private static String REGEX = "(xy)+P";
private static String INPUT = "xyxyxyP";
private static String REGEXREPLACE = "(xyz)+P";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
INPUT = m.replaceAll(REGEXREPLACE);
System.out.println(INPUT);
}
}
but the output is (xyz)+P .
You can achieve it with a \G based regex:
String s = "xyxyxyP";
String pattern = "(?:(?=xy)|(?!^)\\G)xy(?=(?:xy)*P)";
System.out.println(s.replaceAll(pattern, "$0z"));
See a regex demo and an IDEONE demo.
In short, the regex matches:
(?:(?=xy)|(?!^)\\G) - either a location followed with xy ((?=xy)) or the location after the previous successful match ((?!^)\\G)
xy - a sequence of literal characters xy but only if followed with...
(?=(?:xy)*P) - zero or more sequences of xy (due to (?:xy)*) followed with a P.

How would I do this in Java Regex?

Trying to make a regex that grabs all words like lets just say, chicken, that are not in brackets. So like
chicken
Would be selected but
[chicken]
Would not. Does anyone know how to do this?
String template = "[chicken]";
String pattern = "\\G(?<!\\[)(\\w+)(?!\\])";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(template);
while (m.find())
{
System.out.println(m.group());
}
It uses a combination of negative look-behind and negative look-aheads and boundary matchers.
(?<!\\[) //negative look behind
(?!\\]) //negative look ahead
(\\w+) //capture group for the word
\\G //is a boundary matcher for marking the end of the previous match
(please read the following edits for clarification)
EDIT 1:
If one needs to account for situations like:
"chicken [chicken] chicken [chicken]"
We can replace the regex with:
String regex = "(?<!\\[)\\b(\\w+)\\b(?!\\])";
EDIT 2:
If one also needs to account for situations like:
"[chicken"
"chicken]"
As in one still wants the "chicken", then you could use:
String pattern = "(?<!\\[)?\\b(\\w+)\\b(?!\\])|(?<!\\[)\\b(\\w+)\\b(?!\\])?";
Which essentially accounts for the two cases of having only one bracket on either side. It accomplishes this through the | which acts as an or, and by using ? after the look-ahead/behinds, where ? means 0 or 1 of the previous expression.
I guess you want something like:
final Pattern UNBRACKETED_WORD_PAT = Pattern.compile("(?<!\\[)\\b\\w+\\b(?!])");
private List<String> findAllUnbracketedWords(final String s) {
final List<String> ret = new ArrayList<String>();
final Matcher m = UNBRACKETED_WORD_PAT.matcher(s);
while (m.find()) {
ret.add(m.group());
}
return Collections.unmodifiableList(ret);
}
Use this:
/(?<![\[\w])\w+(?![\w\]])/
i.e., consecutive word characters with no square bracket or word character before or after.
This needs to check both left and right for both a square bracket and a word character, else for your input of [chicken] it would simply return
hicke
Without look around:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class MatchingTest
{
private static String x = "pig [cow] chicken bull] [grain";
public static void main(String[] args)
{
Pattern p = Pattern.compile("(\\[?)(\\w+)(\\]?)");
Matcher m = p.matcher(x);
while(m.find())
{
String firstBracket = m.group(1);
String word = m.group(2);
String lastBracket = m.group(3);
if ("".equals(firstBracket) && "".equals(lastBracket))
{
System.out.println(word);
}
}
}
}
Output:
pig
chicken
A bit more verbose, sure, but I find it more readable and easier to understand. Certainly simpler than a huge regular expression trying to handle all possible combinations of brackets.
Note that this won't filter out input like [fence tree grass]; it will indicate that tree is a match. You cannot skip tree in that without a parser. Hopefully, this is not a case you need to handle.

Java repetitive pattern matching

I am trying to get each of the repetitive matches of a simple regular expression in Java:
(\\[[^\\[]*\\])*
which matches any string enclosed in [], as long as it does not contain the [ character. For example, it would match
[a][nice][repetitive][pattern]
There is no prior knowledge of how many such groups exist and I cannot find a way of accessing the individual matching groups via a pattern matcher, i.e. can't get
[a], [nice], [repetitive], [pattern]
(or, even better, the text without the brackets), in 4 different strings.
Using pattern.matcher() I always get the last group.
Surely there must be a simple way of doing this in Java, which I am missing?
Thanks for any help.
while (matcher.find()) {
System.out.println(matcher.group(1));
}
http://download.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#find%28%29
String string = "[a][nice][repetitive][pattern]";
String regexp = "\\[([^\\[]*)\\]";
Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
I would use split
String string = "[a][nice][repetitive][pattern]";
String[] words = string.substring(1, string.length()-1).split("\\]\\[");
System.out.println(Arrays.toString(words));
prints
[a, nice, repetitive, pattern]
Here's my attempt :)
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Foo {
public static void main(String[] args) {
final String text = "[a][nice][repetitive][pattern]";
System.out.println(getStrings(text)); // Prints [a, nice, repetitive, pattern]
}
private static final Pattern pattern = Pattern.compile("\\[([^\\]]+)]");
public static List<String> getStrings(final String text) {
final List<String> strings = new ArrayList<String>();
final Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
strings.add(matcher.group(1));
}
return strings;
}
}

Categories

Resources