Replace a regular expression with another regex - java

I want to replace some regex with regex in java for e.g.
Requirement:
Input: xyxyxyP
Required Output : xyzxyzxyzP
means I want to replace "(for)+\{" to "(for\{)+\{" . Is there any way to do this?
I have tried the following code
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class ReplaceDemo2 {
private static String REGEX = "(xy)+P";
private static String INPUT = "xyxyxyP";
private static String REGEXREPLACE = "(xyz)+P";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
INPUT = m.replaceAll(REGEXREPLACE);
System.out.println(INPUT);
}
}
but the output is (xyz)+P .

You can achieve it with a \G based regex:
String s = "xyxyxyP";
String pattern = "(?:(?=xy)|(?!^)\\G)xy(?=(?:xy)*P)";
System.out.println(s.replaceAll(pattern, "$0z"));
See a regex demo and an IDEONE demo.
In short, the regex matches:
(?:(?=xy)|(?!^)\\G) - either a location followed with xy ((?=xy)) or the location after the previous successful match ((?!^)\\G)
xy - a sequence of literal characters xy but only if followed with...
(?=(?:xy)*P) - zero or more sequences of xy (due to (?:xy)*) followed with a P.

Related

Java Regex capture nested matches

I am having trouble with regex here.
Say i have this input:
608094.21.1.2014.TELE.&BIG00Z.1.1.GBP
My regex looks like this
(\d\d\d\d\.\d?\d\.\d?\d)|(\d?\d\.\d?\d\.\d?\d?\d\d)
I want to extract the date 21.1.2014 out of the string, but all i get is
8094.21.1
I think my problem here is, that 21.1.2014 starts within the (wrong) match before. Is there a simple way to make the matcher look for the next match not after the end of the match before but one character after the beginning of the match before?
You could use a regex like this:
\d{1,2}\.\d{1,2}\.\d{4}
Working demo
Or shorten it and use:
(\d{1,2}\.){2}\d{4}
If the date is always surrounded by dot:
\.(\d\d\d\d\.\d?\d\.\d?\d|\d?\d\.\d?\d\.\d?\d?\d\d)\.
I hope this will help you.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String x = "608094.21.1.2014.TELE.&BIG00Z.1.1.GBP";
String pattern = "[0-9]{2}.[0-9]{1}.[0-9]{4}";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(x);
if (m.find( )) {
System.out.println("Found value: " + m.group() );
}else {
System.out.println("NO MATCH");
}
}

java regex find match between commas

I am trying to find a match between commas if it contains a specific string.
so far i have ,(.*?myString.?*),
Obviously this finds all the input between the first comma in the entire input and the first comma after the string i want. How do i reference the comma immediately before the string that i want?
Edit: i also want to find the match that occurs after a specific set of characters
ie. occurs after (fooo)
dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa,dsfoooafd,safdsa,gfhhhgdtheMatchfhhhfd,dsafdsa
returns gfhhhgdtheMatchfhhhfd, not gfdsgdtheMatchfdsgfd
The following regex should do it :
[^,]+theMatch.*?(?=,)
see regex demo / explanation
Java ( demo )
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class RegEx {
public static void main(String[] args) {
String s = "dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa";
String r = "[^,]+theMatch.*?(?=,)";
Pattern p = Pattern.compile(r);
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group()); // gfdsgdtheMatchfdsgfd
}
}
}
Edit
use this regex fooo.*?([^,]+theMatch.*?)(?=,) demo
You are finding too much because .* will include the comma.
You need the following regular expression: ,([^,]*myinput[^,]*),
[^,]* basically says find all non-comma characters.
I would suggest the following code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,myinput,dsafdsa";
Pattern p = Pattern.compile(",([^,]*myinput[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group(0));
// prints out ",myinput,"
System.out.println(m.group(1));
// prints out "myinput"
}
}
}
Here is a StackOverflow question that is basically the same with some very good answers associated:
Regex to find internal match between two characters
For more on regular expressions in Java look here: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
If you want the position of the comma proceeding your input string use the following code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,myinput,dsafdsa";
Pattern p = Pattern.compile(",([^,]*myinput[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(str.indexOf(m.group(0)));
// prints out "16"
}
}
}
By feeding the match of the regular expression into the String Method indexOf( you are able to locate the position of the start of your string.
Edit:
To find the occurrence of a string following another string, simply modify the regex to: fooo.*,([^,]*theMatch[^,]*),
fooo.* will greedily consume all characters between fooo and the start of your match.
Example code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa,dsfoooafd,safdsa,gfhhhgdtheMatchfhhhfd,dsafdsa";
Pattern p = Pattern.compile("fooo.*,([^,]*theMatch[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group(1));
// prints out: gfhhhgdtheMatchfhhhfd
}
}
}
The usual approach is to use a pattern that cannot match your delimiter in place of .. In this case, you need that only at the front of the pattern; you can use a reluctant quantifier at the back as you already do (though you've misspelled it). For example:
,([^,]*myString.*?),

Extract substring that appears after certain pattern

I need to extract a substring that appears after a certain pattern in the input string. I have been trying various combinations but not getting expected output.
The input string can be in following 2 forms
1. 88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507 BISCAYNE BLVD STE
2. 88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507
I need to write a regex that will be applicable to above 2 variations and extract '149IF1007JMO2507' part that follows 'SNDR REF:'.
Please find below sample program that i have written.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTester {
private static final String input = "88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507 BISCAYNE BLVD STE";
private static Pattern pattern = Pattern.compile(".*SNDR REF:(.*?)(\\s.)*");
private static Matcher matcher = pattern.matcher(input);
public static void main (String[] args) {
if (matcher.matches()) {
System.out.println(matcher.group(1));
}
}
}
Output:149IF1007JMO2507 BISCAYNE BLVD STE
I want output to be '149IF1007JMO2507'
Thank you.
You can use the following idiom to find your sub-string:
String[] examples = {
"88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507 BISCAYNE BLVD STE",
"88,TRN:2014091900217161 SNDR REF:149IF1007JMO2507"
};
// ┌ look-behind for "SNDR REF:"
// | ┌ anything, reluctantly quantified
// | | ┌ lookahead for
// | | | whitespace or end of input
Pattern p = Pattern.compile("(?<=SNDR\\sREF:).+?(?=\\s|$)");
// iterating examples
for (String s: examples) {
Matcher m = p.matcher(s);
// iterating single matches (one per example here)
while (m.find()) {
System.out.printf("Found: %s%n", m.group());
}
}
Output
Found: 149IF1007JMO2507
Found: 149IF1007JMO2507
Note
I expect you don't know in advance it's going to be "149IF1007JMO2507", hence the contextual matching.
You can use this regexp:
private static Pattern pattern = Pattern.compile(".*SNDR REF:([^\\s]+).*");
This will take everything after "SNDR REF
You can do it with replaceAll
str = str.replaceAll(".*(REF:(\\S+)).*", "$2");

regex for letters or numbers in brackets

I am using Java to process text using regular expressions. I am using the following regular expression
^[\([0-9a-zA-Z]+\)\s]+
to match one or more letters or numbers in parentheses one or more times. For instance, I like to match
(aaa) (bb) (11) (AA) (iv)
or
(111) (aaaa) (i) (V)
I tested this regular expression on http://java-regex-tester.appspot.com/ and it is working. But when I use it in my code, the code does not compile. Here is my code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Tester {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("^[\([0-9a-zA-Z]+\)\s]+");
String[] words = pattern.split("(a) (1) (c) (xii) (A) (12) (ii)");
String w = pattern.
for(String s:words){
System.out.println(s);
}
}
}
I tried to use \ instead of \ but the regex gave different results than what I expected (it matches only one group like (aaa) not multiple groups like (aaa) (111) (ii).
Two questions:
How can I fix this regex and be able to match multiple groups?
How can I get the individual matches separately (like (aaa) alone and then (111) and so on). I tried pattern.split but did not work for me.
Firstly, you want to escape any backslashes in the quotation marks with another backslash. The Regex will treat it as a single backslash. (E.g. call a word character \w in quotation marks, etc.)
Secondly, you got to finish the line that reads:
String w = pattern.
That line explains why it doesn't compile.
Here is my final solution to match the individual groups of letters/numbers in brackets that appear at the beginning of a line and ignore the rest
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Tester {
static ArrayList<String> listOfEnums;
public static void main(String[] args) {
listOfEnums = new ArrayList<String>();
Pattern pattern = Pattern.compile("^\\([0-9a-zA-Z^]+\\)");
String p = "(a) (1) (c) (xii) (A) (12) (ii) and the good news (1)";
Matcher matcher = pattern.matcher(p);
boolean isMatch = matcher.find();
int index = 0;
//once you find a match, remove it and store it in the arrayList.
while (isMatch) {
String s = matcher.group();
System.out.println(s);
//Store it in an array
listOfEnums.add(s);
//Remove it from the beginning of the string.
p = p.substring(listOfEnums.get(index).length(), p.length()).trim();
matcher = pattern.matcher(p);
isMatch = matcher.find();
index++;
}
}
}
1) Your regex is incorrect. You want to match individual groups of letters / numbers in brackets, and the current regex will match only a single string of one or more such groups. I.e. it will match
(abc) (def) (123)
as a single group rather than three separate groups.
A better regex that would match only up to the closing bracket would be
\([0-9a-zA-Z^\)]+\)
2) Java requires you to escape all backslashes with another backslash
3) The split() method will not do what you want. It will find all matches in your string then throw them away and return an array of what is left over. You want to use matcher() instead
Pattern pattern = Pattern.compile("\\([0-9a-zA-Z^\\)]+\\)");
Matcher matcher = pattern.matcher("(a) (1) (c) (xii) (A) (12) (ii)");
while (matcher.find()) {
System.out.println(matcher.group());
}

Java repetitive pattern matching

I am trying to get each of the repetitive matches of a simple regular expression in Java:
(\\[[^\\[]*\\])*
which matches any string enclosed in [], as long as it does not contain the [ character. For example, it would match
[a][nice][repetitive][pattern]
There is no prior knowledge of how many such groups exist and I cannot find a way of accessing the individual matching groups via a pattern matcher, i.e. can't get
[a], [nice], [repetitive], [pattern]
(or, even better, the text without the brackets), in 4 different strings.
Using pattern.matcher() I always get the last group.
Surely there must be a simple way of doing this in Java, which I am missing?
Thanks for any help.
while (matcher.find()) {
System.out.println(matcher.group(1));
}
http://download.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#find%28%29
String string = "[a][nice][repetitive][pattern]";
String regexp = "\\[([^\\[]*)\\]";
Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
I would use split
String string = "[a][nice][repetitive][pattern]";
String[] words = string.substring(1, string.length()-1).split("\\]\\[");
System.out.println(Arrays.toString(words));
prints
[a, nice, repetitive, pattern]
Here's my attempt :)
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Foo {
public static void main(String[] args) {
final String text = "[a][nice][repetitive][pattern]";
System.out.println(getStrings(text)); // Prints [a, nice, repetitive, pattern]
}
private static final Pattern pattern = Pattern.compile("\\[([^\\]]+)]");
public static List<String> getStrings(final String text) {
final List<String> strings = new ArrayList<String>();
final Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
strings.add(matcher.group(1));
}
return strings;
}
}

Categories

Resources