I'm a beginner using regex, I have Strings like String1= "DELIVERY 'text1' 'text2'" and string2="DELIVERY 'text1'", I want to extract "text1". I tried this pattern
Pattern p = Pattern.compile("^DELIVERY\\s'(.*)'");
Matcher m2 = p.matcher(string);
if (m2.find()) {
System.out.println(m2.group(1));
}
the result was : text1' 'text2 for the 1st string and text1 for the second
i tried this too
Pattern p = Pattern.compile("^DELIVERY\\s'(.*)'\\s'(.*)'");
Matcher m2 = p.matcher(string);
if (m2.find()) {
System.out.println(m2.group(1));
}
it return a result only for String1
Your first attempt was almost right. Just replace:
.*
With:
.*?
This makes the operator "non-greedy", so it will "swallow up" as little matched text as possible.
Your regex .* is "greedy", and consumes as much input as possible yet still match, so it will consume everything from the first to the last quote.
Instead use a relictant version by adding ?, ie .*? to costume as little as possible yet still match, which won't skip iver a quote.
Combine this change with some java Kung Fu and you can do it all in one line:
String quoted = str.replaceAll(".*DELIVERY\\s'(.*?)'.*", "$1");
if you only want to have 'text1', try this regex:
"DELIVERY '([^']*)"
or without grouping:
"(?<=DELIVERY ')[^']*"
Related
I was trying to replace concatenation symbol '+' with '||' in given multi-line script, however it seems that java regex just replaces 1 occurrence, instead of all.
String ss="A+B+C+D";
Matcher mm=Pattern.compile("(?imc)(.+)\\s*\\+\\s*(.+)").matcher(ss);
while(mm.find())
{
System.out.println(mm.group(1));
System.out.println(mm.group(2));
ss=mm.replaceAll("$1 \\|\\| $2");
}
System.out.println(ss); // Output: A+B+C||D, Expected: A||B||C||D
The reason you only replace one element, is because you match the entire line. The regular expression you use "(?imc)(.+)\\s*\\+\\s*(.+)", matches anything (.+) until the end, then reverts, so it can match the rest \\s*\\+.... So basically your group 1 is .+ almost everything, but the last + and beyond. Therefore replaceAll can only match once, and will terminate after that one replacement.
What you need is a replacement that finds + optionally wrapped in spaces:
Pattern.compile("(?imc)\\s*\\+\\s*");
This should match all you want to match, and does not match the entire line, but only your replacement character.
You could just use:
ss = ss.replaceAll("\\+", "||")
as #ernest_k has pointed out. If you really want to continue using a matcher with iteration, then use Matcher#appendReplacement with a StringBuffer:
String ss = "A+B+C+D";
Matcher mm = Pattern.compile("\\+").matcher(ss);
StringBuffer sb = new StringBuffer();
while (mm.find()) {
mm.appendReplacement(sb, "||");
}
mm.appendTail(sb);
System.out.println(sb);
I thing maybe we would just need a simple string replace:
Demo
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "\\+";
final String string = "A+B+C+D";
final String subst = "||";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(subst);
System.out.println(result);
This link on the right panel explains your original expression. The first capturing group does match between one and unlimited times, as many times as possible, thus it would not work here. If we would have changed them to (.+?), it would have partially worked, yet still unnecessary.
I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!
Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().
[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.
String.matches returns whether the whole string matches the regex, not just any substring.
java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*
Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}
I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).
Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.
you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}
You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);
I have this code and I want to find both 1234 and 4321 but currently I can only get 4321. How could I fix this problem?
String a = "frankabc123 1234 frankabc frankabc123 4321 frankabc";
String rgx = "frank.* ([0-9]*) frank.*";
Pattern patternObject = Pattern.compile(rgx);
Matcher matcherObject = patternObject.matcher(a);
while (matcherObject.find()) {
System.out.println(matcherObject.group(1));
}
Your regex is too greedy. Make it non-greedy.
String rgx = "frank.*? ([0-9]+) frank";
Your r.e. is incorrect. The first part: frank.* matches everything and then backtracks until the rest of the match succeeds. Try this instead:
String rgx = "frank.*? ([0-9]*) frank";
The ? after the quantifier will make it reluctant, matching as few characters as necessary for the rest of the pattern to match. The trailing .* is also causing problems (as nhahtdh pointed out in a comment).
I am using java to do a regular expression match. I am using rubular to verify the match and ideone to test my code.
I got a regex from this SO solution , and it matches the group as I want it to in rubular, but my implementation in java is not matching. When it prints 'value', it is printing the value of commaSeparatedString and not matcher.group(1) I want the captured group/output of println to be "v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso"
String commaSeparatedString = "Vtest7,v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso";
//match everything after first comma
String myRegex = ",(.*)";
Pattern pattern = Pattern.compile(myRegex);
Matcher matcher = pattern.matcher(commaSeparatedString);
String value = "";
if (matcher.matches())
value = matcher.group(1);
else
value = commaSeparatedString;
System.out.println(value);
(edit: I left out that commaSeparatedString will not always contain 2 commas. Rather, it will always contain 0 or more commas)
If you don't have to solve it with regex, you can try this:
int size = commaSeparatedString.length();
value = commaSeparatedString.substring(commaSeparatedString.indexOf(",")+1,size);
Namely, the code above returns the substring which starts from the first comma's index.
EDIT:
Sorry, I've omitted the simpler version. Thanks to one of the commentators, you can use this single line as well:
value = commaSeparatedString.substring( commaSeparatedString.indexOf(",") );
The definition of the regex is wrong. It should be:
String myRegex = "[^,]*,(.*)";
You are yet another victim of Java's misguided regex method naming.
.matches() automatically anchors the regex at the beginning and end (which is in total contradiction with the very definition of "regex matching"). The method you are looking for is .find().
However, for such a simple problem, it is better to go with #DelShekasteh's solution.
I would do this like
String commaSeparatedString = "Vtest7,v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso";
System.out.println(commaSeparatedString.substring(commaSeparatedString.indexOf(",")+1));
Here is another approach with limited split
String[] spl = "Vtest7,v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso".split(",", 2);
if (spl.length == 2)
System.out.println(spl[1]);
Byt IMHO Del's answer is best for your case.
I would use replaceFirst
String commaSeparatedString = "Vtest7,v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso";
System.out.println(commaSeparatedString.replaceFirst(".*?,", ""));
prints
v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso
or you could use the shorter but obtuse
System.out.println(commaSeparatedString.split(",", 2)[1]);
I have strings with parentheses and also escaped characters. I need to match against these characters and also delete them. In the following code, I use matches() and replaceAll() with the same regex, but the matches() returns false, while the replaceAll() seems to match just fine, because the replaceAll() executes and removes the characters. Can someone explain?
String input = "(aaaa)\\b";
boolean matchResult = input.matches("\\(|\\)|\\\\[a-z]+");
System.out.printf("matchResult=%s\n", matchResult);
String output = input.replaceAll("\\(|\\)|\\\\[a-z]+", "");
System.out.printf("INPUT: %s --> OUTPUT: %s\n", input, output);
Prints out:
matchResult=false
INPUT: (aaaa) --> OUTPUT: aaaa
matches matches the whole input, not part of it.
The regular expression \(|\)|\\[a-z]+ doesn't describe the whole word, but only parts of it, so in your case it fails.
What matches is doing has already been explained by Binyamin Sharet. I want to extend this a bit.
Java does not have a "findall" or a "g" modifier like other languages have it to get all matches at once.
The Java Matcher class knows only two methods to use a pattern against a string (without replacing it)
matches(): matches the whole string against the pattern
find(): returns the next match
If you want to get all things that fits your pattern, you need to use find() in a loop, something like this:
Pattern p = Pattern
.compile("\\(|\\)|\\\\[a-z]+");
Matcher m = p.matcher(text);
while(m.find()){
System.out.println(m.group(0));
}
or if you are only interested if your pattern exists in the string
if (m.find()) {
System.out.println(m.group());
} else {
System.out.println("not found");
}