Why does this pattern matching code not work? - java

I'm trying to do some pattern matching in Java:
Pattern p = Pattern.compile("(\\d+) (\\.+)");
Matcher m = p.matcher("5 soy milk");
String qty = m.group(1);
String name = m.group(2);
I want to end up with one string that contains "5" and one string that contains "soy milk". However, this pattern matching code gives me an IllegalStateException.

You have to call matches() before you attempt to get the groups.
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#matches()
public boolean matches()
Attempts to match the entire region against the pattern.
If the match succeeds then more information can be obtained via the start, end, and group methods.
Try this:
Pattern p = Pattern.compile("(\\d+) (\\.+)");
Matcher m = p.matcher("5 soy milk");
if (m.matches())
{
String qty = m.group(1);
String name = m.group(2);
}

This is because you don't initiate your Matcher. You should p.matcher(...).matches() (or .find(), or .lookingAt(), depending on the desired behaviour -- real regex matching is done with .find()).
And check the result of .matches() since in your case it returns false: \.+ ("\\.+" in a Java string) will try and match a dot one or more times; you should use .+ (".+" in a Java string) to match "any character, one or more times".

Related

Simple java pattern matching issue?

I would like to test if a string contains insert and name, with any interceding characters. And if it does, I would like to print the match.
For the below code, only the third Pattern matches, and the entire line is printed. How can I match only insert...name?
String x = "aaa insert into name sdfdf";
Matcher matcher = Pattern.compile("insert.*name").matcher(x);
if (matcher.matches())
System.out.print(matcher.group(0));
matcher = Pattern.compile(".*insert.*name").matcher(x);
if (matcher.matches())
System.out.print(matcher.group(0));
matcher = Pattern.compile(".*insert.*name.*").matcher(x);
if (matcher.matches())
System.out.print(matcher.group(0));
try to use group like this .*(insert.*name).*
Matcher matcher = Pattern.compile(".*(insert.*name).*").matcher(x);
if (matcher.matches()) {
System.out.print(matcher.group(1));
//-----------------------------^
}
Or in your case you can just use :
x = x.replaceAll(".*(insert.*name).*", "$1");
Both of them print :
insert into name
You just need to use find() instead of matches() in your code:
String x = "aaa insert into name sdfdf";
Matcher matcher = Pattern.compile("insert.*?name").matcher(x);
if (matcher.find())
System.out.print(matcher.group(0));
matches() expects you to match entire input string whereas find() lets you match your regex anywhere in the input.
Also suggest you to use .*? instead of .*, in case your input may contain multiple instances of index ... name pairs.
This code sample will output:
insert into name
Just use multiple positive lookaheads:
(?=.*insert)(?=.*name).+
See a demo on regex101.com.

Get a substring from string multiple times

I have a String that I don't know how long it is or what caracters are used in it.
I want to search in the string and get any substring found inside "" .
I tried to use pattern.compile but it always return an empty string
Pattern p = Pattern.compile("\".\"");
Matcher m = p.matcher(mystring);
while(m.find()){
System.out.println(m.group().toString());
}
How can I do it?
Use the .+? to get all characters inside "" with grouping
Pattern p = Pattern.compile("\".+?\"");
The .+ specifies that you want at least one or more characters inside the quotations. The ? specifies that it is a reluctant quantifier, which means it will put different quotations into different groups.
Unit test example:
#Test
public void test() {
String test = "speak \"friend\" and \"enter\"";
Pattern p = Pattern.compile("\".+?\"");
Matcher m = p.matcher(test);
while(m.find()){
System.out.println(m.group().toString().replace("\"", ""));
}
}
Output:
friend
enter
That is because your regex actually searches for one character between " and " ... if you want to search for more character, you should rewrite your regex to "\".?\""

Java Matcher Pattern issue

I am trying to extract everything that is after this string path /share/attachments/docs/. All my strings are starting with /share/attachments/docs/
For example: /share/attachments/docs/image2.png
Number of characters after ../docs/ is not static!
I tried with
Pattern p = Pattern.compile("^(.*)/share/attachments/docs/(\\d+)$");
Matcher m = p.matcher("/share/attachments/docs/image2.png");
m.find();
String link = m.group(2);
System.out.println("Link #: "+link);
But I am getting Exception that: No match found.
Strange because if I use this:
Pattern p = Pattern.compile("^(.*)ABC Results for draw no (\\d+)$");
Matcher m = p.matcher("ABC Results for draw no 2888");
then it works!!!
Also one thing is that in some very rare cases my string does not start with /share/attachments/docs/ and then I should not parse anything but that is not related directly to the issue, but it will be good to handle.
I am getting Exception that: No match found.
This is because image2.png doesn't match with \d+ use a more appropriate pattern like .+ assuming that you want to extract image2.png.
Your regular expression will then be ^(.*)/share/attachments/docs/(.+)$
In case of ABC Results for draw no 2888, the regexp ^(.*)ABC Results for draw no (\\d+)$ works because you have several successive digits at the end of your String while in the first case you had image2.png that is a mix of letters and digits which is the reason why there were no match found.
Generally speaking to avoid getting an IllegalStateException: No match found, you need first to check the result of find(), if it returns true the input String matches:
if (m.find()) {
// The String matches with the pattern
String link = m.group(2);
System.out.println("Draw #: "+link);
} else {
System.out.println("Input value doesn't match with the pattern");
}
The regular expression \d+ (expressed as \\d+ inside a string literal) matches a run of one or more digits. Your example input does not have a corresponding digit run, so it is not matched. The regex metacharacter . matches any character (+/- newline, depending on regex options); it seems like that may be what you're really after.
Additionally, when you use Matcher.find() it is unnecessary for the pattern to match the whole string, so it is needless to include .* to match leading context. Furthermore, find() returns a value that tells you whether a match to the pattern was found. You generally want to use this return value, and in your particular case you can use it to reject those rare non-matching strings.
Maybe this is more what you want:
Pattern p = Pattern.compile("/share/attachments/docs/(.+)$");
Matcher m = p.matcher("/share/attachments/docs/image2.png");
String link;
if (m.find()) {
link = m.group(1);
System.out.println("Draw #: " + link);
} else {
link = null;
System.out.println("Draw #: (not found)");
}

multiple regex matches in a string

i have the following text:
bla [string1] bli [string2]
I like to match string1 and string2 with regex in a loop in java.
Howto do ?
my code so far, which only matches the first string1, but not also string 2.
String sRegex="(?<=\\[).*?(?=\\])";
Pattern p = Pattern.compile(sRegex); // create the pattern only once,
Matcher m = p.matcher(sFormula);
if (m.find())
{
String sString1 = m.group(0);
String sString2 = m.group(1); // << no match
}
Your regex is not using any captured groups hence this call with throw exceptions:
m.group(1);
You can use just use:
String sRegex="(?<=\\[)[^]]*(?=\\])";
Pattern p = Pattern.compile(sRegex); // create the pattern only once,
Matcher m = p.matcher(sFormula);
while (m.find()) {
System.out.println( m.group() );
}
Also if should be replaced by while to match multiple times to return all matches.
Your approach is confused. Either write your regex so that it matches two [....] sequences in the one pattern, or call find multiple times. Your current attempt has a regex that "finds" just one [...] sequence.
Try something like this:
Pattern p = Pattern.compile("\\[([^\\]]+)]");
Matcher m = p.matcher(formula);
if (m.find()) {
String string1 = m.group(0);
if (m.find(m.end()) {
String string2 = m.group(0);
}
}
Or generalize using a loop and an array of String for the extracted strings.
(You don't need any fancy look-behind patterns in this case. And ugly "hungarian notation" is frowned in Java, so get out of the habit of using it.)

Pattern Matcher Vs String Split, which should I use?

First time posting.
Firstly I know how to use both Pattern Matcher & String Split.
My questions is which is best for me to use in my example and why?
Or suggestions for better alternatives.
Task:
I need to extract an unknown NOUN between two known regexp in an unknown string.
My Solution:
get the Start and End of the noun (from Regexp 1&2) and substring to extract the noun.
String line = "unknownXoooXNOUNXccccccXunknown";
int goal = 12 ;
String regexp1 = "Xo+X";
String regexp2 = "Xc+X";
I need to locate the index position AFTER the first regex.
I need to locate the index position BEFORE the second regex.
A) I can use pattern matcher
Pattern p = Pattern.compile(regexp1);
Matcher m = p.matcher(line);
if (m.find()) {
int afterRegex1 = m.end();
} else {
throw new IllegalArgumentException();
//TODO Exception Management;
}
B) I can use String Split
String[] split = line.split(regex1,2);
if (split.length != 2) {
throw new UnsupportedOperationException();
//TODO Exception Management;
}
int afterRegex1 = line.indexOf(split[1]);
Which Approach should I use and why?
I don't know which is more efficient on time and memory.
Both are near enough as readable to myself.
I'd do it like this:
String line = "unknownXoooXNOUNXccccccXunknown";
String regex = "Xo+X(.*?)Xc+X";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(line);
if (m.find()) {
String noun = m.group(1);
}
The (.*?) is used to make the inner match on the NOUN reluctant. This protects us from a case where our ending pattern appears again in the unknown portion of the string.
EDIT
This works because the (.*?) defines a capture group. There's only one such group defined in the pattern, so it gets index 1 (the parameter to m.group(1)). These groups are indexed from left to right starting at 1. If the pattern were defined like this
String regex = "(Xo+X)(.*?)(Xc+X)";
Then there would be three capture groups, such that
m.group(1); // yields "XoooX"
m.group(2); // yields "NOUN"
m.group(3); // yields "XccccccX"
There is a group 0, but that matches the whole pattern, and it's equivalent to this
m.group(); // yields "XoooXNOUNXccccccX"
For more information about what you can do with the Matcher, including ways to get the start and end positions of your pattern within the source string, see the Matcher JavaDocs
You should use String.split() for readability unless you're in a tight loop.
Per split()'s javadoc, split() does the equivalent of Pattern.compile(), which you can optimize away if you're in a tight loop.
It looks like you want to get a unique occurrence. For this do simply
input.replaceAll(".*Xo+X(.*)Xc+X.*", "$1")
For efficiency, use Pattern.matcher(input).replaceAll instead.
In case you input contains line breaks, use Pattern.DOTALL or the s modifier.
In case you want to use split, consider using Guava's Splitter. It behaves more sane and also accepts a Pattern which is good for speed.
If you really need the locations you can do it like this:
String line = "unknownXoooXNOUNXccccccXunknown";
String regexp1 = "Xo+X";
String regexp2 = "Xc+X";
Matcher m=Pattern.compile(regexp1).matcher(line);
if(m.find())
{
int start=m.end();
if(m.usePattern(Pattern.compile(regexp2)).find())
{
final int end = m.start();
System.out.println("from "+start+" to "+end+" is "+line.substring(start, end));
}
}
But if you just need the word in between, I recommend the way Ian McLaird has shown.

Categories

Resources