i have the following text:
bla [string1] bli [string2]
I like to match string1 and string2 with regex in a loop in java.
Howto do ?
my code so far, which only matches the first string1, but not also string 2.
String sRegex="(?<=\\[).*?(?=\\])";
Pattern p = Pattern.compile(sRegex); // create the pattern only once,
Matcher m = p.matcher(sFormula);
if (m.find())
{
String sString1 = m.group(0);
String sString2 = m.group(1); // << no match
}
Your regex is not using any captured groups hence this call with throw exceptions:
m.group(1);
You can use just use:
String sRegex="(?<=\\[)[^]]*(?=\\])";
Pattern p = Pattern.compile(sRegex); // create the pattern only once,
Matcher m = p.matcher(sFormula);
while (m.find()) {
System.out.println( m.group() );
}
Also if should be replaced by while to match multiple times to return all matches.
Your approach is confused. Either write your regex so that it matches two [....] sequences in the one pattern, or call find multiple times. Your current attempt has a regex that "finds" just one [...] sequence.
Try something like this:
Pattern p = Pattern.compile("\\[([^\\]]+)]");
Matcher m = p.matcher(formula);
if (m.find()) {
String string1 = m.group(0);
if (m.find(m.end()) {
String string2 = m.group(0);
}
}
Or generalize using a loop and an array of String for the extracted strings.
(You don't need any fancy look-behind patterns in this case. And ugly "hungarian notation" is frowned in Java, so get out of the habit of using it.)
Related
I have a String that I don't know how long it is or what caracters are used in it.
I want to search in the string and get any substring found inside "" .
I tried to use pattern.compile but it always return an empty string
Pattern p = Pattern.compile("\".\"");
Matcher m = p.matcher(mystring);
while(m.find()){
System.out.println(m.group().toString());
}
How can I do it?
Use the .+? to get all characters inside "" with grouping
Pattern p = Pattern.compile("\".+?\"");
The .+ specifies that you want at least one or more characters inside the quotations. The ? specifies that it is a reluctant quantifier, which means it will put different quotations into different groups.
Unit test example:
#Test
public void test() {
String test = "speak \"friend\" and \"enter\"";
Pattern p = Pattern.compile("\".+?\"");
Matcher m = p.matcher(test);
while(m.find()){
System.out.println(m.group().toString().replace("\"", ""));
}
}
Output:
friend
enter
That is because your regex actually searches for one character between " and " ... if you want to search for more character, you should rewrite your regex to "\".?\""
I'm trying to do some pattern matching in Java:
Pattern p = Pattern.compile("(\\d+) (\\.+)");
Matcher m = p.matcher("5 soy milk");
String qty = m.group(1);
String name = m.group(2);
I want to end up with one string that contains "5" and one string that contains "soy milk". However, this pattern matching code gives me an IllegalStateException.
You have to call matches() before you attempt to get the groups.
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#matches()
public boolean matches()
Attempts to match the entire region against the pattern.
If the match succeeds then more information can be obtained via the start, end, and group methods.
Try this:
Pattern p = Pattern.compile("(\\d+) (\\.+)");
Matcher m = p.matcher("5 soy milk");
if (m.matches())
{
String qty = m.group(1);
String name = m.group(2);
}
This is because you don't initiate your Matcher. You should p.matcher(...).matches() (or .find(), or .lookingAt(), depending on the desired behaviour -- real regex matching is done with .find()).
And check the result of .matches() since in your case it returns false: \.+ ("\\.+" in a Java string) will try and match a dot one or more times; you should use .+ (".+" in a Java string) to match "any character, one or more times".
First time posting.
Firstly I know how to use both Pattern Matcher & String Split.
My questions is which is best for me to use in my example and why?
Or suggestions for better alternatives.
Task:
I need to extract an unknown NOUN between two known regexp in an unknown string.
My Solution:
get the Start and End of the noun (from Regexp 1&2) and substring to extract the noun.
String line = "unknownXoooXNOUNXccccccXunknown";
int goal = 12 ;
String regexp1 = "Xo+X";
String regexp2 = "Xc+X";
I need to locate the index position AFTER the first regex.
I need to locate the index position BEFORE the second regex.
A) I can use pattern matcher
Pattern p = Pattern.compile(regexp1);
Matcher m = p.matcher(line);
if (m.find()) {
int afterRegex1 = m.end();
} else {
throw new IllegalArgumentException();
//TODO Exception Management;
}
B) I can use String Split
String[] split = line.split(regex1,2);
if (split.length != 2) {
throw new UnsupportedOperationException();
//TODO Exception Management;
}
int afterRegex1 = line.indexOf(split[1]);
Which Approach should I use and why?
I don't know which is more efficient on time and memory.
Both are near enough as readable to myself.
I'd do it like this:
String line = "unknownXoooXNOUNXccccccXunknown";
String regex = "Xo+X(.*?)Xc+X";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(line);
if (m.find()) {
String noun = m.group(1);
}
The (.*?) is used to make the inner match on the NOUN reluctant. This protects us from a case where our ending pattern appears again in the unknown portion of the string.
EDIT
This works because the (.*?) defines a capture group. There's only one such group defined in the pattern, so it gets index 1 (the parameter to m.group(1)). These groups are indexed from left to right starting at 1. If the pattern were defined like this
String regex = "(Xo+X)(.*?)(Xc+X)";
Then there would be three capture groups, such that
m.group(1); // yields "XoooX"
m.group(2); // yields "NOUN"
m.group(3); // yields "XccccccX"
There is a group 0, but that matches the whole pattern, and it's equivalent to this
m.group(); // yields "XoooXNOUNXccccccX"
For more information about what you can do with the Matcher, including ways to get the start and end positions of your pattern within the source string, see the Matcher JavaDocs
You should use String.split() for readability unless you're in a tight loop.
Per split()'s javadoc, split() does the equivalent of Pattern.compile(), which you can optimize away if you're in a tight loop.
It looks like you want to get a unique occurrence. For this do simply
input.replaceAll(".*Xo+X(.*)Xc+X.*", "$1")
For efficiency, use Pattern.matcher(input).replaceAll instead.
In case you input contains line breaks, use Pattern.DOTALL or the s modifier.
In case you want to use split, consider using Guava's Splitter. It behaves more sane and also accepts a Pattern which is good for speed.
If you really need the locations you can do it like this:
String line = "unknownXoooXNOUNXccccccXunknown";
String regexp1 = "Xo+X";
String regexp2 = "Xc+X";
Matcher m=Pattern.compile(regexp1).matcher(line);
if(m.find())
{
int start=m.end();
if(m.usePattern(Pattern.compile(regexp2)).find())
{
final int end = m.start();
System.out.println("from "+start+" to "+end+" is "+line.substring(start, end));
}
}
But if you just need the word in between, I recommend the way Ian McLaird has shown.
I am trying to get an array of strings, from a lengthy string. Array consist of strings matching between two other strings (??? and ??? in my case). I tried the following code and it's not giving me the expected results
Pattern pattern = Pattern.compile("\\?\\?\\?(.*?)\\?\\?\\?");
String[] arrayOfKeys = pattern.split("???label.missing???sdfjkhsjkdf sjkdghfjksdg ???some.label???sdjkhsdj");
for (String key : arrayOfKeys) {
System.out.println(key);
}
My expected result is:
["label.missing", "some.label"]
Use Pattern.matcher() to obtain a Matcher for the input string, then use Matcher.find() to find the pattern you want. Matcher.find() will find substring(s) that matches the Pattern provided.
Pattern pattern = Pattern.compile("\\?{3}(.*?)\\?{3}");
Matcher m = pattern.matcher(inputString);
while (m.find()) {
System.out.println(m.group(1));
}
Pattern.split() will use your pattern as delimiter to split the string (then the delimiter part is discarded), which is obviously not what you want in this case. Your regex is designed to match the text that you want to extract.
I shorten the pattern to use quantifier repeating exactly 3 times {3}, instead of writing \? 3 times.
I would create a string input with what you're trying to split, and call input.split() on it.
String input = "???label.missing???sdfjkhsjkdf sjkdghfjksdg ???some.label???sdjkhsdj";
String[] split = input.split("\\?\\?\\?");
Try it here:
http://ideone.com/VAmCyu
Pattern pattern = Pattern.compile("\\?{3}(.+?)\\?{3}");
Matcher matcher= pattern.matcher("???label.missing???sdfjkhsjkdf sjkdghfjksdg ???some.label???sdjkhsdj");
List<String> aList = new ArrayList<String>();
while(matcher.find()) {
aList.add(matcher.group(1));
}
for (String key : aList) {
System.out.println(key);
}
In .NET, if I want to match a sequence of characters against a pattern that describes capturing groups that occur any number of times, I could write something as follows:
String input = "a, bc, def, hijk";
String pattern = "(?<x>[^,]*)(,\\s*(?<y>[^,]*))*";
Match m = Regex.Match(input, pattern);
Console.WriteLine(m.Groups["x"].Value);
//the group "y" occurs 0 or more times per match
foreach (Capture c in m.Groups["y"].Captures)
{
Console.WriteLine(c.Value);
}
This code would print:
a
bc
def
hijk
That seems straightforward, but unfortunately the following Java code doesn't do what the .NET code does. (Which is expected, since java.util.regex doesn't seem to distinguish between groups and captures.)
String input = "a, bc, def, hijk";
Pattern pattern = Pattern.compile("(?<x>[^,]*)(,\\s*(?<y>[^,]*))*");
Matcher m = pattern.matcher(input);
while(m.find())
{
System.out.println(m.group("x"));
System.out.println(m.group("y"));
}
Prints:
a
hijk
null
Can someone please explain how to accomplish the same using Java, without having to re-write the regular expression or use external libraries?
What you want is not possible in java. When the same group has been matched several times, only the last occurrence of that group is saved. For more info read the Pattern docs section Groups and capturing. In java the Matcher/Pattern is used to iterate through a String in "real-time".
Example with repetition:
String input = "a1b2c3";
Pattern pattern = Pattern.compile("(?<x>.\\d)*");
Matcher matcher = pattern.matcher(input);
while(matcher.find())
{
System.out.println(matcher.group("x"));
}
Prints (null because the * matches the empty string too):
c3
null
Without:
String input = "a1b2c3";
Pattern pattern = Pattern.compile("(?<x>.\\d)");
Matcher matcher = pattern.matcher(input);
while(matcher.find())
{
System.out.println(matcher.group("x"));
}
Prints:
a1
b2
c3
You can use Pattern and Matcher classes in Java. It's slightly different. For example following code:
Pattern p = Pattern.compile("(el).*(wo)");
Matcher m = p.matcher("hello world");
while(m.find()) {
for(int i=1; i<=m.groupCount(); ++i) System.out.println(m.group(i));
}
Will print two strings:
el
wo