I'm trying to parse some text, but for some strange reason, Java regex doesn't work. For example, I've tried:
Pattern p = Pattern.compile("[A-Z][0-9]*,[0-9]*");
Matcher m = p.matcher("H3,4");
and it simply gives No match found exception, when I try to get the numbers m.group(1) and m.group(2). Am I missing something about how Java regex works?
Yes.
You must actually call matches() or find() on the matcher first.
Your regex must actually contain capturing groups
Example:
Pattern p = Pattern.compile("[A-Z](\\d*),(\\d*)");
matcher m = p.matcher("H3,4");
if (m.matches()) {
// use m.group(1), m.group(2) here
}
You also need the parenthesis to specify what is part of each group. I changed the leading part to be anything that's not a digit, 0 or more times. What's in each group is 1 or more digits. So, not * but + instead.
Pattern p = Pattern.compile("[^0-9]*([0-9]+),([0-9]+)");
Matcher m = p.matcher("H3,4");
if (m.matches())
{
String g1 = m.group(1);
String g2 = m.group(2);
}
Related
I have a String that I don't know how long it is or what caracters are used in it.
I want to search in the string and get any substring found inside "" .
I tried to use pattern.compile but it always return an empty string
Pattern p = Pattern.compile("\".\"");
Matcher m = p.matcher(mystring);
while(m.find()){
System.out.println(m.group().toString());
}
How can I do it?
Use the .+? to get all characters inside "" with grouping
Pattern p = Pattern.compile("\".+?\"");
The .+ specifies that you want at least one or more characters inside the quotations. The ? specifies that it is a reluctant quantifier, which means it will put different quotations into different groups.
Unit test example:
#Test
public void test() {
String test = "speak \"friend\" and \"enter\"";
Pattern p = Pattern.compile("\".+?\"");
Matcher m = p.matcher(test);
while(m.find()){
System.out.println(m.group().toString().replace("\"", ""));
}
}
Output:
friend
enter
That is because your regex actually searches for one character between " and " ... if you want to search for more character, you should rewrite your regex to "\".?\""
Im using this line of code to extract all text between the two strings "Origin" and "//". I'm trying to exclude all digits but this doesn't work, It grabs everything including the digits. is my regex incorrect?
Pattern p = Pattern.compile(Pattern.quote("ORIGIN") + "(.*?[^0-9])" + Pattern.quote("//"), Pattern.DOTALL);
First of all: you have no need to Pattern.quote() either of ORIGIN or //; what is more, the text in your question suggests Origin, not ORIGIN, so I'll go with that instead.
Try this regex:
private static final Pattern PATTERN
= Pattern.compile("Origin([^0-9/]+)//");
Note: it disallows any slash between Origin and // as well, which may, or may not, be what you want; but since there are no examples in your question, this is as good a solution as I can muster.
What you want is not clear.
1) If you want to get only the text (without any number) even if there is number in:
Pattern p = Pattern.compile("ORIGIN(.*)//");
Matcher m = p.matcher(str);
if(m.find())
System.out.println(m.group(1).replaceAll("\\d+", ""));
2) If you want to get text without number :
Pattern p = Pattern.compile("ORIGIN([^0-9]+)//");
Matcher m = p.matcher(str);
if(m.find())
ystem.out.println(m.group(1));
3) Something else ???????
E.g :
String : ORIGINbla54bla//
1) String : blabla
2) No result (Pattern does not match)
Suppose i have a string kk a.b.cjkmkc jjkocc a.b.c.
I want to find the substring a.b.c in the string , but it is not working.
Here is my code
Pattern p = Pattern.compile("a.b.c");
Matcher m = p.matcher(str);
int x = m.find()
The . in Java Pattern is a special character: "Any character (may or may not match line terminators)" (from the java.util.regex.Pattern web page).
Try escaping it:
Pattern p = Pattern.compile("a\\.b\\.c");
Also note:
Matcher.find returns boolean, not int.
Patterns take double escapes
As others have mentioned, . is a special charater in regular expressions. You can let Java quote sepcial characters using Pattern.quote. BTW: What about String.indexof(String) (which is faster). if you really need regular expressions, have a look at this:
String str = "kk a.b.cjkmkc jjkocc a.b.c.";
Pattern p = Pattern.compile(Pattern.quote("a.b.c"));
Matcher m = p.matcher(str);
while (m.find()) {
int x = m.start();
// ...
}
In .NET, if I want to match a sequence of characters against a pattern that describes capturing groups that occur any number of times, I could write something as follows:
String input = "a, bc, def, hijk";
String pattern = "(?<x>[^,]*)(,\\s*(?<y>[^,]*))*";
Match m = Regex.Match(input, pattern);
Console.WriteLine(m.Groups["x"].Value);
//the group "y" occurs 0 or more times per match
foreach (Capture c in m.Groups["y"].Captures)
{
Console.WriteLine(c.Value);
}
This code would print:
a
bc
def
hijk
That seems straightforward, but unfortunately the following Java code doesn't do what the .NET code does. (Which is expected, since java.util.regex doesn't seem to distinguish between groups and captures.)
String input = "a, bc, def, hijk";
Pattern pattern = Pattern.compile("(?<x>[^,]*)(,\\s*(?<y>[^,]*))*");
Matcher m = pattern.matcher(input);
while(m.find())
{
System.out.println(m.group("x"));
System.out.println(m.group("y"));
}
Prints:
a
hijk
null
Can someone please explain how to accomplish the same using Java, without having to re-write the regular expression or use external libraries?
What you want is not possible in java. When the same group has been matched several times, only the last occurrence of that group is saved. For more info read the Pattern docs section Groups and capturing. In java the Matcher/Pattern is used to iterate through a String in "real-time".
Example with repetition:
String input = "a1b2c3";
Pattern pattern = Pattern.compile("(?<x>.\\d)*");
Matcher matcher = pattern.matcher(input);
while(matcher.find())
{
System.out.println(matcher.group("x"));
}
Prints (null because the * matches the empty string too):
c3
null
Without:
String input = "a1b2c3";
Pattern pattern = Pattern.compile("(?<x>.\\d)");
Matcher matcher = pattern.matcher(input);
while(matcher.find())
{
System.out.println(matcher.group("x"));
}
Prints:
a1
b2
c3
You can use Pattern and Matcher classes in Java. It's slightly different. For example following code:
Pattern p = Pattern.compile("(el).*(wo)");
Matcher m = p.matcher("hello world");
while(m.find()) {
for(int i=1; i<=m.groupCount(); ++i) System.out.println(m.group(i));
}
Will print two strings:
el
wo
UPDATE: Thanks for all the great responses! I tried many different regex patterns but didn't understand why m.matches() was not doing what I think it should be doing. When I switched to m.find() instead, as well as adjusting the regex pattern, I was able to get somewhere.
I'd like to match a pattern in a Java string and then extract the portion matched using a regex (like Perl's $& operator).
This is my source string "s": DTSTART;TZID=America/Mexico_City:20121125T153000
I want to extract the portion "America/Mexico_City".
I thought I could use Pattern and Matcher and then extract using m.group() but it's not working as I expected. I've tried monkeying with different regex strings and the only thing that seems to hit on m.matches() is ".*TZID.*" which is pointless as it just returns the whole string. Could someone enlighten me?
Pattern p = Pattern.compile ("TZID*:"); // <- change to "TZID=([^:]*):"
Matcher m = p.matcher (s);
if (m.matches ()) // <- change to m.find()
Log.d (TAG, "looking at " + m.group ()); // <- change to m.group(1)
You use m.match() that tries to match the whole string, if you will use m.find(), it will search for the match inside, also I improved a bit your regexp to exclude TZID prefix using zero-width look behind:
Pattern p = Pattern.compile("(?<=TZID=)[^:]+"); //
Matcher m = p.matcher ("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
System.out.println(m.group());
}
This should work nicely:
Pattern p = Pattern.compile("TZID=(.*?):");
Matcher m = p.matcher(s);
if (m.find()) {
String zone = m.group(1); // group count is 1-based
. . .
}
An alternative regex is "TZID=([^:]*)". I'm not sure which is faster.
You are using the wrong pattern, try this:
Pattern p = Pattern.compile(".*?TZID=([^:]+):.*");
Matcher m = p.matcher (s);
if (m.matches ())
Log.d (TAG, "looking at " + m.group(1));
.*? will match anything in the beginning up to TZID=, then TZID= will match and a group will begin and match everything up to :, the group will close here and then : will match and .* will match the rest of the String, now you can get what you need in group(1)
You are missing a dot before the asterisk. Your expression will match any number of uppercase Ds.
Pattern p = Pattern.compile ("TZID[^:]*:");
You should also add a capturing group unless you want to capture everything, including the "TZID" and the ":"
Pattern p = Pattern.compile ("TZID=([^:]*):");
Finally, you should use the right API to search the string, rather than attempting to match the string in its entirety.
Pattern p = Pattern.compile("TZID=([^:]*):");
Matcher m = p.matcher("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
System.out.println(m.group(1));
}
This prints
America/Mexico_City
Why not simply use split as:
String origStr = "DTSTART;TZID=America/Mexico_City:20121125T153000";
String str = origStr.split(":")[0].split("=")[1];