Pattern/Matcher group() to obtain substring in Java? - java

UPDATE: Thanks for all the great responses! I tried many different regex patterns but didn't understand why m.matches() was not doing what I think it should be doing. When I switched to m.find() instead, as well as adjusting the regex pattern, I was able to get somewhere.
I'd like to match a pattern in a Java string and then extract the portion matched using a regex (like Perl's $& operator).
This is my source string "s": DTSTART;TZID=America/Mexico_City:20121125T153000
I want to extract the portion "America/Mexico_City".
I thought I could use Pattern and Matcher and then extract using m.group() but it's not working as I expected. I've tried monkeying with different regex strings and the only thing that seems to hit on m.matches() is ".*TZID.*" which is pointless as it just returns the whole string. Could someone enlighten me?
Pattern p = Pattern.compile ("TZID*:"); // <- change to "TZID=([^:]*):"
Matcher m = p.matcher (s);
if (m.matches ()) // <- change to m.find()
Log.d (TAG, "looking at " + m.group ()); // <- change to m.group(1)

You use m.match() that tries to match the whole string, if you will use m.find(), it will search for the match inside, also I improved a bit your regexp to exclude TZID prefix using zero-width look behind:
Pattern p = Pattern.compile("(?<=TZID=)[^:]+"); //
Matcher m = p.matcher ("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
System.out.println(m.group());
}

This should work nicely:
Pattern p = Pattern.compile("TZID=(.*?):");
Matcher m = p.matcher(s);
if (m.find()) {
String zone = m.group(1); // group count is 1-based
. . .
}
An alternative regex is "TZID=([^:]*)". I'm not sure which is faster.

You are using the wrong pattern, try this:
Pattern p = Pattern.compile(".*?TZID=([^:]+):.*");
Matcher m = p.matcher (s);
if (m.matches ())
Log.d (TAG, "looking at " + m.group(1));
.*? will match anything in the beginning up to TZID=, then TZID= will match and a group will begin and match everything up to :, the group will close here and then : will match and .* will match the rest of the String, now you can get what you need in group(1)

You are missing a dot before the asterisk. Your expression will match any number of uppercase Ds.
Pattern p = Pattern.compile ("TZID[^:]*:");
You should also add a capturing group unless you want to capture everything, including the "TZID" and the ":"
Pattern p = Pattern.compile ("TZID=([^:]*):");
Finally, you should use the right API to search the string, rather than attempting to match the string in its entirety.
Pattern p = Pattern.compile("TZID=([^:]*):");
Matcher m = p.matcher("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
System.out.println(m.group(1));
}
This prints
America/Mexico_City

Why not simply use split as:
String origStr = "DTSTART;TZID=America/Mexico_City:20121125T153000";
String str = origStr.split(":")[0].split("=")[1];

Related

Regex for extracting a string between a word and new line character in java [duplicate]

I'm new to using Regex, I've been going through a rake of tutorials but I haven't found one that applies to what I want to do,
I want to search for something, but return everything following it but not the search string itself
e.g. "Some lame sentence that is awesome"
search for "sentence"
return "that is awesome"
Any help would be much appreciated
This is my regex so far
sentence(.*)
but it returns: sentence that is awesome
Pattern pattern = Pattern.compile("sentence(.*)");
Matcher matcher = pattern.matcher("some lame sentence that is awesome");
boolean found = false;
while (matcher.find())
{
System.out.println("I found the text: " + matcher.group().toString());
found = true;
}
if (!found)
{
System.out.println("I didn't find the text");
}
You can do this with "just the regular expression" as you asked for in a comment:
(?<=sentence).*
(?<=sentence) is a positive lookbehind assertion. This matches at a certain position in the string, namely at a position right after the text sentence without making that text itself part of the match. Consequently, (?<=sentence).* will match any text after sentence.
This is quite a nice feature of regex. However, in Java this will only work for finite-length subexpressions, i. e. (?<=sentence|word|(foo){1,4}) is legal, but (?<=sentence\s*) isn't.
Your regex "sentence(.*)" is right. To retrieve the contents of the group in parenthesis, you would call:
Pattern p = Pattern.compile( "sentence(.*)" );
Matcher m = p.matcher( "some lame sentence that is awesome" );
if ( m.find() ) {
String s = m.group(1); // " that is awesome"
}
Note the use of m.find() in this case (attempts to find anywhere on the string) and not m.matches() (would fail because of the prefix "some lame"; in this case the regex would need to be ".*sentence(.*)")
if Matcher is initialized with str, after the match, you can get the part after the match with
str.substring(matcher.end())
Sample Code:
final String str = "Some lame sentence that is awesome";
final Matcher matcher = Pattern.compile("sentence").matcher(str);
if(matcher.find()){
System.out.println(str.substring(matcher.end()).trim());
}
Output:
that is awesome
You need to use the group(int) of your matcher - group(0) is the entire match, and group(1) is the first group you marked. In the example you specify, group(1) is what comes after "sentence".
You just need to put "group(1)" instead of "group()" in the following line and the return will be the one you expected:
System.out.println("I found the text: " + matcher.group(**1**).toString());

Get a substring from string multiple times

I have a String that I don't know how long it is or what caracters are used in it.
I want to search in the string and get any substring found inside "" .
I tried to use pattern.compile but it always return an empty string
Pattern p = Pattern.compile("\".\"");
Matcher m = p.matcher(mystring);
while(m.find()){
System.out.println(m.group().toString());
}
How can I do it?
Use the .+? to get all characters inside "" with grouping
Pattern p = Pattern.compile("\".+?\"");
The .+ specifies that you want at least one or more characters inside the quotations. The ? specifies that it is a reluctant quantifier, which means it will put different quotations into different groups.
Unit test example:
#Test
public void test() {
String test = "speak \"friend\" and \"enter\"";
Pattern p = Pattern.compile("\".+?\"");
Matcher m = p.matcher(test);
while(m.find()){
System.out.println(m.group().toString().replace("\"", ""));
}
}
Output:
friend
enter
That is because your regex actually searches for one character between " and " ... if you want to search for more character, you should rewrite your regex to "\".?\""

Regular expression not filtering out digits

Im using this line of code to extract all text between the two strings "Origin" and "//". I'm trying to exclude all digits but this doesn't work, It grabs everything including the digits. is my regex incorrect?
Pattern p = Pattern.compile(Pattern.quote("ORIGIN") + "(.*?[^0-9])" + Pattern.quote("//"), Pattern.DOTALL);
First of all: you have no need to Pattern.quote() either of ORIGIN or //; what is more, the text in your question suggests Origin, not ORIGIN, so I'll go with that instead.
Try this regex:
private static final Pattern PATTERN
= Pattern.compile("Origin([^0-9/]+)//");
Note: it disallows any slash between Origin and // as well, which may, or may not, be what you want; but since there are no examples in your question, this is as good a solution as I can muster.
What you want is not clear.
1) If you want to get only the text (without any number) even if there is number in:
Pattern p = Pattern.compile("ORIGIN(.*)//");
Matcher m = p.matcher(str);
if(m.find())
System.out.println(m.group(1).replaceAll("\\d+", ""));
2) If you want to get text without number :
Pattern p = Pattern.compile("ORIGIN([^0-9]+)//");
Matcher m = p.matcher(str);
if(m.find())
ystem.out.println(m.group(1));
3) Something else ???????
E.g :
String : ORIGINbla54bla//
1) String : blabla
2) No result (Pattern does not match)

Extracting some pattern using regex

I'm trying to write a regex pattern that will match a "digit~digit~string~sentence". eg 14~742091~065M998~P E ROUX 214. I've come up with the following so far:
String regex= "\\d+~?\\d+~?\\w+~?"
How do I extract the sentence after the last ~?
Use Capturing Groups:
\d+~?\d+~?\w+~(.*)
group(1) contains the part you want.
Another solution is using String#split:
String[] splitted = myString.split("~");
String res = splitted[splitted.length() - 1];
Use capturing groups (), as demonstrated in this pattern: "\\d+~\\d+~\\w+~(.*)". Note that you don't need the greedy quantifier ?.
String input = "14~742091~065M998~P E ROUX 214";
Pattern pattern = Pattern.compile("\\d+~\\d+~\\w+~(.*)");
//Pattern pattern = Pattern.compile("(?:\\d+~){2}\\w+~(.*)"); (would also work)
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
System.out.println(matcher.group(1));
}
Prints:
P E ROUX 214
you should use ( ) to extract the output you want,
for more details see here
.*~(.*)$
This simple regex should work for you.
See demo
try the regexp below, the sentence only contains alphanumeric and spaces
^\d+~\d+~\w+~[\w\s]+

Java regex doesn't find numbers

I'm trying to parse some text, but for some strange reason, Java regex doesn't work. For example, I've tried:
Pattern p = Pattern.compile("[A-Z][0-9]*,[0-9]*");
Matcher m = p.matcher("H3,4");
and it simply gives No match found exception, when I try to get the numbers m.group(1) and m.group(2). Am I missing something about how Java regex works?
Yes.
You must actually call matches() or find() on the matcher first.
Your regex must actually contain capturing groups
Example:
Pattern p = Pattern.compile("[A-Z](\\d*),(\\d*)");
matcher m = p.matcher("H3,4");
if (m.matches()) {
// use m.group(1), m.group(2) here
}
You also need the parenthesis to specify what is part of each group. I changed the leading part to be anything that's not a digit, 0 or more times. What's in each group is 1 or more digits. So, not * but + instead.
Pattern p = Pattern.compile("[^0-9]*([0-9]+),([0-9]+)");
Matcher m = p.matcher("H3,4");
if (m.matches())
{
String g1 = m.group(1);
String g2 = m.group(2);
}

Categories

Resources