Find a subtring in a string using a regular expression - java

Suppose i have a string kk a.b.cjkmkc jjkocc a.b.c.
I want to find the substring a.b.c in the string , but it is not working.
Here is my code
Pattern p = Pattern.compile("a.b.c");
Matcher m = p.matcher(str);
int x = m.find()

The . in Java Pattern is a special character: "Any character (may or may not match line terminators)" (from the java.util.regex.Pattern web page).
Try escaping it:
Pattern p = Pattern.compile("a\\.b\\.c");
Also note:
Matcher.find returns boolean, not int.
Patterns take double escapes

As others have mentioned, . is a special charater in regular expressions. You can let Java quote sepcial characters using Pattern.quote. BTW: What about String.indexof(String) (which is faster). if you really need regular expressions, have a look at this:
String str = "kk a.b.cjkmkc jjkocc a.b.c.";
Pattern p = Pattern.compile(Pattern.quote("a.b.c"));
Matcher m = p.matcher(str);
while (m.find()) {
int x = m.start();
// ...
}

Related

How to pull double out of string with matcher

I'm trying to parse a double out of a string. I have the code:
Pattern p = Pattern.compile("-?\\d+(\\.\\d+)?");
Matcher m = p.matcher("reciproc(2.00000000000)");
System.out.println(Double.parseDouble(m.group()));
This code throws a java.lang.IllegalStateException. I want the output to be 2.00000000000. I got the regex from Java: Regex for Parsing Positive and Negative Doubles where it seemed to work for them. I tried a few other regexs as well and they all threw the same error. Am I missing something here?
It's not a problem with your regex but in how you are using the Matcher class. You need to call find() first.
This should work:
Pattern p = Pattern.compile("-?\\d+(\\.\\d+)?");
String text = "reciproc(2.00000000000)";
Matcher m = p.matcher(text);
if(m.find())
{
System.out.println(Double.parseDouble(text.substring(m.start(), m.end())));
}
Alternatively:
Pattern p = Pattern.compile("-?\\d+(\\.\\d+)?");
Matcher m = p.matcher("reciproc(2.00000000000)");
if(m.find())
{
System.out.println(Double.parseDouble(m.group()));
}
For more information, see the docs.
p.matcher("2.000000000000");
Your pattern should match the regex provided in Pattern.compile()
For more information on regex and patterns:
https://docs.oracle.com/javase/tutorial/essential/regex/
https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

java Pattern Matching issue

I have an issue to write proper regex to match URL.
String input = "AAAhttp://www.gmail.comBBBBabc#gmail.com"
String regex = "www.*.com" // To match www.gmail.com URL
Pattern p = Pattern.compile(regex)
Matcher m = p.matcher(input)
while(m.find()){
}
Here I want to remove the Url www.gmail.com. However it matches till end of string to match email address also which ends with gmail.com.
Can someone help me to get proper regex to match only the URL?
.* does a greedy match. You have to add ? after * to does an reluctant match.
"www\\..*?\\.com"
Your code would be,
String s = "AAAhttp://www.gmail.comBBBBabc#gmail.com";
Pattern p = Pattern.compile("www\\..*?\\.com");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(0));
}
IDEONE
String regex = "www\\..*?\\.com"
Non-greedy repetition of the wildcard '.' and escape dot when literally
A negated character class is faster than .*?
Use this regex:
www\.[^.]+\.com
[^.]+ means any character that is not a dot.
In Java we need to escape some characters:
// for instance
Pattern regex = Pattern.compile("www\\.[^.]+\\.com");
// etc

Pattern/Matcher group() to obtain substring in Java?

UPDATE: Thanks for all the great responses! I tried many different regex patterns but didn't understand why m.matches() was not doing what I think it should be doing. When I switched to m.find() instead, as well as adjusting the regex pattern, I was able to get somewhere.
I'd like to match a pattern in a Java string and then extract the portion matched using a regex (like Perl's $& operator).
This is my source string "s": DTSTART;TZID=America/Mexico_City:20121125T153000
I want to extract the portion "America/Mexico_City".
I thought I could use Pattern and Matcher and then extract using m.group() but it's not working as I expected. I've tried monkeying with different regex strings and the only thing that seems to hit on m.matches() is ".*TZID.*" which is pointless as it just returns the whole string. Could someone enlighten me?
Pattern p = Pattern.compile ("TZID*:"); // <- change to "TZID=([^:]*):"
Matcher m = p.matcher (s);
if (m.matches ()) // <- change to m.find()
Log.d (TAG, "looking at " + m.group ()); // <- change to m.group(1)
You use m.match() that tries to match the whole string, if you will use m.find(), it will search for the match inside, also I improved a bit your regexp to exclude TZID prefix using zero-width look behind:
Pattern p = Pattern.compile("(?<=TZID=)[^:]+"); //
Matcher m = p.matcher ("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
System.out.println(m.group());
}
This should work nicely:
Pattern p = Pattern.compile("TZID=(.*?):");
Matcher m = p.matcher(s);
if (m.find()) {
String zone = m.group(1); // group count is 1-based
. . .
}
An alternative regex is "TZID=([^:]*)". I'm not sure which is faster.
You are using the wrong pattern, try this:
Pattern p = Pattern.compile(".*?TZID=([^:]+):.*");
Matcher m = p.matcher (s);
if (m.matches ())
Log.d (TAG, "looking at " + m.group(1));
.*? will match anything in the beginning up to TZID=, then TZID= will match and a group will begin and match everything up to :, the group will close here and then : will match and .* will match the rest of the String, now you can get what you need in group(1)
You are missing a dot before the asterisk. Your expression will match any number of uppercase Ds.
Pattern p = Pattern.compile ("TZID[^:]*:");
You should also add a capturing group unless you want to capture everything, including the "TZID" and the ":"
Pattern p = Pattern.compile ("TZID=([^:]*):");
Finally, you should use the right API to search the string, rather than attempting to match the string in its entirety.
Pattern p = Pattern.compile("TZID=([^:]*):");
Matcher m = p.matcher("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
System.out.println(m.group(1));
}
This prints
America/Mexico_City
Why not simply use split as:
String origStr = "DTSTART;TZID=America/Mexico_City:20121125T153000";
String str = origStr.split(":")[0].split("=")[1];

Java regex doesn't find numbers

I'm trying to parse some text, but for some strange reason, Java regex doesn't work. For example, I've tried:
Pattern p = Pattern.compile("[A-Z][0-9]*,[0-9]*");
Matcher m = p.matcher("H3,4");
and it simply gives No match found exception, when I try to get the numbers m.group(1) and m.group(2). Am I missing something about how Java regex works?
Yes.
You must actually call matches() or find() on the matcher first.
Your regex must actually contain capturing groups
Example:
Pattern p = Pattern.compile("[A-Z](\\d*),(\\d*)");
matcher m = p.matcher("H3,4");
if (m.matches()) {
// use m.group(1), m.group(2) here
}
You also need the parenthesis to specify what is part of each group. I changed the leading part to be anything that's not a digit, 0 or more times. What's in each group is 1 or more digits. So, not * but + instead.
Pattern p = Pattern.compile("[^0-9]*([0-9]+),([0-9]+)");
Matcher m = p.matcher("H3,4");
if (m.matches())
{
String g1 = m.group(1);
String g2 = m.group(2);
}

new to java regex, how to grab this part of the string

I have a string that looks like:
http://www.example.com/index.do/blah/1_44/asdf/asdf/asdf
http://www.example.com/index.do/blah/1_44_2342/asdf/asdf/asdf
I need to grab the value 44 from the above, ofcourse '44' could be any number.
The number '44' always is prefixed with a _, and after it could be another _ or /.
I have no idea of the java regex API, so as guidance would be appreciated!
It's primarily the Pattern and Matcher classes you're interested in.
String url = "http://www.example.com/index.do/blah/1_44/asdf/asdf/asdf";
Pattern p = Pattern.compile("_(\\d+)");
Matcher m = p.matcher(url);
if (m.find()) {
int number = Integer.valueOf(m.group(1));
}
This pattern finds the first sequence of one or more digits after the first _.

Categories

Resources