Regular expression to split a string into 5 parts as below - java

This is my input string and I wanted to break it up into 5 parts according to regex below so that I can print out the 5 groups but I always get a no match found. What am I doing wrong ?
String content="beit Be it enacted by the Senate and House of Representatives of the United States of America in Congress assembled,,<m>Surface Transportation Extension Act of 2012.,<xm>";
Pattern regEx = Pattern.compile("^(.*)(<m>)(.*)(<xm>)(.*)$", Pattern.MULTILINE);
System.out.println(regEx.matcher(content).group(1));
System.out.println(regEx.matcher(content).group(2));
System.out.println(regEx.matcher(content).group(3));
System.out.println(regEx.matcher(content).group(4));
System.out.println(regEx.matcher(content).group(5));

Pattern regEx = Pattern.compile("^(.*)(<m>)(.*)(<xm>)(.*)$", Pattern.MULTILINE);
Matcher matcher = regEx.matcher(content);
if (matcher.find()) { // calling find() is important
// if the regex matches multiple times, use while instead of if
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
System.out.println(matcher.group(4));
System.out.println(matcher.group(5));
} else {
System.out.println("Regex didn't match");
}

Your regex's 5th match doesn't match anything - there is no content after the <xm>. Also, you should really run regEx.matcher() once, and then pull the groups out of the one matcher; as written, it executes the regex 5 times, once to get each group out. Also your RegEx is never executed unless you call find() or matches.

Related

Java Pattern Matcher is not working for regex as expected

1) Pattern pattern = Pattern.compile("34238");
Matcher matcher = pattern.matcher("6003 Honore Ave Suite 101 Sarasota Florida,
34238");
if (matcher.find()) {
System.out.println("ok");
}
2) Pattern pattern = Pattern.compile("^[0-9]{5}(?:-[0-9]{4})?$");
Matcher matcher = pattern.matcher("34238");
if (matcher.find()) {
System.out.println("ok");
}
Output for the above code is: ok
But the following code is not printing anything:
Pattern pattern = Pattern.compile("^[0-9]{5}(?:-[0-9]{4})?$");
Matcher matcher = pattern.matcher("6003 Honore Ave Suite 101 Sarasota Florida, 34238");
if (matcher.find()) {
System.out.println("ok");
}
What is the reason for this not to print ok? I am using the same pattern here also.
Although the pattern is the same, the input strings are different:
In your second example, you are matching a string consisting entirely of a zip code, so you get a match for ^...$ expression
The second example does not start with the zip code, so the ^ anchor prevents your regex from matching.
^ and $ anchors are used when you want your expression to match the entire input line. When you want to match at the beginning, keep ^ and remove $; when you want to match at the end, remove ^ and keep $; when you want to match anywhere inside the string, remove both anchors.
The code is good and working as expected. In the 2) and 3) block in your question you are using the same regex but different input strings.
However, if you just want to check if a string must contain a US zip code, then the problem is that your regex is using anchors, so you are only matching lines that starts and finish with a zip code.
The strings that matches your regex are like 34238 or 34238-1234 and won't match something 12345 something.
If you remove the anchors, then you will match whatever 12345 whatever:
// Pattern pattern = Pattern.compile("^[0-9]{5}(?:-[0-9]{4})?$");
// ^--------- Here -------^
Pattern pattern = Pattern.compile("[0-9]{5}(?:-[0-9]{4})?");
Matcher matcher = pattern.matcher("6003 Honore Ave Suite 101 Sarasota Florida, 34238");
if (matcher.find()) {
System.out.println("ok");
}
Btw, if you just want to check if a string contains a zip code, then you can use String.matches(..), like this:
String str = "6003 Honore Ave Suite 101 Sarasota Florida, 34238";
if (str.matches(".*[0-9]{5}(?:-[0-9]{4})?.*")) {
System.out.println("ok");
}
IDEOne demo

Is it possible to use a quantifier on a non-capturing group ? - regex

I catch a group by regex and I would like to catch
everything but not the group(s).
So group can have several occurences, on different locations, in the String.
My first thought was, to solve it with negativ lookahead but I failed with it. Therefore I tried it with non capturing group and I stuck here too.
(bar) (baz) foo
I want foo.
This is what I have so far:
String input = "(bar) (baz) foo";
String matchesGroup = "((?=\\().*?\\))"; //matches (...)
// as Casimir et Hippolyte commented, I know use
// ((?:(...))+) for the non capturing group
String matchesFoo = "((?:"+ matchesGroup +")+)\\s(.*)";
Pattern pattern = Pattern.compile(matchesFoo);
Matcher matcher = pattern.matcher(input);
while (matcher.find()){
System.out.println(matcher.group());
}
but there is nothing captured at all
actual :
expected : foo
Where is my fault in the regex ?
since you want to match multiple (...) groups, account for the possible trailing space and move the + to quantify one or more of those (I moved the space into the group, and the + quantifying that whole structure)
String matchesFoo = "(?:(?:(?=\\().*?\\))\\s?)+(.*)";
demo here
Why don't you try this:
String testtring = "matches matches foo";
testString = testString.replaceAll("matches", "");
System.out.println(testString);

Android Java regexp pattern

I ping a host. In result a standard output. Below a REGEXP but it do not work correct. Where I did a mistake?
String REGEXP ="time=(\\\\d+)ms";
Pattern pattern = Pattern.compile(REGEXP);
Matcher matcher = pattern.matcher(result);
if (matcher.find()) {
result = matcher.group(1);
}
You only need \\d+ in your regex because
Matcher looks for the pattern (using which it is created) and then tries to find every occurance of the pattern in the string being matched.
Use while(matcher.group(1) in case of multiple occurances.
each () represents a captured group.
You have too many backslashes. Assuming you want to get the number from a string like "time=32ms", then you need:
String REGEXP ="time=(\\d+)ms";
Pattern pattern = Pattern.compile(REGEXP);
Matcher matcher = pattern.matcher(result);
if (matcher.find()) {
result = matcher.group(1);
}
Explanation: The search pattern you are looking for is "\d", meaning a decimal number, the "+" means 1 or more occurrences.
To get the "\" to the matcher, it needs to be escaped, and the escape character is also "\".
The brackets define the matching group that you want to pick out.
With "\\\\d+", the matcher sees this as "\\d+", which would match a backslash followed by one or more "d"s. The first backslash protects the second backslash, and the third protects the fourth.

Java String matches and replaceAll differ in matching parentheses

I have strings with parentheses and also escaped characters. I need to match against these characters and also delete them. In the following code, I use matches() and replaceAll() with the same regex, but the matches() returns false, while the replaceAll() seems to match just fine, because the replaceAll() executes and removes the characters. Can someone explain?
String input = "(aaaa)\\b";
boolean matchResult = input.matches("\\(|\\)|\\\\[a-z]+");
System.out.printf("matchResult=%s\n", matchResult);
String output = input.replaceAll("\\(|\\)|\\\\[a-z]+", "");
System.out.printf("INPUT: %s --> OUTPUT: %s\n", input, output);
Prints out:
matchResult=false
INPUT: (aaaa) --> OUTPUT: aaaa
matches matches the whole input, not part of it.
The regular expression \(|\)|\\[a-z]+ doesn't describe the whole word, but only parts of it, so in your case it fails.
What matches is doing has already been explained by Binyamin Sharet. I want to extend this a bit.
Java does not have a "findall" or a "g" modifier like other languages have it to get all matches at once.
The Java Matcher class knows only two methods to use a pattern against a string (without replacing it)
matches(): matches the whole string against the pattern
find(): returns the next match
If you want to get all things that fits your pattern, you need to use find() in a loop, something like this:
Pattern p = Pattern
.compile("\\(|\\)|\\\\[a-z]+");
Matcher m = p.matcher(text);
while(m.find()){
System.out.println(m.group(0));
}
or if you are only interested if your pattern exists in the string
if (m.find()) {
System.out.println(m.group());
} else {
System.out.println("not found");
}

Author and time matching regex

I would to use a regex in my Java program to recognize some feature of my strings.
I've this type of string:
`-Author- has wrote (-hh-:-mm-)
So, for example, I've a string with:
Cecco has wrote (15:12)
and i've to extract author, hh and mm fields. Obviously I've some restriction to consider:
hh and mm must be numbers
author hasn't any restrictions
I've to consider space between "has wrote" and (
How can I can use regex?
EDIT: I attach my snippet:
String mRegex = "(\\s)+ has wrote \\((\\d\\d):(\\d\\d)\\)";
Pattern mPattern = Pattern.compile(mRegex);
String[] str = {
"Cecco CQ has wrote (14:55)", //OK (matched)
"yesterday you has wrote that I'm crazy", //NO (different text)
"Simon has wrote (yesterday)", // NO (yesterday isn't numbers)
"John has wrote (22:32)", //OK
"James has wrote(22:11)", //NO (missed space between has wrote and ()
"Tommy has wrote (xx:ss)" //NO (xx and ss aren't numbers)
};
for(String s : str) {
Matcher mMatcher = mPattern.matcher(s);
while (mMatcher.find()) {
System.out.println(mMatcher.group());
}
}
homework?
Something like:
(.+) has wrote \((\d\d):(\d\d)\)
Should do the trick
() - mark groups to capture (there are three in the above)
.+ - any chars (you said no restrictions)
\d - any digit
\(\) escape the parens as literals instead of a capturing group
use:
Pattern p = Pattern.compile("(.+) has wrote \\((\\d\\d):(\\d\\d)\\)");
Matcher m = p.matcher("Gareth has wrote (12:00)");
if( m.matches()){
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
}
To cope with an optional (HH:mm) at the end you need to start to use some dark regex voodoo:
Pattern p = Pattern.compile("(.+) has wrote\\s?(?:\\((\\d\\d):(\\d\\d)\\))?");
Matcher m = p.matcher("Gareth has wrote (12:00)");
if( m.matches()){
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
}
m = p.matcher("Gareth has wrote");
if( m.matches()){
System.out.println(m.group(1));
// m.group(2) == null since it didn't match anything
}
The new unescaped pattern:
(.+) has wrote\s?(?:\((\d\d):(\d\d)\))?
\s? optionally match a space (there might not be a space at the end if there isn't a (HH:mm) group
(?: ... ) is a none capturing group, i.e. allows use to put ? after it to make is optional
I think #codinghorror has something to say about regex
The easiest way to figure out regular expressions is to use a testing tool before coding.
I use an eclipse plugin from http://www.brosinski.com/regex/
Using this I came up with the following result:
([a-zA-Z]*) has wrote \((\d\d):(\d\d)\)
Cecco has wrote (15:12)
Found 1 match(es):
start=0, end=23
Group(0) = Cecco has wrote (15:12)
Group(1) = Cecco
Group(2) = 15
Group(3) = 12
An excellent turorial on regular expression syntax can be found at http://www.regular-expressions.info/tutorial.html
Well, just in case you didn't know, Matcher has a nice function that can draw out specific groups, or parts of the pattern enclosed by (), Matcher.group(int). Like if I wanted to match for a number between two semicolons like:
:22:
I could use the regex ":(\\d+):" to match one or more digits between two semicolons, and then I can fetch specifically the digits with:
Matcher.group(1)
And then its just a matter of parsing the String into an int. As a note, group numbering starts at 1. group(0) is the whole match, so Matcher.group(0) for the previous example would return :22:
For your case, I think the regex bits you need to consider are
"[A-Za-z]" for alphabet characters (you could probably also safely use "\\w", which matchers alphabet characters, as well as numbers and _).
"\\d" for digits (1,2,3...)
"+" for indicating you want one or more of the previous character or group.

Categories

Resources