Getting overlapping matches with multiple patterns in Java regex - java

I have the same problem as in this link
but with multiple patterns. My regex is like:
Pattern word = Pattern.compile("([\w]+ [\d]+)|([\d]+ suite)|([\w]+ road)");
If my sample text is,
XYZ Road 123 Suite
My desire output is,
XYZ Road 123
123 suite
But am getting
XYZ Road 123
only.
Thanks in advance!

You could try the below regex which uses positive lookahead assertion.
(?=(\b\w+ Road \d+\b)|(\b\d+ suite\b))
DEMO
String s = "XYZ Road 123 Suite";
Matcher m = Pattern.compile("(?i)(?=(\\b\\w+ Road \\d+\\b)|(\\b\\d+ suite))").matcher(s);
while(m.find())
{
if(m.group(1) != null) System.out.println(m.group(1));
if(m.group(2) != null) System.out.println(m.group(2));
}
Output:
XYZ Road 123
123 Suite

(?=(\b[\w]+ [\d]+))|(?=(\b[\d]+ suite))|(?=(\b[\w]+ road))
Try this.See demo.Grab the captures.
https://regex101.com/r/dU7oN5/16
Use positive lookahead to avoid string being consumed.

Something like this, maybe?
Pattern p = Pattern.compile("([\\w ] Road) (\\d+) (Suite)");
Matcher m = p.matcher(input);
if(m.find) {
System.out.println(m.group(1) + " " + m.group(2));
System.out.println(m.group(2) + " " + m.group(3));
}

Related

How to get only First Name Last Name from LDAP CN when format is last name\, first name

CN=Belzile\, Pierre,OU=LaptopUser,OU=Users,DC=Company,DC=local
I need only "Belzile Pierre" to be returned.
I need help with the regex syntax
For the regular expression we use Java syntax https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html.
Expected Result:
Belzile Pierre
You can use this regex and capture firstname and last name in group1 and group2,
CN=([a-zA-Z]+)\\,\s+([a-zA-Z]+)
Demo
Java code,
String s = "CN=Belzile\\, Pierre,OU=LaptopUser,OU=Users,DC=Company,DC=local";
Pattern p = Pattern.compile("CN=([a-zA-Z]+)\\\\,\\s+([a-zA-Z]+)");
Matcher m = p.matcher(s);
if(m.find()) {
System.out.println(m.group(1) + " " + m.group(2));
}
Prints your expected output,
Belzile Pierre

Extract values from string using regex groups

I have to extract values from string using regex groups.
Inputs are like this,
-> 1
-> 5.2
-> 1(2)
-> 3(*)
-> 2(3).2
-> 1(*).5
Now I write following code for getting values from these inputs.
String stringToSearch = "2(3).2";
Pattern p = Pattern.compile("(\\d+)(\\.|\\()(\\d+|\\*)\\)(\\.)(\\d+)");
Matcher m = p.matcher(stringToSearch);
System.out.println("1: "+m.group(1)); // O/P: 2
System.out.println("3: "+m.group(3)); // O/P: 3
System.out.println("3: "+m.group(5)); // O/P: 2
But, my problem is only first group is compulsory and others are optional.
Thats why I need regex like, It will check all patterns and extract values.
Use non-capturing groups and turn them to optional by adding ? quantifier next to those groups.
^(\d+)(?:\((\d+|\*)\))?(?:\.(\d+))?$
DEMO
Java regex would be,
"(?m)^(\\d+)(?:\\((\d\+|\\*)\\))?(?:\\.(\\d+))?$"
Example:
String input = "1\n" +
"5.2\n" +
"1(2)\n" +
"3(*)\n" +
"2(3).2\n" +
"1(*).5";
Matcher m = Pattern.compile("(?m)^(\\d+)(?:\\((\\d+|\\*)\\))?(?:\\.(\\d+))?$").matcher(input);
while(m.find())
{
if (m.group(1) != null)
System.out.println(m.group(1));
if (m.group(2) != null)
System.out.println(m.group(2));
if (m.group(3) != null)
System.out.println(m.group(3));
}
Here is an alternate approach that is simpler to understand.
First replace all non-digit, non-* characters by a colon
Split by :
Code:
String repl = input.replaceAll("[^\\d*]+", ":");
String[] tok = repl.split(":");
RegEx Demo

Find multiple string matches using Java regex

I am trying to use regex to find a match for a string between Si and (P) or Si and (I).
Below is what I wrote. Why isn't it working and how do I fix it?
String Channel = "Si0/4(I) Si0/6( Si0/8K Si0/5(P)";
if (Channel.length() > 0) {
String pattern1 = "Si";
String pattern2 = "(P)";
String pattern3 = "(I)";
String P1 = Pattern.quote(pattern1) + "(.*?)[" + Pattern.quote(pattern2) + "|" + Pattern.quote(pattern3) + "]";
Pattern p = Pattern.compile(P1);
Matcher m = p.matcher(Channel);
while(m.find()){
if (m.group(1)!= null)
{
System.out.println(m.group(1));
}
else if (m.group(2)!= null)
{
System.out.println(m.group(2));
}
}
}
Expected output
0/4
0/5
Actual output
0/4
0/6
0/8K Si0/5
Use a lookbehind and lookahead in your regex. And also you need to add space inside the character class, so that it won't this 0/8K string .
(?<=Si)[^\\( ]*(?=\\((?:P|I)\\))
DEMO
String str="Si0/4(I) Si0/6( Si0/8K Si0/5(P)";
String regex="(?<=Si)[^\\( ]*(?=\\([PI]\\))";
Pattern pattern = Pattern.compile(regex);
Matcher matcher =pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(0));
}
Output:
0/4
0/5
You need to group your regex.It is currently
Si(.*?)[(P)|(I)]
Whereas it should be
Si(.*?)\(I\)|Si(.*?)\(P\)
See demo.
http://regex101.com/r/oO8zI4/8
[] means "any of these character", so it evaluates every letter in the block as if they were separated with OR.
If the result you're searching is always: number/number
You can use:
Si(\d+\/\d+)(?:\(P\)|\(I\))

Group Matching Regex fails in Java

Why does this regex pattern fail to match the groups in Java. When I run the same example with in a bash shell with echo and sed it works.
String s = "Match foo and bar and baz";
//Pattern p = Pattern.compile("Match (.*) or (.*) or (.*)"); //was a typo
Pattern p = Pattern.compile("Match (.*) and (.*) and (.*)");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
I am expecting to match foo, bar, and baz.
$ echo "Match foo and bar and baz" | sed 's/Match \(.*\) and \(.*\) and \(.*\)/\1, \2, \3/'
foo, bar, baz
It is due to greedy nature of .*. You can use this regex:
Pattern p = Pattern.compile("Match (\\S+) and (\\S+) and (\\S+)");
Here this regex is using \\S+ which means match 1 or more non-spaces.
Full code
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1) + ", " + m.group(2) + ", " + m.group(3));
}
You're trying to match the whole String, so
while (m.find()) {
will only iterate once.
That single find() will capture all the groups. As such, you can print them out as
System.out.println(m.group(1) + " " + m.group(2) + m.group(3));
Or use a for loop over the Matcher#groupCount().
Your regex is correct, but you need to print the different groups and not only the 1st, ex:
while (m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
}
It seems like a simple typo (or -> and):
Pattern p = Pattern.compile("Match (.*) and (.*) and (.*)");
UPDATE
To replace:
String s = "Match foo and bar and baz";
String replaced = s.replaceAll("Match (.*) and (.*) and (.*)", "$1, $2, $3");
System.out.println(replaced);

Regular expression for mobile number vaidation?

I have following regular expression for following mobile numbers:
^(([+]|[0]{2})([\\d]{1,3})([\\s-]{0,1}))?([\\d]{10})$
Valid numbers are:
+123-9854875847
00123 9854875847
+123 9854875847
9878757845
Above expression will not validate if user enter 9 or 11 digit mobile number but if user enter 9 or 11 digit number with +123 or +91 respectively then it is getting validate because in this part of expression ([\\d]{1,3}) last two digits are optional.
So, any way to make this part ([\\s-]{0,1}))?([\\d]{10}) not to get combine with this part ([\\d]{1,3})?
The question is somewhat unclear, but I presume you want to split the number and the country code.
This is quite easy to do by extracting groups. group(i) is the i-th thing in brackets.
I also applied these simplifications: [\\d] = \\d, {0,1} = ?, [+] = \\+, [0]{2} = 00.
Code:
String regex = "^((\\+|00)(\\d{1,3})[\\s-]?)?(\\d{10})$";
String str = "+123-9854875847";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
if (m.matches())
{
System.out.println("Country = " + m.group(3));
System.out.println("Data = " + m.group(4));
}
Output:
Country = 123
Data = 9854875847
Alternative using non-matching groups (?:): (so you can use group(1) and group(2))
String regex = "^(?:(?:\\+|00)(\\d{1,3})[\\s-]?)?(\\d{10})$";
String str = "+123-9854875847";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
if (m.matches())
{
System.out.println("Country = " + m.group(1));
System.out.println("Data = " + m.group(2));
}
Reference.
Related test.
As long as the extension is always separated from the rest of the phone number, your regex will work fine. If there is no such separation, there is no way to correctly validate a phone number.
Also keep in mind that both extensions and phone numbers can vary in length from country to country, so there is no regex that will solve all cases. If you can produce a list of allowed extensions, you can work that into the regex and get better matches, but for many groups of arbitrary length of digits you will get many wrong matches.
I have simplified your regex a bit, so oyu can see #Dukeling's suggestions in practice. Your regex on top, mine on the bottom.
^(([+]|[0]{2})([\\d]{1,3})([\\s-]{0,1}))?([\\d]{10})$
^( (\\+|00) \\d{1,3} [\\s-]?)? \\d{10} $
try {
String mobile_number="india number +919979045000\n" +
"india number 9979045000\n" +
"china number +86 591 2123654\n" +
"Brazil number +55 79 2012345\n" +
"it is test all string get mobile number all country"+
"Ezipt +20 10 1234567\n" +
"France +33 123456789\n" +
"Hong Kong +852 1234 5456\n" +
"Mexico +52 55 12345678"+
"thanks";
Pattern p = Pattern.compile("\\(?\\+[0-9]{1,3}\\)? ?-?[0-9]{1,3} ?-?[0-9]{3,5} ?-?[0-9]{5}( ?-?[0-9]{3})? ?(\\w{1,10}\\s?\\d{1,6})?");
List<String> numbers = new ArrayList<String>();
//mobile_number= mobile_number.replaceAll("\\-", "");
Matcher m = p.matcher("" + mobile_number);
while (m.find()) {
numbers.add(m.group());
}
p = Pattern.compile("\\(?\\+[0-9]{1,3}\\)? ?-?[0-9]{1,3} ?-?[0-9]{3,5} ?-?[0-9]{4}( ?-?[0-9]{3})? ?(\\w{1,10}\\s?\\d{1,6})?");
m = p.matcher("" + mobile_number);
while (m.find()) {
numbers.add(m.group());
}
p = Pattern.compile("((?:|\\+)([0-9]{5})(?: |\\-)(0\\d|\\([0-9]{5}\\)|[1-9]{0,5}))");
m = p.matcher("" + mobile_number);
while (m.find()) {
numbers.add(m.group());
}
p = Pattern.compile("[0-9]{10}|\\(?\\+[0-9]{1,3}\\)?-?[0-9]{3,5} ?-?[0-9]{4}?");
m = p.matcher("" + mobile_number);
while (m.find()) {
numbers.add(m.group());
}
String numberArray=numbers.toString();
System.out.print(""+numberArray);
// final result
/* [+919979045000, +86 591 2123654, +33 123456789, +52 55 12345678, +919979045000, +86 591 2123654, +55 79 2012345, +20 10 1234567, +33 123456789, +852 1234 5456, +52 55 12345678, +919979045000, 9979045000] */
} catch (Exception e) {
e.printStackTrace();
}
Best way to take input in two parts i.e country code and mobile number.
In that case you can easily validate it (both country code and mobile number) with regex.

Categories

Resources