java parse regex multiple capture groups

java parse regex multiple capture groups - java

Hi I need to be able to handle both of these scenarios
John, party of 4
william, party of 6 dislikes John, jeff
What I want to capture is
From string 1: John, 4
From String 2: william, 6, john, jeff
I'm pretty stumped at how to achieve this
I know that ([^,])+ gives me the first group (just the name before the comma, without including the comma) but I have no clue on how to concatenate the other portion of the expression.

You may use
(\w+)(?:,\s*party of (\d+)|(?![^,]))
See the regex demo.
Details
(\w+) - Group 1: one or more word chars
(?:,\s*party of (\d+)|(?![^,])) - a non-capturing group matching
,\s*party of (\d+) - ,, then 0+ whitespaces, then party of and a space, and then Group 2 capturing 1+ digits
| - or
(?![^,]) - a location that is followed with , or end of string.
See Java demo:
String regex = "(\\w+)(?:,\\s*party of (\\d+)|(?![^,]))";
List<String> strings = Arrays.asList("John, party of 4", "william, party of 6 dislikes John, jeff");
Pattern pattern = Pattern.compile(regex);
for (String s : strings) {
System.out.println("-------- Testing '" + s + "':");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(1) + ": " + (matcher.group(2) != null ? matcher.group(2) : "N/A"));
}
}
Output:
-------- Testing 'John, party of 4':
John: 4
-------- Testing 'william, party of 6 dislikes John, jeff':
william: 6
John: N/A
jeff: N/A

Related

Find two substrings using one regex

External app sends following line:
U999;U999;$SMS=;client: John Doe; A$ABC12345;, SHA:12345ABCDE
I need to extract 2 values from it: John Doe and 12345ABCDE
Now I can extract separately those 2 values using regex:
(?=client:(.*?);) for John Doe
(?=SHA:(.*?)$) for 12345ABCDE
Is it possible to extract those values using one regex in Pattern and extract them as list of 2 values?

You could use a pattern matcher with two capture groups:
String input = "U999;U999;$SMS=;client: John Doe; A$ABC12345;, SHA:12345ABCDE";
String pattern = "^.*;\\s*client: ([^;]+);.*;.*\\bSHA:([^;]+).*$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
if (m.find()) {
System.out.println("client: " + m.group(1));
System.out.println("SHA: " + m.group(2));
}
This prints:
client: John Doe
SHA: 12345ABCDE

how to format string line that contains numbers with regex

I want to format telephone numbers from the following format:
359878123456
0878123456
00359878123456
that are placed in a file that has information about name and phone number in the following format:
DarkoT 00359878123456
to be formatted in a standard form just for the numbers(to ignore the name). see below:
DarkoT +359 87 2 123456
this is for all cases.
This is where i am at.(my regex)
System.out.println(String.valueOf(inputLine).replaceAll("((\\+|00)359|0)(\\-|\\s)?8[7-9][2-9](\\-|\\s)?\\d{3}(\\s|\\-)?\\d{3}$", "($1)-\\$"));
I am confused with the placement. Please advise.

1 solution without regex: you could just split the string and group it from the back. But, I would really prefer doing it with regex, so here I go:
I suppose you must have this format (if this is not correct, this will not work):
[OPTIONAL] 00 (length: 2)
[OPTIONAL] 111 (Considering always 3 numbers) (length: 3)
If you don't have the former: 0 (changing zones, I understand?) (length: 1)
22 (length: 2)
3 (length: 1)
444444 (length: 6)
And now the regex to capture this:
(?:(?:00)?(\d{3})|0)(\d{2})(\d{1})(\d{6})
You will have, as a result:
Group 1: 3 digits (country code?) or nothing (if it's the '0').
Group 2: 2 digits (zone?)
Group 3: 1 digit (no idea, in my country we don't use this)
Group 4: last 6 digits
Using a replace have some limitations, so I would use a matcher, as easy as:
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(originalNumber);
if (m.find()) {
String nationalCode = m.group(1) != null ? m.group(1) : DEFAULT_NATIONAL_CODE;
formattedNumber = "+" + nationalCode + " " + m.group(2) + " " + m.group(3) + " " + m.group(4);
}
If you want more flexibility (for example, country numbers as 2 digits, not only 3) let me know and I will change the regexp.
NOTE: I didn't test this, just coded off the top of my head, let me know if it fails.

I think this is a continuation of your previous problem How to check exact phone number in Java with regex
Sample.java
import java.util.*;
import java.util.regex.*;
import java.io.*;
public class Sample{
public static void main(String[] args){
try{
File inFile = new File ("phonebook.txt");
Scanner sc = new Scanner (inFile);
while (sc.hasNextLine())
{
String line = sc.nextLine();
if(line.matches("^((.)*\\s)?((\\+|00)359|0)8[7-9][2-9]\\d{6}$")) // here that code doesn't work
{
System.out.println (line.replaceAll("^([^\\s\\+]*\\s?)?((\\+|00)?359|0)[-\\s]?(8[7-9][2-9])[-\\s]?(\\d{3})[-\\s]?(\\d{3})$", "$1 - $2 $4 $5 $6").replaceAll("00359","+359").replaceAll("- 0","+359"));
}
}
sc.close();
}catch(Exception e){}
}
}
phonebook.txt
Sagar +359883123456
Test 00359883123565
Someone 0883123456
People 1234567890
Test1 +359873123456
Output
C:\Users\acer\Desktop\Java\programs>javac Sample.java
C:\Users\acer\Desktop\Java\programs>java Sample
Sagar - +359 883 123 456
Test - +359 883 123 565
Someone +359 883 123 456
Test1 - +359 873 123 456

Regular expression split string on colon

I have a string
String l = "name: kumar age: 22 relationship: single "
it is comming from UI dynamically now i need to split the above string to
name: kumar
age: 22
relationship: single
My code is :
Pattern ptn = Pattern.compile("([^\\s]+( ?= ?[^\\s]*)?)");
Matcher mt = ptn.matcher(l);
while(mt.find())
{
String col_dat=mt.group(0);
if(col_dat !=null && col_dat.length()>0)
{
System.out.println("\t"+col_dat );
}
}
Any Suggestions will appreciated Thank you

You can use this regex:
\S+\s*:\s*\S+
Or this:
\w+\s*:\s*\w+
Demo: https://regex101.com/r/EgXlcD/6
Regex:
\S+ - 1 or more non space characters
\s* - 0 or more space characters
\w+ - 0 or more \w i.e [A-Za-z0-9_] characters.

Extract a particular number from a string using regex in java

Here is my string
INPUT:
22 TIRES (2 defs)
1 AP(PEAR + ANC)E (CAN anag)
6 CHIC ("SHEIK" hom)
EXPECTED OUTPUT:
22 TIRES
1 APPEARANCE
6 CHIC
ACTUAL OUTPUT :
TIRES
APPEARANCE
CHIC
I tried using below code and got the above output.
String firstnames =a.split(" \\(.*")[0].replace("(", "").replace(")", "").replace(" + ",
"");
Any idea of how to extract along with the numbers ? I don't want the numbers which are after the parentheses like in the input " 22 TIRES (2 defs)". I need the output as "22 TIRES" Any help would be great !!

I am doing it bit differently
String line = "22 TIRES (2 defs)\n\n1 AP(PEAR + ANC)E (CAN anag)\n\n6 CHIC (\"SHEIK\" hom)";
String pattern = "(\\d+\\s+)(.*)\\(";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
while (m.find()) {
String tmp = m.group(1) + m.group(2).replaceAll("[^\\w]", "");
System.out.println(tmp);
}
Ideone Demo

I would use a single replaceAll function.
str.replaceAll("\\s+\\(.*|\\s*\\+\\s*|[()]", "");
DEMO
\\s+\\(.*, this matches a space and and the following ( characters plus all the remaining characters which follows this pattern. So (CAN anag) part in your example got matched.
\\s*\\+\\s* matches + along with the preceding and following spaces.
[()] matches opening or closing brackets.
Atlast all the matched chars are replaced by empty string.

Java Split String by colon on both side

Can you suggest me an approach by which I can split a String which is like:
:31C:150318
:31D:150425 IN BANGLADESH
:20:314015040086
So I tried to parse that string with
:[A-za-z]|\\d:
This kind of regular expression, but it is not working . Please suggest me a regular expression by which I can split that string with 20 , 31C , 31D etc as Keys and 150318 , 150425 IN BANGLADESH etc as Values .
If I use string.split(":") then it would not serve my purpose.
If a string is like:
:20: MY VALUES : ARE HERE
then It will split up into 3 string , and key 20 will be associated with "MY VALUES" , and "ARE HERE" will not associated with key 20 .

You may use matching mechanism instead of splitting since you need to match a specific colon in the string.
The regex to get 2 groups between the first and second colon and also capture everything after the second colon will look like
^:([^:]*):(.*)$
See demo. The ^ will assert the beginning of the string, ([^:]*) will match and capture into Group 1 zero or more characters other than :, and (.*) will match and capture into Group 2 the rest of the string. $ will assert the position at the end of a single line string (as . matches any symbol but a newline without Pattern.DOTALL modifier).
String s = ":20:AND:HERE";
Pattern pattern = Pattern.compile("^:([^:]*):(.*)$");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println("Key: " + matcher.group(1) + ", Value: " + matcher.group(2) + "\n");
}
Result for this demo: Key: 20, Value: AND:HERE

You can use the following to split:
^[:]+([^:]+):

Try with split function of String class
String[] splited = string.split(":");
For your requirements:
String c = ":31D:150425 IN BANGLADESH:todasdsa";
c=c.substring(1);
System.out.println("C="+c);
String key= c.substring(0,c.indexOf(":"));
String value = c.substring(c.indexOf(":")+1);
System.out.println("key="+key+" value="+value);
Result:
C=31D:150425 IN BANGLADESH:todasdsa
key=31D value=150425 IN BANGLADESH:todasdsa

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java parse regex multiple capture groups - java

Related

Find two substrings using one regex

how to format string line that contains numbers with regex

Regular expression split string on colon

Extract a particular number from a string using regex in java

Java Split String by colon on both side

Categories

Resources