Java - How do I code optional regex patterns with a matcher - java

Let’s say I am looping through a text file and come across the following two strings with random words and integer values
“foo 11 25”
“foo 38 15 976 24”
I write a regex pattern that would match both strings, for example:
((?:[a-z][a-z]+)\\s+\\d+\\s\\d+)
But, the problem is I don’t think this regex would allow me to get to all 4 integer values in the 2nd string.
Q1.) How can I create a single pattern that leaves these 3rd and 4th integers optional?
Q2.) How do I write the matcher code to only go after the 3rd and 4th values when they are found by the pattern?
Here is a template program to help anyone willing to offer a hand. Thanks.
public void foo(String fooFile) {
//Assume fooFile contains the two strings
//"foo 11 25";
//"foo 38 976 24";
Pattern p = Pattern.compile("((?:[a-z][a-z]+)\\s+\\d+\\s\\d+)", Pattern.CASE_INSENSITIVE);
BufferedReader br = new BufferedReader(new FileReader(fooFile));
String line;
while ((line = br.readLine()) != null) {
//Process the patterns
Matcher m1 = p.matcher(line);
if (m1.find()) {
int int1, int2, int3, int4;
//Need help to write the matcher code
}
}
}

If you want to retrieve every int value, you can use regex:
[a-z]+\s(\d+)\s(\d+)\s?(\d+)?\s?(\d+)?
DEMO
and every int will be in groups from 1 to 4. Then you can use somethig like:
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args){
String[] strings = {"foo 11 25","foo 67 45 97",
"foo 38 15 976 24"};
for(String string : strings) {
ArrayList<Integer> numbers = new ArrayList<Integer>();
Matcher matcher = Pattern.compile("[a-z]+\\s(\\d+)\\s(\\d+)\\s?(\\d+)?\\s?(\\d+)?").matcher(string);
matcher.find();
for(int i = 0; i < 4; i++){
if(matcher.group(i+1) != null) {
numbers.add(Integer.valueOf(matcher.group(i + 1)));
}else{
System.out.println("group " + (i+1) + " is " + matcher.group(i+1));
}
}
System.out.println("Match from string: "+ "\""+ string + "\"" + " : " + numbers.toString());
}
}
}
with output:
group 3 is null
group 4 is null
Match from string: "foo 11 25" : [11, 25]
group 4 is null
Match from string: "foo 67 45 97" : [67, 45, 97]
Match from string: "foo 38 15 976 24" : [38, 15, 976, 24]
Another way would be to get all int in one group with:
[a-z]+\s((?:\d+\s?)+)
DEMO
and split matcher.group(1) with space, you will get String[] with values. Implementation in Java:
public class Test {
public static void main(String[] args){
String[] strings = {"foo 11 25","foo 67 45 97",
"foo 38 15 976 24"};
for(String string : strings) {
ArrayList<Integer> numbers = new ArrayList<Integer>();
Matcher matcher = Pattern.compile("[a-z]+\\s((?:\\d+\\s?)+)").matcher(string);
matcher.find();
String[] nums = matcher.group(1).split("\\s");
for(String num : nums){
numbers.add(Integer.valueOf(num));
}
System.out.println("Match from string: "+ "\""+ string + "\"" + " : " + numbers.toString());
}
}
}
with output:
Match from string: "foo 11 25" : [11, 25]
Match from string: "foo 67 45 97" : [67, 45, 97]
Match from string: "foo 38 15 976 24" : [38, 15, 976, 24]

The current regex pattern you are using requires the text \s\d\s\d at the end. If you want it to allow for any number of numbers each preceded by whitespace, you would use (\s+\d+)+.
So the full regex would be ((?:[a-z][a-z]+)(\s+\d+)+)

Related

Split string and extract text and number

I have to divide an address into street and number. Examples
Lievensberg 31D
Jablunkovska 21/2
Weimarstraat 113 A
Pastoor Baltesenstraat 22
Van Musschenboek strasse 84
I need to split like this:
Street1: Lievensberg
Number1: 31D
Street2: Jablunkovska
Number2: 21/2
Street3: Weimarstraat
Number3: 113 A
Street4: Pastoor Baltesenstraat
Number4: 22
Street5: Van Musschenboek strasse
Number5: 84
I used this code but not working, because I need to split only when the character after the white space will be a number:
String[] arrSplit = address_line.split("\\s");
for (int i = 0; i < arrSplit.length; i++) {
System.out.println(arrSplit[i]);
}
But I don't know how to do it so that all my requirements are met. Any idea?
If the number can be optional, instead of using split, you could use 2 capturing groups where the second group is optional.
^([^\d\r\n]+?)(?:\h*(\d.*)|$)
Explanation
^ Start of string
([^\d\r\n]+?) Match 1+ times any char except a digit or newline non greedy
(?: Non capture group
\h*(\d.*) Match 0+ horizontal whitespace chars
| Or
$ End of string
) Close non capture group
Regex demo | Java demo
Example code
String regex = "^([^\\d\\r\\n]+?)(?:\\h*(\\d.*)|$)";
String string = "Lievensberg 31D\n"
+ "Jablunkovska 21/2\n"
+ "Weimarstraat 113 A\n"
+ "Pastoor Baltesenstraat 22\n"
+ "Van Musschenboek strasse 84\n"
+ "Lievensberg";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Street: " + matcher.group(1));
if (matcher.group(2) != null) {
System.out.println("Number: " + matcher.group(2));
}
System.out.println("------------------");
}
Output
Street: Lievensberg
Number: 31D
------------------
Street: Jablunkovska
Number: 21/2
------------------
Street: Weimarstraat
Number: 113 A
------------------
Street: Pastoor Baltesenstraat
Number: 22
------------------
Street: Van Musschenboek strasse
Number: 84
------------------
Street: Lievensberg
------------------
Something like this:
ArrayList<String> list = new ArrayList();
list.add("Lievensberg 31D");
list.add("Jablunkovska 21/2");
list.add("Weimarstraat 113 A");
list.add("Pastoor Baltesenstraat 22");
list.add("Van Musschenboek strasse 84");
for(int i=0;i<list.size();i++){
System.out.println("Street"+(i+1)+": "+ list.get(i).split("\\s+(?=\\d)")[0]);
System.out.println("Number"+(i+1)+": "+ list.get(i).split("\\s+(?=\\d)")[1]);
}
You can use regex to verify first whether it matches or not, then only process it.
String str1 = "Lievensberg 31D"; // street = Lievensberg, number = 31D
String str2 = "Lievensberg NN31D"; // doesn't matches
String str3 = "Lievensberg"; // street = Lievensberg, number = null
String str4 = "Pastoor Baltesenstraat 22"; // street = Pastoor Baltesenstraat, number = 22
Pattern pattern = Pattern.compile("([a-zA-Z ]+?)(\\s(\\d+)(.*))?");
Matcher matcher = pattern.matcher(str1);
if(matcher.matches()) {
String street = matcher.group(1);
String number = matcher.group(2) != null ? matcher.group(3) + matcher.group(4) : null;
System.out.println("street = " + street);
System.out.println("number = " + number);
}
You can use this logic:
Find the index of the first number
Split the string based on this index
For better understanding use below code
public static void main(String[] args) {
String address_line = "Weimarstraat 113 A";
// Find index of first number
Matcher matcher = Pattern.compile("\\d+").matcher(address_line);
int i = -1;
for(char c: address_line.toCharArray() ){
if('0'<=c && c<='9')
break;
i++;
}
//Split string using index
System.out.println(address_line.substring(0, i));
System.out.println(address_line.substring(i+1));
}
Its output will be:
Weimarstraat
113 A
Here's a simple solution using regex and split:
String str = "Jablunkovska 21/2";
String[] split = str.split("\\s(?=\\d)", 2);
System.out.println(Arrays.toString(split));
Output:
[Jablunkovska, 21/2]
The expression (?=\\d) is a lookahead for a digit, so it doesn't get removed with the split.

Java Regular Expression remove Matches from String

I'm trying to figure out how to remove the found matches from my String. So my Code sample currently looks like this:
public void checkText() {
String helper = "985, 913, 123, SomeotherText, MoreText, MoreText";
Pattern pattern = Pattern.compile("\\b\\d{3}");
Matcher matcher = pattern.matcher(helper);
while (matcher.find()) {
String newtext = "Number: " + matcher.group() + "\n"+ newtext;
helper.replaceAll(matcher.group(),"");
}
newtext = newtext + "________________\n"+ helper;
editText.setText(newtext);
}
So my input string is: 985, 913, 123, SomeotherText, MoreText, MoreText
After running the code what I would like to see is this:
Number: 985
Number: 913
Number: 123
________________________
SomeotherText, MoreText, MoreText
Anyone can tell me whats wrong in my current code?
There are a few things you could update in the code:
You should set the return of the replacement to helper
If you only replace with an empty string, your string will start with , , , in the replacement leaving the comma's and the follwing space
You might initialize the variable String newtext = "";
See a Java demo
Your code might look like:
String helper = "985, 913, 123, SomeotherText, MoreText, MoreText";
Pattern pattern = Pattern.compile("\\b\\d{3}");
Matcher matcher = pattern.matcher(helper);
String newtext = "";
while (matcher.find()) {
newtext = "Number: " + matcher.group() + "\n"+ newtext;
helper = helper.replaceAll(matcher.group() + ", ","");
}
newtext = newtext + "________________\n"+ helper;
System.out.println(newtext);
Result:
Number: 123
Number: 913
Number: 985
________________
SomeotherText, MoreText, MoreText
Since you are already using the Matcher class you can also use the method Matcher.appendReplacement for the replacement:
public void checkText() {
String helper = "985, 913, 123, SomeotherText, MoreText, MoreText";
Pattern pattern = Pattern.compile("\\b\\d{3}, ");
Matcher matcher = pattern.matcher(helper);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
System.out.println("Number:"+matcher.group());
matcher.appendReplacement(sb, "");
}
matcher.appendTail(sb);
System.out.println(sb.toString());
}

reading data from text file and validate it

I have a text file and i need to read data from it to a 2D array. the file contains string as well as numbers.
String[][] arr = new String[3][5];
BufferedReader br = new BufferedReader(new FileReader("C:/Users/kp/Desktop/sample.txt"));
String line = " ";
String [] temp;
int i = 0;
while ((line = br.readLine())!= null){
temp = line.split(" ");
for (int j = 0; j<arr[i].length; j++) {
arr[i][j] = (temp[j]);
}
i++;
}
sample text file is :
name age salary id gender
jhon 45 4900 22 M
janey 33 4567 33 F
philip 55 5456 44 M
now, when the name is a single word without any space in between, the code works. but it doesn't work when the name is like "jhon desuja". How to overcome this?
I need to store it in a 2d array. how to validate the input? like name should not contain numbers or age should not be negative or contain letters. any help will be highly appreciated.
Regular Expression might be a better options:
Pattern p = Pattern.compile("(.+) (\\d+) (\\d+) (\\d+) ([MF])");
String[] test = new String[]{"jhon 45 4900 22 M","janey 33 4567 33 F","philip 55 5456 44 M","john mayer 56 4567 45 M"};
for(String line : test){
Matcher m = p.matcher(line);
if(m.find())
System.out.println(m.group(1) +", " +m.group(2) +", "+m.group(3) +", " + m.group(4) +", " + m.group(5));
}
which would return
jhon, 45, 4900, 22, M
janey, 33, 4567, 33, F
philip, 55, 5456, 44, M
john mayer, 56, 4567, 45, M

Java: Find Integers in a String (Calculator)

If I have a String that looks like this: String calc = "5+3". Can I substring the integers 5 and 3?
In this case, you do know how the String looks, but it could look like this: String calc = "55-23" Therefore, I want to know if there is a way to identify integers in a String.
For something like that, regular expression is your friend:
String text = "String calc = 55-23";
Matcher m = Pattern.compile("\\d+").matcher(text);
while (m.find())
System.out.println(m.group());
Output
55
23
Now, you might need to expand it to support decimals:
String text = "String calc = 1.1 + 22 * 333 / (4444 - 55555)";
Matcher m = Pattern.compile("\\d+(?:.\\d+)?").matcher(text);
while (m.find())
System.out.println(m.group());
Output
1.1
22
333
4444
55555
You could use a regex like ([\d]+)([+-])([\d]+) to obtain the full binary expression.
Pattern pattern = Pattern.compile("([\\d]+)([+-])([\\d]+)");
String calc = "5+3";
Matcher matcher = pattern.matcher(calc);
if (matcher.matches()) {
int lhs = Integer.parseInt(matcher.group(1));
int rhs = Integer.parseInt(matcher.group(3));
char operator = matcher.group(2).charAt(0);
System.out.print(lhs + " " + operator + " " + rhs + " = ");
switch (operator) {
case '+': {
System.out.println(lhs + rhs);
}
case '-': {
System.out.println(lhs - rhs);
}
}
}
Output:
5 + 3 = 8
You can read each character and find it's Ascii code. Evaluate its code if it is between 48 and 57, it is a number and if it is not, it is a symbol.
if you find another character that is a number also you must add to previous number until you reach a symbol.
String calc="55-23";
String intString="";
char tempChar;
for (int i=0;i<calc.length();i++){
tempChar=calc.charAt(i);
int ascii=(int) tempChar;
if (ascii>47 && ascii <58){
intString=intString+tempChar;
}
else {
System.out.println(intString);
intString="";
}
}

How to divide string into two parts using regex in java?

String strArray="135(i),15a,14(g)(q)12,67dd(),kk,159"; //splited by ','
divide string after first occurrence of alphanumeric value/character
expected output :
original expected o/p
15a s1=15 s2=a
67dd() s1=67 s2=dd()
kk s1="" s2=kk
159 s1=159 s2=""
Please help me................
You could use the group-method of Pattern/Matcher:
String strArray = "135(i),15a,14(g)(q)12,67dd(),kk,159";//splited by ','
Pattern pattern = Pattern.compile("(?<digits>\\d*)(?<chars>[^,]*)");
Matcher matcher = pattern.matcher(strArray);
while (matcher.find()) {
if (!matcher.group().isEmpty()) //omit empty groups
System.out.println(matcher.group() + " : " + matcher.group("digits") + " - " + matcher.group("chars"));
}
The method group(String name) gives you the String found in the pattern's parenthesis with the specific name (here it is 'digits' or 'chars') within the match.
The method group(int i) would give you the String found in the i-th parenthesis of the pattern within the match.
See the Oracle tutorial at http://docs.oracle.com/javase/tutorial/essential/regex/ for more examples of using regex in Java.
You can use a Pattern and a Matcher to find the first index of a letter preceded by a number and split at that position.
Code
public static void main(String[] args) throws ParseException {
String[] inputs = { "15a", "67dd()", "kk", "159" };
for (String input : inputs) {
Pattern p = Pattern.compile("(?<=[0-9])[a-zA-Z]");
Matcher m = p.matcher(input);
System.out.println("Input: " + input);
if (m.find()) {
int splitIndex = m.end();
// System.out.println(splitIndex);
System.out.println("1.\t"+input.substring(0, splitIndex - 1));
System.out.println("2.\t"+input.substring(splitIndex - 1));
} else {
System.out.println("1.");
System.out.println("2.\t"+input);
}
}
}
Output
Input: 15a
1. 15
2. a
Input: 67dd()
1. 67
2. dd()
Input: kk
1.
2. kk
Input: 159
1.
2. 159
Use java.util.regex.Pattern and java.util.regex.Matcher
String strArray="135(i),15a,14(g)(q)12,67dd(),kk,159";
String arr[] = strArray.split(",");
for (String s : arr) {
Matcher m = Pattern.compile("([0-9]*)([^0-9]*)").matcher(s);
System.out.println("String in = " + s);
if(m.matches()){
System.out.println(" s1: " + m.group(1));
System.out.println(" s2: " + m.group(2));
} else {
System.out.println(" unmatched");
}
}
outputs:
String in = 135(i)
s1: 135
s2: (i)
String in = 15a
s1: 15
s2: a
String in = 14(g)(q)12
unmatched
String in = 67dd()
s1: 67
s2: dd()
String in = kk
s1:
s2: kk
String in = 159
s1: 159
s2:
Note how '14(g)(q)12' is not matched. It's not clear what the OP's required output is in this instance (or if a comma is missing from this portion of the example input string).

Categories

Resources