Split string and extract text and number - java

I have to divide an address into street and number. Examples
Lievensberg 31D
Jablunkovska 21/2
Weimarstraat 113 A
Pastoor Baltesenstraat 22
Van Musschenboek strasse 84
I need to split like this:
Street1: Lievensberg
Number1: 31D
Street2: Jablunkovska
Number2: 21/2
Street3: Weimarstraat
Number3: 113 A
Street4: Pastoor Baltesenstraat
Number4: 22
Street5: Van Musschenboek strasse
Number5: 84
I used this code but not working, because I need to split only when the character after the white space will be a number:
String[] arrSplit = address_line.split("\\s");
for (int i = 0; i < arrSplit.length; i++) {
System.out.println(arrSplit[i]);
}
But I don't know how to do it so that all my requirements are met. Any idea?

If the number can be optional, instead of using split, you could use 2 capturing groups where the second group is optional.
^([^\d\r\n]+?)(?:\h*(\d.*)|$)
Explanation
^ Start of string
([^\d\r\n]+?) Match 1+ times any char except a digit or newline non greedy
(?: Non capture group
\h*(\d.*) Match 0+ horizontal whitespace chars
| Or
$ End of string
) Close non capture group
Regex demo | Java demo
Example code
String regex = "^([^\\d\\r\\n]+?)(?:\\h*(\\d.*)|$)";
String string = "Lievensberg 31D\n"
+ "Jablunkovska 21/2\n"
+ "Weimarstraat 113 A\n"
+ "Pastoor Baltesenstraat 22\n"
+ "Van Musschenboek strasse 84\n"
+ "Lievensberg";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Street: " + matcher.group(1));
if (matcher.group(2) != null) {
System.out.println("Number: " + matcher.group(2));
}
System.out.println("------------------");
}
Output
Street: Lievensberg
Number: 31D
------------------
Street: Jablunkovska
Number: 21/2
------------------
Street: Weimarstraat
Number: 113 A
------------------
Street: Pastoor Baltesenstraat
Number: 22
------------------
Street: Van Musschenboek strasse
Number: 84
------------------
Street: Lievensberg
------------------

Something like this:
ArrayList<String> list = new ArrayList();
list.add("Lievensberg 31D");
list.add("Jablunkovska 21/2");
list.add("Weimarstraat 113 A");
list.add("Pastoor Baltesenstraat 22");
list.add("Van Musschenboek strasse 84");
for(int i=0;i<list.size();i++){
System.out.println("Street"+(i+1)+": "+ list.get(i).split("\\s+(?=\\d)")[0]);
System.out.println("Number"+(i+1)+": "+ list.get(i).split("\\s+(?=\\d)")[1]);
}

You can use regex to verify first whether it matches or not, then only process it.
String str1 = "Lievensberg 31D"; // street = Lievensberg, number = 31D
String str2 = "Lievensberg NN31D"; // doesn't matches
String str3 = "Lievensberg"; // street = Lievensberg, number = null
String str4 = "Pastoor Baltesenstraat 22"; // street = Pastoor Baltesenstraat, number = 22
Pattern pattern = Pattern.compile("([a-zA-Z ]+?)(\\s(\\d+)(.*))?");
Matcher matcher = pattern.matcher(str1);
if(matcher.matches()) {
String street = matcher.group(1);
String number = matcher.group(2) != null ? matcher.group(3) + matcher.group(4) : null;
System.out.println("street = " + street);
System.out.println("number = " + number);
}

You can use this logic:
Find the index of the first number
Split the string based on this index
For better understanding use below code
public static void main(String[] args) {
String address_line = "Weimarstraat 113 A";
// Find index of first number
Matcher matcher = Pattern.compile("\\d+").matcher(address_line);
int i = -1;
for(char c: address_line.toCharArray() ){
if('0'<=c && c<='9')
break;
i++;
}
//Split string using index
System.out.println(address_line.substring(0, i));
System.out.println(address_line.substring(i+1));
}
Its output will be:
Weimarstraat
113 A

Here's a simple solution using regex and split:
String str = "Jablunkovska 21/2";
String[] split = str.split("\\s(?=\\d)", 2);
System.out.println(Arrays.toString(split));
Output:
[Jablunkovska, 21/2]
The expression (?=\\d) is a lookahead for a digit, so it doesn't get removed with the split.

Related

String output with quotes

I get in put string as below
{key: IsReprint, value:COPY};{key: IsCancelled, value:CANCELLED}
I want to convert above string as below in my output...,want to add quotes to the string (key , value pairs).
{"key": "IsReprint", "value":"COPY"};{"key": "IsCancelled", "value":"CANCELLED"}
Please assist..thanks in advance..
String input="{key: IsReprint, value:COPY};{key: IsCancelled,value:CANCELLED}";
if(input.contains("key:") && input.contains("value:") ){
input=input.replaceAll("key", "\"key\"");
input=input.replaceAll("value", "\"value\"");
input=input.replaceAll(":", ":\"");
input=input.replaceAll("}", "\"}");
input=input.replaceAll(",", "\",");
//System.out.println("OUTPUT----> "+input);
}
I above code has problem if input string as below
{key: BDTV, value:Africa Ltd | Reg No: 433323240833-C23441,GffBLAB | VAT No: 4746660239035Level 6}
You could use regex to accomplish the same, but more concisely:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class JsonScanner {
private final static String JSON_REGEX = "\\{key: (.*?), value:(.*?)(\\};|\\}$)";
/**
* Splits the JSON string into key/value tokens.
*
* #param json the JSON string to format
* #return the formatted JSON string
*/
private String findMatched(String json) {
Pattern p = Pattern.compile(JSON_REGEX);
Matcher m = p.matcher(json);
StringBuilder result = new StringBuilder();
while (m.find()) {
result.append("\"key\"=\"" + m.group(1) + "\", ");
result.append("\"value\"=\"" + m.group(2) + "\" ; ");
System.out.println("m.group(1)=" + m.group(1) + " ");
System.out.println("m.group(2)=" + m.group(2) + " ");
System.out.println("m.group(3)=" + m.group(3) + "\n");
}
return result.toString();
}
public static void main(String... args) {
JsonScanner jsonScanner = new JsonScanner();
String result = jsonScanner.findMatched("{key: TVREG, value:WestAfrica Ltd | VAT No: 1009034324829/{834324}<br/>Plot No.56634773,Road};{key: REGISTRATION, value:SouthAfricaLtd | VAT No: 1009034324829/{834324}<br />Plot No. 56634773, Road}");
System.out.println(result);
}
}
You might have to tweak the regex or output string to meet your exact requirements, but this should give you an idea of how to get started...
You have to escape characters
How do I escape a string in Java?
For example:
String s = "{\"key\": \"IsReprint\""; // will be print as {"key": "IsReprint"
The double quote character has to be escaped with a backslash in a Java string literal. Other characters that need special treatment include:
Carriage return and newline: "\r" and "\n"
Backslash: "\"
Single quote: "\'"
Horizontal tab and form feed: "\t" and "\f".
Here is a solution using a regexp to split your input into key / value pairs and then aggregating the result using the format you wish :
// Split key value pairs
final String regexp = "\\{(.*?)\\}";
final Pattern p = Pattern.compile(regexp);
final Matcher m = p.matcher(input);
final List<String[]> keyValuePairs = new ArrayList<>();
while (m.find())
{
final String[] keyValue = input.substring(m.start() + 1, m.end() - 1) // "key: IsReprint, value:COPY"
.substring(5) // "IsReprint, value:COPY"
.split(", value:"); // ["IsReprint", "COPY"]
keyValuePairs.add(keyValue);
}
// Aggregate
final List<String> newKeyValuePairs = keyValuePairs.stream().map(keyValue ->
{
return "{\"key\": \"" + keyValue[0] + "\", \"value\":\"" + keyValue[1] + "\"}";
}).collect(Collectors.toList());
System.out.println(StringUtils.join(newKeyValuePairs.toArray(), ";"));
The result for the folowing input string
final String input = "{key: IsReprint, value:COPY};{key: IsCancelled, value:CANCELLED};{key: BDTV, value:Africa Ltd | Reg No: 433323240833-C23441,GffBLAB | VAT No: 4746660239035<br />Level 6}";
is {"key": "IsReprint", "value":"COPY"};{"key": "IsCancelled", "value":"CANCELLED"};{"key": "BDTV", "value":"Africa Ltd | Reg No: 433323240833-C23441,GffBLAB | VAT No: 4746660239035<br />Level 6"}
This gives the exact result as you want!
public static void main(String s[]){
String test = "{key: TVREG, value:WestAfrica Ltd | VAT No: 1009034324829/{834324}<br />Plot No. 56634773, Road};{key: REGISTRATION, value:SouthAfricaLtd | VAT No: 1009034324829/{834324}<br />Plot No. 56634773, Road}";
StringBuilder sb= new StringBuilder();
String[] keyValOld = test.split(";");
for(int j=0; j<keyValOld.length; j++){
String keyVal = keyValOld[j].substring(1,keyValOld[j].length()-1);
String[] parts = keyVal.split("(:)|(,)",4);
sb.append("{");
for (int i = 0; i < parts.length; i += 2) {
sb.append("\""+parts[i].trim()+"\": \""+parts[i + 1].trim()+"\"");
if(i+2<parts.length) sb.append(", ");
}
sb.append("};");
}
System.out.println(sb.toString());
}
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class NewClass {
public static void main(String[] args) {
String input="{key: IsReprint, value:COPY};{key: IsCancelled, value:CANCELLED};{key: BDTV,value:Africa Ltd | Reg No: 433323240833-C23441,GffBLAB | VAT No: 4746660239035 Level 6}";
Matcher m1 = Pattern.compile("key:" + "(.*?)" + ",\\s*value:").matcher(input);
Matcher m2 = Pattern.compile("value:" + "(.*?)" + "}").matcher(input);
StringBuilder sb = new StringBuilder();
while(m1.find() && m2.find()){
sb.append("{\"key\": ")
.append("\"")
.append(m1.group(1).trim())
.append("\", \"value\":")
.append("\"")
.append(m2.group(1).trim())
.append("\"};");
}
String output = sb.deleteCharAt(sb.length()-1).toString();
System.out.println(output);
}
}

Java: Find Integers in a String (Calculator)

If I have a String that looks like this: String calc = "5+3". Can I substring the integers 5 and 3?
In this case, you do know how the String looks, but it could look like this: String calc = "55-23" Therefore, I want to know if there is a way to identify integers in a String.
For something like that, regular expression is your friend:
String text = "String calc = 55-23";
Matcher m = Pattern.compile("\\d+").matcher(text);
while (m.find())
System.out.println(m.group());
Output
55
23
Now, you might need to expand it to support decimals:
String text = "String calc = 1.1 + 22 * 333 / (4444 - 55555)";
Matcher m = Pattern.compile("\\d+(?:.\\d+)?").matcher(text);
while (m.find())
System.out.println(m.group());
Output
1.1
22
333
4444
55555
You could use a regex like ([\d]+)([+-])([\d]+) to obtain the full binary expression.
Pattern pattern = Pattern.compile("([\\d]+)([+-])([\\d]+)");
String calc = "5+3";
Matcher matcher = pattern.matcher(calc);
if (matcher.matches()) {
int lhs = Integer.parseInt(matcher.group(1));
int rhs = Integer.parseInt(matcher.group(3));
char operator = matcher.group(2).charAt(0);
System.out.print(lhs + " " + operator + " " + rhs + " = ");
switch (operator) {
case '+': {
System.out.println(lhs + rhs);
}
case '-': {
System.out.println(lhs - rhs);
}
}
}
Output:
5 + 3 = 8
You can read each character and find it's Ascii code. Evaluate its code if it is between 48 and 57, it is a number and if it is not, it is a symbol.
if you find another character that is a number also you must add to previous number until you reach a symbol.
String calc="55-23";
String intString="";
char tempChar;
for (int i=0;i<calc.length();i++){
tempChar=calc.charAt(i);
int ascii=(int) tempChar;
if (ascii>47 && ascii <58){
intString=intString+tempChar;
}
else {
System.out.println(intString);
intString="";
}
}

Java - How do I code optional regex patterns with a matcher

Let’s say I am looping through a text file and come across the following two strings with random words and integer values
“foo 11 25”
“foo 38 15 976 24”
I write a regex pattern that would match both strings, for example:
((?:[a-z][a-z]+)\\s+\\d+\\s\\d+)
But, the problem is I don’t think this regex would allow me to get to all 4 integer values in the 2nd string.
Q1.) How can I create a single pattern that leaves these 3rd and 4th integers optional?
Q2.) How do I write the matcher code to only go after the 3rd and 4th values when they are found by the pattern?
Here is a template program to help anyone willing to offer a hand. Thanks.
public void foo(String fooFile) {
//Assume fooFile contains the two strings
//"foo 11 25";
//"foo 38 976 24";
Pattern p = Pattern.compile("((?:[a-z][a-z]+)\\s+\\d+\\s\\d+)", Pattern.CASE_INSENSITIVE);
BufferedReader br = new BufferedReader(new FileReader(fooFile));
String line;
while ((line = br.readLine()) != null) {
//Process the patterns
Matcher m1 = p.matcher(line);
if (m1.find()) {
int int1, int2, int3, int4;
//Need help to write the matcher code
}
}
}
If you want to retrieve every int value, you can use regex:
[a-z]+\s(\d+)\s(\d+)\s?(\d+)?\s?(\d+)?
DEMO
and every int will be in groups from 1 to 4. Then you can use somethig like:
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args){
String[] strings = {"foo 11 25","foo 67 45 97",
"foo 38 15 976 24"};
for(String string : strings) {
ArrayList<Integer> numbers = new ArrayList<Integer>();
Matcher matcher = Pattern.compile("[a-z]+\\s(\\d+)\\s(\\d+)\\s?(\\d+)?\\s?(\\d+)?").matcher(string);
matcher.find();
for(int i = 0; i < 4; i++){
if(matcher.group(i+1) != null) {
numbers.add(Integer.valueOf(matcher.group(i + 1)));
}else{
System.out.println("group " + (i+1) + " is " + matcher.group(i+1));
}
}
System.out.println("Match from string: "+ "\""+ string + "\"" + " : " + numbers.toString());
}
}
}
with output:
group 3 is null
group 4 is null
Match from string: "foo 11 25" : [11, 25]
group 4 is null
Match from string: "foo 67 45 97" : [67, 45, 97]
Match from string: "foo 38 15 976 24" : [38, 15, 976, 24]
Another way would be to get all int in one group with:
[a-z]+\s((?:\d+\s?)+)
DEMO
and split matcher.group(1) with space, you will get String[] with values. Implementation in Java:
public class Test {
public static void main(String[] args){
String[] strings = {"foo 11 25","foo 67 45 97",
"foo 38 15 976 24"};
for(String string : strings) {
ArrayList<Integer> numbers = new ArrayList<Integer>();
Matcher matcher = Pattern.compile("[a-z]+\\s((?:\\d+\\s?)+)").matcher(string);
matcher.find();
String[] nums = matcher.group(1).split("\\s");
for(String num : nums){
numbers.add(Integer.valueOf(num));
}
System.out.println("Match from string: "+ "\""+ string + "\"" + " : " + numbers.toString());
}
}
}
with output:
Match from string: "foo 11 25" : [11, 25]
Match from string: "foo 67 45 97" : [67, 45, 97]
Match from string: "foo 38 15 976 24" : [38, 15, 976, 24]
The current regex pattern you are using requires the text \s\d\s\d at the end. If you want it to allow for any number of numbers each preceded by whitespace, you would use (\s+\d+)+.
So the full regex would be ((?:[a-z][a-z]+)(\s+\d+)+)

How to divide string into two parts using regex in java?

String strArray="135(i),15a,14(g)(q)12,67dd(),kk,159"; //splited by ','
divide string after first occurrence of alphanumeric value/character
expected output :
original expected o/p
15a s1=15 s2=a
67dd() s1=67 s2=dd()
kk s1="" s2=kk
159 s1=159 s2=""
Please help me................
You could use the group-method of Pattern/Matcher:
String strArray = "135(i),15a,14(g)(q)12,67dd(),kk,159";//splited by ','
Pattern pattern = Pattern.compile("(?<digits>\\d*)(?<chars>[^,]*)");
Matcher matcher = pattern.matcher(strArray);
while (matcher.find()) {
if (!matcher.group().isEmpty()) //omit empty groups
System.out.println(matcher.group() + " : " + matcher.group("digits") + " - " + matcher.group("chars"));
}
The method group(String name) gives you the String found in the pattern's parenthesis with the specific name (here it is 'digits' or 'chars') within the match.
The method group(int i) would give you the String found in the i-th parenthesis of the pattern within the match.
See the Oracle tutorial at http://docs.oracle.com/javase/tutorial/essential/regex/ for more examples of using regex in Java.
You can use a Pattern and a Matcher to find the first index of a letter preceded by a number and split at that position.
Code
public static void main(String[] args) throws ParseException {
String[] inputs = { "15a", "67dd()", "kk", "159" };
for (String input : inputs) {
Pattern p = Pattern.compile("(?<=[0-9])[a-zA-Z]");
Matcher m = p.matcher(input);
System.out.println("Input: " + input);
if (m.find()) {
int splitIndex = m.end();
// System.out.println(splitIndex);
System.out.println("1.\t"+input.substring(0, splitIndex - 1));
System.out.println("2.\t"+input.substring(splitIndex - 1));
} else {
System.out.println("1.");
System.out.println("2.\t"+input);
}
}
}
Output
Input: 15a
1. 15
2. a
Input: 67dd()
1. 67
2. dd()
Input: kk
1.
2. kk
Input: 159
1.
2. 159
Use java.util.regex.Pattern and java.util.regex.Matcher
String strArray="135(i),15a,14(g)(q)12,67dd(),kk,159";
String arr[] = strArray.split(",");
for (String s : arr) {
Matcher m = Pattern.compile("([0-9]*)([^0-9]*)").matcher(s);
System.out.println("String in = " + s);
if(m.matches()){
System.out.println(" s1: " + m.group(1));
System.out.println(" s2: " + m.group(2));
} else {
System.out.println(" unmatched");
}
}
outputs:
String in = 135(i)
s1: 135
s2: (i)
String in = 15a
s1: 15
s2: a
String in = 14(g)(q)12
unmatched
String in = 67dd()
s1: 67
s2: dd()
String in = kk
s1:
s2: kk
String in = 159
s1: 159
s2:
Note how '14(g)(q)12' is not matched. It's not clear what the OP's required output is in this instance (or if a comma is missing from this portion of the example input string).

Regular expression in java

I have regular expression but I don't know how to use it in Java. This is the Java code,
String inputString = "he is in cairo on 20-2-20 12 and he will be here on JANUARY 20 2013 the expected time to arrived is 100: 00 ";
String pattern = " ";
Pattern pt = Pattern.compile(pattern);
Matcher m = pt.matcher(inputString);
String resultString=null;
if(m.find()) {
resultString = m.replaceAll(" ");
}
System.out.println(resultString);
The requirements are:
remove any spaces substitutes by single space.
the data format like this dd-mm-yyyy.
if there is any spaces between numbers remove it just between numbers.
the month JANUARY maybe come in this format: JAN.
The expected output is:
he is in cairo on 20-2-2012 and he will be here on 20-01-2013 the expected time to arrived is 100:00
I have used this:
Matcher m = Pattern.compile("(\\d+)-(\\d+)?\\s*(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)").matcher(inputString);
String resultString=null;
String temp_str=null;
while (m.find()) {
if (m.groupCount()==3) {
int first = Integer.valueOf(m.group(1));
int second = Integer.valueOf(m.group(2));
String month = m.group(3);
System.out.println("three parts");
temp_str=m.replaceAll("\\1-\\2-\\3");
System.out.println(temp_str);
} else {
int first = Integer.valueOf(m.group(1));
String month = m.group(2);
System.out.println("two parts");
temp_str=m.replaceAll("\\1-\\2-\\3");
}
}
Many thanks I found the solution as follows :
Matcher m = Pattern.compile("([0-9]{1,2}) ([0-9]{1,2}) ([0-9]{4})").matcher(inputString);
String resultString = null;
String temp_str = null;
while (m.find()) {
if (m.groupCount() == 3) {
int first = Integer.valueOf(m.group(1));
int second = Integer.valueOf(m.group(2));
String month = m.group(3);
System.out.println("three parts" + month);
if (month.matches("Jan"))
{
System.out.println("three parts wael");
temp_str = m.replaceAll(first + "-" + second + "-" + "JANUARY");
}
System.out.println(temp_str);
}
else {
int first = Integer.valueOf(m.group(1));
String month = m.group(2);
System.out.println("two parts");
temp_str = m.replaceAll("\\1-\\2-\\3");
}
}

Categories

Resources