how to format string line that contains numbers with regex - java

I want to format telephone numbers from the following format:
359878123456
0878123456
00359878123456
that are placed in a file that has information about name and phone number in the following format:
DarkoT 00359878123456
to be formatted in a standard form just for the numbers(to ignore the name). see below:
DarkoT +359 87 2 123456
this is for all cases.
This is where i am at.(my regex)
System.out.println(String.valueOf(inputLine).replaceAll("((\\+|00)359|0)(\\-|\\s)?8[7-9][2-9](\\-|\\s)?\\d{3}(\\s|\\-)?\\d{3}$", "($1)-\\$"));
I am confused with the placement. Please advise.

1 solution without regex: you could just split the string and group it from the back. But, I would really prefer doing it with regex, so here I go:
I suppose you must have this format (if this is not correct, this will not work):
[OPTIONAL] 00 (length: 2)
[OPTIONAL] 111 (Considering always 3 numbers) (length: 3)
If you don't have the former: 0 (changing zones, I understand?) (length: 1)
22 (length: 2)
3 (length: 1)
444444 (length: 6)
And now the regex to capture this:
(?:(?:00)?(\d{3})|0)(\d{2})(\d{1})(\d{6})
You will have, as a result:
Group 1: 3 digits (country code?) or nothing (if it's the '0').
Group 2: 2 digits (zone?)
Group 3: 1 digit (no idea, in my country we don't use this)
Group 4: last 6 digits
Using a replace have some limitations, so I would use a matcher, as easy as:
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(originalNumber);
if (m.find()) {
String nationalCode = m.group(1) != null ? m.group(1) : DEFAULT_NATIONAL_CODE;
formattedNumber = "+" + nationalCode + " " + m.group(2) + " " + m.group(3) + " " + m.group(4);
}
If you want more flexibility (for example, country numbers as 2 digits, not only 3) let me know and I will change the regexp.
NOTE: I didn't test this, just coded off the top of my head, let me know if it fails.

I think this is a continuation of your previous problem How to check exact phone number in Java with regex
Sample.java
import java.util.*;
import java.util.regex.*;
import java.io.*;
public class Sample{
public static void main(String[] args){
try{
File inFile = new File ("phonebook.txt");
Scanner sc = new Scanner (inFile);
while (sc.hasNextLine())
{
String line = sc.nextLine();
if(line.matches("^((.)*\\s)?((\\+|00)359|0)8[7-9][2-9]\\d{6}$")) // here that code doesn't work
{
System.out.println (line.replaceAll("^([^\\s\\+]*\\s?)?((\\+|00)?359|0)[-\\s]?(8[7-9][2-9])[-\\s]?(\\d{3})[-\\s]?(\\d{3})$", "$1 - $2 $4 $5 $6").replaceAll("00359","+359").replaceAll("- 0","+359"));
}
}
sc.close();
}catch(Exception e){}
}
}
phonebook.txt
Sagar +359883123456
Test 00359883123565
Someone 0883123456
People 1234567890
Test1 +359873123456
Output
C:\Users\acer\Desktop\Java\programs>javac Sample.java
C:\Users\acer\Desktop\Java\programs>java Sample
Sagar - +359 883 123 456
Test - +359 883 123 565
Someone +359 883 123 456
Test1 - +359 873 123 456

Related

How to splitting records based white spaces when different lines have spaces at different positions

I have a file with records as below and I am trying to split the records in it based on white spaces and convert them into comma.
file:
a 3w 12 98 header P6124
e 4t 2 100 header I803
c 12L 11 437 M12
BufferedReader reader = new BufferedReader(new FileReader("/myfile.txt"));
String line = reader.readLine();
while (line != null) {
System.out.println(line);
line = reader.readLine();
String[] splitLine = line.split("\\s+")
If the data is separated by multiple white spaces, I usually go for regex replace -> split('\\s+') or split(" +").
But in the above case, I have a record c which doesn't have the data header. Hence the regex "\s+" or " +" will just skip that record and I will get an empty space as c,12L,11,437,M12 instead of c,12L,11,437,,M12
How do I properly split the lines based on any delimiter in this case so that I get data in the below format:
a,3w,12,98,header,P6124
e,4t,2,100,header,I803
c,12L,11,437,,M12
Could anyone let me know how I can achieve this ?
May be you can try using a more complicated approach, using a complex regex in order to match exatcly six fields for each line and handling explicitly the case of a missing value for the fifth one.
I rewrote your example adding some console log in order to clarify my suggestion:
public class RegexTest {
private static final String Input = "a 3w 12 98 header P6124\n" +
"e 4t 2 100 header I803\n" +
"c 12L 11 437 M12";
public static void main(String[] args) throws Exception {
BufferedReader reader = new BufferedReader(new StringReader(Input));
String line = null;
Pattern pattern = Pattern.compile("^([^ ]+) +([^ ]+) +([^ ]+) +([^ ]+) +([^ ]+)? +([^ ]+)$");
do {
line = reader.readLine();
System.out.println(line);
if(line != null) {
String[] splitLine = line.split("\\s+");
System.out.println(splitLine.length);
System.out.println("Line: " + line);
Matcher matcher = pattern.matcher(line);
System.out.println("matches: " + matcher.matches());
System.out.println("groups: " + matcher.groupCount());
for(int i = 1; i <= matcher.groupCount(); i++) {
System.out.printf(" Group %d has value '%s'\n", i, matcher.group(i));
}
}
} while (line != null);
}
}
The key is that the pattern used to match each line requires a sequence of six fields:
for each field, the value is described as [^ ]+
separators between fields are described as +
the value of the fifth (nullable) field is described as [^ ]+?
each value is captured as a group using parentheses: ( ... )
start (^) and end ($) of each line are marked explicitly
Then, each line is matched against the given pattern, obtaining six groups: you can access each group using matcher.group(index), where index is 1-based because group(0) returns the full match.
This is a more complex approach but I think it can help you to solve your problem.
Put a limit on the number of whitespace chars that may be used to split the input.
In the case of your example data, a maximum of 5 works:
String[] splitLine = line.split("\\s{1,5}");
See live demo (of this code working as desired).
Are you just trying to switch your delimiters from spaces to commas?
In that case:
cat myFile.txt | sed 's/ */ /g' | sed 's/ /,/g'
*edit: added a stage to strip out lists of more than two spaces, replacing them with just the two spaces needed to retain the double comma.

java parse regex multiple capture groups

Hi I need to be able to handle both of these scenarios
John, party of 4
william, party of 6 dislikes John, jeff
What I want to capture is
From string 1: John, 4
From String 2: william, 6, john, jeff
I'm pretty stumped at how to achieve this
I know that ([^,])+ gives me the first group (just the name before the comma, without including the comma) but I have no clue on how to concatenate the other portion of the expression.
You may use
(\w+)(?:,\s*party of (\d+)|(?![^,]))
See the regex demo.
Details
(\w+) - Group 1: one or more word chars
(?:,\s*party of (\d+)|(?![^,])) - a non-capturing group matching
,\s*party of (\d+) - ,, then 0+ whitespaces, then party of and a space, and then Group 2 capturing 1+ digits
| - or
(?![^,]) - a location that is followed with , or end of string.
See Java demo:
String regex = "(\\w+)(?:,\\s*party of (\\d+)|(?![^,]))";
List<String> strings = Arrays.asList("John, party of 4", "william, party of 6 dislikes John, jeff");
Pattern pattern = Pattern.compile(regex);
for (String s : strings) {
System.out.println("-------- Testing '" + s + "':");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(1) + ": " + (matcher.group(2) != null ? matcher.group(2) : "N/A"));
}
}
Output:
-------- Testing 'John, party of 4':
John: 4
-------- Testing 'william, party of 6 dislikes John, jeff':
william: 6
John: N/A
jeff: N/A

How to remove unneeded white spaces and last word of a string with Regex?

I have a string that contains words in a pattern like this:
2013-2014  XXX 29 
2011-2012  XXXX 44
Please note that there are 2 whitespaces before AND after the year.
I need to remove the first 2 whitespaces, the 1 whitespace after the year and the last word (29/44 etc).
So it will become like this:
2013-2014 XXX
2011-2012 XXXX
Im really bad with Regex so any help would be appreciated. So far i can remove the last word with
str.replaceAll(" [^ ]+$", "");
Select only what you want and replace the rest (with a space in the middle) :)
This should work for you :
public static void main(String[] args) throws IOException {
String s1 = " 2013-2014 XXX 29 ";
System.out.println(s1.replaceAll("^\\s+([\\d-]+)\\s+(\\w+).*", "$1 $2"));
String s2 = " 2011-2012 XXXX 44 ";
System.out.println(s2.replaceAll("^\\s+([\\d-]+)\\s+(\\w+).*", "$1 $2"));
}
O/P :
2013-2014 XXX
2011-2012 XXXX
You can use a single regex for this:
str = str.replaceAll("^ +|(?<=\\d{4} ) | [^ ]+ *$", "");
RegEx Demo
RegEx Breakup:
^ + # 1 or more spaces at start
| # OR
(?<=\\d{4} ) # space after 4 digit year and a space
| # OR
[^ ]+ *$ # text after last space at end
you could also do it in multiple more easy to understand steps, like this:
public static void main(String[]args){
String s = " 2011-2012 XXXX 44";
// Remove leading and trailing whitespace
s = s.trim();
System.out.println(s);
// replace two or more whitespaces with a single whitespace
s = s.replaceAll("\\s{2,}", " ");
System.out.println(s);
// remove the last word and the whitespace before it
s = s.replaceAll("\\s\\w*$", "");
System.out.println(s);
}
O/P:
2011-2012 XXXX 44
2011-2012 XXXX 44
2011-2012 XXXX
You can also try this:
str = str.replaceAll("\\s{2}", " ").trim();
Example:
String str = " 2013-2014 XXX 29 ";
Now:
str.replaceAll("\\s{2}", " ");
Output: " 2013-2014 XXX 29 "
And with .trim() it looks like this: "2013-2014 XXX 29"

Extract a particular number from a string using regex in java

Here is my string
INPUT:
22 TIRES (2 defs)
1 AP(PEAR + ANC)E (CAN anag)
6 CHIC ("SHEIK" hom)
EXPECTED OUTPUT:
22 TIRES
1 APPEARANCE
6 CHIC
ACTUAL OUTPUT :
TIRES
APPEARANCE
CHIC
I tried using below code and got the above output.
String firstnames =a.split(" \\(.*")[0].replace("(", "").replace(")", "").replace(" + ",
"");
Any idea of how to extract along with the numbers ? I don't want the numbers which are after the parentheses like in the input " 22 TIRES (2 defs)". I need the output as "22 TIRES" Any help would be great !!
I am doing it bit differently
String line = "22 TIRES (2 defs)\n\n1 AP(PEAR + ANC)E (CAN anag)\n\n6 CHIC (\"SHEIK\" hom)";
String pattern = "(\\d+\\s+)(.*)\\(";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
while (m.find()) {
String tmp = m.group(1) + m.group(2).replaceAll("[^\\w]", "");
System.out.println(tmp);
}
Ideone Demo
I would use a single replaceAll function.
str.replaceAll("\\s+\\(.*|\\s*\\+\\s*|[()]", "");
DEMO
\\s+\\(.*, this matches a space and and the following ( characters plus all the remaining characters which follows this pattern. So (CAN anag) part in your example got matched.
\\s*\\+\\s* matches + along with the preceding and following spaces.
[()] matches opening or closing brackets.
Atlast all the matched chars are replaced by empty string.

Regex matches in myregexp.com, but not matching in Java

This is the regex for finding the session ID: "(?<=( ))([0-9]*)(?=(.*ABC.DEEP. [1-9] s))" and the output is:
ID TYPE USER IDLE
63494 ABC DEEP 3 s
-> 70403 ABC DEEAP 0 s
82446 ABC DEEOP 52 min 27 s
In myregexp.com/signedJar.html, this regex works fine. But when I try to find using Java, it is not able to get the output. Please find the snippet:
FrameworkControls.regularExpressionPattern = Pattern.compile("(?<=( ))([0-9]*)(?=(.*ABC.*DEEP.*[1-9] s))");
String deepak = "\n" +
"\n" +
" ID TYPE USER IDLE\n" +
"\n" +
" 63494 ABC DEEP 3 s\n" +
" -> 70403 ABC DEEAP 0 s\n" +
" 82446 ABC DEEOP 52 min 27 s\n";
FrameworkControls.regularExpressionMatcher = FrameworkControls.regularExpressionPattern.matcher(deepak);
if (FrameworkControls.regularExpressionMatcher.find()) {
String h = FrameworkControls.regularExpressionMatcher.group().trim();
System.err.println(h);
}
"FrameworkControls.regularExpressionMatcher.find()" returns true. But h variable is always empty. Can anyone let me know, where I might be doing wrong.
Expected Output: 63494
I think you're trying to print ID of the USER DEEPAK. If yes, then your code would be,
Pattern p = Pattern.compile("(?<= )[0-9]+(?=\\s*ABC\\s*DEEP\\s*[0-9]\\s*s)");
Matcher m = p.matcher(deepak);
while (m.find()) {
System.out.println(m.group());
}
IDEONE
I would use the following expression:
"^\\s+(\\d+)\\s+(\\w+)\\s+(\\w+).+\$"
then
group(1) is ID
group(2) is TYPE
group(3) is USER
The expressions are non greedy, so you can remove last two groups if you don't need them.

Categories

Resources