I want to format telephone numbers from the following format:
359878123456
0878123456
00359878123456
that are placed in a file that has information about name and phone number in the following format:
DarkoT 00359878123456
to be formatted in a standard form just for the numbers(to ignore the name). see below:
DarkoT +359 87 2 123456
this is for all cases.
This is where i am at.(my regex)
System.out.println(String.valueOf(inputLine).replaceAll("((\\+|00)359|0)(\\-|\\s)?8[7-9][2-9](\\-|\\s)?\\d{3}(\\s|\\-)?\\d{3}$", "($1)-\\$"));
I am confused with the placement. Please advise.
1 solution without regex: you could just split the string and group it from the back. But, I would really prefer doing it with regex, so here I go:
I suppose you must have this format (if this is not correct, this will not work):
[OPTIONAL] 00 (length: 2)
[OPTIONAL] 111 (Considering always 3 numbers) (length: 3)
If you don't have the former: 0 (changing zones, I understand?) (length: 1)
22 (length: 2)
3 (length: 1)
444444 (length: 6)
And now the regex to capture this:
(?:(?:00)?(\d{3})|0)(\d{2})(\d{1})(\d{6})
You will have, as a result:
Group 1: 3 digits (country code?) or nothing (if it's the '0').
Group 2: 2 digits (zone?)
Group 3: 1 digit (no idea, in my country we don't use this)
Group 4: last 6 digits
Using a replace have some limitations, so I would use a matcher, as easy as:
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(originalNumber);
if (m.find()) {
String nationalCode = m.group(1) != null ? m.group(1) : DEFAULT_NATIONAL_CODE;
formattedNumber = "+" + nationalCode + " " + m.group(2) + " " + m.group(3) + " " + m.group(4);
}
If you want more flexibility (for example, country numbers as 2 digits, not only 3) let me know and I will change the regexp.
NOTE: I didn't test this, just coded off the top of my head, let me know if it fails.
I think this is a continuation of your previous problem How to check exact phone number in Java with regex
Sample.java
import java.util.*;
import java.util.regex.*;
import java.io.*;
public class Sample{
public static void main(String[] args){
try{
File inFile = new File ("phonebook.txt");
Scanner sc = new Scanner (inFile);
while (sc.hasNextLine())
{
String line = sc.nextLine();
if(line.matches("^((.)*\\s)?((\\+|00)359|0)8[7-9][2-9]\\d{6}$")) // here that code doesn't work
{
System.out.println (line.replaceAll("^([^\\s\\+]*\\s?)?((\\+|00)?359|0)[-\\s]?(8[7-9][2-9])[-\\s]?(\\d{3})[-\\s]?(\\d{3})$", "$1 - $2 $4 $5 $6").replaceAll("00359","+359").replaceAll("- 0","+359"));
}
}
sc.close();
}catch(Exception e){}
}
}
phonebook.txt
Sagar +359883123456
Test 00359883123565
Someone 0883123456
People 1234567890
Test1 +359873123456
Output
C:\Users\acer\Desktop\Java\programs>javac Sample.java
C:\Users\acer\Desktop\Java\programs>java Sample
Sagar - +359 883 123 456
Test - +359 883 123 565
Someone +359 883 123 456
Test1 - +359 873 123 456
Related
I have a file with records as below and I am trying to split the records in it based on white spaces and convert them into comma.
file:
a 3w 12 98 header P6124
e 4t 2 100 header I803
c 12L 11 437 M12
BufferedReader reader = new BufferedReader(new FileReader("/myfile.txt"));
String line = reader.readLine();
while (line != null) {
System.out.println(line);
line = reader.readLine();
String[] splitLine = line.split("\\s+")
If the data is separated by multiple white spaces, I usually go for regex replace -> split('\\s+') or split(" +").
But in the above case, I have a record c which doesn't have the data header. Hence the regex "\s+" or " +" will just skip that record and I will get an empty space as c,12L,11,437,M12 instead of c,12L,11,437,,M12
How do I properly split the lines based on any delimiter in this case so that I get data in the below format:
a,3w,12,98,header,P6124
e,4t,2,100,header,I803
c,12L,11,437,,M12
Could anyone let me know how I can achieve this ?
May be you can try using a more complicated approach, using a complex regex in order to match exatcly six fields for each line and handling explicitly the case of a missing value for the fifth one.
I rewrote your example adding some console log in order to clarify my suggestion:
public class RegexTest {
private static final String Input = "a 3w 12 98 header P6124\n" +
"e 4t 2 100 header I803\n" +
"c 12L 11 437 M12";
public static void main(String[] args) throws Exception {
BufferedReader reader = new BufferedReader(new StringReader(Input));
String line = null;
Pattern pattern = Pattern.compile("^([^ ]+) +([^ ]+) +([^ ]+) +([^ ]+) +([^ ]+)? +([^ ]+)$");
do {
line = reader.readLine();
System.out.println(line);
if(line != null) {
String[] splitLine = line.split("\\s+");
System.out.println(splitLine.length);
System.out.println("Line: " + line);
Matcher matcher = pattern.matcher(line);
System.out.println("matches: " + matcher.matches());
System.out.println("groups: " + matcher.groupCount());
for(int i = 1; i <= matcher.groupCount(); i++) {
System.out.printf(" Group %d has value '%s'\n", i, matcher.group(i));
}
}
} while (line != null);
}
}
The key is that the pattern used to match each line requires a sequence of six fields:
for each field, the value is described as [^ ]+
separators between fields are described as +
the value of the fifth (nullable) field is described as [^ ]+?
each value is captured as a group using parentheses: ( ... )
start (^) and end ($) of each line are marked explicitly
Then, each line is matched against the given pattern, obtaining six groups: you can access each group using matcher.group(index), where index is 1-based because group(0) returns the full match.
This is a more complex approach but I think it can help you to solve your problem.
Put a limit on the number of whitespace chars that may be used to split the input.
In the case of your example data, a maximum of 5 works:
String[] splitLine = line.split("\\s{1,5}");
See live demo (of this code working as desired).
Are you just trying to switch your delimiters from spaces to commas?
In that case:
cat myFile.txt | sed 's/ */ /g' | sed 's/ /,/g'
*edit: added a stage to strip out lists of more than two spaces, replacing them with just the two spaces needed to retain the double comma.
Hi I need to be able to handle both of these scenarios
John, party of 4
william, party of 6 dislikes John, jeff
What I want to capture is
From string 1: John, 4
From String 2: william, 6, john, jeff
I'm pretty stumped at how to achieve this
I know that ([^,])+ gives me the first group (just the name before the comma, without including the comma) but I have no clue on how to concatenate the other portion of the expression.
You may use
(\w+)(?:,\s*party of (\d+)|(?![^,]))
See the regex demo.
Details
(\w+) - Group 1: one or more word chars
(?:,\s*party of (\d+)|(?![^,])) - a non-capturing group matching
,\s*party of (\d+) - ,, then 0+ whitespaces, then party of and a space, and then Group 2 capturing 1+ digits
| - or
(?![^,]) - a location that is followed with , or end of string.
See Java demo:
String regex = "(\\w+)(?:,\\s*party of (\\d+)|(?![^,]))";
List<String> strings = Arrays.asList("John, party of 4", "william, party of 6 dislikes John, jeff");
Pattern pattern = Pattern.compile(regex);
for (String s : strings) {
System.out.println("-------- Testing '" + s + "':");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(1) + ": " + (matcher.group(2) != null ? matcher.group(2) : "N/A"));
}
}
Output:
-------- Testing 'John, party of 4':
John: 4
-------- Testing 'william, party of 6 dislikes John, jeff':
william: 6
John: N/A
jeff: N/A
I have a string that contains words in a pattern like this:
2013-2014 XXX 29
2011-2012 XXXX 44
Please note that there are 2 whitespaces before AND after the year.
I need to remove the first 2 whitespaces, the 1 whitespace after the year and the last word (29/44 etc).
So it will become like this:
2013-2014 XXX
2011-2012 XXXX
Im really bad with Regex so any help would be appreciated. So far i can remove the last word with
str.replaceAll(" [^ ]+$", "");
Select only what you want and replace the rest (with a space in the middle) :)
This should work for you :
public static void main(String[] args) throws IOException {
String s1 = " 2013-2014 XXX 29 ";
System.out.println(s1.replaceAll("^\\s+([\\d-]+)\\s+(\\w+).*", "$1 $2"));
String s2 = " 2011-2012 XXXX 44 ";
System.out.println(s2.replaceAll("^\\s+([\\d-]+)\\s+(\\w+).*", "$1 $2"));
}
O/P :
2013-2014 XXX
2011-2012 XXXX
You can use a single regex for this:
str = str.replaceAll("^ +|(?<=\\d{4} ) | [^ ]+ *$", "");
RegEx Demo
RegEx Breakup:
^ + # 1 or more spaces at start
| # OR
(?<=\\d{4} ) # space after 4 digit year and a space
| # OR
[^ ]+ *$ # text after last space at end
you could also do it in multiple more easy to understand steps, like this:
public static void main(String[]args){
String s = " 2011-2012 XXXX 44";
// Remove leading and trailing whitespace
s = s.trim();
System.out.println(s);
// replace two or more whitespaces with a single whitespace
s = s.replaceAll("\\s{2,}", " ");
System.out.println(s);
// remove the last word and the whitespace before it
s = s.replaceAll("\\s\\w*$", "");
System.out.println(s);
}
O/P:
2011-2012 XXXX 44
2011-2012 XXXX 44
2011-2012 XXXX
You can also try this:
str = str.replaceAll("\\s{2}", " ").trim();
Example:
String str = " 2013-2014 XXX 29 ";
Now:
str.replaceAll("\\s{2}", " ");
Output: " 2013-2014 XXX 29 "
And with .trim() it looks like this: "2013-2014 XXX 29"
Here is my string
INPUT:
22 TIRES (2 defs)
1 AP(PEAR + ANC)E (CAN anag)
6 CHIC ("SHEIK" hom)
EXPECTED OUTPUT:
22 TIRES
1 APPEARANCE
6 CHIC
ACTUAL OUTPUT :
TIRES
APPEARANCE
CHIC
I tried using below code and got the above output.
String firstnames =a.split(" \\(.*")[0].replace("(", "").replace(")", "").replace(" + ",
"");
Any idea of how to extract along with the numbers ? I don't want the numbers which are after the parentheses like in the input " 22 TIRES (2 defs)". I need the output as "22 TIRES" Any help would be great !!
I am doing it bit differently
String line = "22 TIRES (2 defs)\n\n1 AP(PEAR + ANC)E (CAN anag)\n\n6 CHIC (\"SHEIK\" hom)";
String pattern = "(\\d+\\s+)(.*)\\(";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
while (m.find()) {
String tmp = m.group(1) + m.group(2).replaceAll("[^\\w]", "");
System.out.println(tmp);
}
Ideone Demo
I would use a single replaceAll function.
str.replaceAll("\\s+\\(.*|\\s*\\+\\s*|[()]", "");
DEMO
\\s+\\(.*, this matches a space and and the following ( characters plus all the remaining characters which follows this pattern. So (CAN anag) part in your example got matched.
\\s*\\+\\s* matches + along with the preceding and following spaces.
[()] matches opening or closing brackets.
Atlast all the matched chars are replaced by empty string.
This is the regex for finding the session ID: "(?<=( ))([0-9]*)(?=(.*ABC.DEEP. [1-9] s))" and the output is:
ID TYPE USER IDLE
63494 ABC DEEP 3 s
-> 70403 ABC DEEAP 0 s
82446 ABC DEEOP 52 min 27 s
In myregexp.com/signedJar.html, this regex works fine. But when I try to find using Java, it is not able to get the output. Please find the snippet:
FrameworkControls.regularExpressionPattern = Pattern.compile("(?<=( ))([0-9]*)(?=(.*ABC.*DEEP.*[1-9] s))");
String deepak = "\n" +
"\n" +
" ID TYPE USER IDLE\n" +
"\n" +
" 63494 ABC DEEP 3 s\n" +
" -> 70403 ABC DEEAP 0 s\n" +
" 82446 ABC DEEOP 52 min 27 s\n";
FrameworkControls.regularExpressionMatcher = FrameworkControls.regularExpressionPattern.matcher(deepak);
if (FrameworkControls.regularExpressionMatcher.find()) {
String h = FrameworkControls.regularExpressionMatcher.group().trim();
System.err.println(h);
}
"FrameworkControls.regularExpressionMatcher.find()" returns true. But h variable is always empty. Can anyone let me know, where I might be doing wrong.
Expected Output: 63494
I think you're trying to print ID of the USER DEEPAK. If yes, then your code would be,
Pattern p = Pattern.compile("(?<= )[0-9]+(?=\\s*ABC\\s*DEEP\\s*[0-9]\\s*s)");
Matcher m = p.matcher(deepak);
while (m.find()) {
System.out.println(m.group());
}
IDEONE
I would use the following expression:
"^\\s+(\\d+)\\s+(\\w+)\\s+(\\w+).+\$"
then
group(1) is ID
group(2) is TYPE
group(3) is USER
The expressions are non greedy, so you can remove last two groups if you don't need them.