I want to find numbers like
(123).234.4567
(123)-234-4567
123.234.4567
123-456-4567
This digits represents US phone numbers. So they can be separated with -(dash) or .(dot) and must have 3 digits, followed by another 3 digits, followed by 4 more. And the first 3 digits can be fenced or not with brackets '()'.
and code is like below
Pattern regex = Pattern.compile("^(\\(\\d{3}\\))|^\\d{3}[.-]?\\d{3}[.-]?\\d{4}$");
Matcher matcher = regex.matcher("(425).882.8080 tel");
while(matcher.find()){
System.out.println(matcher.group());
}
But the result is :
(425)
What I am doing Wrong. I want to print (425).882.8080 at once.
You can try Back reference to check for exact match.
Backreferences match the same text as previously matched by a capturing group
^(\(\d{3}\)|\d{3})([.-])\d{3}\2\d{4}$
Capture 2nd group -----------^^^^ ^^-------- Same text as 2nd captured group
Groups are captured by enclosing it inside the parenthesis (...) and can be accessed by \index
Here is online demo
Note: If you want to find substring in a string then remove ^ and $ that are used for beginning and ending of the string respectively.
Patten explanation:
( group and capture to \1:
\( '('
\d{3} digits (0-9) (3 times)
\) ')'
| OR
\d{3} digits (0-9) (3 times)
) end of \1
( group and capture to \2:
[.-] any character of: '.', '-'
) end of \2
\d{3} digits (0-9) (3 times)
\2 what was matched by capture \2
\d{4} digits (0-9) (4 times)
Sample code:
String regex="(\\(\\d{3}\\)|\\d{3})([.-])\\d{3}\\2\\d{4}";
System.out.println("(123).234.4567".matches(regex)); // true
System.out.println("(123)-234-4567".matches(regex)); // true
System.out.println("123.234.4567".matches(regex)); // true
System.out.println("123-456-4567".matches(regex)); // true
System.out.println("(123)-234.4567".matches(regex)); // false
System.out.println("(123-234-4567".matches(regex)); // false
System.out.println("123.234-4567".matches(regex)); // false
System.out.println("123-456.4567".matches(regex)); // false
Sample code: (as per your's)
Matcher matcher = Pattern.compile(regex).matcher("(425).882.8080 tel");
while (matcher.find()) {
String str = matcher.group();
System.out.println(str); // (425).882.8080
}
You only have one matching group: (\\(\\d{3}\\))
Try
Pattern.compile("^((?:\\(\\d{3}\\))|^\\d{3}[.-]?\\d{3}[.-]?\\d{4})");
this will provide you the whole matched number.
If you, instead, need all numbers separately you have to add multiple matching groups, like
Pattern.compile("^(\\(\\d{3}\\))|^(\\d{3})[.-]?(\\d{3})[.-]?(\\d{4})");
Btw. by using [.-]? you will also match 1232344567 (without dots/dashes). To fix this, drop the ? after [.-].
An optimized version could be:
Pattern.compile("^((\\((\\d{3})\\)|(\\d{3}))[.-](\\d{3})[.-](\\d{4}))");
This way you get the whole matched number as well as all included nubmers separately.
Another point: your ininital regexp would also match 123.234-4567 If that's not desireable, anothe OR is needed for all cases.
E.g.
Pattern.compile("^((\\((\\d{3})\\)|((\\d{3})\\.(\\d{3})\\.(\\d{4})|(\\d{3})-(\\d{3})-(\\d{4})))");
Updated for you last edit:
Pattern.compile("^((?:\\(\\d{3}\\)|\\d{3})(?:\\.\\d{3}\\.\\d{4}|\\d{3}-\\d{3}-\\d{4}))");
You don't need to put start and end anchors in your regex if the phone number would appears anywhere on the input.
\d{3}([.-])?\d{3}\1?\d{4}|\(\d{3}\)([.-])?\d{3}\2?\d{4}
DEMO
Java regex would be,
"\\d{3}([.-])?\\d{3}\\1?\\d{4}|\\(\\d{3}\\)([.-])?\\d{3}\\2?\\d{4}"
Example:
Pattern regex = Pattern.compile("\\d{3}([.-])?\\d{3}\\1?\\d{4}|\\(\\d{3}\\)([.-])?\\d{3}\\2?\\d{4}");
Matcher matcher = regex.matcher("(425).882.8080 tel");
while(matcher.find()){
System.out.println(matcher.group());
} // (425).882.8080
Related
Using the following regex
^(\d)(?!\1+$)\d{3}-\d{1}$
It works for the pattern but I need to validate that all numbers are not the same even after /separated by the hyphen (-).
Example:
0000-0 not allowed (because of all are same digits)
0000-1 allowed
1111-1 not allowed (because of all are same digits)
1234-2 allowed
TheFourthBird's answer surely works that uses a negative lookahead. Here is another variant of this regex that might be slightly faster:
^(\d)(?!\1{3}-\1$)\d{3}-\d$
RegEx Demo
Explanation:
^(\d) matches and captures first digit after start in group #1
(?!\1{3}-\1$) is a negative lookahead that will fail the match if we have 3 repetitions and a hyphen and another repeat of 1st digit.
You could exclude only - or the same digit only to the right till the end of the string:
^(\d)(?!(?:\1|-)*$)\d{3}-\d$
^ Start of string
(\d) Capture group 1, match a digit
(?! Negative lookahead, assert what is to the right is not
(?:\1|-)*$ Optionally repeat either the backrefernce to what is already captured or - till the end of the string
) Close the non capture group
\d{3}-\d Match 3 digits - and a digit
$ End of string
Regex demo
If you don't want to match double -- or an - at the end of the string and match optional repetitions:
^(\d)(?!(?:\1|-)*$)\d*(?:-\d+)*$
Explanation
^ Start of string
(\d) Capture a single digits in group 1
(?!(?:\1|-)*$) Negative lookahead, assert not only - and the same digit till the end of the string
\d* Match optional digits
(?:-\d+)* Optionally repeat matching - and 1+ digits
$ End of string
Regex demo
You'll need a back reference, for example:
^(\d){4}-\1$
those are my possible inputs:
"#smoke"
"#smoke,#Functional1" (OR condition)
"#smoke,#Functional1,#Functional2" (OR condition)
"#smoke","#Functional1" (AND condition),
"#smoke","~#Functional1" (SKIP condition),
"~#smoke","~#Functional1" (NOT condition)
(Please note, the string input for the regex, stops at the last " character on each line, no space or comma follows it!
The regex I came up with so far is
"((?:[~#]{1}\w*)+),?"
This matches in capturing groups for the samples 1, 4, 5 and 6 but NOT 2 and 3.
I am not sure how to continue tweaking it further, any suggestions?
I would like to capture the preceding boolean meaning of the tag (eg: ~) as well please.
If you have any suggestions to pre-process the string in Java before regex that would make it simpler, I am open to that possibility as well.
Thanks.
It seems that you want to match an optional ~ followed by an # and get iterative matches for group 1. You could make use of the \G anchors, which matches either at the start, or at the end of the previous match.
(?:"(?=.*"$)|\G(?!^))(~?#\w+(?:,~?#\w+)*)"?[,\h]?
Explanation
(?: Non capture group
"(?=.*"$) Match " and assert that the string ends with "
| Or
\G(?!^) Assert the position at the end of the previous match, not at the start
) Close non capture group
( Capture group 1
~?#\w+(?:,~?#\w+)* Match an optional ~, than # and 1+ word characters and repeat 0+ times with a comma prepended
)"? Close group 1 and match an optional "
[,\h] Match either a comma or a horizontal whitespace char.
Regex demo | Java demo
Example code
String regex = "(?:\"(?=.*\"$)|\\G(?!^))(~?#\\w+(?:,~?#\\w+)*)\"?[,\\h]?";
String string = "\"#smoke\"\n"
+ "\"#smoke,#Functional1\"\n"
+ "\"#smoke,#Functional1,#Functional2\"\n"
+ "\"#smoke\",\"#Functional1\"\n"
+ "\"#smoke\",\"~#Functional1\"\n"
+ "\"~#smoke\",\"~#Functional1\"";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
#smoke
#smoke,#Functional1
#smoke,#Functional1,#Functional2
#smoke
#Functional1
#smoke
~#Functional1
~#smoke
~#Functional1
Edit
If there are no consecutive matches, you could also use:
"(~?#\w+(?:,~?#\w+)*)"
Regex demo
what would be the regular expression to find duplicate set of digits in a numeric string?
Suppose
String s="0.1234523452345234";
From this string I need to obtain "2345". I tried the following regex-
String s="0.1234523452345234";
String regex="(\\d+)\\1+\\b";
Pattern p=Pattern.compile(regex);
Matcher m=p.matcher(s);
if(m.find())
{
System.out.println(m.group(0));
}
But the output is
523452345234
While i need to print
2345
"(\\d+)\\1+\\b" macthes any sequence of digits followed immediately by this sequence at least once. It can be followed by multiple occurences of the sequence (the + quantifier). The regex also enforces a word boundary after the last matching sequence.
I think what you are looking for is the following regex:
"(\\d+).*\\1" (without word boundary, anything between your sequences, and only one repetition of the sequence. Example:
0.1234789897897123499
^^^^ ^^^^---- (\\d+) and \\1
^^^^^^^^^-------- .*
If your longest run needs to be followed immediately by the duplicate (no fillers inbetween), then drop the .* from the regex.
group(0) will return the full match (e.g. 12347898978971234), group(1) will contain the first capturing group (e.g. 1234).
I tried this regular expression that finds the number that duplicates one time , it can be shown by m.group(1) the first occurence :
String s="0.1234523452345234";
String regex="([0-9]+)\\1";
Pattern p=Pattern.compile(regex);
Matcher m=p.matcher(s);
if(m.find())
{
System.out.println(m.group(1));
}
Output :
2345
I have JCL statement to be matched with regex pattern
The statement would be like below
//name JOB optionalParam,keyword=param,keyword=param,keyword=param
Actual statement would be like below
//ADBB503 JOB ,MSGCLASS=2,CLASS=P
//ABCD JOB Something,MSG=NTNG,CLASS=ABC
I have tried a regular expression to match in groups but the last keyword and param will have n number of times I need to continue matching till it exists.
String regex= (\/\/)(\w+)(\s+)(JOB)(\s+)(\w+)?(,)([\w+=\w+]+);
My trial is in the link given below
https://regex101.com/r/gUyRMV/1
The error I am facing is only one keyword=parameter is matching. N number of keyword and parameters needs to be matched.
You could match the job statement in the first capturing group and make use of \G to get the parameters in group 2:
(?:(//\w+\s+JOB(?: \w+)?)\h*|\G(?!^)),(\w+=\w+)
Explanation
(?: Non capturing group
( Capture group 1
//\w+\s+JOB Match //, 1+ word chars and JOB
(?: \w+)? Match optional param
)\h*` Close group and match 0+ horizontal whitespace chars
| Or
\G(?!^) Assert position at the end of previous match, not at the start
), Close non capturing group and match ,
( Capture group 2
\w+=\w+ Match 1+ word chars = 1 + word chars
) Close group
In java
String regex = "(?:(//\\w+\\s+JOB(?: \\w+)?)\\h*|\\G(?!^)),(\\w+=\\w+)";
Regex demo | Java demo
I have two lines in Array list which contains number
line1 1234 5694 7487
line2 10/02/1992 or 1992
I used different regex to get both the line, but the problem is when I use the regex ([0-9]{4}//s?)([0-9]{4}//s?)([0-9]{4}//n) . It gets the first line cool.
But for checking the line2 I used ([0-9]{2}[/-])?([0-9]{2}[/-])?([0-9]{4}).
this regex instead of returning the last line its returning first 4 numbers of the line1.
As stated in the comments below you are using .matches which returns true if the whole string can be matched.
In your pattern ([0-9]{2}[/-])?([0-9]{2}[/-])?([0-9]{4}) it would also match only 4 digits as the first 2 groups ([0-9]{2}[/-])?([0-9]{2}[/-])? are optional due to the question mark ? leaving the 3rd group ([0-9]{4}) able to match 4 digits.
What you might do instead is to use an alternation to either match a date like format where the first 2 parts including the delimiter are optional. Or match 3 times 4 digits.
.*?(?:(?:[0-9]{2}[/-]){2}[0-9]{4}|[0-9]{4}(?:\h[0-9]{4}){2}).*
Explanation
.*? Match any character except a newline non greedy
(?: Non capturing groupo
(?:[0-9]{2}[/-]){2} Repeat 2 times matching 2 digits and / or -
[0-9]{4} Match 4 digits
| Or
[0-9]{4} Match 4 digits
(?:\\h[0-9]{4}){2} Repeat 2 times matching a horizontal whitespace char and 4 digits
) Close non capturing group
.* Match 0+ times any character except a newline
Regex demo | Java demo
For example
List<String> list = Arrays.asList(
new String[]{
"10/02/1992 or 1992",
"10/02/1992",
"10/1992",
"02/1992",
"1992",
"1234 5694 7487"
}
);
String regex = ".*?(?:(?:[0-9]{2}[/-]){2}[0-9]{4}|[0-9]{4}(?:\\h[0-9]{4}){2}).*";
for (String str: list) {
if (str.matches(regex)){
System.out.println(str);
}
}
Result
10/02/1992 or 1992
10/02/1992
1234 5694 7487
Note that in your first pattern I think you mean \\s instead of //s.
The \\s will also match a newline. If you want to match a single space you could just match that or use \\h to match a horizontal whitespace character.