This question already has answers here:
How to validate phone numbers using regex
(43 answers)
Closed 6 years ago.
I am using "^[(\+[0-9]{1,3}\.[0-9]{4,14}(?:x.+)?]$" regex to validate phone number. I want it to work for international numbers as well.
It is working for the patterns:
+4454475294x364
I want to add space and '-' also.
example: +44 544-75294 x364.
What changes I need more in my regex.
Thanks.
Description
You provided the following examples of numbers you'd like matched.
+4454475294x364
+44 544-75294 x364
(123) 555-1212x4567
123-555-1232
The Regex
This regex will do the following:
Match international numbers of the format you provided
Match North American numbers
If the phone number is followed by an extension, then capture that
Allow spaces, hyphens, and parentheses at obvious spots
This is limited to just the formats that you listed in your question
^(?:[+][0-9]{2}\s?[0-9]{3}[-]?[0-9]{3,}|(?:[(][0-9]{3}[)]|[0-9]{3})\s*[-]?\s*[0-9]{3}[-][0-9]{4})(?:\s*x\s*[0-9]+)?
Note: for Java you'll need to escape the forward slashes \ to look like \\.
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[+] any character of: '+'
----------------------------------------------------------------------
[0-9]{2} any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
----------------------------------------------------------------------
[0-9]{3} any character of: '0' to '9' (3 times)
----------------------------------------------------------------------
[-]? any character of: '-' (optional
(matching the most amount possible))
----------------------------------------------------------------------
[0-9]{3,} any character of: '0' to '9' (at least 3
times (matching the most amount
possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[(] any character of: '('
----------------------------------------------------------------------
[0-9]{3} any character of: '0' to '9' (3 times)
----------------------------------------------------------------------
[)] any character of: ')'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[0-9]{3} any character of: '0' to '9' (3 times)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
[-]? any character of: '-' (optional
(matching the most amount possible))
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
[0-9]{3} any character of: '0' to '9' (3 times)
----------------------------------------------------------------------
[-] any character of: '-'
----------------------------------------------------------------------
[0-9]{4} any character of: '0' to '9' (4 times)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
x 'x'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)? end of grouping
Examples
Using the sample text above
Matches
[0][0] = +4454475294x364
[1][0] = +44 544-75294 x364
[2][0] = (123) 555-1212x4567
[3][0] = 123-555-1232
Related
I have a text which will have multiple record information. All the records will begin with same regex pattern and each record will have some unique text. Here I want to fetch only the entry name and value of the record which contains the text "Entertainment Extra 4K". I tried to with a regex and but as the regex begn match to first one I'm always getting the first record values.
https://regex101.com/r/MAAc1s/1
In the above link, I'm want to get only the below record info,
<input type='radio' class="radio" id="bb_radio128411" name='484' value='13'
-----
----Entertainment Extra 4K
Any suggestions would be really appreciated
Use
name='(\d+)'\s+value='(\d+)'[^<>]*Entertainment Extra 4K
See proof.
EXPLANATION
--------------------------------------------------------------------------------
name=' 'name=\''
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
value=' 'value=\''
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
[^<>]* any character except: '<', '>' (0 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
Entertainment Extra 'Entertainment Extra 4K'
4K
I want to parse the column data and extract the required column info based on regex. Below shared is link that I tried,
^\s*(\S+)\s*(\S+)\s*(\S+)\s*(\S+)\s*(\d+)\s*(\d+)
https://regex101.com/r/lwzfQA/1
From that above link, I want to parse onlu the three rows data but it is matching with other details like "Sat Jan 30 15:56:06.144 UTC". I see first two matches in the above link are not proper but last two looks fine. which regex that I can use to parse only column info.
In your example data, the columns are separated by more than 1 whitespace, but in your pattern you make those spaces optional using \s* Also note that \S matches a non whitespace character which is a broad match.
As you tagged Java, I would suggest making use of \h{2,} to match 2 or more horizontal whitespace chars as \s can also match a newline and might give unexpected results.
You could also add an anchor $ to assert the end of the string to prevent partial matches.
^\h{2,}(\S+)\h{2,}(\S+)\h{2,}(\S+)\h{2,}(\S+)\h{2,}(\d+)\h{2,}(\d+)$
Regex demo
In Java with the doubled backslashes
String regex = "^\\h{2,}(\\S+)\\h{2,}(\\S+)\\h{2,}(\\S+)\\h{2,}(\\S+)\\h{2,}(\\d+)\\h{2,}(\\d+)$";
Try to replace your first "*" by a "+".
What it means:
"*" = 0 or more
"+" = 1 or more
Given the fact that all your columns begin with some spaces, it excludes the date line which does not begin with a space.
^\s+(\S+)\s*(\S+)\s*(\S+)\s*(\S+)\s*(\d+)\s*(\d+)
As Peter mentioned, your regex will function if using the + (one or more) operator on your space matching at the beginning instead of * (zero or more).
I would further encourage you to recognize that this is a "Fixed Width" format table, meaning each of the columns are simply padded with spaces to a predetermined width. If you will be parsing a large file this way, you will find it much more predictable and easy to debug by using regex to match the line of all hyphens to chop of the beginning, then going line by line with simple substrings and trim at the column length for each column.
If you wish to continue using regex for this, you could also explore other range quantifiers and named groups. This would make the regex a little more clear and help identify issues with formatting later. Please see the following example:
https://regex101.com/r/KEnutx/1
The (?<name>\d+), for example, names the capture group. In many languages, you can then refer to the group by this name, easily pulling out your data and not making your code specific to the index of the groups. Also, it is much easier to find that name when you are debugging or improving your regex to accomodate changes.
You can try changing your first whitespace filter from ^\s* to ^\s+. The effectively filters out the date line since it does not begin with whitespace. Also, if possible, it might be helpful to change the filters to be more specific to the data your searching. For example, with "BE100" you could use \D+\d+, or something even more specific depending on the data.
\s matches line breaks, exclude them from \s with [^\S\r\n] and use + instead of *:
^[^\S\r\n]+(\S+)[^\S\r\n]+(\S+)[^\S\r\n]+(\S+)[^\S\r\n]+(\S+)[^\S\r\n]+(\d+)[^\S\r\n]+(\d+)
See proof
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
[^\S\r\n]+ any character except: non-whitespace (all
but \n, \r, \t, \f, and " "), '\r'
(carriage return), '\n' (newline) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
[^\S\r\n]+ any character except: non-whitespace (all
but \n, \r, \t, \f, and " "), '\r'
(carriage return), '\n' (newline) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
[^\S\r\n]+ any character except: non-whitespace (all
but \n, \r, \t, \f, and " "), '\r'
(carriage return), '\n' (newline) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
[^\S\r\n]+ any character except: non-whitespace (all
but \n, \r, \t, \f, and " "), '\r'
(carriage return), '\n' (newline) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
[^\S\r\n]+ any character except: non-whitespace (all
but \n, \r, \t, \f, and " "), '\r'
(carriage return), '\n' (newline) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \5:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of \5
--------------------------------------------------------------------------------
[^\S\r\n]+ any character except: non-whitespace (all
but \n, \r, \t, \f, and " "), '\r'
(carriage return), '\n' (newline) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \6:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of \6
I want to pick up the entire block from the starting title to the end title, but not include the end title. Example is :
<section1>
Base_Currency=EUR
Description=Revaluation
Grouping_File
<section2>
the match result should be:
<section1>
Base_Currency=EUR
Description=Revaluation
Grouping_File
Problem is that how can I formulate the Pattern for this match using Regex in java?
If your input is something like below
<section1>
Base_Currency=EUR
Description=Revaluation
Grouping_File
<section2>
Base_Currency=EUR
Description=Revaluation
Grouping_File
<section3>
Base_Currency=EUR
Description=Revaluation
Grouping_File
Then you can use the following regex
(?s)(<section\d+>.*?)(?=<section\d+>|$)
Explanation for the regex is
NODE EXPLANATION
--------------------------------------------------------------------------------
(?s) set flags for this block (with . matching
\n) (case-sensitive) (with ^ and $
matching normally) (matching whitespace
and # normally)
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
<section '<section'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
<section '<section'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
If you want to match only for one tag then you can use
(?s)(<section\d+>[^<]*)
Explanation for this regex is
NODE EXPLANATION
--------------------------------------------------------------------------------
(?s) set flags for this block (with . matching
\n) (case-sensitive) (with ^ and $
matching normally) (matching whitespace
and # normally)
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
<section '<section'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
[^<]* any character except: '<' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
If your entire input is of this format, you can simply split:
String[] sections = input.split("\\R(?=<)");
\R is "any newline sequence" and (?=<) means "the next char is a '<'".
However if that's not the case, from the regex toolbox you're going to need:
the DOTALL flag so dot matches newlines too
the MULTILINE flag so ^ matches start of line too
a negative look ahead so you stop consuming at the start of the next section
Assuming "sections" start with a "<" at the start of a line:
"(?sm)^<\\w+>(.(?!^<))*"
Here's how you could use it:
String input = "<section1>\nBase_Currency=EUR\nDescription=Revaluation\nGrouping_File\n<section2>\nfoo";
Matcher matcher = Pattern.compile("(?sm)^<\\w+>(.(?!^<))*").matcher(input);
while (matcher.find()) {
String section = matcher.group();
}
[a-zA-Z0-9\#\#\$\%\&\*\(\)\-\_\+\]\[\'\;\:\?\.\,\!\^]+$
The output that is valid is : reahb543)(*&&!##$%^kshABmhbahdxb!#$##%6813741646
This is the expression I have. But I need the value to be 8 to 32 digits only.
So a valid string would be:
8 to 32 characters long
including digits,alphabets,special characters
Description
There are a couple things I would change in your expression:
all the escaping of characters in the character class is unnecessary
move the dash inside the character class to the end, as this character does have a special meaning inside the character class and needs to be at the end or beginning of the class
add a look ahead to force the required number of digits in the string
add a start string anchor to require the prevent the string from matching longer strings which may contain more digits then allowed
This expression will:
require the string to contain between 8 and 32 digits, any more or less will not be allowed
allow any number of characters from your character set (providing the other rules here are also true)
allow numbers to appear as the first or last character of the string
^(?=(?:\D*?\d){8,32}(?!.*?\d))[a-zA-Z0-9#\#$%&*()_+\]\[';:?.,!^-]+$
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
(?: group, but do not capture (between 8 and
32 times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\D*? non-digits (all but 0-9) (0 or more
times (matching the least amount
possible))
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
){8,32} end of grouping
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more
times (matching the least amount
possible))
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0- any character of: 'a' to 'z', 'A' to 'Z',
9#\#$%&*()_+\]\[';:? '0' to '9', '#', '\#', '$', '%', '&', '*',
.,!^-]+ '(', ')', '_', '+', '\]', '\[', ''', ';',
':', '?', '.', ',', '!', '^', '-' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
Example
Live Demo
Samples
reahb)(*&&!##$%^kshABmhbahdxb!#$##%1234567 = bad
reahb)(*&&!##$%^kshABmhbahdxb!#$##%12345678 = good
1234reahb)(*&&!##$%^kshABmhbahdxb!#$##%5678 = good
1234reahb)(*&&!##$%^kshABmhbahdxb!#$##%5678901234567890123456789012 = good
1234reahb)(*&&!##$%^kshABmhbahdxb!#$##%56789012345678901234567890123 = bad
reahb)(*&&!#12345678901234567890123456789012#$%^kshABmhbahdxb!#$##% = good
reahb)(*&&!#123456789012345678901234567890123#$%^kshABmhbahdxb!#$##% = bad
Or
If you're looking to allow only 8-32 characters of any type from your character class then will work:
^[a-zA-Z0-9#\#$%&*()_+\]\[';:?.,!^-]{8,32}$
I have this:
110121 NATURAL 95 1570,40
110121 NATURAL 95 1570,40*
41,110 1 x 38,20 CZK)[A] *
' 31,831 261,791 1308,61)
>01572 PRAVO SO 17,00
1,000 ks x 17,00
1570,40
Every line of this output is saved in List and I want to get number 1570,40
My regular expressions looks like this for this type of format
"([1-9][0-9]*[\\.|,][0-9]{2})[^\\.\\d](.*)"
"^([1-9][0-9]*[\\.|,][0-9]{2})$"
I have a problem that 1570,40 at the last line if founded (by second regular expression), also 1570,40 (from line with 1570,40* at the end) but the first line is not founded.. do you know where is the problem?
Not sure I well understand your needs, but I think you could use word boundaries like:
\b([1-9]\d*[.,]\d{2})\b
In order to not match dates, you can use:
(?:^|[^.,\d])(\d+[,.]\d\d)(?:[^.,\d]|$)
explanation:
The regular expression:
(?-imsx:(?:^|[^.,\d])(\d+[,.]\d\d)(?:[^.,\d]|$))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^.,\d] any character except: '.', ',', digits
(0-9)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
[,.] any character of: ',', '.'
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[^.,\d] any character except: '.', ',', digits
(0-9)
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
$ before an optional \n, and the end of
the string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
The "([1-9][0-9]*[\\.|,][0-9]{2})[^\\.\\d](.*)" has [^\\.\\d], it means it expects one non-digit, non-dot symbol right after the number. The second line has * which matches it. First line has the number at the end of line, so nothing matches. I think you need just one regexp which will catch all numbers: [^.\\d]*([1-9][0-9]*[.,][0-9]{2})[^.\\d]*. Also, you should use find instead of match to find any substring in a string instead of matching the whole string. Also, maybe it has a sense to find all matches in case if a line has two such numbers in it, not sure if it is a case for you or not.
Also, use either [0-9] or \d. At the moment it is confusing - it means the same, but looks differently.
Try this:
String s = "41,110 1 x 38,20 CZK)[A] * ";
Matcher m = Pattern.compile("\\d+,\\d+").matcher(s);
while(m.find()) {
System.out.println(m.group());
}