regex pattern to match specific number pattern , skip if there different pattern - java

Requirement:
If pattern 57XXXXXXX OR 57XXXXXXX-X found in a sentence , then copy this matched pattern (X- denotes 7 integer number and 57 are constant values must be there), else ignore complete sentence.
I have written a regex pattern 57[0-9]{7}|-[0-9]{1} to do match both the pattern.
If below pattern found(8 digits after 57 instead 7 , then still above regex still gets the matching pattern (actually expecting regex to not match)
for e.g. 5712345678-0 (after 57 , 8 digits in sentance) --> regex matches and gives 571234567-0
Using java to compile above pattern.

You could try this:
\b57\d{7}(?:-\d)?\b
Here's what it looks like:
In Java, that would be Pattern.compile("\\b57\\d{7}(?:-\\d)?\\b").

Not very different but allows letters and underscores around:
(?:(?<=[^0-9])|^)57[0-9]{7}(?:-[0-9])?(?:(?=[^0-9])|$)

You could use lookahead assertions in this case.
57\d{7}(?:-\d)?(?!\d)
Regular expression:
57 '57'
\d{7} digits (0-9) (7 times)
(?: group, but do not capture:
- '-'
\d digits (0-9)
)? end of grouping
(?! look ahead to see if there is not:
\d digits (0-9)
) end of look-ahead
Or:
(?:57\d{7})(?:-\d)?(?!\d)
Regular expression:
(?: group, but do not capture:
57 '57'
\d{7} digits (0-9) (7 times)
) end of grouping
(?: group, but do not capture
- '-'
\d digits (0-9)
)? end of grouping
(?! look ahead to see if there is not:
\d digits (0-9)
) end of look-ahead

Related

how to create a regular expression to validate the numbers are not same even separated by hyphen (-)

Using the following regex
^(\d)(?!\1+$)\d{3}-\d{1}$
It works for the pattern but I need to validate that all numbers are not the same even after /separated by the hyphen (-).
Example:
0000-0 not allowed (because of all are same digits)
0000-1 allowed
1111-1 not allowed (because of all are same digits)
1234-2 allowed
TheFourthBird's answer surely works that uses a negative lookahead. Here is another variant of this regex that might be slightly faster:
^(\d)(?!\1{3}-\1$)\d{3}-\d$
RegEx Demo
Explanation:
^(\d) matches and captures first digit after start in group #1
(?!\1{3}-\1$) is a negative lookahead that will fail the match if we have 3 repetitions and a hyphen and another repeat of 1st digit.
You could exclude only - or the same digit only to the right till the end of the string:
^(\d)(?!(?:\1|-)*$)\d{3}-\d$
^ Start of string
(\d) Capture group 1, match a digit
(?! Negative lookahead, assert what is to the right is not
(?:\1|-)*$ Optionally repeat either the backrefernce to what is already captured or - till the end of the string
) Close the non capture group
\d{3}-\d Match 3 digits - and a digit
$ End of string
Regex demo
If you don't want to match double -- or an - at the end of the string and match optional repetitions:
^(\d)(?!(?:\1|-)*$)\d*(?:-\d+)*$
Explanation
^ Start of string
(\d) Capture a single digits in group 1
(?!(?:\1|-)*$) Negative lookahead, assert not only - and the same digit till the end of the string
\d* Match optional digits
(?:-\d+)* Optionally repeat matching - and 1+ digits
$ End of string
Regex demo
You'll need a back reference, for example:
^(\d){4}-\1$

How do I get a regex expression to contain only uppercase letters or numbers?

Regex expression: [A-Z]([^0-9]|[^A-Z])+[A-Z]
The requirements are that the string should start and end with a capital letter A-Z, and contain at least one number in between. It should not have anything else besides capital letters on the inside. However, it's accepting spaces and punctuation too.
My expression fails the following test case A65AJ3L 3F,D due to the comma and whitespace.
Why does this happen when I explicitly said only numbers and uppercase letters can be in the string?
Starting the character class with [^ makes is a negated character class.
Using ([^0-9]|[^A-Z])+ matches any char except a digit (but does match A-Z), or any char except A-Z (but does match a digit).
This way it can match any character.
If you would turn it into [A-Z]([0-9]|[A-Z])+[A-Z] it still does not make it mandatory to match at least a single digit on the inside due to the alternation | and it can still match AAA for example.
You might use:
^[A-Z]+[0-9][A-Z0-9]*[A-Z]$
The pattern matches:
^ Start of string
[A-Z]+ Match 1+ times A-Z
[0-9] Match a single digit
[A-Z0-9]* Optionally match either A-Z or 0-9
[A-Z] Match a single char A-Z
$ End of string
Regex demo
Use
^(?=\D*\d\D*$)[A-Z][A-Z\d]*[A-Z]$
See regex proof.
(?=\D*\d\D*$) requires only one digit in the string, no more no less.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\D* non-digits (all but 0-9) (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\D* non-digits (all but 0-9) (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
[A-Z\d]* any character of: 'A' to 'Z', digits (0-9)
(0 or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

value which should match only numbers, dots and commas

I have a regex:
"(\\d+\\.\\,?)+"
And the value:
3.053,500
But my regex pattern does not match it.
I want to have a pattern which validates numbers, dots and commas.
For exmaple values which are valid:
1
12
1,2
1.2
1,23,456
1,23.456
1.234,567
etc.
Your (\d+\.\,?)+ regex matches 1 or more repetitions of 1+ digits, a dot, and an opional ,. It means the strings must end with a dot. 3.053,500 does not end with a dot.
You may use
s.matches("\\d+(?:[.,]\\d+)*")
See the regex demo
Note that the ^ and $ anchors are not necessary in Java's .matches() method as the match is anchored to the start/end of the string automatically. At regex101.com, the anchors are meant to match start/end of the line (since the demo is run against a multiline string).
Pattern details
\d+ - 1+ digits
(?: - start of a non-capturing group:
[.,] - a dot or ,
\d+ - 1+ digits
)* - 0 or more repetitions.

Java - regular expression for get number format

I have this:
110121 NATURAL 95 1570,40
110121 NATURAL 95 1570,40*
41,110 1 x 38,20 CZK)[A] *
' 31,831 261,791 1308,61)
>01572 PRAVO SO 17,00
1,000 ks x 17,00
1570,40
Every line of this output is saved in List and I want to get number 1570,40
My regular expressions looks like this for this type of format
"([1-9][0-9]*[\\.|,][0-9]{2})[^\\.\\d](.*)"
"^([1-9][0-9]*[\\.|,][0-9]{2})$"
I have a problem that 1570,40 at the last line if founded (by second regular expression), also 1570,40 (from line with 1570,40* at the end) but the first line is not founded.. do you know where is the problem?
Not sure I well understand your needs, but I think you could use word boundaries like:
\b([1-9]\d*[.,]\d{2})\b
In order to not match dates, you can use:
(?:^|[^.,\d])(\d+[,.]\d\d)(?:[^.,\d]|$)
explanation:
The regular expression:
(?-imsx:(?:^|[^.,\d])(\d+[,.]\d\d)(?:[^.,\d]|$))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^.,\d] any character except: '.', ',', digits
(0-9)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
[,.] any character of: ',', '.'
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[^.,\d] any character except: '.', ',', digits
(0-9)
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
$ before an optional \n, and the end of
the string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
The "([1-9][0-9]*[\\.|,][0-9]{2})[^\\.\\d](.*)" has [^\\.\\d], it means it expects one non-digit, non-dot symbol right after the number. The second line has * which matches it. First line has the number at the end of line, so nothing matches. I think you need just one regexp which will catch all numbers: [^.\\d]*([1-9][0-9]*[.,][0-9]{2})[^.\\d]*. Also, you should use find instead of match to find any substring in a string instead of matching the whole string. Also, maybe it has a sense to find all matches in case if a line has two such numbers in it, not sure if it is a case for you or not.
Also, use either [0-9] or \d. At the moment it is confusing - it means the same, but looks differently.
Try this:
String s = "41,110 1 x 38,20 CZK)[A] * ";
Matcher m = Pattern.compile("\\d+,\\d+").matcher(s);
while(m.find()) {
System.out.println(m.group());
}

Regex expression in plain english

I'm working on a new Java project and therefore im reading the already existing code. On a very important part of the code if found the following regex expression and i can't really tell what they are doing. Anybody can explain in plain english what they do??
1)
[^,]*|.+(,).+
2)
(\()?\d+(?(1)\))
Next time you need a regex explained, you can use the following explain.pl service from Rick Measham:
Regex: [^,]*|.+(,).+
NODE EXPLANATION
--------------------------------------------------------------------------------
[^,]* any character except: ',' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
Regex: (\()?\d+(?(1)\))
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1 (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\( '('
--------------------------------------------------------------------------------
)? end of \1 (NOTE: because you're using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
(?(1) if back-reference \1 matched, then:
--------------------------------------------------------------------------------
\) ')'
--------------------------------------------------------------------------------
| else:
--------------------------------------------------------------------------------
succeed
--------------------------------------------------------------------------------
) end of conditional on \1
Links
http://rick.measham.id.au/paste/explain.pl
Note on conditionals
JAVA DOES NOT SUPPORT CONDITIONALS! An unconditionalized regex for the second pattern would be something like:
\d+|\(\d+\)
i.e. a non-zero repetition of digits, with or without surrounding parentheses.
Links
regular-expressions.info/If-then-else conditionals
Conditionals are supported by the JGsoft engine, Perl, PCRE and the .NET framework.
The patterns in depth
Here's a test harness for the first pattern
import java.util.regex.*;
//...
Pattern p = Pattern.compile("[^,]*|.+(,).+");
String[] tests = {
"", // [] is a match with no commas
"abc", // [abc] is a match with no commas
",abc", // [,abc] is not a match
"abc,", // [abc,] is not a match
"ab,c", // [ab,c] is a match with separating comma
"ab,c,", // [ab,c,] is a match with separating comma
",", // [,] is not a match
",,", // [,,] is not a match
",,,", // [,,,] is a match with separating comma
};
for (String test : tests) {
Matcher m = p.matcher(test);
System.out.format("[%s] is %s %n", test,
!m.matches() ? "not a match"
: m.group(1) != null
? "a match with separating comma"
: "a match with no commas"
);
}
Conclusion
To match, the string must fall into one of these two cases:
Contains no comma (potentially an empty string)
Contains a comma that separates two non-empty strings
On a match, \1 can be used to distinguish between the two cases
And here's a similar test harness for the second pattern, rewritten without using conditionals (which isn't supported by Java):
Pattern p = Pattern.compile("\\d+|(\\()\\d+\\)");
String[] tests = {
"", // [] is not a match
"0", // [0] is a match without parenthesis
"(0)", // [(0)] is a match with surrounding parenthesis
"007", // [007] is a match without parenthesis
"(007)", // [(007)] is a match with surrounding parenthesis
"(007", // [(007] is not a match
"007)", // [007)] is not a match
"-1", // [-1] is not a match
};
for (String test : tests) {
Matcher m = p.matcher(test);
System.out.format("[%s] is %s %n", test,
!m.matches() ? "not a match"
: m.group(1) != null
? "a match with surrounding parenthesis"
: "a match without parenthesis"
);
}
As previously said, this matches a non-zero number of digits, possibly surrounded by parenthesis (and \1 distinguishes between the two).
1)
[^,]* means any number of characters that are not a comma
.+(,).+ means 1 or more characters followed by a comma followed by 1 or more characters
| means either the first one or the second one
2)
(\()? means zero or one '(' note* backslash is to escape '('
\d+ means 1 or more digits
(?(1)\)) means if back-reference \1 matched, then ')' note* no else is given
Also note that parenthesis are used to capture certain parts of the regular expression, except, of course, if they are escaped with a backslash
1) Anything that doesn't starts with a comma, or anything that contains a comma in between.
2) Any number that ends with a 1, and is between parenthesis, possible closed before and opened again after the number.

Categories

Resources