How to allow specific delimiters in between numeric pattern - java

I have a big Java regex pattern composed of multiple subpatterns concatenated by OR (|). I want to allow multiple delimiters anywhere in between the numbers.
For example, I have the following pattern "(3[47][0-9]{13})|(56022[1-5][0-9]{10}|(5610)[0-9]{12})". How do I allow the following delimiters: equal to (=), backslash (\), dot (.), hyphen (-) and white space ().
These delimiters can appear anywhere (except start and end) and any number of times in between the numbers which match the numeric pattern.

You will have to insert [\s=\\.-]* pattern (it matches zero or more whitespaces, =, \, . and -) in between all digit matching patterns and convert \d{X} into \d(?:[\s=\\.-]*\d){X-1} patterns:
(3[\s=\\.-]*[47][\s=\\.-]*[0-9](?:[\s=\\.-]*[0-9]){12})|(5[\s=\\.-]*6[\s=\\.-]*0[\s=\\.-]*2[\s=\\.-]*2[\s=\\.-]*[1-5][\s=\\.-]*[0-9](?:[\s=\\.-]*[0-9]){9}|(5[\s=\\.-]*6[\s=\\.-]*1[\s=\\.-]*0)[\s=\\.-]*[0-9](?:[\s=\\.-]*[0-9]){11})
See the regex demo
Do not forget to double the backslashes when using the pattern inside a Java string literal:
String part_of_regex = "(3[\\s=\\\\.-]*[47][\\s=\\\\.-]*[0-9](?:[\\s=\\\\.-]*[0-9]){12})|(5[\\s=\\\\.-]*6[\\s=\\\\.-]*0[\\s=\\\\.-]*2[\\s=\\\\.-]*2[\\s=\\\\.-]*[1-5][\\s=\\\\.-]*[0-9](?:[\\s=\\\\.-]*[0-9]){9}|(5[\\s=\\\\.-]*6[\\s=\\\\.-]*1[\\s=\\\\.-]*0)[\\s=\\\\.-]*[0-9](?:[\\s=\\\\.-]*[0-9]){11})";

Related

Regex Pattern Format Validation

I need a RegEx pattern which will be sent by the client where the starting characters will be alphanumeric, the length of this starting String will be defined by the number after this String. This is followed by a special character which will always be a single character. This is again followed by a variable length string of alphanumeric characters.
I have come closest to the below String and formats.
[A-Za-z0-9]{4}-[A-Za-z0-9]{5} - RegEx Input String
[A-Za-z0-9]{2}#[A-Za-z0-9]{6} - RegEx Input String
[0-9]{3}#[0-9]{5} - RegEx Input String
[a-z]{5}#[a-z]{5} - RegEx Input String
[A-Z]{4}#[a-z]{4} - RegEx Input String
[\w]{\d{1,1}}(\S{1,1})[\w]{\d{1,1}} - RegEx Format
Is the above pattern and format correct?
Can we validate the RegEx input string against the required RegEx format?
This is a web service which will have an input as [A-Za-z0-9]{4}-[A-Za-z0-9]{5}. I need two things here. First, how do I validate this input to see if it matches the format I want and the proceed. The format is the one I mentioned above as RegEx format.
This regular expression should match the subset of regular expressions you're interested in :
\[(?:[a-zA-Z0-9](?:-[a-zA-Z0-9])?)*\](?:\{\d\})?\S\[(?:[a-zA-Z0-9](?:-[a-zA-Z0-9])?)*\](?:\{\d\})?
Let's break it down :
it matches a line which contains in sequence a character class, an optional quantifier, a separator, a second character class and its second optional quantifier
the separator is any non-whitespace character, \S (you might want to change that to something more specific, or which includes some whitespaces)
the optional quantifier is easy, it's a digit surrounded with literal curly brackets, the whole enclosed in an unbound group we use to make it optional : (?:\{\d\})?. Note that this will not accept multiple digits length, so you might want to change the \d to \d+, nor the more specific {m,n} range quantifier.
a character class is a sequence of 0 or more characters or character-ranges, enclosed in literal brackets.
a character is a letter or digit : [a-zA-Z0-9](?:-[a-zA-Z0-9])? when the unbounded group isn't matched
a character range is a character followed by the literal - followed by another character : [a-zA-Z0-9](?:-[a-zA-Z0-9])? when the unbounded group is matched

Java regex "[.]" vs "."

I'm trying to use some regex in Java and I came across this when debugging my code.
What's the difference between [.] and .?
I was surprised that .at would match "cat" but [.]at wouldn't.
[.] matches a dot (.) literally, while . matches any character except newline (\n) (unless you use DOTALL mode).
You can also use \. ("\\." if you use java string literal) to literally match dot.
The [ and ] are metacharacters that let you define a character class. Anything enclosed in square brackets is interpreted literally. You can include multiple characters as well:
[.=*&^$] // Matches any single character from the list '.','=','*','&','^','$'
There are two specific things you need to know about the [...] syntax:
The ^ symbol at the beginning of the group has a special meaning: it inverts what's matched by the group. For example, [^.] matches any character except a dot .
Dash - in between two characters means any code point between the two. For example, [A-Z] matches any single uppercase letter. You can use dash multiple times - for example, [A-Za-z0-9] means "any single upper- or lower-case letter or a digit".
The two constructs above (^ and -) are common to nearly all regex engines; some engines (such as Java's) define additional syntax specific only to these engines.
regular-expression constructs
. => Any character (may or may not match line terminators)
and to match the dot . use the following
[.] => it will matches a dot
\\. => it will matches a dot
NOTE: The character classes in Java regular expression is defined using the square brackets "[ ]", this subexpression matches a single character from the specified or, set of possible characters.
Example : In string address replaces every "." with "[.]"
public static void main(String[] args) {
String address = "1.1.1.1";
System.out.println(address.replaceAll("[.]","[.]"));
}
if anything is missed please add :)

Regular expression Java Merge Pattern

I've these three regular expressions. They work individually but i would like to merge them in a single pattern.
regex1 = [0-9]{16}
regex2 = [0-9]{4}[-][0-9]{4}[-][0-9]{4}[-][0-9]{4}
regex3 = [0-9]{4}[ ][0-9]{4}[ ][0-9]{4}[ ][0-9]{4}
I use this method:
Pattern.compile(regex);
Which is the regex string to merge them?
You can use backreferences:
[0-9]{4}([ -]|)([0-9]{4}\1){2}[0-9]{4}
This will only match if the seperators are either all
spaces
hyphens
blank
\1 means "this matches exactly what the first capturing group – expression in parentheses – matched".
Since ([ -]|) is that group, both other separators need to be the same for the pattern to match.
You can simplify it further to:
\d{4}([ -]|)(\d{4}\1){2}\d{4}
The following should match anything the three patterns match:
regex = [0-9]{4}[- ]?[0-9]{4}[- ]?[0-9]{4}[- ]?[0-9]{4}
That is, I'm assuming you are happy with either a hyphen, a space or nothing between the numbers?
Note: this will also match situations where you have any combination of the three, e.g.
0000-0000 00000000
which may not be desired?
Alternatively, if you need to match any of the three individual patterns then just concatenate them with |, as follows:
([0-9]{16})|([0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4})|([0-9]{4} [0-9]{4} [0-9]{4} [0-9]{4})
(Your original example appears to have unnecessary square brackets around the space and hyphen)

Splitting a string that has escape sequence using regular expression in Java

String to be split
abc:def:ghi\:klm:nop
String should be split based on ":"
"\" is escape character. So "\:" should not be treated as token.
split(":") gives
[abc]
[def]
[ghi\]
[klm]
[nop]
Required output is array of string
[abc]
[def]
[ghi\:klm]
[nop]
How can the \: be ignored
Use a look-behind assertion:
split("(?<!\\\\):")
This will only match if there is no preceding \. Using double escaping \\\\ is required as one is required for the string declaration and one for the regular expression.
Note however that this will not allow you to escape backslashes, in the case that you want to allow a token to end with a backslash. To do that you will have to first replace all double backslashes with
string.replaceAll("\\\\\\\\", ESCAPE_BACKSLASH)
(where ESCAPE_BACKSLASH is a string which will not occur in your input) and then, after splitting using the look-behind assertion, replace the ESCAPE_BACKSLASH string with an unescaped backslash with
token.replaceAll(ESCAPE_BACKSLASH, "\\\\")
Gumbo was right using a look-behind assertion, but in case your string contains the escaped escape character (e.g. \\) right in front of a comma, the split might break. See this example:
test1\,test1,test2\\,test3\\\,test3\\\\,test4
If you do a simple look-behind split for (?<!\\), as Gumbo suggested, the string gets split into two parts only test1\,test1 and test2\\,test3\\\,test3\\\\,test4. This is because the look-behind just checks one character back for the escape character. What would actually be correct, if the string is split on commas and commas preceded by an even number of escape characters.
To achieve this a slightly more complex (double) look-behind expression is needed:
(?<!(?<![^\\]\\(?:\\{2}){0,10})\\),
Using this more complex regular expression in Java, again requires to escape all \ by \\. So this should be a more sophisticated answer to your question:
"any comma separated string".split("(?<!(?<![^\\\\]\\\\(?:\\\\{2}){0,10})\\\\),");
Note: Java does not support infinite repetitions inside of lookbehinds. Therefore only up to 10 repeating double escape characters are checked by using the expression {0,10}. If needed, you can increase this value by adjusting the latter number.

Particular java regular expression

How would I check that a String input in Java has the format:
xxxx-xxxx-xxxx-xxxx
where x is a digit 0..9?
Thanks!
To start, this is a great source of regexps: http://www.regular-expressions.info. Visit it, poke and play around. Further the java.util.Pattern API has a concise overview of regex patterns.
Now, back to your question: you want to match four consecutive groups of four digits separated by a hyphen. A single group of 4 digits can in regex be represented as
\d{4}
Four of those separated by a hyphen can be represented as:
\d{4}-\d{4}-\d{4}-\d{4}
To make it shorter you can also represent a single group of four digits and three consecutive groups of four digits prefixed with a hyphen:
\d{4}(-\d{4}){3}
Now, in Java you can use String#matches() to test whether a string matches the given regex.
boolean matches = value.matches("\\d{4}(-\\d{4}){3}");
Note that I escaped the backslashes \ by another backslash \, because the backslashes have a special meaning in String. To represent the actual backslash, you'd have to use \\.
String objects in Java have a matches method which can check against a regular expression:
myString.matches("^\\d{4}(-\\d{4}){3}$")
This particular expression checks for four digits, and then three times (a hyphen and four digits), thus representing your required format.

Categories

Resources