Using regular expression to validate colon separated inputs - java

I'm reading in a file for a Java application which has data separated by colons in the format:
test : test : 0 : 0
Where the first two segments are names of something and the last two are digits.
The match should fail if the input is not formatted in that exact way above (aside from the data being different)
test : test : 0 : 0 -----> pass
: test: 0 : 0 -----> fail
0 : test : 0 : test -----> fail
test test : 0 : 0 -----> fail
So the match will fail if there are any segments omitted, if the digits and words do not appear where they should, i.e. word : word : digit : digit, and there has to be 3 colons and 4 segments no more no less as above.
This is where I have gotten so far but it's not quite right:
^\D+(?:\s\:\s\w+)*$

You may use a regex like
^[a-zA-Z]+\s*:\s*[a-zA-Z]+(?:\s*:\s*\d+){2}$
Details
^ - start of string (implicit in String#matches)
[a-zA-Z]+ - 1+ ASCII letters
\s*:\s* - a : enclosed with 0+ whitespaces
[a-zA-Z]+ - 1+ ASCII letters
(?:\s*:\s*\d+){2} - two occurrences of : enclosed with 0+ whitespaces and then 1+ digits
$ - end of string (implicit in String#matches)
NOTE: If there must be an obligatory single space between the items, you need to replace \s* with \s. To match 1 or more whitespaces, \s* must be turned into \s+.
In Java, you may write it as
s.matches("[a-zA-Z]+\\s*:\\s*[a-zA-Z]+(?:\\s*:\\s*\\d+){2}")
See the regex demo

Here you go (demo at Regex101):
[a-zA-Z]+\s+:\s+[a-zA-Z]+\s+:\s+\d+\s+:\s+\d+
Explanation:
[a-zA-Z]+ stands for 1 or more letters (+ is the modifiers allowing to match the previous statement at least once
\s+ stands for 1 or more
: is the : character, literally
\d+ stands for at least one digit (remove the + to match one digit exactly)
Finally, compose those parts according to your needs. You might want to make the Regex make stricter replacing the \s+ with only one empty space .
Validate the String using the method String::matches (don't forget to use two slashes \\):
boolean isValid = string.matches("[a-zA-Z]+\\s+:\\s+[a-zA-Z]+\\s+:\\s+\\d+\\s+:\\s+\\d+");

I would just use String#matches on each line, with the following pattern:
[a-z]+ : [a-z]+ : [0-9]+ : [0-9]+
For example:
String line = "test : test : 0 : 0";
if (line.matches("[a-z]+ : [a-z]+ : [0-9]+ : [0-9]+")) {
System.out.println("Found a match");
}

Related

Regex to validate custom format

I have this format: xx:xx:xx or xx:xx:xx-y, where x can be 0-9 a-f A-F and y can be only 0 or 1.
I come up with this regex: ([0-9A-Fa-f]{2}[:][0-9A-Fa-f]{2}[:][0-9A-Fa-f]{2}|[-][0-1]{1})
(See regexr).
But this matches 0a:0b:0c-3 too, which is not expected.
Is there any way to remove these cases from result?
[:] means a character from the list that contains only :. It is the same as
:. The same for [-] which has the same result as -.
Also, {1} means "the previous piece exactly one time". It does not have any effect, you can remove it altogether.
To match xx:xx:xx or xx:xx:xx-y, the part that matches -y must be optional. The quantifier ? after the optional part mark it as optional.
All in all, your regex should be like this:
[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}(-[01])?
If the regex engine you use can be told to ignore the character case then you can get rid of A-F (or a-f) from all character classes and the regex becomes:
[0-9a-f]{2}:[0-9a-f]{2}:[0-9a-f]{2}(-[01])?
How it works, piece by piece:
[0-9a-f] # any digit or letter from (and including) 'a' to 'f'
{2} # the previous piece exactly 2 times
: # the character ':'
[0-9a-f]
{2}
:
[0-9a-f]
{2}
( # start a group; it does not match anything
- # the character '-'
[01] # any character from the class (i.e. '0' or '1')
) # end of group; the group is needed for the next quantifier
? # the previous piece (i.e. the group) is optional
# it can appear zero or one times
See it in action: https://regexr.com/4rfvr
Update
As #the-fourth-bird mentions in a comment, if the regex must match the entire string then you need to anchor its ends:
^[0-9a-f]{2}:[0-9a-f]{2}:[0-9a-f]{2}(-[01])?$
^ as the first character of a regex matches the beginning of the string, $ as the last character matches the end of the string. This way the regex matches the entire string only (when there aren't other characters before or after the xx:xx:xx or xx:xx:xx-y part).
If you use the regex to find xx:xx:xx or xx:xx:xx-y in a larger string then you don't need to add ^ and $. Of course, you can add only ^ or $ to let the regex match only at the beginning or at the end of the string.
You want
xx:xx:xx or if it is followed by a -, then it must be a 0 or 1 and then it is the end (word boundry).
So you don't want any of these
0a:0b:0c-123
0a:0b:0cd
10a:0b:0c
either.
Then you want "negative lookingahead", so if you match the first part, you don't want it to be followed by a - (the first pattern) and it should end there (word boundary), and if it is followed by a -, then it must be a 0 or 1, and then a word boundary:
/\b([0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}(?!-)\b|\b[0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}-[01]\b)/i
To prevent any digit in front, a word boundary is added to the front as well.
Example: https://regexr.com/4rg42
The following almost worked:
/\b([0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}\b[^-]|\b[0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}-[01]\b)/i
but if it is the end of file and it is 3a:2b:11, then the [^-] will try to match a non - character and it won't match.
Example: https://regexr.com/4rg4q

How to use regex groups in Java

I need to replace string 'name' with fullName in the following kind of strings:
software : (publisher:abc and name:oracle)
This needs to be replaced as:
software : (publisher:abc and fullName:xyz)
Now, basically, part "name:xyz" can come anywhere inside parenthesis. e.g.
software:(name:xyz)
I am trying to use groups and the regex I built looks :
(\bsoftware\s*?:\s*?\()((.*?)(\s*?(and|or)\s*?))(\bname:.*?\)\s|:.*?\)$)
You may use
\b(software\s*:\s*\([^()]*)\bname:\w+
and replace with $1fullName:xyz. See the regex demo and the regex graph:
Details
\b - word boundary
(software\s*:\s*\([^()]*) - Capturing group 1 ($1 in the replacement pattern is a placeholder for the value captured in this group):
software - a word
\s*:\s* - a : enclosed with 0+ whitespaces
\( - a ( char
[^()]* - 0 or more chars other than ( and )
\bname - whole word name
: - colon
\w+ - 1 or more letters, digits or underscores.
Java sample code:
String result = s.replaceAll("\\b(software\\s*:\\s*\\([^()]*)\\bname:\\w+", "$1fullName:xyz");

Regex help in android

I have two lines in Array list which contains number
line1 1234 5694 7487
line2 10/02/1992 or 1992
I used different regex to get both the line, but the problem is when I use the regex ([0-9]{4}//s?)([0-9]{4}//s?)([0-9]{4}//n) . It gets the first line cool.
But for checking the line2 I used ([0-9]{2}[/-])?([0-9]{2}[/-])?([0-9]{4}).
this regex instead of returning the last line its returning first 4 numbers of the line1.
As stated in the comments below you are using .matches which returns true if the whole string can be matched.
In your pattern ([0-9]{2}[/-])?([0-9]{2}[/-])?([0-9]{4}) it would also match only 4 digits as the first 2 groups ([0-9]{2}[/-])?([0-9]{2}[/-])? are optional due to the question mark ? leaving the 3rd group ([0-9]{4}) able to match 4 digits.
What you might do instead is to use an alternation to either match a date like format where the first 2 parts including the delimiter are optional. Or match 3 times 4 digits.
.*?(?:(?:[0-9]{2}[/-]){2}[0-9]{4}|[0-9]{4}(?:\h[0-9]{4}){2}).*
Explanation
.*? Match any character except a newline non greedy
(?: Non capturing groupo
(?:[0-9]{2}[/-]){2} Repeat 2 times matching 2 digits and / or -
[0-9]{4} Match 4 digits
| Or
[0-9]{4} Match 4 digits
(?:\\h[0-9]{4}){2} Repeat 2 times matching a horizontal whitespace char and 4 digits
) Close non capturing group
.* Match 0+ times any character except a newline
Regex demo | Java demo
For example
List<String> list = Arrays.asList(
new String[]{
"10/02/1992 or 1992",
"10/02/1992",
"10/1992",
"02/1992",
"1992",
"1234 5694 7487"
}
);
String regex = ".*?(?:(?:[0-9]{2}[/-]){2}[0-9]{4}|[0-9]{4}(?:\\h[0-9]{4}){2}).*";
for (String str: list) {
if (str.matches(regex)){
System.out.println(str);
}
}
Result
10/02/1992 or 1992
10/02/1992
1234 5694 7487
Note that in your first pattern I think you mean \\s instead of //s.
The \\s will also match a newline. If you want to match a single space you could just match that or use \\h to match a horizontal whitespace character.

REGEX greediness or just wrong syntax

I tried to delete all the [.!?] from quotes in a text and doing so , I want first to catch all my quotes including [.!?] with a regex to delete them after that.
My regex doesn't work, maybe because it's greedy. It takes from my "«" (character at index 569) to the last character which is another "»" (character at index 2730).
My regex was:
Pattern full=Pattern.compile("«.*[.!?].*?»");
Matcher mFull = full.matcher(result);
while(mFull.find()){
System.out.println(mFull.start()+" "+mFull.end());
}
So I got:
569 2731
Also , Same problem of greediness , with catching sentences ( beginning with any [A-Z] and ending with any [.!?].
You may use
s = s.replaceAll("(\\G(?!^)|«)([^«».!?]*)[.!?](?=[^«»]*»)", "$1$2");
See the regex demo
Details
(\G(?!^)|«) - Group 1 (whose value is referred to with $1 from the replacement pattern): either the end of the previous match or «
([^«».!?]*) - Group 2 ($2): any 0+ chars other than «, », !, . and ?
[.!?] - any of the three symbols
(?=[^«»]*») - there must be a » after 0 or more chars other than « and » immediately to the right of the current location.

Validate range for fetching records

I am working on a regular expression which requires the following pattern with and without spaces
Comma separated list of alphanumeric values : DG1, D3 OR R4,UI2
Comma separated list of alphanumeric and numeric values : D1, 2 OR D1,2
Range of alphanumeric values : DG1 - DG5 OR DG1-DG5
Range of alphanumeric and numeric values : DG1 - 8 OR DG1-8 OR 8-DG11 OR 8 - DG13
A combination of range and comma separated values : DG1, DG3-DG7 OR DG1,DG3-DG6
A combination of range and comma separated alphanumeric and numeric values : DG1, 3-DG7 OR 1,DG3-6
No other special character should be allowed other than comma and hyphen
2 special characters cannot come together
Cannot start or end with a special character
Invalid values may be
,1,DG1
-DG1-5
DG1 - 3 - GP9
1,F4,
RE3-
1,-G3
5,S3,-9
I'd use:
^([A-Z0-9]+(?:-[A-Z0-9]+)?)(?:,(?1))*$
Explanation:
^ : Start of string
( : begin group 1
[A-Z0-9]+ : 1 or more alphanum
(?: : begin NON capture group
- : literally -
[A-Z0-9]+ : 1 or more alphanum
)? : end group, optional
) : end group 1
(?: : begin NON capture group
, : literally ,
(?1) : repeat regex in group 1
)? : end group,optional
$
If (?1) doesn't work, have to duplicate the relevant part:
^[A-Z0-9]+(?:-[A-Z0-9]+)?(?:,[A-Z0-9]+(?:-[A-Z0-9]+)?)*$
If you want to deal with optional spaces:
^\s*[A-Z0-9]+(?:\s*-\s*[A-Z0-9]+)?(?:\s*,\s*[A-Z0-9]+(?:\s*-\s*[A-Z0-9]+)?)*\s*$

Categories

Resources