Regex to match comma separated values - java

I'm new to Regex in Java and I wanted to know how can I build one that only takes a string that consists of one or two comma-separated lists of uppercase letters, separated by a single whitespace.
I would need to filter out strings that start with a comma, that end with a comma or strings that have multiple consecutive commas.
All these would be invalid:
"D,, D"
"D D,,"
"D, ,D"
"D, ,,D"
"D,, ,D"
"D,,"
",,A"
",A"
"A,"
All these would be valid:
"D,D T,F"
"D,D T"
"A,A"
"A"
I used (\s?("[\w\s]*"|\d*)\s?(,,|$)) for consecutive commas but it doesn't do the trick when the comma is at the end or beggining of one of the whitespace separated substring like "D, ,D"
Should I aim to split by whitespace and look for a simpler regex for each of the substrings?

That would be something like this:
^[A-Z](,[A-Z])*( [A-Z](,[A-Z])*)*$
What happens here, is the following:
We expect a letter, optionally followed by one or more times a comma-immediately-followed-by-another-letter.
Then we optionally accept a space, and then the abovementioned pattern. And this is repeated.
Test: https://regex101.com/r/kzLhtw/1
You could, of course, slightly optimize the regex by making all capturing groups non-capturing: just put ?: immediately behind the (, that is, (?:.

You might use
^[A-Z](?: [A-Z])*(?:,[A-Z](?: [A-Z])*){0,2}$
^ Start of string
[A-Z] Match a single char A-Z
(?: [A-Z])* Optionally repeat a space and and a single char A-Z
(?: Non capture group
,[A-Z](?: [A-Z])* Match a comma, char A-Z followed by optionally repeat matching a space and a char A-Z
){0,2} Close the group and repeat 0-2 times
$ End of string
Regex demo

"a string that consists of one or two comma-separated lists of uppercase letters, separated by a single whitespace"
Not sure how to exactly interpretate the above, but my reading is: One or two comma-seperated lists where each list may only consist of uppercase characters. In the case of two lists, the two lists are seperated by a single space.
You could try:
^(?!.* .* )[A-Z](?:[ ,][A-Z])*$
See the online demo
^ - Start string anchor.
(?!.* .* ) - Negative lookahead to prevent two spaces present.
[A-Z] - A single uppercase alpha-char.
(?: - Open non-capture group:
[ ,] - A comma or space.
[A-Z] - A single uppercase alpha-char.
)* - Close non-capture group and match 0+ times upt to;
$ - End string anchor.

Related

Every word in the sentence capital letter

I'm trying to write an regex expression for my task.
Every word in the sentence starts with a capital letter, the rest is lower case letter.
(^[A-Z]{1}[a-z\s]+)+
e.g.
Java Test - ok
Java test - not ok
JaVa Test - not ok
java Test - not ok
The pattern you tried will also match Java test because the character class [a-z\s]+ repeats 1+ times any of the listed including a space and does not force the second word to start with an uppercase char.
You could repeat the part matching an uppercase char followed by 1+ lower case chars for every iteration.
Note that \s will also match a newline and you can omit {1}
^[A-Z][a-z]+(?: [A-Z][a-z]+)*$
^ Start of string
[A-Z][a-z]+ Match 1 uppercase A-Z and 1+ lowercase a-z
(?: Non capturing group
[A-Z][a-z]+ Match a space, 1 uppercase A-Z and 1+ lowercase chars a-z
)* Close non capturing group and repeat 1+ times
$ End of string
Regex demo
Instead of matching a single space you could also match 1+ horizonltal whitespace chars using \h (In java \\h)
Regex demo
If there can be single character in any of the words like :
This Is A Test
I Am A Programmer
then you can use :
^(\b[A-Z][a-z]*\s?\b)+$
Demo and explanation can be found here
Otherwise If there are always more than one character in every word, you can use :
^(\b[A-Z][a-z]+\s?\b)+$
Demo and explanation can be found here.

How to avoid a hyphen from splitting a regex?

I'm writing a simple android app for saving your favorite games in a list.
In the first screen a user has to enter his gamertag (as a String). The gamertag should only contain letters from a-z (uppercase and lowercase), numbers (0-9) and underscores/hpyhens (_ and -).
I can get it to work with an underscore in every position or a hyphen at the beginning. But if the String contains a hyphen in the middle it gets "split" into two pieces and if the hyphen is at the end, it stands alone.
I came up with this regex:
[a-zA-Z0-9_\-]\w+
in java it looks a little different because the \ needs to be escaped:
[a-zA-Z0-9_\\-]\\w+
Gamertags that should validate:
- GamerTag
- Gamer_Tag
- _GamerTag
- GamerTag_
- -GamerTag
- Gamer-Tag
- GamerTag-
Gamertags that shouldn't validate:
- !GamerTag
- Gamer%Tag
- Gamer Tag
Gamertags that should validate, but my regex fails:
- Gamer-Tag
- GamerTag-
Your pattern [a-zA-Z0-9_\-]\w+ matches 1 character out of the character class followed by 1+ times a word character \w which does not match a -.
You could repeat the character class 1+ times where the hyphen is present and if the hyphen is at the end of the character class you don't have to eacape it.
[a-zA-Z0-9_-]+
The Gamer-Tag does not get split but has 2 matches. The character class matches G and the \w+ matches amer. Then in the next match the character class matches - and \w+ matches Tag.
If those are the only values allowed, you could use anchors ^ to assert the start and $ to assert the end of the string.
^[a-zA-Z0-9_-]+$
Regex demo

Regex to allow a space instead of following numbers after first two letters

I need a RegEx that allow a single space after two letters i.e. AB123 should not be allowed but AB 123 should be allowed ?
Here is the regex [a-zA-Z]{2}\s\S*
[a-zA-Z] means character from a to Z
{2} means character twice
\s means white space
\S means non white space.
* duplicate with 0 or more
https://regex101.com/r/uWYci4/1
This pattern will do the work: ^[a-zA-Z]{2} \d+$
Explanation:
^ - match beginning of a string
[a-zA-Z]{2} - match two letters (upper- or lowercase),
- match space
\d+ - match one or more digits
$ - match end of a string
Demo

Java Regex to match String password

I have recently encountered this question in the text book:
I am suppose to write a method to check if a string have:
at least ten characters
only letters and digits
at least three digits
I am trying to solve it by Regx, rather than iterating through every character; this is what I got so far:
String regx = "[a-z0-9]{10,}";
But this only matches the first two conditions. How should I go about the 3rd condition?
You could use a positive lookahead for 3rd condition, like this:
^(?=(?:.*\d){3,})[a-z0-9]{10,}$
^ indicates start of string.
(?= ... ) is the positive lookahead, which will search the whole string to match whatever is between (?= and ).
(?:.*\d){3,} matches at least 3 digits anywhere in the string.
.*\d matches a digit preceded by any (or none) character (if omitted then only consecutive digits would match).
{3,} matches three or more of .*\d.
(?: ... ) is a non-capturing group.
$ indicates end of string.

Java Regular Expression: what is " '- "

I came up to a line in java that uses regular expressions.
It needs a user input of Last Name
return lastName.matches( "[a-zA-z]+([ '-][a-zA-Z]+)*" );
I would like to know what is the function of the [ '-].
Also why do we need both a "+" and a "*" at the same time, and the [ '-][a-zA-Z] is in brackets?
Your RE is: [a-zA-z]+([ '-][a-zA-Z]+)*
I'll break it down into its component parts:
[a-zA-Z]+
The string must begin with any letter, a-z or A-Z, repeated one or more times (+).
([ '-][a-zA-Z]+)*
[ '-]
Any single character of <space>, ', or -.
[a-zA-Z]+
Again, any letter, a-z or A-Z, repeated once or more times.
This combination of letters ('- and a-ZA-Z) may then be repeated zero or more times.
Why [ '-]? To allow for hiphenated names, such as Higgs-Boson or names with apostrophes, such as O'Reilly, or names with spaces such as Van Dyke.
The expression [ '-] means "one of ', , or -". The order is very important - the dash must be the last one, otherwise the character class would be considered a range, and other characters with code points between the space and the quote ' would be accepted as well.
+ means "one or more repetitions"; * means "zero or more repetitions", referring to the term of the regular expression preceding the + or * modifier.]
Overall, the expression matches groups of lowercase and uppercase letters separated by spaces, dashes, or single quotes.
it means it can be any of the characters space ' or - ( space, quote dash )
the - can be done as \- as it also can mean a range... like a-z
This looks like it is a pattern to match double-barreled (space or hyphen) or I-don't-know-what-to-call-it names like O'Grady... for example:
It would match
counter-terrorism
De'ville
O'Grady
smith-jones
smith and wesson
But it will not match
jones-
O'Learys'
#hashtag
Bob & Sons
The idea is, after the first [A-Za-z]+ consumes all the letters it can, the match will end right there unless the next character is a space, an apostrophe, or a hyphen ([ '-]). If one of those characters is present, it must be followed by at least one more letter.
A lot of people have difficulty with this. The naively write something like [A-Za-z]+[ '-]?[A-Za-z]*, figuring both the separator and the extra chunks of letters are optional. But they're not independently optional; if there is a separator ([ '-]), it must be followed by at least one more letter. Otherwise it would treat strings like R'- j'-' as valid. Your regex doesn't have that problem.
By the way, you've got a typo in your regex: [a-zA-z]. You want to watch out for that, because [A-z] does match all the uppercase and lowercase letters, so it will seem to be working correctly as long as the inputs are valid. But it also matches several non-letter characters whose code points happen to lie between those of Z and a. And very few IDEs or regex tools will catch that error.

Categories

Resources