I'm new to Regex in Java and I wanted to know how can I build one that only takes a string that consists of one or two comma-separated lists of uppercase letters, separated by a single whitespace.
I would need to filter out strings that start with a comma, that end with a comma or strings that have multiple consecutive commas.
All these would be invalid:
"D,, D"
"D D,,"
"D, ,D"
"D, ,,D"
"D,, ,D"
"D,,"
",,A"
",A"
"A,"
All these would be valid:
"D,D T,F"
"D,D T"
"A,A"
"A"
I used (\s?("[\w\s]*"|\d*)\s?(,,|$)) for consecutive commas but it doesn't do the trick when the comma is at the end or beggining of one of the whitespace separated substring like "D, ,D"
Should I aim to split by whitespace and look for a simpler regex for each of the substrings?
That would be something like this:
^[A-Z](,[A-Z])*( [A-Z](,[A-Z])*)*$
What happens here, is the following:
We expect a letter, optionally followed by one or more times a comma-immediately-followed-by-another-letter.
Then we optionally accept a space, and then the abovementioned pattern. And this is repeated.
Test: https://regex101.com/r/kzLhtw/1
You could, of course, slightly optimize the regex by making all capturing groups non-capturing: just put ?: immediately behind the (, that is, (?:.
You might use
^[A-Z](?: [A-Z])*(?:,[A-Z](?: [A-Z])*){0,2}$
^ Start of string
[A-Z] Match a single char A-Z
(?: [A-Z])* Optionally repeat a space and and a single char A-Z
(?: Non capture group
,[A-Z](?: [A-Z])* Match a comma, char A-Z followed by optionally repeat matching a space and a char A-Z
){0,2} Close the group and repeat 0-2 times
$ End of string
Regex demo
"a string that consists of one or two comma-separated lists of uppercase letters, separated by a single whitespace"
Not sure how to exactly interpretate the above, but my reading is: One or two comma-seperated lists where each list may only consist of uppercase characters. In the case of two lists, the two lists are seperated by a single space.
You could try:
^(?!.* .* )[A-Z](?:[ ,][A-Z])*$
See the online demo
^ - Start string anchor.
(?!.* .* ) - Negative lookahead to prevent two spaces present.
[A-Z] - A single uppercase alpha-char.
(?: - Open non-capture group:
[ ,] - A comma or space.
[A-Z] - A single uppercase alpha-char.
)* - Close non-capture group and match 0+ times upt to;
$ - End string anchor.
I'm writing a syntax checker (in Java) for a file that has the keywords and comma (separation)/semicolon (EOL) separated values. The amount of spaces between two complete constructions is unspecified.
What is required:
Find any duplicate words (consecutive and non-consecutive) in the multiline file.
// Example_1 (duplicate 'test'):
item1 , test, item3 ;
item4,item5;
test , item6;
// Example_2 (duplicate 'test'):
item1 , test, test ;
item2,item3;
I've tried to apply the (\w+)(s*\W\s*\w*)*\1 pattern, which doesn't catch duplicate properly.
You may use this regex with mode DOTALL (single line):
(?s)(\b\w+\b)(?=.*\b\1\b)
RegEx Demo
RegEx Details:
(?s): Enable DOTALL mode
(\b\w+\b): Match a complete word and capture it in group #1
(?=.*\b\1\b): Lookahead to assert that we have back-reference \1 present somewhere ahead. \b is used to make sure we match exact same word again.
Additionally:
Based on earlier comments below if intent was to not match consecutive word repeats like item1 item1, then following regex may be used:
(?s)(\b\w+\b)(?!\W+\1\b)(?=.*\b\1\b)
RegEx Demo 2
There is one extra negative lookahead assertion here to make sure we don't match consecutive repeats.
(?!\W+\1\b): Negative lookahead to fail the match for consecutive repeats.
You may use
\b(\w+)\b(?:\s*[^\w\s]\s*\w+)+\s*[^\w\s]\s*\b\1\b
See the regex demo
Details
\b(\w+)\b - Group 1: one or more word chars as a whole word
(?:\s*[^\w\s]\s*\w+)+ - 1 or more occurrences of:
\s* - 0+ whitespaces
[^\w\s] - 1 char other than a word and whitespace char
\s* - 0+ whitespaces
\w+ - 1+ word chars
\s* - 0+ whitespaces
[^\w\s] - 1 char other than a word and whitespace char
\s* - 0+ whitespaces
\b\1\b - the same value as in Group 1 as whole word.
To only match the word, put the second part of the regex into a positive lookahead:
\b(\w+)\b(?=(?:\s*[^\w\s]\s*\w+)+\s*[^\w\s]\s*\b\1\b)
^^^ ^
See this regex demo.
Java regex variable declaration:
String regex = "\\b(\\w+)\\b(?:\\s*[^\\w\\s]\\s*\\w+)+\\s*[^\\w\\s]\\s*\\b\\1\\b";
To make it fully Unicode aware add (?U):
String regex = "(?U)\\b(\\w+)\\b(?:\\s*[^\\w\\s]\\s*\\w+)+\\s*[^\\w\\s]\\s*\\b\\1\\b";
I am trying to replace everything except a specific expression including digits in java using only the replaceAll() method and a single regex.
Given the String P=32 N=5 M=2 I want to extract each variable independently.
I can match the expression N=5 with the regex N=\d, but I can't seem to find an inverse expression that will match anything but N=\d, where x may be any digit.
I do not want to use Pattern or Matcher but solve this using regex only. So for x, y, z being any digit, I want to be able to replace everything but the expression N=y in a String P=x N=y M=z:
String input = "P=32 N=5 M=2";
output = input.replaceAll(regex, "");
System.out.println(output);
// expected "N=5"
You may use
s = s.replaceAll("\\s*\\b(?!N=\\d)\\w+=\\d+", "").trim();
See the Java demo and the regex demo.
Details
\s* - 0+ whitespaces
\b - a word boundary
(?!N=\d) - immediately to the right, there should be no N= and any digit
\w+ - 1+ letters/digits/_
= - an = sign
\d+ - 1+ digits.
I need a RegEx that allow a single space after two letters i.e. AB123 should not be allowed but AB 123 should be allowed ?
Here is the regex [a-zA-Z]{2}\s\S*
[a-zA-Z] means character from a to Z
{2} means character twice
\s means white space
\S means non white space.
* duplicate with 0 or more
https://regex101.com/r/uWYci4/1
This pattern will do the work: ^[a-zA-Z]{2} \d+$
Explanation:
^ - match beginning of a string
[a-zA-Z]{2} - match two letters (upper- or lowercase),
- match space
\d+ - match one or more digits
$ - match end of a string
Demo
Trying to match a string of numbers with spaces in between while ignoring other strings of numbers without spaces in between them. I'd like to match 16 characters.
eg. Would like to match 12345 67890 1234 but NOT 1234567890123456
I have tried this:
[0-9 ]{16}
But this matches both sets of strings.
I used and corrected #Wiktor Stribiżew regex, because original regex will match a space at the beginning and the end of the number.
Regex: \b(?![0-9]{16})\d[0-9 ]{14}\d\b
Details:
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
(?!) Negative Lookahead
[] Match a single character present in the list 0-9
{n} Matches exactly n times
\d matches a digit (equal to [0-9])
RegEx demo
You can use this regex to enforcees at least one space in between numbers:
\d+(?:\h+\d+)+
RegEx Demo
\d+: Match 1+ digits
(?:\h+\d+)+: Match 1+ group of 1+ whitespace and 1+ digits