Every word in the sentence capital letter - java

I'm trying to write an regex expression for my task.
Every word in the sentence starts with a capital letter, the rest is lower case letter.
(^[A-Z]{1}[a-z\s]+)+
e.g.
Java Test - ok
Java test - not ok
JaVa Test - not ok
java Test - not ok

The pattern you tried will also match Java test because the character class [a-z\s]+ repeats 1+ times any of the listed including a space and does not force the second word to start with an uppercase char.
You could repeat the part matching an uppercase char followed by 1+ lower case chars for every iteration.
Note that \s will also match a newline and you can omit {1}
^[A-Z][a-z]+(?: [A-Z][a-z]+)*$
^ Start of string
[A-Z][a-z]+ Match 1 uppercase A-Z and 1+ lowercase a-z
(?: Non capturing group
[A-Z][a-z]+ Match a space, 1 uppercase A-Z and 1+ lowercase chars a-z
)* Close non capturing group and repeat 1+ times
$ End of string
Regex demo
Instead of matching a single space you could also match 1+ horizonltal whitespace chars using \h (In java \\h)
Regex demo

If there can be single character in any of the words like :
This Is A Test
I Am A Programmer
then you can use :
^(\b[A-Z][a-z]*\s?\b)+$
Demo and explanation can be found here
Otherwise If there are always more than one character in every word, you can use :
^(\b[A-Z][a-z]+\s?\b)+$
Demo and explanation can be found here.

Related

Regex to match comma separated values

I'm new to Regex in Java and I wanted to know how can I build one that only takes a string that consists of one or two comma-separated lists of uppercase letters, separated by a single whitespace.
I would need to filter out strings that start with a comma, that end with a comma or strings that have multiple consecutive commas.
All these would be invalid:
"D,, D"
"D D,,"
"D, ,D"
"D, ,,D"
"D,, ,D"
"D,,"
",,A"
",A"
"A,"
All these would be valid:
"D,D T,F"
"D,D T"
"A,A"
"A"
I used (\s?("[\w\s]*"|\d*)\s?(,,|$)) for consecutive commas but it doesn't do the trick when the comma is at the end or beggining of one of the whitespace separated substring like "D, ,D"
Should I aim to split by whitespace and look for a simpler regex for each of the substrings?
That would be something like this:
^[A-Z](,[A-Z])*( [A-Z](,[A-Z])*)*$
What happens here, is the following:
We expect a letter, optionally followed by one or more times a comma-immediately-followed-by-another-letter.
Then we optionally accept a space, and then the abovementioned pattern. And this is repeated.
Test: https://regex101.com/r/kzLhtw/1
You could, of course, slightly optimize the regex by making all capturing groups non-capturing: just put ?: immediately behind the (, that is, (?:.
You might use
^[A-Z](?: [A-Z])*(?:,[A-Z](?: [A-Z])*){0,2}$
^ Start of string
[A-Z] Match a single char A-Z
(?: [A-Z])* Optionally repeat a space and and a single char A-Z
(?: Non capture group
,[A-Z](?: [A-Z])* Match a comma, char A-Z followed by optionally repeat matching a space and a char A-Z
){0,2} Close the group and repeat 0-2 times
$ End of string
Regex demo
"a string that consists of one or two comma-separated lists of uppercase letters, separated by a single whitespace"
Not sure how to exactly interpretate the above, but my reading is: One or two comma-seperated lists where each list may only consist of uppercase characters. In the case of two lists, the two lists are seperated by a single space.
You could try:
^(?!.* .* )[A-Z](?:[ ,][A-Z])*$
See the online demo
^ - Start string anchor.
(?!.* .* ) - Negative lookahead to prevent two spaces present.
[A-Z] - A single uppercase alpha-char.
(?: - Open non-capture group:
[ ,] - A comma or space.
[A-Z] - A single uppercase alpha-char.
)* - Close non-capture group and match 0+ times upt to;
$ - End string anchor.

Regex to identify consecutive and non-consecutive duplicate words in multiline text

I'm writing a syntax checker (in Java) for a file that has the keywords and comma (separation)/semicolon (EOL) separated values. The amount of spaces between two complete constructions is unspecified.
What is required:
Find any duplicate words (consecutive and non-consecutive) in the multiline file.
// Example_1 (duplicate 'test'):
item1 , test, item3 ;
item4,item5;
test , item6;
// Example_2 (duplicate 'test'):
item1 , test, test ;
item2,item3;
I've tried to apply the (\w+)(s*\W\s*\w*)*\1 pattern, which doesn't catch duplicate properly.
You may use this regex with mode DOTALL (single line):
(?s)(\b\w+\b)(?=.*\b\1\b)
RegEx Demo
RegEx Details:
(?s): Enable DOTALL mode
(\b\w+\b): Match a complete word and capture it in group #1
(?=.*\b\1\b): Lookahead to assert that we have back-reference \1 present somewhere ahead. \b is used to make sure we match exact same word again.
Additionally:
Based on earlier comments below if intent was to not match consecutive word repeats like item1 item1, then following regex may be used:
(?s)(\b\w+\b)(?!\W+\1\b)(?=.*\b\1\b)
RegEx Demo 2
There is one extra negative lookahead assertion here to make sure we don't match consecutive repeats.
(?!\W+\1\b): Negative lookahead to fail the match for consecutive repeats.
You may use
\b(\w+)\b(?:\s*[^\w\s]\s*\w+)+\s*[^\w\s]\s*\b\1\b
See the regex demo
Details
\b(\w+)\b - Group 1: one or more word chars as a whole word
(?:\s*[^\w\s]\s*\w+)+ - 1 or more occurrences of:
\s* - 0+ whitespaces
[^\w\s] - 1 char other than a word and whitespace char
\s* - 0+ whitespaces
\w+ - 1+ word chars
\s* - 0+ whitespaces
[^\w\s] - 1 char other than a word and whitespace char
\s* - 0+ whitespaces
\b\1\b - the same value as in Group 1 as whole word.
To only match the word, put the second part of the regex into a positive lookahead:
\b(\w+)\b(?=(?:\s*[^\w\s]\s*\w+)+\s*[^\w\s]\s*\b\1\b)
^^^ ^
See this regex demo.
Java regex variable declaration:
String regex = "\\b(\\w+)\\b(?:\\s*[^\\w\\s]\\s*\\w+)+\\s*[^\\w\\s]\\s*\\b\\1\\b";
To make it fully Unicode aware add (?U):
String regex = "(?U)\\b(\\w+)\\b(?:\\s*[^\\w\\s]\\s*\\w+)+\\s*[^\\w\\s]\\s*\\b\\1\\b";

How to avoid a hyphen from splitting a regex?

I'm writing a simple android app for saving your favorite games in a list.
In the first screen a user has to enter his gamertag (as a String). The gamertag should only contain letters from a-z (uppercase and lowercase), numbers (0-9) and underscores/hpyhens (_ and -).
I can get it to work with an underscore in every position or a hyphen at the beginning. But if the String contains a hyphen in the middle it gets "split" into two pieces and if the hyphen is at the end, it stands alone.
I came up with this regex:
[a-zA-Z0-9_\-]\w+
in java it looks a little different because the \ needs to be escaped:
[a-zA-Z0-9_\\-]\\w+
Gamertags that should validate:
- GamerTag
- Gamer_Tag
- _GamerTag
- GamerTag_
- -GamerTag
- Gamer-Tag
- GamerTag-
Gamertags that shouldn't validate:
- !GamerTag
- Gamer%Tag
- Gamer Tag
Gamertags that should validate, but my regex fails:
- Gamer-Tag
- GamerTag-
Your pattern [a-zA-Z0-9_\-]\w+ matches 1 character out of the character class followed by 1+ times a word character \w which does not match a -.
You could repeat the character class 1+ times where the hyphen is present and if the hyphen is at the end of the character class you don't have to eacape it.
[a-zA-Z0-9_-]+
The Gamer-Tag does not get split but has 2 matches. The character class matches G and the \w+ matches amer. Then in the next match the character class matches - and \w+ matches Tag.
If those are the only values allowed, you could use anchors ^ to assert the start and $ to assert the end of the string.
^[a-zA-Z0-9_-]+$
Regex demo

Java regex. Match any "value" that is no preceded by given string

I need some help with a Java regexp.
I'm working with a file that has JSON similar format:
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',value:'xxxx'},{},[]],
['zul.wgt.Label','f6DQpf',{sclass:'class notranslate',style:'font-weight: bold;',prolog:' ',value:'xxxx'},
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',label:'xxxx'},{},[]]
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',label:'xxxx'},{},[]]
I need to match any label or value data that is not preceded by a "notranslate" value on the sclass property.
I've been working on an almost working Regexp but I need the final push to match only what I've previously wrote
((?!.*?notranslate)sclass:'[\w\s]+'.*?)((value|label):'(.*?)')
Right now it matches anything from sclass that it's not followed by 'notranslate'
Thanks for your help
The values of your current regex are in the 4th capturing group
You could also use 1 capturing group instead of 4:
^(?!.*\bsclass:'[^']*\bnotranslate\b[^']*').*\b(?:label|value):'([^']+)'
Regex demo
That would match:
^ Assert start of the string
(?! Negative lookahead to assert that what is on the right does not
.*\bsclass: Match any character 0+ times followed by class:
'[^']*\bnotranslate\b[^']*' Match notranslate between single quotes and word boundaries
) Close non capturing group
.* match any character 0+ times
\b(?:label|value): Match either label or value followed by :
'([^']+)' Match ', capture in a group matching not ' 1+ times and match '
Java demo

regex expression to remove eed from string

I am trying to replace 'eed' and 'eedly' with 'ee' from words where there is a vowel before either term ('eed' or 'eedly') appears.
So for example, the word indeed would become indee because there is a vowel ('i') that happens before the 'eed'. On the other hand the word 'feed' would not change because there is no vowel before the suffix 'eed'.
I have this regex: (?i)([aeiou]([aeiou])*[e{2}][d]|[dly]\\b)
You can see what is happening with this here.
As you can see, this is correctly identifying words that end with 'eed', but it is not correctly identifying 'eedly'.
Also, when it does the replace, it is replacing all words that end with 'eed' , even words like feed which it should not remove the eed
What should I be considering here in order to make it correctly identify the words based on the rules I specified?
You can use:
str = str.replaceAll("(?i)\\b(\\w*?[aeiou]\\w*)eed(?:ly)?", "$1ee");
Updated RegEx Demo
\\b(\\w*?[aeiou]\\w*) before eed or eedly makes sure there is at least one vowel in the same word before this.
To expedite this regex you can use negated expression regex:
\\b([^\\Waeiou]*[aeiou]\\w*)eed(?:ly)?
RegEx Breakup:
\\b # word boundary
( # start captured group #`
[^\\Waeiou]* # match 0 or more of non-vowel and non-word characters
[aeiou] # match one vowel
\\w* # followed by 0 or more word characters
) # end captured group #`
eed # followed by literal "eed"
(?: # start non-capturing group
ly # match literal "ly"
)? # end non-capturing group, ? makes it optional
Replacement is:
"$1ee" which means back reference to captured group #1 followed by "ee"
find dly before finding d. otherwise your regex evaluation stops after finding eed.
(?i)([aeiou]([aeiou])*[e{2}](dly|d))

Categories

Resources