match whole sentence with regex - java

I'm trying to match sentences without capital letters with regex in Java:
"Hi this is a test" -> Shouldn't match
"hi thiS is a test" -> Shouldn't match
"hi this is a test" -> Should match
I've tried the following regex, but it also matches my second example ("hi, thiS is a test").
[a-z]+
It seems like it's only looking at the first word of the sentence.
Any help?

[a-z]+ will match if your string contains any lowercase letter.
If you want to make sure your string doesn't contain uppercase letters, you could use a negative character class: ^[^A-Z]+$
Be aware that this won't handle accentuated characters (like É) though.
To make this work, you can use Unicode properties: ^\P{Lu}+$
\P means is not in Unicode category, and Lu is the uppercase letter that has a lowercase variant category.

^[a-z ]+$
Try this.This will validate the right ones.

It's not matching because you haven't used a space in the match pattern, so your regex is only matching whole words with no spaces.
try something like ^[a-z ]+$ instead (notice the space is the square brackets) you can also use \s which is shorthand for 'whitespace characters' but this can also include things like line feeds and carriage returns so just be aware.
This pattern does the following:
^ matches the start of a string
[a-z ]+ matches any a-z character or a space, where 1 or more exists.
$ matches the end of the string.

I would actually advise against regex in this case, since you don't seem to employ extended characters.
Instead try to test as following:
myString.equals(myString.toLowerCase());

Related

How to Match a String Against Multiple Regex Patterns in Java

I understand how to match a single String against multiple regex patterns using the pipe symbol as explained in some of the answers to this question: Match a string against multiple regex patterns
My question is that when I have the following String:
this_isAnExample of What nav-input a-autoid-9-announce thisIsAnExampleToo
And I use the following regex to extract text:
[A-Z][a-z]*|(?<=_)[A-Za-z-]*
I am expecting to get the following matches:
is
An
Example
What
Is
An
Example
Too
But I actually get is:
isAnExample
What
Is
An
Example
Too
Basically the engine is automatically linking the word An with Example bec it matches the underscore pattern but I want it to treat them as two words (non greedy?) bec according to the other pattern there is another match.
You probably ment the regex to be
[A-Z][a-z]*|(?<=_)[a-z-]*
The first part being lowercase word starting with uppercase letter, or the second: lowercase word preceded by underscore.
The part of your posted regex (?<=_)[A-Za-z-]* matches lower and upper case letters after underscore, i.e. does not stop matching when uppercase letter met, which should be in fact start of another word.
You can use this alternation regex to capture all the lower case text that is wither preceded by _ OR mixed case text:
((?<=_)[a-z][a-z-]*|[A-Z][a-z]*)
RegEx Demo

Is there a wildcard character I can use to solve this?

I'm learning Java and I have a simple problem but I'm stuck.
I need to search a string for the text "bob", except the "o" can be any character.
Is there a wildcard character I can use, or another simpler method?
Thanks in advance
You can use the matches() method with the right regex:
if (str.matches(".*b.b.*"))
Note that the regex must match the whole string to return true.
If you want to match the word "bob", you'll need "word boundaries":
if (str.matches("(?i).*\\bb[a-z]b\\b.*"))
Note that the "case insensitive" flag has been added to allow any case to match.
This regex matches a whole word starting and ending with 'b' with any alpha-numeric (or underscore) character between:
\bb\wb\b
To match any lowercase letter between, use
\bb[a-z]b\b
To match any letter between, use
\bb[a-zA-Z]b\b
To escape these for Java strings, change each \ to \\

Stop regular expression from matching across lines

I have a regular expression,
end\\s+[a-zA-Z]{1}[a-zA-Z_0-9]
which is supposed to match a line with the specifications
end abcdef123
where abcdef123 must start with a letter and subsequent alphanumeric characters.
However currently it is also matching this
foobar barfooend
bar fred bob
It's picking up that end at the end of barfooend and also picking up bar in effect returning end bar as a legitimate result.
I tried
^end\\s+[a-zA-Z]{1}[a-zA-Z_0-9]
but that doesn't seem to work at all. It ends up matching nothing.
It should be fairly simple but I can't seem to nut it out.
\s includes also newline characters. So you either need to specify a character class that has only the wanted whitespace charaters or exclude the not wanted.
Use instead of \\s+ one of those:
[^\\S\r\n] this includes all whitespace but not \r and \n. See end[^\S\r\n]+[a-zA-Z][a-zA-Z_0-9]+ here on Regexr
[ \t] this includes only space and tab. See end[ \t]+[a-zA-Z][a-zA-Z_0-9]+ here on Regexr
You can use \b (word boundary detection) to check a word boundary. In our case we will use it to match the beginning of the word end. It can also be used to match the end of a word.
As #nhahtdh stated in his comment the {1} is redundant as [a-zA-Z] already matches one letter in the given range.
Also your regex does not do what you want because it only matches one alphanumeric character after the first letter. Add a + at the end (for one or more times) or * (for zero or more times).
This should work:
"\\bend\\s+[a-zA-Z]{1}[a-zA-Z_0-9]*"
Edit : I think \b is better than ^ because the latter only matches the beginning of a line.
For example take this input : "end azd123 end bfg456" There will be only one match for ^ when \b will help matching both.
Try the regular expression:
end[ ]+[a-zA-Z]\w+
\w is a word character: [a-zA-Z_0-9]

Java regex "[.]" vs "."

I'm trying to use some regex in Java and I came across this when debugging my code.
What's the difference between [.] and .?
I was surprised that .at would match "cat" but [.]at wouldn't.
[.] matches a dot (.) literally, while . matches any character except newline (\n) (unless you use DOTALL mode).
You can also use \. ("\\." if you use java string literal) to literally match dot.
The [ and ] are metacharacters that let you define a character class. Anything enclosed in square brackets is interpreted literally. You can include multiple characters as well:
[.=*&^$] // Matches any single character from the list '.','=','*','&','^','$'
There are two specific things you need to know about the [...] syntax:
The ^ symbol at the beginning of the group has a special meaning: it inverts what's matched by the group. For example, [^.] matches any character except a dot .
Dash - in between two characters means any code point between the two. For example, [A-Z] matches any single uppercase letter. You can use dash multiple times - for example, [A-Za-z0-9] means "any single upper- or lower-case letter or a digit".
The two constructs above (^ and -) are common to nearly all regex engines; some engines (such as Java's) define additional syntax specific only to these engines.
regular-expression constructs
. => Any character (may or may not match line terminators)
and to match the dot . use the following
[.] => it will matches a dot
\\. => it will matches a dot
NOTE: The character classes in Java regular expression is defined using the square brackets "[ ]", this subexpression matches a single character from the specified or, set of possible characters.
Example : In string address replaces every "." with "[.]"
public static void main(String[] args) {
String address = "1.1.1.1";
System.out.println(address.replaceAll("[.]","[.]"));
}
if anything is missed please add :)

Java regex match all characters except

What is the correct syntax for matching all characters except specific ones.
For example I'd like to match everything but letters [A-Z] [a-z] and numbers [0-9].
I have
string.matches("[^[A-Z][a-z][0-9]]")
Is this incorrect?
Yes, you don't need nested [] like that. Use this instead:
"[^A-Za-z0-9]"
It's all one character class.
If you want to match anything but letters, you should have a look into Unicode properties.
\p{L} is any kind of letter from any language
Using an uppercase "P" instead it is the negation, so \P{L} would match anything that is not a letter.
\d or \p{Nd} is matching digits
So your expression in modern Unicode style would look like this
Either using a negated character class
[^\p{L}\p{Nd}]
or negated properties
[\P{L}\P{Nd}]
The next thing is, matches() matches the expression against the complete string, so your expression is only true with exactly one char in the string. So you would need to add a quantifier:
string.matches("[^\p{L}\p{Nd}]+")
returns true, when the complete string has only non alphanumerics and at least one of them.
Almost right. What you want is:
string.matches("[^A-Za-z0-9]")
Here's a good tutorial
string.matches("[^A-Za-z0-9]")
Lets say that you want to make sure that no Strings have the _ symbol in them, then you would simply use something like this.
Pattern pattern = Pattern.compile("_");
Matcher matcher = Pattern.matcher(stringName);
if(!matcher.find()){
System.out.println("Valid String");
}else{
System.out.println("Invalid String");
}
You can negate character classes:
"[^abc]" // matches any character except a, b, or c (negation).
"[^a-zA-Z0-9]" // matches non-alphanumeric characters.

Categories

Resources