Java regex match all characters except - java

What is the correct syntax for matching all characters except specific ones.
For example I'd like to match everything but letters [A-Z] [a-z] and numbers [0-9].
I have
string.matches("[^[A-Z][a-z][0-9]]")
Is this incorrect?

Yes, you don't need nested [] like that. Use this instead:
"[^A-Za-z0-9]"
It's all one character class.

If you want to match anything but letters, you should have a look into Unicode properties.
\p{L} is any kind of letter from any language
Using an uppercase "P" instead it is the negation, so \P{L} would match anything that is not a letter.
\d or \p{Nd} is matching digits
So your expression in modern Unicode style would look like this
Either using a negated character class
[^\p{L}\p{Nd}]
or negated properties
[\P{L}\P{Nd}]
The next thing is, matches() matches the expression against the complete string, so your expression is only true with exactly one char in the string. So you would need to add a quantifier:
string.matches("[^\p{L}\p{Nd}]+")
returns true, when the complete string has only non alphanumerics and at least one of them.

Almost right. What you want is:
string.matches("[^A-Za-z0-9]")
Here's a good tutorial

string.matches("[^A-Za-z0-9]")

Lets say that you want to make sure that no Strings have the _ symbol in them, then you would simply use something like this.
Pattern pattern = Pattern.compile("_");
Matcher matcher = Pattern.matcher(stringName);
if(!matcher.find()){
System.out.println("Valid String");
}else{
System.out.println("Invalid String");
}

You can negate character classes:
"[^abc]" // matches any character except a, b, or c (negation).
"[^a-zA-Z0-9]" // matches non-alphanumeric characters.

Related

Java regular expression matching two consecutive consonants

I'm trying to match only strings with two consecutive consonants. but no matter what input I give to myString this never evaluates to true, so I have to assume something is wrong with the syntax of my regex. Any ideas?
if (Pattern.matches("([^aeiou]&&[^AEIOU]){2}", myString)) {...}
Additional info:
myString is a substring of at most two characters
There is no whitespace, as this string is the output of a .split with a whitespace delimiter
I'm not worried about special characters, as the program just concatenates and prints the result, though if you'd like to show me how to include something like [b-z]&&[^eiou] in your answer I would appreciate it.
Edit:
After going through these answers and testing a little more, the code I finally used was
if (myString.matches("(?i)[b-z&&[^eiou]]{2}")) {...}
[^aeiou] matches non-letter characters as well, so you should use a different pattern:
Pattern rx = Pattern.compile("[bcdfghjklmnpqrstuvwxyz]{2}", Pattern.CASE_INSENSITIVE);
if (rx.matches(myString)) {...}
If you would like to use && for an intersection, you can do it like this:
"[a-z&&[^aeiou]]{2}"
Demo.
To use character class intersection, you need to wrap your syntax inside of a bracketed expression. The below matches characters that are both lowercase letters and not vowels.
[a-z&&[^aeiou]]{2}

match whole sentence with regex

I'm trying to match sentences without capital letters with regex in Java:
"Hi this is a test" -> Shouldn't match
"hi thiS is a test" -> Shouldn't match
"hi this is a test" -> Should match
I've tried the following regex, but it also matches my second example ("hi, thiS is a test").
[a-z]+
It seems like it's only looking at the first word of the sentence.
Any help?
[a-z]+ will match if your string contains any lowercase letter.
If you want to make sure your string doesn't contain uppercase letters, you could use a negative character class: ^[^A-Z]+$
Be aware that this won't handle accentuated characters (like É) though.
To make this work, you can use Unicode properties: ^\P{Lu}+$
\P means is not in Unicode category, and Lu is the uppercase letter that has a lowercase variant category.
^[a-z ]+$
Try this.This will validate the right ones.
It's not matching because you haven't used a space in the match pattern, so your regex is only matching whole words with no spaces.
try something like ^[a-z ]+$ instead (notice the space is the square brackets) you can also use \s which is shorthand for 'whitespace characters' but this can also include things like line feeds and carriage returns so just be aware.
This pattern does the following:
^ matches the start of a string
[a-z ]+ matches any a-z character or a space, where 1 or more exists.
$ matches the end of the string.
I would actually advise against regex in this case, since you don't seem to employ extended characters.
Instead try to test as following:
myString.equals(myString.toLowerCase());

simple java regular expression not working

I have this simple example of a regular expression. But it is not working. I don't know what I am doing wrong:
String name = "abc";
System.out.println(name.matches("[a-zA-Z]"));
it returns false, it should be true.
use :
name.matches("[a-zA-Z]+") // matches more than one character
or name.matches("\\w+") // matches more than one character
name.matches("[a-zA-Z]") // matches exactly one character.
Add + to your regex to match one or more alphabets,
String name = "abc"; System.out.println(name.matches("[a-zA-Z]+"));
Your regex [a-zA-Z] must match a single alphabet, not more than one.
[a-zA-Z] Match a lowercase alphabet from a-z or match an uppercase alphabet from A-Z.
The reason why this evaluates to false is, it tries to match the entrie string (see doc of String.matches()) to the Pattern [A-Za-z] wich only matches a single character. Either use
Pattern.compile("[A-Za-z]").matcher(str).find() to see if a substring matches (will return true in this case), or alter the RegEx to account for multiple Characters. The cleanest way of doing so is
Pattern.compile("^[A-Za-z]+$");
The ^ marks "start of string" and $ marks "end of string". + means "previous token at least once".
If you want to allow the empty String as well, use
Pattern.compile("^[A-Za-z]*$");
instead (* means "match the previous token 0 or more times")
Try with [a-zA-Z]+
[a-zA-Z] indicates:

Is there a wildcard character I can use to solve this?

I'm learning Java and I have a simple problem but I'm stuck.
I need to search a string for the text "bob", except the "o" can be any character.
Is there a wildcard character I can use, or another simpler method?
Thanks in advance
You can use the matches() method with the right regex:
if (str.matches(".*b.b.*"))
Note that the regex must match the whole string to return true.
If you want to match the word "bob", you'll need "word boundaries":
if (str.matches("(?i).*\\bb[a-z]b\\b.*"))
Note that the "case insensitive" flag has been added to allow any case to match.
This regex matches a whole word starting and ending with 'b' with any alpha-numeric (or underscore) character between:
\bb\wb\b
To match any lowercase letter between, use
\bb[a-z]b\b
To match any letter between, use
\bb[a-zA-Z]b\b
To escape these for Java strings, change each \ to \\

Regular Expression for a string that contains one or more letters somewhere in it

What would be a regular expression that would evaluate to true if the string has one or more letters anywhere in it.
For example:
1222a3999 would be true
a222aZaa would be true
aaaAaaaa would be true
but:
1111112())-- would be false
I tried: ^[a-zA-Z]+$ and [a-zA-Z]+ but neither work when there are any numbers and other characters in the string.
.*[a-zA-Z].*
The above means one letter, and before/after it - anything is fine.
In java:
String regex = ".*[a-zA-Z].*";
System.out.println("1222a3999".matches(regex));
System.out.println("a222aZaa ".matches(regex));
System.out.println("aaaAaaaa ".matches(regex));
System.out.println("1111112())-- ".matches(regex));
Will provide:
true
true
true
false
as expected
^.*[a-zA-Z].*$
Depending on the implementation, match() functions check if the entire string matches (which is probably why your [a-zA-Z] or [a-zA-Z]+ patterns didn't work).
Either use match() with the above pattern or use some sort of search() method instead.
This regexp should do it:
[a-zA-Z]
It matches as long as there's a single letter anywhere in the string, it doesn't care about any of the other characters.
[a-zA-Z]+
should have worked as well, I don't know why it didn't for you.
.*[a-zA-Z]?.*
Should get you the result you want.
The period matches any character except new line, the asterisk says this should exist zero or more times. Then the pattern [a-zA-Z]? says give me at least one character that is in the brackets because of the use of the question mark. Finally the ending .* says that the alphabet characters can be followed by zero or more characters of any type.

Categories

Resources