I'm learning Java and I have a simple problem but I'm stuck.
I need to search a string for the text "bob", except the "o" can be any character.
Is there a wildcard character I can use, or another simpler method?
Thanks in advance
You can use the matches() method with the right regex:
if (str.matches(".*b.b.*"))
Note that the regex must match the whole string to return true.
If you want to match the word "bob", you'll need "word boundaries":
if (str.matches("(?i).*\\bb[a-z]b\\b.*"))
Note that the "case insensitive" flag has been added to allow any case to match.
This regex matches a whole word starting and ending with 'b' with any alpha-numeric (or underscore) character between:
\bb\wb\b
To match any lowercase letter between, use
\bb[a-z]b\b
To match any letter between, use
\bb[a-zA-Z]b\b
To escape these for Java strings, change each \ to \\
Related
Want to match the character at position 7 to either be - or an Uppercase letter
This is what I have ^.{6}[-(A-Z)]
Though this matches the first 7 characters, it doesn't match the whole string. Any help appreciated.
I am using Java and wanting .matches() to return true for this String
Though this matches the first 7 characters, it doesn't match the whole string.
That's the right explanation of what is going on. You can skip over the rest of the string by adding .* at the end. Additionally, the ^ anchor at the front of the expression is implied, so you can drop it for a pattern of
.{6}[A-Z-].*
As mentioned You can use .* to match anything after your specific character so use
^.{6}[-A-Z].*
and also no need of () if you don't want to capture that specific character
I wrote a program to detect palindromes. It works with what I have, but I stumbled upon another bit of syntax, and I would like to know what it means exactly?
This is the line of code I'm using:
userString = userString.toLowerCase().replaceAll("[^a-zA-Z]", "");
I understand that the replaceAll code snippet means to "match characters ([...]) that are not (^) in the range a-z and A-Z (a-zA-Z)."
However, this worked as well:
replaceAll("[^(\p{L}')]", "");
I just don't understand how to translate that into English. I am completely new to regular expressions, and I find them quite fascinating. Thanks to anyone who can tell me what it means.
You should check this website:
https://regex101.com
It helped me a lot when I was writing/testing/debugging some regexes ;)
It gives the following explanation:
[^(\p{L}')] match a single character not present in the list below:
( the literal character (
\p{L} matches any kind of letter from any language
') a single character in the list ') literally
The two regexes are not the same:
[^a-zA-Z] matches any char not an English letter
[^(\p{L}')] matches any char not a letter, quote or bracket
ie the 2nd one removes brackets and quotes too.
The regex \p{L} is the posix character class for "any letter". IE these two regexes are equivalent in the context of letters only from English:
[a-zA-Z]
\p{L}
I'm trying to match sentences without capital letters with regex in Java:
"Hi this is a test" -> Shouldn't match
"hi thiS is a test" -> Shouldn't match
"hi this is a test" -> Should match
I've tried the following regex, but it also matches my second example ("hi, thiS is a test").
[a-z]+
It seems like it's only looking at the first word of the sentence.
Any help?
[a-z]+ will match if your string contains any lowercase letter.
If you want to make sure your string doesn't contain uppercase letters, you could use a negative character class: ^[^A-Z]+$
Be aware that this won't handle accentuated characters (like É) though.
To make this work, you can use Unicode properties: ^\P{Lu}+$
\P means is not in Unicode category, and Lu is the uppercase letter that has a lowercase variant category.
^[a-z ]+$
Try this.This will validate the right ones.
It's not matching because you haven't used a space in the match pattern, so your regex is only matching whole words with no spaces.
try something like ^[a-z ]+$ instead (notice the space is the square brackets) you can also use \s which is shorthand for 'whitespace characters' but this can also include things like line feeds and carriage returns so just be aware.
This pattern does the following:
^ matches the start of a string
[a-z ]+ matches any a-z character or a space, where 1 or more exists.
$ matches the end of the string.
I would actually advise against regex in this case, since you don't seem to employ extended characters.
Instead try to test as following:
myString.equals(myString.toLowerCase());
What is the correct syntax for matching all characters except specific ones.
For example I'd like to match everything but letters [A-Z] [a-z] and numbers [0-9].
I have
string.matches("[^[A-Z][a-z][0-9]]")
Is this incorrect?
Yes, you don't need nested [] like that. Use this instead:
"[^A-Za-z0-9]"
It's all one character class.
If you want to match anything but letters, you should have a look into Unicode properties.
\p{L} is any kind of letter from any language
Using an uppercase "P" instead it is the negation, so \P{L} would match anything that is not a letter.
\d or \p{Nd} is matching digits
So your expression in modern Unicode style would look like this
Either using a negated character class
[^\p{L}\p{Nd}]
or negated properties
[\P{L}\P{Nd}]
The next thing is, matches() matches the expression against the complete string, so your expression is only true with exactly one char in the string. So you would need to add a quantifier:
string.matches("[^\p{L}\p{Nd}]+")
returns true, when the complete string has only non alphanumerics and at least one of them.
Almost right. What you want is:
string.matches("[^A-Za-z0-9]")
Here's a good tutorial
string.matches("[^A-Za-z0-9]")
Lets say that you want to make sure that no Strings have the _ symbol in them, then you would simply use something like this.
Pattern pattern = Pattern.compile("_");
Matcher matcher = Pattern.matcher(stringName);
if(!matcher.find()){
System.out.println("Valid String");
}else{
System.out.println("Invalid String");
}
You can negate character classes:
"[^abc]" // matches any character except a, b, or c (negation).
"[^a-zA-Z0-9]" // matches non-alphanumeric characters.
I have the following string:
"Perl is the only language that looks the same before and after RSA encryption." :)
This pattern "\\p{javaUpperCase}.*\\." looks for uppercase character and period. It returns true for that string, but if I remove word "Perl" it'll give me false. Why is that? There's still "RSA" word, which is uppercase too.
\p{javaUpperCase} - stands for UpperCase character
. means any character after that UpperCase
* is Greedy quantifiers, one or more times
\\. - period.
Where am I wrong? Why does it look only at the beginning and at the end?
Probably because it is trying to match the whole string. (Reference: http://www.regular-expressions.info/java.html says "It is important to remember that String.matches() only returns true if the entire string can be matched"). Depending on what regular expression library/function you use, it might require a match on everything.
Without "Perl", the string doesn't start with an uppercase character, so even though a substring matches, the whole string doesn't.
Try .*(\p{javaUpperCase}.*\.).* to match substrings.
The .* added on both ends allows extra characters on either end of the substring of interest.