What does this regex syntax actually mean in Java? - java

I wrote a program to detect palindromes. It works with what I have, but I stumbled upon another bit of syntax, and I would like to know what it means exactly?
This is the line of code I'm using:
userString = userString.toLowerCase().replaceAll("[^a-zA-Z]", "");
I understand that the replaceAll code snippet means to "match characters ([...]) that are not (^) in the range a-z and A-Z (a-zA-Z)."
However, this worked as well:
replaceAll("[^(\p{L}')]", "");
I just don't understand how to translate that into English. I am completely new to regular expressions, and I find them quite fascinating. Thanks to anyone who can tell me what it means.

You should check this website:
https://regex101.com
It helped me a lot when I was writing/testing/debugging some regexes ;)
It gives the following explanation:
[^(\p{L}')] match a single character not present in the list below:
( the literal character (
\p{L} matches any kind of letter from any language
') a single character in the list ') literally

The two regexes are not the same:
[^a-zA-Z] matches any char not an English letter
[^(\p{L}')] matches any char not a letter, quote or bracket
ie the 2nd one removes brackets and quotes too.
The regex \p{L} is the posix character class for "any letter". IE these two regexes are equivalent in the context of letters only from English:
[a-zA-Z]
\p{L}

Related

openapi - regex for not allowing whitespace or hyphen [duplicate]

I tried this but it doesn't work :
[^\s-]
Any Ideas?
[^\s-]
should work and so will
[^-\s]
[] : The char class
^ : Inside the char class ^ is the
negator when it appears in the beginning.
\s : short for a white space
- : a literal hyphen. A hyphen is a
meta char inside a char class but not
when it appears in the beginning or
at the end.
It can be done much easier:
\S which equals [^ \t\r\n\v\f]
Which programming language are you using? May be you just need to escape the backslash like "[^\\s-]"
In Java:
String regex = "[^-\\s]";
System.out.println("-".matches(regex)); // prints "false"
System.out.println(" ".matches(regex)); // prints "false"
System.out.println("+".matches(regex)); // prints "true"
The regex [^-\s] works as expected. [^\s-] also works.
See also
Regular expressions and escaping special characters
regular-expressions.info/Character class
Metacharacters Inside Character Classes
The hyphen can be included right after the opening bracket, or right before the closing bracket, or right after the negating caret.
Note that regex is not one standard, and each language implements its own based on what the library designers felt like. Take for instance the regex standard used by bash, documented here: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05.
If you are having problems with regular expressions not working, it might be good to simplify it, for instance using "[^ -]" if this covers all forms of whitespace in your case.
Try [^- ], \s will match 5 other characters beside the space (like tab, newline, formfeed, carriage return).

Problem coming up with appropriate Regex expression

I need to match text similar to the following text in an if statement.
REG#John Smith#14102245862#7 johns road new york#John Anthony Smith
The expression is meant to match a REG keyword at the beginning of the string then username followed by an account number composed of numbers with no specific restriction on the number of digits, then the address and lastly the name of the individual the address is registered to.
The Regex expression I had come up with is not working. The regex expression is below:
^REG\#\w\#[0-9]\#\w\#\w
May you kindly assist in showing me where I went wrong and how to make it work.
Thank you in advance
The problem is that you don't use quantifiers (* or +) and space is not included within \w which stands for [A-Za-z0-9_]. The character # does not need to be escaped (at least as far as I know in Java). Try the following Regex:
^REG#[\w ]+#\d+#[\w ]+#[\w ]+
^REG matches the beginning of the string (REG) literally
# matches self literally
[\w ]+ stands for at least one word character or space
\d+ stands for at least one digit
In Java, don't forget the double escaping:
String regex = "^REG#[\\w ]+#\\d+#[\\w ]+#[\\w ]+";
Try ^REG\#.*?\#[0-9]*?\#.*?\#.* , the operator *? means repeat until next slice of expression, in that case, \#

match whole sentence with regex

I'm trying to match sentences without capital letters with regex in Java:
"Hi this is a test" -> Shouldn't match
"hi thiS is a test" -> Shouldn't match
"hi this is a test" -> Should match
I've tried the following regex, but it also matches my second example ("hi, thiS is a test").
[a-z]+
It seems like it's only looking at the first word of the sentence.
Any help?
[a-z]+ will match if your string contains any lowercase letter.
If you want to make sure your string doesn't contain uppercase letters, you could use a negative character class: ^[^A-Z]+$
Be aware that this won't handle accentuated characters (like É) though.
To make this work, you can use Unicode properties: ^\P{Lu}+$
\P means is not in Unicode category, and Lu is the uppercase letter that has a lowercase variant category.
^[a-z ]+$
Try this.This will validate the right ones.
It's not matching because you haven't used a space in the match pattern, so your regex is only matching whole words with no spaces.
try something like ^[a-z ]+$ instead (notice the space is the square brackets) you can also use \s which is shorthand for 'whitespace characters' but this can also include things like line feeds and carriage returns so just be aware.
This pattern does the following:
^ matches the start of a string
[a-z ]+ matches any a-z character or a space, where 1 or more exists.
$ matches the end of the string.
I would actually advise against regex in this case, since you don't seem to employ extended characters.
Instead try to test as following:
myString.equals(myString.toLowerCase());

Is there a wildcard character I can use to solve this?

I'm learning Java and I have a simple problem but I'm stuck.
I need to search a string for the text "bob", except the "o" can be any character.
Is there a wildcard character I can use, or another simpler method?
Thanks in advance
You can use the matches() method with the right regex:
if (str.matches(".*b.b.*"))
Note that the regex must match the whole string to return true.
If you want to match the word "bob", you'll need "word boundaries":
if (str.matches("(?i).*\\bb[a-z]b\\b.*"))
Note that the "case insensitive" flag has been added to allow any case to match.
This regex matches a whole word starting and ending with 'b' with any alpha-numeric (or underscore) character between:
\bb\wb\b
To match any lowercase letter between, use
\bb[a-z]b\b
To match any letter between, use
\bb[a-zA-Z]b\b
To escape these for Java strings, change each \ to \\

Java regex match all characters except

What is the correct syntax for matching all characters except specific ones.
For example I'd like to match everything but letters [A-Z] [a-z] and numbers [0-9].
I have
string.matches("[^[A-Z][a-z][0-9]]")
Is this incorrect?
Yes, you don't need nested [] like that. Use this instead:
"[^A-Za-z0-9]"
It's all one character class.
If you want to match anything but letters, you should have a look into Unicode properties.
\p{L} is any kind of letter from any language
Using an uppercase "P" instead it is the negation, so \P{L} would match anything that is not a letter.
\d or \p{Nd} is matching digits
So your expression in modern Unicode style would look like this
Either using a negated character class
[^\p{L}\p{Nd}]
or negated properties
[\P{L}\P{Nd}]
The next thing is, matches() matches the expression against the complete string, so your expression is only true with exactly one char in the string. So you would need to add a quantifier:
string.matches("[^\p{L}\p{Nd}]+")
returns true, when the complete string has only non alphanumerics and at least one of them.
Almost right. What you want is:
string.matches("[^A-Za-z0-9]")
Here's a good tutorial
string.matches("[^A-Za-z0-9]")
Lets say that you want to make sure that no Strings have the _ symbol in them, then you would simply use something like this.
Pattern pattern = Pattern.compile("_");
Matcher matcher = Pattern.matcher(stringName);
if(!matcher.find()){
System.out.println("Valid String");
}else{
System.out.println("Invalid String");
}
You can negate character classes:
"[^abc]" // matches any character except a, b, or c (negation).
"[^a-zA-Z0-9]" // matches non-alphanumeric characters.

Categories

Resources