Matching '_' and '-' in java regexes - java

I had this regex in java that matched either an alphanumeric character or the tilde (~)
^([a-z0-9])+|~$
Now I have to add also the characters - and _ I've tried a few combinations, neither of which work, for example:
^([a-zA-Z0-9_-])+|~$
^([a-zA-Z0-9]|-|_)+|~$
Sample input strings that must match:
woZOQNVddd
00000
ncnW0mL14-
dEowBO_Eu7
7MyG4XqFz-
A8ft-y6hDu
~
Any clues / suggestion?

- is a special character within square brackets. It indicates a range. If it's not at either end of the regex it needs to be escaped by putting a \ before it.
It's worth pointing out a shortcut: \w is equivalent to [0-9a-zA-Z_] so I think this is more readable:
^([\w-]+|~$

You need to escape the -, like \-, since it is a special character (the range operator). _ is ok.
So ^([a-z0-9_\-])+|~$.
Edit: your last input String will not match because the regular expression you are using matches a string of alphanumeric characters (plus - and _) OR a tilde (because of the pipe). But not both. If you want to allow an optional tilde on the end, change to:
^([a-z0-9_\-])+(~?)$

If you put the - first, it won't be interpreted as the range indicator.
^([-a-zA-Z0-9_])+|~$
This matches all of your examples except the last one using the following code:
String str = "A8ft-y6hDu ~";
System.out.println("Result: " + str.matches("^([-a-zA-Z0-9_])+|~$"));
That last example won't match because it doesn't fit your description. The regex will match any combination of alphanumerics, -, and _, OR a ~ character.

Related

How to make a regex to match string that starts with 0-9 or a-z with accents and must accept only this special character - _ ' between words?

My pattern must match a String that:
Can start with number
Can start with letters with accents or without accents too
Can't start with spaces
Can't start with special characters
Allow spaces between words
Do not accept special character except: - _ '
My current patter is: ^[^_\W][\p{L}\s0-9À-ÖØ-öø-ÿ.'-]+$
Valid examples:
Blockquote
João Antonio
João-Antonio
João's Company
Peter Müller
François Hollande
Patrick O'Brian
Silvana Koch-Mehrin
Invalid examples:
Company N#me
100% Company
\Company
\s Company
_Blockquote
Please help me!
My best was:
/^[^_\W][\p{L}\s0-9À-ÖØ-öø-ÿ.'-]+$/gi
Test: https://regexr.com/521r2
First letter:
Start with number, letter
Exclude accents and special chars
[^\W_]
Rest of the text:
Include number, letter and accents
Include _, -, ' and
[0-9] & [A-Za-zÀ-ÖØ-öø-ÿ] & [_\-\' ]
Here you are:
^[^\W_][0-9A-Za-zÀ-ÖØ-öø-ÿ_\-\' ]+$
See this question
When you have to deal with complicated Regex, use the Regexr'!
I think the requirements are not too clear but based on your examples:
^[a-zA-ZÀ-ÖØ-öø-ÿ][ '_-a-zA-ZÀ-ÖØ-öø-ÿ]+$
^ = beginning of line
$ = end of line
[a-zA-ZÀ-ÖØ-öø-ÿ] = matches all these characters specified including one with accents
[ '_-a-zA-ZÀ-ÖØ-öø-ÿ] = same as above except it includes the quote, blank space, underscore
+ = one or more (greedy)
See this for more details and examples link
Best.

Regex-How to prevent repeated special characters?

I don't have an experience on Regular Expressions. I need to a regular expression which doesn't allow to repeat of special characters (+-*/& etc.)
The string can contain digits, alphanumerics, and special characters.
This should be valid : abc,df
This should be invalid : abc-,df
i will be really appreciated if you can help me ! Thanks for advance.
Two solutions presented so far match a string that is not allowed.
But the tilte is How to prevent..., so I assume that the regex
should match the allowed string. It means that the regex should:
match the whole string if it does not contain 2
consecutive special characters,
not match otherwise.
You can achieve this putting together the following parts:
^ - start of string anchor,
(?!.*[...]{2}) - a negative lookahead for 2 consecutive special
characters (marked here as ...), in any place,
a regex matching the whole (non-empty) string,
$ - end of string anchor.
So the whole regex should be:
^(?!.*[!##$%^&*()\-_+={}[\]|\\;:'",<.>\/?]{2}).+$
Note that within a char class (between [ and ]) a backslash
escaping the following char should be placed before - (if in
the middle of the sequence), closing square bracket,
a backslash itself and / (regex terminator).
Or if you want to apply the regex to individual words (not the whole
string), then the regex should be:
\b(?!\S*[!##$%^&*()\-_+={}[\]|\\;:'",<.>\/?]{2})\S+
[\,\+\-\*\/\&]{2,} Add more characters in the square bracket if you want.
Demo https://regex101.com/r/CBrldL/2
Use the following regex to match the invalid string.
[^A-Za-z0-9]{2,}
[^\w!\s]{2,} This would be a shortest version to match any two consecutive special characters (ignoring space)
If you want to consider space, please use [^\w]{2,}

Restrict consecutive characters using Java Regex

I need to allow alphanumeric characters , "?","." , "/" and "-" in the given string. But I need to restrict consecutive - only.
For example:
www.google.com/flights-usa should be valid
www.google.com/flights--usa should be invalid
currently I'm using ^[a-zA-Z0-9\\/\\.\\?\\_\\-]+$.
Please suggest me how to restrict consecutive - only.
You may use grouping with quantifiers:
^[a-zA-Z0-9/.?_]+(?:-[a-zA-Z0-9/.?_]+)*$
See the regex demo
Details:
^ - start of string
[a-zA-Z0-9/.?_]+ - 1 or more characters from the set defined in the character class (can be replaced with [\w/.?]+)
(?:-[a-zA-Z0-9/.?_]+)* - zero or more sequences ((?:...)*) of:
- - hyphen
[a-zA-Z0-9/.?_]+ - see above
$ - end of string.
Or use a negative lookahead:
^(?!.*--)[a-zA-Z0-9/.?_-]+$
^^^^^^^^^
See the demo here
Details:
^ - start of string
(?!.*--) - a negative lookahead that will fail the match once the regex engine finds a -- substring after any 0+ chars other than a newline
[a-zA-Z0-9/.?_-]+ - 1 or more chars from the set defined in the character class
$ - end of string.
Note that [a-zA-Z0-9_] = \w if you do not use the Pattern.UNICODE_CHARACTER_CLASS flag. So, the first would look like "^[\\w/.?]+(?:-[\\w/.?]+)*$" and the second as "^(?!.*--)[\\w/.?-]+$".
One approach is to restrict multiple dashes with negative look-behind on a dash, like this:
^(?:[a-zA-Z0-9\/\.\?\_]|(?<!-)-)+$
The right side of the |, i.e. (?<!-)-, means "a dash, unless preceded by another dash".
Demo.
I'm not sure of the efficiency of this, but I believe this should work.
^([a-zA-Z0-9\/\.\?\_]|\-([^\-]|$))+$
For each character, this regex checks if it can match [a-zA-Z0-9\/\.\?\_], which is everything you included in your regex except the hyphen. If that does not match, it instead tries to match \-([^\-]|$), which matches a hyphen not followed by another hyphen, or a hyphen at the end of the string.
Here's a demo.

Java Regular Expression: what is " '- "

I came up to a line in java that uses regular expressions.
It needs a user input of Last Name
return lastName.matches( "[a-zA-z]+([ '-][a-zA-Z]+)*" );
I would like to know what is the function of the [ '-].
Also why do we need both a "+" and a "*" at the same time, and the [ '-][a-zA-Z] is in brackets?
Your RE is: [a-zA-z]+([ '-][a-zA-Z]+)*
I'll break it down into its component parts:
[a-zA-Z]+
The string must begin with any letter, a-z or A-Z, repeated one or more times (+).
([ '-][a-zA-Z]+)*
[ '-]
Any single character of <space>, ', or -.
[a-zA-Z]+
Again, any letter, a-z or A-Z, repeated once or more times.
This combination of letters ('- and a-ZA-Z) may then be repeated zero or more times.
Why [ '-]? To allow for hiphenated names, such as Higgs-Boson or names with apostrophes, such as O'Reilly, or names with spaces such as Van Dyke.
The expression [ '-] means "one of ', , or -". The order is very important - the dash must be the last one, otherwise the character class would be considered a range, and other characters with code points between the space and the quote ' would be accepted as well.
+ means "one or more repetitions"; * means "zero or more repetitions", referring to the term of the regular expression preceding the + or * modifier.]
Overall, the expression matches groups of lowercase and uppercase letters separated by spaces, dashes, or single quotes.
it means it can be any of the characters space ' or - ( space, quote dash )
the - can be done as \- as it also can mean a range... like a-z
This looks like it is a pattern to match double-barreled (space or hyphen) or I-don't-know-what-to-call-it names like O'Grady... for example:
It would match
counter-terrorism
De'ville
O'Grady
smith-jones
smith and wesson
But it will not match
jones-
O'Learys'
#hashtag
Bob & Sons
The idea is, after the first [A-Za-z]+ consumes all the letters it can, the match will end right there unless the next character is a space, an apostrophe, or a hyphen ([ '-]). If one of those characters is present, it must be followed by at least one more letter.
A lot of people have difficulty with this. The naively write something like [A-Za-z]+[ '-]?[A-Za-z]*, figuring both the separator and the extra chunks of letters are optional. But they're not independently optional; if there is a separator ([ '-]), it must be followed by at least one more letter. Otherwise it would treat strings like R'- j'-' as valid. Your regex doesn't have that problem.
By the way, you've got a typo in your regex: [a-zA-z]. You want to watch out for that, because [A-z] does match all the uppercase and lowercase letters, so it will seem to be working correctly as long as the inputs are valid. But it also matches several non-letter characters whose code points happen to lie between those of Z and a. And very few IDEs or regex tools will catch that error.

Java regex "[.]" vs "."

I'm trying to use some regex in Java and I came across this when debugging my code.
What's the difference between [.] and .?
I was surprised that .at would match "cat" but [.]at wouldn't.
[.] matches a dot (.) literally, while . matches any character except newline (\n) (unless you use DOTALL mode).
You can also use \. ("\\." if you use java string literal) to literally match dot.
The [ and ] are metacharacters that let you define a character class. Anything enclosed in square brackets is interpreted literally. You can include multiple characters as well:
[.=*&^$] // Matches any single character from the list '.','=','*','&','^','$'
There are two specific things you need to know about the [...] syntax:
The ^ symbol at the beginning of the group has a special meaning: it inverts what's matched by the group. For example, [^.] matches any character except a dot .
Dash - in between two characters means any code point between the two. For example, [A-Z] matches any single uppercase letter. You can use dash multiple times - for example, [A-Za-z0-9] means "any single upper- or lower-case letter or a digit".
The two constructs above (^ and -) are common to nearly all regex engines; some engines (such as Java's) define additional syntax specific only to these engines.
regular-expression constructs
. => Any character (may or may not match line terminators)
and to match the dot . use the following
[.] => it will matches a dot
\\. => it will matches a dot
NOTE: The character classes in Java regular expression is defined using the square brackets "[ ]", this subexpression matches a single character from the specified or, set of possible characters.
Example : In string address replaces every "." with "[.]"
public static void main(String[] args) {
String address = "1.1.1.1";
System.out.println(address.replaceAll("[.]","[.]"));
}
if anything is missed please add :)

Categories

Resources