I'm finding a regular expression which adheres below rules.
Allowed Characters
Alphabet : a-z A-Z
Numbers : 0-9
I am using [^a-zA-Z0-9] but when call
regex = "[^a-zA-Z0-9]" ;
String key = "message";
if (!key.matches(regex))
message = "Invalid key";
system will show Invalid key, The key should be valid. Could you please help me?
If you want to allow these characters [a-zA-Z0-9] you should not use ^ since it negates what is inside the [].
This expression [^a-zA-Z0-9] means anything that is not a-z A-Z or numbers : 0-9.
You may have seen the ^ being used outside the [] at the begging of a regular expression to indicate the begging string like ^[a-zA-Z0-9].
The below regex would allow one or more alphanumeric characters,
^[A-Za-z0-9]+$
Your regex [^a-zA-Z0-9], matches a single character but not of a alphanumeric character. [^..] called negated character class which do the negation of chars which are present inside that character class.
You don't need to give start or end anchors in the regex when it is passed to matches method. So [A-Za-z0-9]+ would be enough.
Explanation:
^ Anchor which denotes the start.
[A-Za-z0-9]+ , + repeats the preceding token [A-Za-z0-9] one or more times.
$ End of the line.
I think you just have to remove the not-operator. Here is the same example, only the variable is renamed:
invalidChars = "[^a-zA-Z0-9]" ;
String key = "message";
if (key.matches(invalidChars)) {
message = "Invalid key";
}
(However, the negated logic is not very readable.)
Try below Alphanumeric regex
"^[a-zA-Z0-9]$"
^ - Start of string
[a-zA-Z0-9] - multiple characters to include
$ - End of string
With validation use \A \z anchors instead of ^ $:
\\A[a-zA-Z0-9]+\\z
Related
I have a Java regex:
^[a-zA-Z_][a-zA-Z0-9_]{1,126}$
It means:
Begin with an alphabetic character or underscore character.
Subsequent characters may include letters, digits or underscores.
Be between 1 and 127 characters in length.
Now, I want to replace a string having characters not in that regex with a underscore.
Example:
final String label = "23_fgh99##";
System.out.println(label.replaceAll("^[^a-zA-Z_][^a-zA-Z0-9_]{1,126}$", "_"));
But the result is still 23_fgh99##.
How can I "convert" it to _3_fgh99__?
Use this code:
final String label = "23_fgh99##";
System.out.println(label.replaceAll("^[^a-zA-Z_]|(?<!^)[^a-zA-Z0-9_]", "_"));
It outputs _3_fgh99__.
To remove what is "not in the original pattern", you need to negate the first character class and only check a character at the beginning (^[^a-zA-Z_]), and then check other characters not at the beginning with the negated second character class ((?<!^)[^a-zA-Z0-9_]). Then, we just use an alternation symbol | to apply both patterns in 1 replacement operation.
I want to transforme all "*" into ".*" excepte "\*"
String regex01 = "\\*toto".replaceAll("[^\\\\]\\*", ".*");
assertTrue("*toto".matches(regex01));// True
String regex02 = "toto*".replaceAll("[^\\\\]\\*", ".*");
assertTrue("tototo".matches(regex02));// True
String regex03 = "*toto".replaceAll("[^\\\\]\\*", ".*");
assertTrue("tototo".matches(regex03));// Error
If the "*" is the first character a error occure :
java.util.regex.PatternSyntaxException:
Dangling meta character '*' near index 0
What is the correct regex ?
This is currently the only solution capable of dealing with multiple escaped \ in a row:
String regex = input.replaceAll("\\G((?:[^\\\\*]|\\\\[\\\\*])*)[*]", "$1.*");
How it works
Let's print the string regex to have a look at the actual string being parsed by the regex engine:
\G((?:[^\\*]|\\[\\*])*)[*]
((?:[^\\*]|\\[\\*])*) matches a sequence of characters not \ or *, or escape sequence \\ or \*. We match all the characters that we don't want to touch, and put it in a capturing group so that we can put it back.
The above sequence is followed by an unescaped asterisk, as described by [*].
In order to make sure that we don't "jump" when the regex can't match an unescaped *, \G is used to make sure the next match can only start at the beginning of the string, or from where the last match ends.
Why such a long solution? It is necessary, since the look-behind construct to check whether the number of consecutive \ preceding a * is odd or even is not officially supported by Java regex. Therefore, we need to consume the string from left to right, taking into account escape sequences, until we encounter an unescaped * and replace it with .*.
Test program
String inputs[] = {
"toto*",
"\\*toto",
"\\\\*toto",
"*toto",
"\\\\\\\\*toto",
"\\\\*\\\\\\*\\*\\\\\\\\*"};
for (String input: inputs) {
String regex = input.replaceAll("\\G((?:[^\\\\*]|\\\\[\\\\*])*)[*]", "$1.*");
System.out.println(input);
System.out.println(Pattern.compile(regex));
System.out.println();
}
Sample output
toto*
toto.*
\*toto
\*toto
\\*toto
\\.*toto
*toto
.*toto
\\\\*toto
\\\\.*toto
\\*\\\*\*\\\\*
\\.*\\\*\*\\\\.*
You need to use negative lookbehind here:
String regex01 = input.replaceFirst("(?<!\\\\)\\*", ".*");
(?<!\\\\) is a negative lookbehind that means match * if it is not preceded by a backslash.
Examples:
regex01 = "\\*toto".replaceAll("(?<!\\\\)\\*", ".*");
//=> \*toto
regex01 = "*toto".replaceAll("(?<!\\\\)\\*", ".*");
//=> .*toto
You have to cater for the case of a string starting with * in your regex:
(^|[^\\\\])\\*
The single caret represents the 'beginning of the string' ( 'start anchor' ).
Edit
Apart from the correction above, the replacement string in the replaceAll call must be $1.* instead of .* lest a matched character before an unescaped * be lost.
I have this simple example of a regular expression. But it is not working. I don't know what I am doing wrong:
String name = "abc";
System.out.println(name.matches("[a-zA-Z]"));
it returns false, it should be true.
use :
name.matches("[a-zA-Z]+") // matches more than one character
or name.matches("\\w+") // matches more than one character
name.matches("[a-zA-Z]") // matches exactly one character.
Add + to your regex to match one or more alphabets,
String name = "abc"; System.out.println(name.matches("[a-zA-Z]+"));
Your regex [a-zA-Z] must match a single alphabet, not more than one.
[a-zA-Z] Match a lowercase alphabet from a-z or match an uppercase alphabet from A-Z.
The reason why this evaluates to false is, it tries to match the entrie string (see doc of String.matches()) to the Pattern [A-Za-z] wich only matches a single character. Either use
Pattern.compile("[A-Za-z]").matcher(str).find() to see if a substring matches (will return true in this case), or alter the RegEx to account for multiple Characters. The cleanest way of doing so is
Pattern.compile("^[A-Za-z]+$");
The ^ marks "start of string" and $ marks "end of string". + means "previous token at least once".
If you want to allow the empty String as well, use
Pattern.compile("^[A-Za-z]*$");
instead (* means "match the previous token 0 or more times")
Try with [a-zA-Z]+
[a-zA-Z] indicates:
I'm trying to use some regex in Java and I came across this when debugging my code.
What's the difference between [.] and .?
I was surprised that .at would match "cat" but [.]at wouldn't.
[.] matches a dot (.) literally, while . matches any character except newline (\n) (unless you use DOTALL mode).
You can also use \. ("\\." if you use java string literal) to literally match dot.
The [ and ] are metacharacters that let you define a character class. Anything enclosed in square brackets is interpreted literally. You can include multiple characters as well:
[.=*&^$] // Matches any single character from the list '.','=','*','&','^','$'
There are two specific things you need to know about the [...] syntax:
The ^ symbol at the beginning of the group has a special meaning: it inverts what's matched by the group. For example, [^.] matches any character except a dot .
Dash - in between two characters means any code point between the two. For example, [A-Z] matches any single uppercase letter. You can use dash multiple times - for example, [A-Za-z0-9] means "any single upper- or lower-case letter or a digit".
The two constructs above (^ and -) are common to nearly all regex engines; some engines (such as Java's) define additional syntax specific only to these engines.
regular-expression constructs
. => Any character (may or may not match line terminators)
and to match the dot . use the following
[.] => it will matches a dot
\\. => it will matches a dot
NOTE: The character classes in Java regular expression is defined using the square brackets "[ ]", this subexpression matches a single character from the specified or, set of possible characters.
Example : In string address replaces every "." with "[.]"
public static void main(String[] args) {
String address = "1.1.1.1";
System.out.println(address.replaceAll("[.]","[.]"));
}
if anything is missed please add :)
Pattern pattern = Pattern.compile("^[a-z]+$");
String string = "abc-def";
assertTrue( pattern.matcher(string).matches() ); // obviously fails
Is it possible to have the character class match a "-" ?
Don't put the minus sign between characters.
"[a-z-]"
Escape the minus sign
[a-z\\-]
Inside a character class [...] a - is treated specially(as a range operator) if it's surrounded by characters on both sides. That means if you include the - at the beginning or at the end of the character class it will be treated literally(non-special).
So you can use the regex:
^[a-z-]+$
or
^[-a-z]+$
Since the - that we added is being treated literally there is no need to escape it. Although it's not an error if you do it.
Another (less recommended) way is to not include the - in the character class:
^(?:[a-z]|-)+$
Note that the parenthesis are not optional in this case as | has a very low precedence, so with the parenthesis:
^[a-z]|-+$
Will match a lowercase alphabet at the beginning of the string and one or more - at the end.
I'd rephrase the "don't put it between characters" a little more concretely.
Make the dash the first or last character in the character class. For example "[-a-z1-9]" matches lower-case characters, digits or dash.
This works for me
Pattern p = Pattern.compile("^[a-z\\-]+$");
String line = "abc-def";
Matcher matcher = p.matcher(line);
System.out.println(matcher.matches()); // true