I am trying to write a regex filter that will only allow 0-9, a-z, A-Z, _, -, and the & sign.
So far I have this, "^[A-Za-z0-9_-]$" but I am unsure on how to include the & sign as part of the allowed characters. Thanks.
Just add & inside the character class and make that char class to repeat one or more times by adding + quantifier next to that character class.
"^[A-Za-z0-9_&-]+$"
[A-Za-z0-9_] would be written as \w.
"^[\\w&-]+$"
If you want to allow only a single character. Then + after the char class won't be needed.
"^[\\w&-]$"
& has no special meaning in regex. The problem may have been that you added it to the end of your character class, like this:
[A-Za-z0-9_-&]
The dash character - has special meaning inside a character class when not first or last - it is the "range" operator, so by ending with _-& you are specifying "all characters between (in unicode order) and including _ and &".
Instead, add the & before the dash:
[A-Za-z0-9_&-]
When the dash is first or last in a character class, it's just a literal dash character. This last version should work.
Related
I'm struggling with REGEX and require it for a program.
The input require only alphanumerical keys and also (allow only comma,:,space,/,- in special chars)
I have tried = (^[a-zA-Z0-9,:\S/-]*$)
As far as i understand and please correct me if I'm wrong.
a-zA-Z0-9 - The alphanumerical keys.
,: - Comma and colon
\S - Space
/ - I'm not sure how to represent a forward slash thus i escaped it
- - Dash also not sure if it is needed to escape it.
Would be appreciated if this can be corrected and also a explanation of each part.
Thanks in advance.
You can replace a-zA-Z0-9 with just \\w which is short for [a-zA-Z_0-9]. Furthermore, \\S is any character, but not a whitespace, you should use a \\s instead. You don't need to escape /, and even - if it's the first one or the last one, because if it's placed between two characters it could be interpreted as range and you'll have to escape it. So, you can make your regex like ^([\w,:\s/-]*)$
The \S shorthand matches any character except whitespace, just the opposite of what you want. Lowercase \s matches whitespace [\t\v\n\r\f ]. But if you only want spaces, just put a space in the character class.
a hyphen - needs to be escaped inside characters, unless it's the first or last character in the character class, but you could always escape it just to be sure.
Slashes / don't need to be escaped. They're escaped in other languages where you use them as pattern delimiters. ie: /regex/i.
Besides hyphens and shorthands, only backslashes \\ and closing brackets \] need to be escaped.
Remember in java, you always need to use double backslashes (one is interpreted by java, the other by the regex engine).
Regex
pattern = "^[a-zA-Z0-9 ,:/\\-]*$"
Move the Start of Line ^ and End of Line $ outside the group - like
^([a-zA-Z0-9,:\S/-]*)$
That should do it.
This question already has answers here:
Including a hyphen in a regex character bracket?
(6 answers)
Closed 3 years ago.
I would like to match a line with a regular expression. The line contains two numbers which could be divided by plus, minus or a star (multiplication). However, I am not sure how to escape the star.
line.matches("[0-9]*[+-*][0-9]*");
I tried also line.matches("[0-9]*[+-\\*][0-9]*"); but it does not work either.
Should I put the star into separate group? Why does the escaping \\* not work in this case?
* is not metacharacter in character class ([...]) so you don't need to escape it at all. What you need to escape is - because inside character class it is responsible for creating range of characters like [a-z].
So instead of "[+-*]" which represents all characters placed in Unicode Table between + and * use
"[+\\-*]"
or place - where it can't be used as range indicator
at start of character class [-+*]
at end of character class [+*-]
or right after other range if you have one [a-z-+*]
BTW if you would like to add \ literal to your operators you need to write it in regex as \\ (it is metacharacter used for escaping or to access standard character classes like \w so we also need to escape it). But since \ is also special in String literals (it can be used to represent characters via \n \r \t or \uXXXX), you also need to escape it there as well. So in regex \ needs to be represented as \\ which as string literal is written as "\\\\".
BTW 2: to represent digit instead of [0-9] you can use \d (written in string literal as "\\d" since \ is special there and requires escaping).
BTW 3: if you want to make sure that there will be at least two numbers in string (one on each side of operator) you need to use + instead of * at [0-9]* since + represents one or more occurrence of previous element, while * represents zero or more occurrences.
So your code can look like
line.matches("\\d+[-+*]\\d+");
I came up to a line in java that uses regular expressions.
It needs a user input of Last Name
return lastName.matches( "[a-zA-z]+([ '-][a-zA-Z]+)*" );
I would like to know what is the function of the [ '-].
Also why do we need both a "+" and a "*" at the same time, and the [ '-][a-zA-Z] is in brackets?
Your RE is: [a-zA-z]+([ '-][a-zA-Z]+)*
I'll break it down into its component parts:
[a-zA-Z]+
The string must begin with any letter, a-z or A-Z, repeated one or more times (+).
([ '-][a-zA-Z]+)*
[ '-]
Any single character of <space>, ', or -.
[a-zA-Z]+
Again, any letter, a-z or A-Z, repeated once or more times.
This combination of letters ('- and a-ZA-Z) may then be repeated zero or more times.
Why [ '-]? To allow for hiphenated names, such as Higgs-Boson or names with apostrophes, such as O'Reilly, or names with spaces such as Van Dyke.
The expression [ '-] means "one of ', , or -". The order is very important - the dash must be the last one, otherwise the character class would be considered a range, and other characters with code points between the space and the quote ' would be accepted as well.
+ means "one or more repetitions"; * means "zero or more repetitions", referring to the term of the regular expression preceding the + or * modifier.]
Overall, the expression matches groups of lowercase and uppercase letters separated by spaces, dashes, or single quotes.
it means it can be any of the characters space ' or - ( space, quote dash )
the - can be done as \- as it also can mean a range... like a-z
This looks like it is a pattern to match double-barreled (space or hyphen) or I-don't-know-what-to-call-it names like O'Grady... for example:
It would match
counter-terrorism
De'ville
O'Grady
smith-jones
smith and wesson
But it will not match
jones-
O'Learys'
#hashtag
Bob & Sons
The idea is, after the first [A-Za-z]+ consumes all the letters it can, the match will end right there unless the next character is a space, an apostrophe, or a hyphen ([ '-]). If one of those characters is present, it must be followed by at least one more letter.
A lot of people have difficulty with this. The naively write something like [A-Za-z]+[ '-]?[A-Za-z]*, figuring both the separator and the extra chunks of letters are optional. But they're not independently optional; if there is a separator ([ '-]), it must be followed by at least one more letter. Otherwise it would treat strings like R'- j'-' as valid. Your regex doesn't have that problem.
By the way, you've got a typo in your regex: [a-zA-z]. You want to watch out for that, because [A-z] does match all the uppercase and lowercase letters, so it will seem to be working correctly as long as the inputs are valid. But it also matches several non-letter characters whose code points happen to lie between those of Z and a. And very few IDEs or regex tools will catch that error.
To match any non-word and non-digit character (special characters) I use this: [\\W\\D]. What should I add if I want to also ignore some concrete characters? Let's say, underscore.
First of all, you must know that \W is equivalent to [^a-zA-Z0-9_]. So, you can change your current regex to:
[\\W]
This will automatically take care of \D.
Now, if you want to ignore some other character, say & (underscore is already exluded in \W), you can use negated character class:
[^\\w&]
I had this regex in java that matched either an alphanumeric character or the tilde (~)
^([a-z0-9])+|~$
Now I have to add also the characters - and _ I've tried a few combinations, neither of which work, for example:
^([a-zA-Z0-9_-])+|~$
^([a-zA-Z0-9]|-|_)+|~$
Sample input strings that must match:
woZOQNVddd
00000
ncnW0mL14-
dEowBO_Eu7
7MyG4XqFz-
A8ft-y6hDu
~
Any clues / suggestion?
- is a special character within square brackets. It indicates a range. If it's not at either end of the regex it needs to be escaped by putting a \ before it.
It's worth pointing out a shortcut: \w is equivalent to [0-9a-zA-Z_] so I think this is more readable:
^([\w-]+|~$
You need to escape the -, like \-, since it is a special character (the range operator). _ is ok.
So ^([a-z0-9_\-])+|~$.
Edit: your last input String will not match because the regular expression you are using matches a string of alphanumeric characters (plus - and _) OR a tilde (because of the pipe). But not both. If you want to allow an optional tilde on the end, change to:
^([a-z0-9_\-])+(~?)$
If you put the - first, it won't be interpreted as the range indicator.
^([-a-zA-Z0-9_])+|~$
This matches all of your examples except the last one using the following code:
String str = "A8ft-y6hDu ~";
System.out.println("Result: " + str.matches("^([-a-zA-Z0-9_])+|~$"));
That last example won't match because it doesn't fit your description. The regex will match any combination of alphanumerics, -, and _, OR a ~ character.