RegEx: match any non-word and non-digit character except - java

To match any non-word and non-digit character (special characters) I use this: [\\W\\D]. What should I add if I want to also ignore some concrete characters? Let's say, underscore.

First of all, you must know that \W is equivalent to [^a-zA-Z0-9_]. So, you can change your current regex to:
[\\W]
This will automatically take care of \D.
Now, if you want to ignore some other character, say & (underscore is already exluded in \W), you can use negated character class:
[^\\w&]

Related

Replacing characters in String using Meta characters or character classes

I am writing to remove all non-alphanumeric characters in a String with only lowercase letters.
I am using the replaceAll function and have looked at a few regexes
My reference is from: https://www.vogella.com/tutorials/JavaRegularExpressions/article.html which shows that
\s : A whitespace character, short for [ \t\n\x0b\r\f]
\W : A non-word character [^\w]
I tried the folllowing in Java but the results didn't remove the spaces or symbols:
lowercased = lowercased.replaceAll("\\W\\s", "");
output:
amanaplanac analp anam a
May I know what is wrong?
Regex \W\s means "a non-word character followed by a whitespace character".
If you want to replace any character that is one of those, use one of these:
\W|\s where | means or
[\W\s] where [ ] is a character class that in this case merges the built-in special character classes \W and \s, because that's what those are.
Of the two, I recommend using the second.
Of course, having \s there is redundant, because \s means whitespace character, and \W means non-word character, and since whitespaces are not word characters, using \W alone is enough.
lowercased = lowercased.replaceAll("\\W+", "");
Regex \W is meant for matching character's that are not numbers(0-9), alphabets(A-Z and a-z) and underscore (_). And /s is meant for matching space.
As /W already take care for matching non alphanumeric characters (excluding underscore). No need to use \s.
So if you are using \W you are allowing underscore(_) with alphanumeric values.
use the following to exclude underscore as well.
lowercased = lowercased.replaceAll("\\W|_", "");
Use | (or operator) like \W|\s since both \W and \s are independent case for which you want to replace. And since whitespace are not word character you can use \W only.
lowercased = lowercased.replaceAll("\\W|\\s", "");

regex for allowing only certain special characters and also including the alphanumerical characters

I'm struggling with REGEX and require it for a program.
The input require only alphanumerical keys and also (allow only comma,:,space,/,- in special chars)
I have tried = (^[a-zA-Z0-9,:\S/-]*$)
As far as i understand and please correct me if I'm wrong.
a-zA-Z0-9 - The alphanumerical keys.
,: - Comma and colon
\S - Space
/ - I'm not sure how to represent a forward slash thus i escaped it
- - Dash also not sure if it is needed to escape it.
Would be appreciated if this can be corrected and also a explanation of each part.
Thanks in advance.
You can replace a-zA-Z0-9 with just \\w which is short for [a-zA-Z_0-9]. Furthermore, \\S is any character, but not a whitespace, you should use a \\s instead. You don't need to escape /, and even - if it's the first one or the last one, because if it's placed between two characters it could be interpreted as range and you'll have to escape it. So, you can make your regex like ^([\w,:\s/-]*)$
The \S shorthand matches any character except whitespace, just the opposite of what you want. Lowercase \s matches whitespace [\t\v\n\r\f ]. But if you only want spaces, just put a space in the character class.
a hyphen - needs to be escaped inside characters, unless it's the first or last character in the character class, but you could always escape it just to be sure.
Slashes / don't need to be escaped. They're escaped in other languages where you use them as pattern delimiters. ie: /regex/i.
Besides hyphens and shorthands, only backslashes \\ and closing brackets \] need to be escaped.
Remember in java, you always need to use double backslashes (one is interpreted by java, the other by the regex engine).
Regex
pattern = "^[a-zA-Z0-9 ,:/\\-]*$"
Move the Start of Line ^ and End of Line $ outside the group - like
^([a-zA-Z0-9,:\S/-]*)$
That should do it.

How to replace all non-digit charaters in a string?

I need to replace all non-digit charaters in the string. For instance:
String: 987sdf09870987=-0\\\`42
Replaced: 987**sdf**09870987**=-**0**\\\`**42
That's all non-digit char-sequence wrapped into ** charaters. How can I do that with String::replaceAll()?
(?![0-9]+$).*
the regex doesn't match what I want. How can I do that?
(\\D+)
You can use this and replace by **$1**.See demo.
https://regex101.com/r/fM9lY3/2
You can use a negated character class for a non-digit and use the 0th group back-reference to avoid overhead with capturing groups (it is minimal here, but still is):
String x = "987sdf09870987=-0\\\\\\`42";
x = x.replaceAll("[^0-9]+", "**$0**");
System.out.println(x);
See demo on IDEONE. Output: 987**sdf**09870987**=-**0**\\\`**42.
Also, in Java regex, character classes look neater than multiple escape symbols, that is why I prefer this [^0-9]+ pattern meaning match 1 or more (+) symbols other than (because of ^) digits from 0 to 9 ([0-9]).
A couple of words about your (?![0-9]+$).* regex. It consists of a negative lookahead (?![0-9]+$) that checks if from the current position onward there are no digits only (if there are only digits up to the end of string, the match fails), and .* matching any characters but a newline. You can see example of what it is doing here. I do not think it can help you since you need to actually match non-numbers, not just check if digits are absent.

Java ampersand (&) regex

I am trying to write a regex filter that will only allow 0-9, a-z, A-Z, _, -, and the & sign.
So far I have this, "^[A-Za-z0-9_-]$" but I am unsure on how to include the & sign as part of the allowed characters. Thanks.
Just add & inside the character class and make that char class to repeat one or more times by adding + quantifier next to that character class.
"^[A-Za-z0-9_&-]+$"
[A-Za-z0-9_] would be written as \w.
"^[\\w&-]+$"
If you want to allow only a single character. Then + after the char class won't be needed.
"^[\\w&-]$"
& has no special meaning in regex. The problem may have been that you added it to the end of your character class, like this:
[A-Za-z0-9_-&]
The dash character - has special meaning inside a character class when not first or last - it is the "range" operator, so by ending with _-& you are specifying "all characters between (in unicode order) and including _ and &".
Instead, add the & before the dash:
[A-Za-z0-9_&-]
When the dash is first or last in a character class, it's just a literal dash character. This last version should work.

Java Replacing Characters

This is pretty simple but how would I create a regex to strip anything but
letters a-Z,
numbers 0-9
and commas?
I think the regex expression for the first two is [^a-zA-Z_0-9] but how could I add commas to it.
Also, would it be the following?
"string".replaceAll("expression", null);
First of all, you can not use null for the replacement value. It will give you java.lang.NullPointerException. You must use string there. For example instead of null use empty "".
About the regex, if you need anything to add inside your character class [], just add them there. For example [^a-z,*.]
Furthermore, your a-zA-Z_0-9 can be replaced with \\w
[^\\w,]
You can simply add comma to your negated character class
[^a-zA-Z0-9,]
^ add this
Also Strings are immutable so replaceAll will not affect original string but create new one with replaced characters so you need to store it somewhere (maybe in reference to original String).
Last thing is that you need to pass empty string "" as replacement, not null.
So try with
yourString = yourString.replaceAll("[^a-zA-Z0-9,]","");
Another thing is that regex you are currently using also prevents _ from being removed. If that was intentional then instead of _ a-z A-Z 0-9 you can simply use predefined character class \w (which in Javas String needs to be written as "\\w" because \ needs to be escaped) so your code can look like
yourString = yourString.replaceAll("[^\\w,]","");
No, you should do:
value = "string".replaceAll("[\\W_,]", "");
My pattern doesn't use negation.
You should replace it with empty string and not null and you've to assign the result to your string as strings are immutable.
You can just simplify your regex to mine.
Otherwise just add , to your negated character class.
[\w,]+ is the regex which matches alphanumeric, underscore and comma.
Here \w is equivalent to [A-Za-z0-9_]
[\W,]+ is the regex which matches everything except alphanumeric, underscore and comma.
Here \W - Matches any character that is not a word character (alphanumeric & underscore) which is equivalent to [^A-Za-z0-9_]

Categories

Resources