Java Replacing Characters - java

This is pretty simple but how would I create a regex to strip anything but
letters a-Z,
numbers 0-9
and commas?
I think the regex expression for the first two is [^a-zA-Z_0-9] but how could I add commas to it.
Also, would it be the following?
"string".replaceAll("expression", null);

First of all, you can not use null for the replacement value. It will give you java.lang.NullPointerException. You must use string there. For example instead of null use empty "".
About the regex, if you need anything to add inside your character class [], just add them there. For example [^a-z,*.]
Furthermore, your a-zA-Z_0-9 can be replaced with \\w
[^\\w,]

You can simply add comma to your negated character class
[^a-zA-Z0-9,]
^ add this
Also Strings are immutable so replaceAll will not affect original string but create new one with replaced characters so you need to store it somewhere (maybe in reference to original String).
Last thing is that you need to pass empty string "" as replacement, not null.
So try with
yourString = yourString.replaceAll("[^a-zA-Z0-9,]","");
Another thing is that regex you are currently using also prevents _ from being removed. If that was intentional then instead of _ a-z A-Z 0-9 you can simply use predefined character class \w (which in Javas String needs to be written as "\\w" because \ needs to be escaped) so your code can look like
yourString = yourString.replaceAll("[^\\w,]","");

No, you should do:
value = "string".replaceAll("[\\W_,]", "");
My pattern doesn't use negation.
You should replace it with empty string and not null and you've to assign the result to your string as strings are immutable.
You can just simplify your regex to mine.
Otherwise just add , to your negated character class.

[\w,]+ is the regex which matches alphanumeric, underscore and comma.
Here \w is equivalent to [A-Za-z0-9_]
[\W,]+ is the regex which matches everything except alphanumeric, underscore and comma.
Here \W - Matches any character that is not a word character (alphanumeric & underscore) which is equivalent to [^A-Za-z0-9_]

Related

How to use regex in matches() to look for letters, dots, and apostrophe?

In the following code, I tried many expressions to check if the string str has only letters, dots, or apostrophe by using matches() method.
However, it's not returning true for this string, for example, one o'clock. :
String str = "One o'clock.";
System.out.println(str.matches("[^a-zA-Z'. ]"));
You have two problems:
your regex is using negated character class [^...]
your regex can match strings with only one character.
So use standard character class [a-zA-Z'. ] and to let your regex match strings one or more characters use + quantifier.
System.out.println(str.matches("[a-zA-Z'. ]+"));
You can try this one [\p{Alpha}\s'.]*

Java regex match all characters except

What is the correct syntax for matching all characters except specific ones.
For example I'd like to match everything but letters [A-Z] [a-z] and numbers [0-9].
I have
string.matches("[^[A-Z][a-z][0-9]]")
Is this incorrect?
Yes, you don't need nested [] like that. Use this instead:
"[^A-Za-z0-9]"
It's all one character class.
If you want to match anything but letters, you should have a look into Unicode properties.
\p{L} is any kind of letter from any language
Using an uppercase "P" instead it is the negation, so \P{L} would match anything that is not a letter.
\d or \p{Nd} is matching digits
So your expression in modern Unicode style would look like this
Either using a negated character class
[^\p{L}\p{Nd}]
or negated properties
[\P{L}\P{Nd}]
The next thing is, matches() matches the expression against the complete string, so your expression is only true with exactly one char in the string. So you would need to add a quantifier:
string.matches("[^\p{L}\p{Nd}]+")
returns true, when the complete string has only non alphanumerics and at least one of them.
Almost right. What you want is:
string.matches("[^A-Za-z0-9]")
Here's a good tutorial
string.matches("[^A-Za-z0-9]")
Lets say that you want to make sure that no Strings have the _ symbol in them, then you would simply use something like this.
Pattern pattern = Pattern.compile("_");
Matcher matcher = Pattern.matcher(stringName);
if(!matcher.find()){
System.out.println("Valid String");
}else{
System.out.println("Invalid String");
}
You can negate character classes:
"[^abc]" // matches any character except a, b, or c (negation).
"[^a-zA-Z0-9]" // matches non-alphanumeric characters.

Check string contains whitespace along with some other char sequence using regex in java

am using regex expression to check if a string contains white space.
my regex is : ^\\s+$
for example if my string is my name then regex matches should return true.
but it is returning true only if my string contains only spaces no other character.
How to check if a string contains a whitespace or tab or carriage return characters in between/start/end of some string.
^(.*\s+.*)+$ seems to work for me. Accepts anything as long as there is at least one space in the string. This will match the entire string.
If you only want to check for the presence of a space, you can just use \s without any begin or end markers in the string. The difference is that this will only match the individual spaces.
Your regex is not correct.
That's a string representing a regular expression. (as tchrist pointed out correctly)
The corresponding pattern that you get when using Pattern.compile() matches only strings containing one or more whitespace characters, starting from the beginning until the end. Thus, the matching string only consists of whitespace characters.
Try this string instead for Pattern.compile():
"\\s+"
The difference is that without the anchors "^" and "$" there may be other characters around the whitespace character. The whitespace character(s) may be everywhere in the string.
Using this pattern-string the whitespace character(s) must be at the beginning:
"^\\s+"
And here the sequence of whitespace characters has to be at the end:
"\\s+$"
Use org.apache.commons.lang.StringUtils.containsAny(). See http://commons.apache.org/lang/api-3.1/org/apache/commons/lang3/StringUtils.html.

Split string with dot as delimiter

I am wondering if I am going about splitting a string on a . the right way? My code is:
String[] fn = filename.split(".");
return fn[0];
I only need the first part of the string, that's why I return the first item. I ask because I noticed in the API that . means any character, so now I'm stuck.
split() accepts a regular expression, so you need to escape . to not consider it as a regex meta character. Here's an example :
String[] fn = filename.split("\\.");
return fn[0];
I see only solutions here but no full explanation of the problem so I decided to post this answer
Problem
You need to know few things about text.split(delim). split method:
accepts as argument regular expression (regex) which describes delimiter on which we want to split,
if delim exists at end of text like in a,b,c,, (where delimiter is ,) split at first will create array like ["a" "b" "c" "" ""] but since in most cases we don't really need these trailing empty strings it also removes them automatically for us. So it creates another array without these trailing empty strings and returns it.
You also need to know that dot . is special character in regex. It represents any character (except line separators but this can be changed with Pattern.DOTALL flag).
So for string like "abc" if we split on "." split method will
create array like ["" "" "" ""],
but since this array contains only empty strings and they all are trailing they will be removed (like shown in previous second point)
which means we will get as result empty array [] (with no elements, not even empty string), so we can't use fn[0] because there is no index 0.
Solution
To solve this problem you simply need to create regex which will represents dot. To do so we need to escape that .. There are few ways to do it, but simplest is probably by using \ (which in String needs to be written as "\\" because \ is also special there and requires another \ to be escaped).
So solution to your problem may look like
String[] fn = filename.split("\\.");
Bonus
You can also use other ways to escape that dot like
using character class split("[.]")
wrapping it in quote split("\\Q.\\E")
using proper Pattern instance with Pattern.LITERAL flag
or simply use split(Pattern.quote(".")) and let regex do escaping for you.
Split uses regular expressions, where '.' is a special character meaning anything. You need to escape it if you actually want it to match the '.' character:
String[] fn = filename.split("\\.");
(one '\' to escape the '.' in the regular expression, and the other to escape the first one in the Java string)
Also I wouldn't suggest returning fn[0] since if you have a file named something.blabla.txt, which is a valid name you won't be returning the actual file name. Instead I think it's better if you use:
int idx = filename.lastIndexOf('.');
return filename.subString(0, idx);
the String#split(String) method uses regular expressions.
In regular expressions, the "." character means "any character".
You can avoid this behavior by either escaping the "."
filename.split("\\.");
or telling the split method to split at at a character class:
filename.split("[.]");
Character classes are collections of characters. You could write
filename.split("[-.;ld7]");
and filename would be split at every "-", ".", ";", "l", "d" or "7". Inside character classes, the "." is not a special character ("metacharacter").
As DOT( . ) is considered as a special character and split method of String expects a regular expression you need to do like this -
String[] fn = filename.split("\\.");
return fn[0];
In java the special characters need to be escaped with a "\" but since "\" is also a special character in Java, you need to escape it again with another "\" !
String str="1.2.3";
String[] cats = str.split(Pattern.quote("."));
Wouldn't it be more efficient to use
filename.substring(0, filename.indexOf("."))
if you only want what's up to the first dot?
Usually its NOT a good idea to unmask it by hand. There is a method in the Pattern class for this task:
java.util.regex
static String quote(String s)
The split must be taking regex as a an argument... Simply change "." to "\\."
The solution that worked for me is the following
String[] fn = filename.split("[.]");
Note: Further care should be taken with this snippet, even after the dot is escaped!
If filename is just the string ".", then fn will still end up to be of 0 length and fn[0] will still throw an exception!
This is, because if the pattern matches at least once, then split will discard all trailing empty strings (thus also the one before the dot!) from the array, leaving an empty array to be returned.
Using ApacheCommons it's simplest:
File file = ...
FilenameUtils.getBaseName(file.getName());
Note, it also extracts a filename from full path.
split takes a regex as argument. So you should pass "\." instead of "." because "." is a metacharacter in regex.

How to escape a square bracket for Pattern compilation?

I have comma separated list of regular expressions:
.{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]
I have done a split on the comma. Now I'm trying to match this regex against a generated password. The problem is that Pattern.compile does not like square brackets that is not escaped.
Can some please give me a simple function that takes a string like so: [0-9] and returns the escaped string \[0-9\].
For some reason, the above answer didn't work for me. For those like me who come after, here is what I found.
I was expecting a single backslash to escape the bracket, however, you must use two if you have the pattern stored in a string. The first backslash escapes the second one into the string, so that what regex sees is \]. Since regex just sees one backslash, it uses it to escape the square bracket.
\\]
In regex, that will match a single closing square bracket.
If you're trying to match a newline, for example though, you'd only use a single backslash. You're using the string escape pattern to insert a newline character into the string. Regex doesn't see \n - it sees the newline character, and matches that. You need two backslashes because it's not a string escape sequence, it's a regex escape sequence.
You can use Pattern.quote(String).
From the docs:
public static String quote​(String s)
Returns a literal pattern String for the specified String.
This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.
Metacharacters or escape sequences in the input sequence will be given no special meaning.
You can use the \Q and \E special characters...anything between \Q and \E is automatically escaped.
\Q[0-9]\E
Pattern.compile() likes square brackets just fine. If you take the string
".{8},[0-9],[^0-9A-Za-z ],[A-Z],[a-z]"
and split it on commas, you end up with five perfectly valid regexes: the first one matches eight non-line-separator characters, the second matches an ASCII digit, and so on. Unless you really want to match strings like ".{8}" and "[0-9]", I don't see why you would need to escape anything.

Categories

Resources