Using NOT in Regex in replaceAll - java

I have this string:
String a = "$$bar$55^$$";
I want remove all symbols. I make regex:
String b = a.replaceAll("(?<=[^[\\p{Alpha}][\\p{Digit}]])", "");
But, I get:
$$bar$55^$$
But I want to get this string:
bar55
What am I doing wrong? How can I filter out all characters except letters and numbers?
In Oracle it work for me:
select regexp_replace('$$bar$55^$$','[^[:alpha:][:digit:]]*') from dual;

You are using a lookaround that is a non-consuming pattern, i.e. the match value will always be empty since only a location inside a string will be matched. Use
String b = a.replaceAll("\\P{Alnum}+", "");
The \\P{Alnum}+ pattern matches one or more chars other than ASCII alphanumeric chars. Also, see Predefined Character classes.
Alternatively, you may use
String b = a.replaceAll("[^\\p{L}\\p{P}\\p{S}]+", "");
This will remove chunks of 1 or more chars other than Unicode letters, punctuation and symbols.

Related

Modifying part of a regex in replaceAll call

I am trying to format a string with a regex as follows:
String string = "5.07+12.0+2.14";
string = string.replaceAll("\\.0[^0-9]","");
What I think will happen is the string will become:
5.07+122.14 //the regex will delete the .0+ next to the 12
How can I create the regex so that it deletes only the .0 not the + sign?
I would prefer to do everything in the same call to "replaceAll"
thanks for any suggestions
Matched characters will be replaced. So, instead of matching the non-digit at the end, you can use lookahead, which will perform the desired check but won't consume any characters. Also, the shorthand for a non-digit is \D, which is a bit nicer to read than [^0-9]:
String string = "5.07+12.0+2.14";
string = string.replaceAll("\\.0(?=\\D)","");
If you want to replace all trailing zeros (for example, replace 5.00 with 5 instead of 50, which you probably don't want), then repeat the 0 one or more times with + to ensure that all zeros after the decimal point get replaced:
String string = "5.07+12.000+2.14";
string = string.replaceAll("\\.0+(?=\\D)","");
If the string never contains alphabetical or underscore _ characters (those and numeric characters count as word characters), then you can make it even prettier with a word boundary instead of a lookahead. A word boundary, as it sounds, will match a position with a word character on one side and a non-word character on the other side, with \b:
string = string.replaceAll("\\.0+\\b","");

How can we remove a ':' characters from a string?

I have strings like
#lle #mme: #crazy #upallnight:
I would like to remove the words which starts with either # or #. It works perfectly fine if those words doesn't contain the ':' character. However, that ':' character is left whenever I delete the words. Therefore I decided to replace those ':' characters before I delete the words using a string.replace() function. However, they are still not removed.
String example = "#lle #mme: #crazy #upallnight:";
example.replace(':',' ');
The result : #lle #mme: #crazy #upallnight:
I am pretty stuck here, anyhelp would be appreciated.
You can do this:
example = example.replaceAll(" +[##][^ ]+", "");
What this will do is replace any substrings in your string that match the regex pattern [##][^ ]+ with the empty string. Since that pattern matches the words you want to dump, it'll do what you want.
Demo of the pattern on Regex101
From Java docs:
String s = "Abc: abc#:";
String result = s.replace(':',' ');
Output in variable result= Abc abc#
I think you forgot to store the returned result of replace() method in some other String variable.

Java regex negative lookahead to replace non-triple characters

I'm trying to take a number, convert it into a string and replace all characters that are not a triple.
Eg. if I pass in 1222331 my replace method should return 222. I can find that this pattern exists but I need to get the value and save it into a string for additional logic. I don't want to do a for loop to iterate through this string.
I have the following code:
String first = Integer.toString(num1);
String x = first.replaceAll("^((?!([0-9])\\3{2})).*$","");
But it's replacing the triple digits also. I only need it to replace the rest of the characters. Is my approach wrong?
You can use
first = first.replaceAll("((\\d)\\2{2})|\\d", "$1");
See regex demo
The regex - ((\d)\2{2})|\d - matches either a digit that repeats thrice (and captures it into Group 1), or just matches any other digit. $1 just restores the captured text in the resulting string while removing all others.

How to write this Java regex?

I need to break the string into words by a hyphen. For example:
"WorkInProgress" is converted to "Work-In-Progress"
"NotComplete" is converted to "Not-Complete"
Most of cases one word starts with capital and ends with lowercase.
But there is one exception, "CIInProgress" should be converted to "CI-In-Progress".
I wrote like the code below, any pattern that has lowercase or "CI", followed by an capital, will be added "-" in middle. But it still can't work for "CIInProgress". Can anyone tell me how to correct it?
String str;
String pattern = "([a-z|CI]+)([A-Z])";
str= str.replaceAll(pattern, "$1\\-$2");
You could use a negative lookbehind,
Regex:
(?<!^)([A-Z][a-z])
Replacement string:
-$1
DEMO
Explanation:
(?<!^) Negative lookbehind is used here , which asserts what proceeds the characters [A-Z](uppercase) and also the following [a-z](lowercase) is not a starting anchor. An uppercase letter and the following lowercase letter will be matched only if it satisfies the above mentioned condition.() capturing groups are used to capture the matched characters, captured chars are stored into groups. Later you could get the captured chars by referring it's group index number.
Code:
System.out.println("WorkInProgress".replaceAll("(?<!^)([A-Z][a-z])", "-$1"));
System.out.println("NotComplete".replaceAll("(?<!^)([A-Z][a-z])", "-$1"));
System.out.println("CIInProgress".replaceAll("(?<!^)([A-Z][a-z])", "-$1"));
Output:
Work-In-Progress
Not-Complete
CI-In-Progress
You can't have | in a character class; it will just get interpreted as a literal vertical bar character. Try:
String pattern = "([a-z]+|CI)([A-Z])";
try this
str= str.replaceAll("(?<=\\p{javaLowerCase})(?=\\p{javaUpperCase})", "-");

how to check all character in a string is lowercase using java

I tried like this but it outputs false,Please help me
String inputString1 = "dfgh";// but not dFgH
String regex = "[a-z]";
boolean result;
Pattern pattern1 = Pattern.compile(regex);
Matcher matcher1 = pattern1.matcher(inputString1);
result = matcher1.matches();
System.out.println(result);
Your solution is nearly correct. The regex must say "[a-z]+"—include a quantifier, which means that you are not matching a single character, but one or more lowercase characters. Note that the uber-correct solution, which matches any lowercase char in Unicode, and not only those from the English alphabet, is this:
"\\p{javaLowerCase}+"
Additionally note that you can achieve this with much less code:
System.out.println(input.matches("\\p{javaLowerCase}*"));
(here I am alternatively using the * quantifier, which means zero or more. Choose according to the desired semantics.)
you are almost there, except that you are only checking for one character.
String regex = "[a-z]+";
the above regex would check if the input string would contain any number of characters from a to z
read about how to use Quantifiers in regex
Use this pattern :
String regex = "[a-z]*";
Your current pattern only works if the tested string is one char only.
Note that it does exactly what it looks like : it doesn't really test if the string is in lowercase but if it doesn't contain chars outside [a-z]. This means it returns false for lowercase strings like "àbcd". A correct solution in a Unicode world would be to use the Character.isLowercase() function and loop over the string.
It should be
^[a-z]+$
^ is the start of string
$ is the end of string
[a-z]+ matches 1 to many small characters
You need to use quantifies like * which matches 0 to many chars,+ which matches 1 to many chars..They would matches 0 or 1 to many times of the preceding character or range
Why bother with a regular expression ?
String inputString1 = "dfgh";// but not dFgH
boolean result = inputString1.toLowerCase().equals( inputString1 );

Categories

Resources