Modifying part of a regex in replaceAll call - java

I am trying to format a string with a regex as follows:
String string = "5.07+12.0+2.14";
string = string.replaceAll("\\.0[^0-9]","");
What I think will happen is the string will become:
5.07+122.14 //the regex will delete the .0+ next to the 12
How can I create the regex so that it deletes only the .0 not the + sign?
I would prefer to do everything in the same call to "replaceAll"
thanks for any suggestions

Matched characters will be replaced. So, instead of matching the non-digit at the end, you can use lookahead, which will perform the desired check but won't consume any characters. Also, the shorthand for a non-digit is \D, which is a bit nicer to read than [^0-9]:
String string = "5.07+12.0+2.14";
string = string.replaceAll("\\.0(?=\\D)","");
If you want to replace all trailing zeros (for example, replace 5.00 with 5 instead of 50, which you probably don't want), then repeat the 0 one or more times with + to ensure that all zeros after the decimal point get replaced:
String string = "5.07+12.000+2.14";
string = string.replaceAll("\\.0+(?=\\D)","");
If the string never contains alphabetical or underscore _ characters (those and numeric characters count as word characters), then you can make it even prettier with a word boundary instead of a lookahead. A word boundary, as it sounds, will match a position with a word character on one side and a non-word character on the other side, with \b:
string = string.replaceAll("\\.0+\\b","");

Related

Find a three-digit number in a string using replaceAll()

I have String from which I need to extract a keyword.
Something like: "I have 100 friends and 1 evil".
I need to extract "100" from that String using only replaceAll function and appropriate regex.
I tried to do it in that way:
String input = "I have 100 friends and 1 evil";
String result = input.replaceAll("[^\\d{3}]", "")
But it doesn't work. Any help would be appreciated.
You can consider any of the solutions below:
String result = input.replaceFirst(".*?(\\d{3}).*", "$1");
String result = input.replaceFirst(".*?(?<!\\d)(\\d{3})(?!\\d).*", "$1");
String result = input.replaceFirst(".*?\\b(\\d{3})\\b.*", "$1");
String result = input.replaceFirst(".*?(?<!\\S)(\\d{3})(?!\\S).*", "$1");
See the regex demo. NOTE you may use replaceAll here, too, but it makes little sense as the replacement must occur only once in this case.
Here,
.*? - matches any zero or more chars other than line break chars, as few as possible
(\d{3}) - captures into Group 1 any three digits
.* - matches any zero or more chars other than line break chars, as many as possible.
The (?<!\d) / (?!\d) lookarounds are digit boundaries, there is no match if the sequence is four or more digits. \b are word boundaries, there will be no match of the three digits are glued to a letter, digit or underscore. (?<!\S) / (?!\S) lookarounds are whitespace boundaries, there must be a space or start of string before the match and either a space or end of string after.
The replacement is $1, the value of Group 1.
See the Java demo:
String input = "I have 100 friends and 1 evil";
System.out.println(input.replaceFirst(".*?(\\d{3}).*", "$1"));
System.out.println(input.replaceFirst(".*?(?<!\\d)(\\d{3})(?!\\d).*", "$1"));
System.out.println(input.replaceFirst(".*?\\b(\\d{3})\\b.*", "$1"));
System.out.println(input.replaceFirst(".*?(?<!\\S)(\\d{3})(?!\\S).*", "$1"));
All output 100.

Using NOT in Regex in replaceAll

I have this string:
String a = "$$bar$55^$$";
I want remove all symbols. I make regex:
String b = a.replaceAll("(?<=[^[\\p{Alpha}][\\p{Digit}]])", "");
But, I get:
$$bar$55^$$
But I want to get this string:
bar55
What am I doing wrong? How can I filter out all characters except letters and numbers?
In Oracle it work for me:
select regexp_replace('$$bar$55^$$','[^[:alpha:][:digit:]]*') from dual;
You are using a lookaround that is a non-consuming pattern, i.e. the match value will always be empty since only a location inside a string will be matched. Use
String b = a.replaceAll("\\P{Alnum}+", "");
The \\P{Alnum}+ pattern matches one or more chars other than ASCII alphanumeric chars. Also, see Predefined Character classes.
Alternatively, you may use
String b = a.replaceAll("[^\\p{L}\\p{P}\\p{S}]+", "");
This will remove chunks of 1 or more chars other than Unicode letters, punctuation and symbols.

Merge multiple regex in Java

I have written a regex to omit the characters after the first occurrence of some characters (, and #)
String number = "(123) (456) (7890)#123";
number = number.replaceAll("[,#](.*)", ""); //This is the 1st regex
Then a second regex to get only numbers (remove spaces and other non numeric characters)
number = number.replaceAll("[^0-9]+", ""); //This is the 2nd regex
Output: 1234567890
How can I merge the two regex into one like piping the O/p from first regex to the second.
You can combine both regex in the following way.
String number = "(123) (456) (7890)#123";
number = number.replaceAll("[,#](.*)", "").replaceAll("[^0-9]+", "");
So you need to remove all symbols other than digits and the whole rest of the string after the first hash symbol or a comma.
You cannot just concatenate the patterns with |operator because one of the patterns is anchored implicitly at the end of the string.
You need to remove any symbols but digits AND hashes with commas first since the tegex engine processes the string from left to right and then you can add the alternative to match a comma or hash with any text after them. Use DOTALL modifier in case you have newline symbols in your input.
Use
 (?s)[,#].*$|[^#,0-9]+

Java regex negative lookahead to replace non-triple characters

I'm trying to take a number, convert it into a string and replace all characters that are not a triple.
Eg. if I pass in 1222331 my replace method should return 222. I can find that this pattern exists but I need to get the value and save it into a string for additional logic. I don't want to do a for loop to iterate through this string.
I have the following code:
String first = Integer.toString(num1);
String x = first.replaceAll("^((?!([0-9])\\3{2})).*$","");
But it's replacing the triple digits also. I only need it to replace the rest of the characters. Is my approach wrong?
You can use
first = first.replaceAll("((\\d)\\2{2})|\\d", "$1");
See regex demo
The regex - ((\d)\2{2})|\d - matches either a digit that repeats thrice (and captures it into Group 1), or just matches any other digit. $1 just restores the captured text in the resulting string while removing all others.

regex to strip leading zeros treated as string

I have numbers like this that need leading zero's removed.
Here is what I need:
00000004334300343 -> 4334300343
0003030435243 -> 3030435243
I can't figure this out as I'm new to regular expressions. This does not work:
(^0)
You're almost there. You just need quantifier:
str = str.replaceAll("^0+", "");
It replaces 1 or more occurrences of 0 (that is what + quantifier is for. Similarly, we have * quantifier, which means 0 or more), at the beginning of the string (that's given by caret - ^), with empty string.
Accepted solution will fail if you need to get "0" from "00". This is the right one:
str = str.replaceAll("^0+(?!$)", "");
^0+(?!$) means match one or more zeros if it is not followed by end of string.
Thank you to the commenter - I have updated the formula to match the description from the author.
If you know input strings are all containing digits then you can do:
String s = "00000004334300343";
System.out.println(Long.valueOf(s));
// 4334300343
Code Demo
By converting to Long it will automatically strip off all leading zeroes.
Another solution (might be more intuitive to read)
str = str.replaceFirst("^0+", "");
^ - match the beginning of a line
0+ - match the zero digit character one or more times
A exhausting list of pattern you can find here Pattern.
\b0+\B will do the work. See demo \b anchors your match to a word boundary, it matches a sequence of one or more zeros 0+, and finishes not in a word boundary (to not eliminate the last 0 in case you have only 00...000)
The correct regex to strip leading zeros is
str = str.replaceAll("^0+", "");
This regex will match 0 character in quantity of one and more at the string beginning.
There is not reason to worry about replaceAll method, as regex has ^ (begin input) special character that assure the replacement will be invoked only once.
Ultimately you can use Java build-in feature to do the same:
String str = "00000004334300343";
long number = Long.parseLong(str);
// outputs 4334300343
The leading zeros will be stripped for you automatically.
I know this is an old question, but I think the best way to do this is actually
str = str.replaceAll("(^0+)?(\d+)", "$2")
The reason I suggest this is because it splits the string into two groups. The second group is at least one digit. The first group matches 1 or more zeros at the start of the line. However, the first group is optional, meaning that if there are no leading zeros, you just get all of the digits. And, if str is only a zero, you get exactly one zero (because the second group must match at least one digit).
So if it's any number of 0s, you get back exactly one zero. If it starts with any number of 0s followed by any other digit, you get no leading zeros. If it starts with any other digit, you get back exactly what you had in the first place.
Here is the simple and proper solution.
str = str.replaceAll(/^0+/g, "");
Global Flag g is required when using replaceAll with regex

Categories

Resources