Java regex negative lookahead to replace non-triple characters - java

I'm trying to take a number, convert it into a string and replace all characters that are not a triple.
Eg. if I pass in 1222331 my replace method should return 222. I can find that this pattern exists but I need to get the value and save it into a string for additional logic. I don't want to do a for loop to iterate through this string.
I have the following code:
String first = Integer.toString(num1);
String x = first.replaceAll("^((?!([0-9])\\3{2})).*$","");
But it's replacing the triple digits also. I only need it to replace the rest of the characters. Is my approach wrong?

You can use
first = first.replaceAll("((\\d)\\2{2})|\\d", "$1");
See regex demo
The regex - ((\d)\2{2})|\d - matches either a digit that repeats thrice (and captures it into Group 1), or just matches any other digit. $1 just restores the captured text in the resulting string while removing all others.

Related

Java Regex to replace only part of string (url)

I want to replace only numeric section of a string. Most of the cases it's either full URL or part of URL, but it can be just a normal string as well.
/users/12345 becomes /users/XXXXX
/users/234567/summary becomes /users/XXXXXX/summary
/api/v1/summary/5678 becomes /api/v1/summary/XXXX
http://example.com/api/v1/summary/5678/single becomes http://example.com/api/v1/summary/XXXX/single
Notice that I am not replacing 1 from /api/v1
So far, I have only following which seem to work in most of the cases:
input.replaceAll("/[\\d]+$", "/XXXXX").replaceAll("/[\\d]+/", "/XXXXX/");
But this has 2 problems:
The replacement size doesn't match with the original string length.
The replacement character is hardcoded.
Is there a better way to do this?
In Java you can use:
str = str.replaceAll("(/|(?!^)\\G)\\d(?=\\d*(?:/|$))", "$1X");
RegEx Demo
RegEx Details:
\G asserts position at the end of the previous match or the start of the string for the first match.
(/|(?!^)\\G): Match / or end of the previous match (but not at start) in capture group #1
\\d: Match a digit
(?=\\d*(?:/|$)): Ensure that digits are followed by a / or end.
Replacement: $1X: replace it with capture group #1 followed by X
Not a Java guy here but the idea should be transferrable. Just capture a /, digits and / optionally, count the length of the second group and but it back again.
So
(/)(\d+)(/?)
becomes
$1XYZ$3
See a demo on regex101.com and this answer for a lambda equivalent to e.g. Python or PHP.
First of all you need something like this :
String new_s1 = s3.replaceAll("(\\/)(\\d)+(\\/)?", "$1XXXXX$3");

Replace all non numeric characters by only one word

I want to do the next replacement:
WORD1234 -> W1234
So, I'm using the regex:
([^\d]*)([0-9]+)([^\d]*)
Replacement: W$2
If the word is WORD1234AAAAA, using the previous regex I have the same result: W1234, which is what I want.
But if the word is WO12RD34 the result I have is: W12W34
What I want basically in all the cases is to remove all non-numeric characters and add the letter W at the beginning.
Update:
The input string does not always start with a W. It can be for example ABC12DE34 and the desired result is: FA1234. Meaning, remove all non-numeric characters and add a word at the beginning.
Try this:
String regex = "(?<start>^W)|(\\D)";
String replacement = "${start}";
System.out.println("WO12RD34".replaceAll(regex, replacement)); //prints W1234
System.out.println("WORD1234AAAAA".replaceAll(regex, replacement)); //prints W1234
With this regex, the "start" capturing group will only be set when the first character is matched. Otherwise, it will be empty.
The idea is that, when the start of the string followed by W is matched, the named "start" pattern would be initialised to ^W. Then, just replace ^W with itself.
Otherwise, when any non-digit character is matched, then the start pattern will not be set (and be empty). Then, also replace the non-digit character with nothing.

Modifying part of a regex in replaceAll call

I am trying to format a string with a regex as follows:
String string = "5.07+12.0+2.14";
string = string.replaceAll("\\.0[^0-9]","");
What I think will happen is the string will become:
5.07+122.14 //the regex will delete the .0+ next to the 12
How can I create the regex so that it deletes only the .0 not the + sign?
I would prefer to do everything in the same call to "replaceAll"
thanks for any suggestions
Matched characters will be replaced. So, instead of matching the non-digit at the end, you can use lookahead, which will perform the desired check but won't consume any characters. Also, the shorthand for a non-digit is \D, which is a bit nicer to read than [^0-9]:
String string = "5.07+12.0+2.14";
string = string.replaceAll("\\.0(?=\\D)","");
If you want to replace all trailing zeros (for example, replace 5.00 with 5 instead of 50, which you probably don't want), then repeat the 0 one or more times with + to ensure that all zeros after the decimal point get replaced:
String string = "5.07+12.000+2.14";
string = string.replaceAll("\\.0+(?=\\D)","");
If the string never contains alphabetical or underscore _ characters (those and numeric characters count as word characters), then you can make it even prettier with a word boundary instead of a lookahead. A word boundary, as it sounds, will match a position with a word character on one side and a non-word character on the other side, with \b:
string = string.replaceAll("\\.0+\\b","");

Merge multiple regex in Java

I have written a regex to omit the characters after the first occurrence of some characters (, and #)
String number = "(123) (456) (7890)#123";
number = number.replaceAll("[,#](.*)", ""); //This is the 1st regex
Then a second regex to get only numbers (remove spaces and other non numeric characters)
number = number.replaceAll("[^0-9]+", ""); //This is the 2nd regex
Output: 1234567890
How can I merge the two regex into one like piping the O/p from first regex to the second.
You can combine both regex in the following way.
String number = "(123) (456) (7890)#123";
number = number.replaceAll("[,#](.*)", "").replaceAll("[^0-9]+", "");
So you need to remove all symbols other than digits and the whole rest of the string after the first hash symbol or a comma.
You cannot just concatenate the patterns with |operator because one of the patterns is anchored implicitly at the end of the string.
You need to remove any symbols but digits AND hashes with commas first since the tegex engine processes the string from left to right and then you can add the alternative to match a comma or hash with any text after them. Use DOTALL modifier in case you have newline symbols in your input.
Use
 (?s)[,#].*$|[^#,0-9]+

regex to strip leading zeros treated as string

I have numbers like this that need leading zero's removed.
Here is what I need:
00000004334300343 -> 4334300343
0003030435243 -> 3030435243
I can't figure this out as I'm new to regular expressions. This does not work:
(^0)
You're almost there. You just need quantifier:
str = str.replaceAll("^0+", "");
It replaces 1 or more occurrences of 0 (that is what + quantifier is for. Similarly, we have * quantifier, which means 0 or more), at the beginning of the string (that's given by caret - ^), with empty string.
Accepted solution will fail if you need to get "0" from "00". This is the right one:
str = str.replaceAll("^0+(?!$)", "");
^0+(?!$) means match one or more zeros if it is not followed by end of string.
Thank you to the commenter - I have updated the formula to match the description from the author.
If you know input strings are all containing digits then you can do:
String s = "00000004334300343";
System.out.println(Long.valueOf(s));
// 4334300343
Code Demo
By converting to Long it will automatically strip off all leading zeroes.
Another solution (might be more intuitive to read)
str = str.replaceFirst("^0+", "");
^ - match the beginning of a line
0+ - match the zero digit character one or more times
A exhausting list of pattern you can find here Pattern.
\b0+\B will do the work. See demo \b anchors your match to a word boundary, it matches a sequence of one or more zeros 0+, and finishes not in a word boundary (to not eliminate the last 0 in case you have only 00...000)
The correct regex to strip leading zeros is
str = str.replaceAll("^0+", "");
This regex will match 0 character in quantity of one and more at the string beginning.
There is not reason to worry about replaceAll method, as regex has ^ (begin input) special character that assure the replacement will be invoked only once.
Ultimately you can use Java build-in feature to do the same:
String str = "00000004334300343";
long number = Long.parseLong(str);
// outputs 4334300343
The leading zeros will be stripped for you automatically.
I know this is an old question, but I think the best way to do this is actually
str = str.replaceAll("(^0+)?(\d+)", "$2")
The reason I suggest this is because it splits the string into two groups. The second group is at least one digit. The first group matches 1 or more zeros at the start of the line. However, the first group is optional, meaning that if there are no leading zeros, you just get all of the digits. And, if str is only a zero, you get exactly one zero (because the second group must match at least one digit).
So if it's any number of 0s, you get back exactly one zero. If it starts with any number of 0s followed by any other digit, you get no leading zeros. If it starts with any other digit, you get back exactly what you had in the first place.
Here is the simple and proper solution.
str = str.replaceAll(/^0+/g, "");
Global Flag g is required when using replaceAll with regex

Categories

Resources