Merge multiple regex in Java - java

I have written a regex to omit the characters after the first occurrence of some characters (, and #)
String number = "(123) (456) (7890)#123";
number = number.replaceAll("[,#](.*)", ""); //This is the 1st regex
Then a second regex to get only numbers (remove spaces and other non numeric characters)
number = number.replaceAll("[^0-9]+", ""); //This is the 2nd regex
Output: 1234567890
How can I merge the two regex into one like piping the O/p from first regex to the second.

You can combine both regex in the following way.
String number = "(123) (456) (7890)#123";
number = number.replaceAll("[,#](.*)", "").replaceAll("[^0-9]+", "");

So you need to remove all symbols other than digits and the whole rest of the string after the first hash symbol or a comma.
You cannot just concatenate the patterns with |operator because one of the patterns is anchored implicitly at the end of the string.
You need to remove any symbols but digits AND hashes with commas first since the tegex engine processes the string from left to right and then you can add the alternative to match a comma or hash with any text after them. Use DOTALL modifier in case you have newline symbols in your input.
Use
 (?s)[,#].*$|[^#,0-9]+

Related

Remove leading zeros from a string using regex

I am learning regex and java and working on a problem related to that.
I have an input string which can be any dollar amount like
$123,456.78
$0012,345.67
$123,04.56
$123,45.06
it also could be
$0.1
I am trying to find if the dollar amount has leading zeros and trying to remove it.
so far I have tried this
string result = input_string.replaceAll(("[!^0]+)" , "");
But I guess I'm doing something wrong.
I just want to remove the leading zeros, not the ones between the amount part and not the one in cents. And if the amount is $0.1, I don't want to remove it.
Match zeroes or commas that are preceded by a dollar sign and followed by a digit:
str = str.replaceAll("(?<=\\$)[0,]+(?=\\d)", "");
See live demo.
This covers the edge cases:
$001.23 -> $1.23
$000.12 -> $0.12
$00,123.45 -> $123.45
$0,000,000.12 -> $0.12
The regex:
(?<=\\$) means the preceding character is a dollar sign
(?=\\d) means the following character is a digit
[0,]+ means one or more zeroes or commas
To handle any currency symbol:
str = str.replaceAll("(?<=[^\\d,.])[0,]+(?=\\d)", "");
See live demo.
You can use
string result = input_string.replaceFirst("(?<=\\p{Sc})[0,]+(?=\\d)", "");
See the regex demo.
Details:
(?<=\p{Sc}) - immediately on the left, there must be any currency char
[0,]+ - one or more zeros or commas
(?=\d) - immediately on the right, there must be any one digit.
This is a simple regex that fits your needs:
str = str.replaceAll("^[$][,0]+0(?![.])", "$");

Modifying part of a regex in replaceAll call

I am trying to format a string with a regex as follows:
String string = "5.07+12.0+2.14";
string = string.replaceAll("\\.0[^0-9]","");
What I think will happen is the string will become:
5.07+122.14 //the regex will delete the .0+ next to the 12
How can I create the regex so that it deletes only the .0 not the + sign?
I would prefer to do everything in the same call to "replaceAll"
thanks for any suggestions
Matched characters will be replaced. So, instead of matching the non-digit at the end, you can use lookahead, which will perform the desired check but won't consume any characters. Also, the shorthand for a non-digit is \D, which is a bit nicer to read than [^0-9]:
String string = "5.07+12.0+2.14";
string = string.replaceAll("\\.0(?=\\D)","");
If you want to replace all trailing zeros (for example, replace 5.00 with 5 instead of 50, which you probably don't want), then repeat the 0 one or more times with + to ensure that all zeros after the decimal point get replaced:
String string = "5.07+12.000+2.14";
string = string.replaceAll("\\.0+(?=\\D)","");
If the string never contains alphabetical or underscore _ characters (those and numeric characters count as word characters), then you can make it even prettier with a word boundary instead of a lookahead. A word boundary, as it sounds, will match a position with a word character on one side and a non-word character on the other side, with \b:
string = string.replaceAll("\\.0+\\b","");

Regular expression to remove unwanted characters from the String

I have a requirement where I need to remove unwanted characters for String in java.
For example,
Input String is
Income ......................4,456
liability........................56,445.99
I want the output as
Income 4,456
liability 56,445.99
What is the best approach to write this in java. I am parsing large documents
for this hence it should be performance optimized.
You can do this replace with this line of code:
System.out.println("asdfadf ..........34,4234.34".replaceAll("[ ]*\\.{2,}"," "));
For this particular example, I might use the following replacement:
String input = "Income ......................4,456";
input = input.replaceAll("(\\w+)\\s*\\.+(.*)", "$1 $2");
System.out.println(input);
Here is an explanation of the pattern being used:
(\\w+) match AND capture one or more word characters
\\s* match zero or more whitespace characters
\\.+ match one or more literal dots
(.*) match AND capture the rest of the line
The two quantities in parentheses are known as capture groups. The regex engine remembers what these were while matching, and makes them available, in order, as $1 and $2 to use in the replacement string.
Output:
Income 4,456
Demo
Best way to do that is like:
String result = yourString.replaceAll("[-+.^:,]","");
That will replace this special character with nothing.

Java regex negative lookahead to replace non-triple characters

I'm trying to take a number, convert it into a string and replace all characters that are not a triple.
Eg. if I pass in 1222331 my replace method should return 222. I can find that this pattern exists but I need to get the value and save it into a string for additional logic. I don't want to do a for loop to iterate through this string.
I have the following code:
String first = Integer.toString(num1);
String x = first.replaceAll("^((?!([0-9])\\3{2})).*$","");
But it's replacing the triple digits also. I only need it to replace the rest of the characters. Is my approach wrong?
You can use
first = first.replaceAll("((\\d)\\2{2})|\\d", "$1");
See regex demo
The regex - ((\d)\2{2})|\d - matches either a digit that repeats thrice (and captures it into Group 1), or just matches any other digit. $1 just restores the captured text in the resulting string while removing all others.

Find substring between two strings

I have a string such as:
file = "UserTemplate324.txt"
I'd like to extract "324". How can I do this without any external libraries like Apache StringUtils?
Assuming you want "the digits just before the dot":
String number = str.replaceAll(".*?(\\d+)\\..*", "$1");
This uses regex to find and capture the digits and replace the entire input with them.
Of minor note is the use of a non-greedy quantifier to consume the minimum leading input (so as not to consume the leading part of the numbers too).
If you want to exclude the non-digit characters:
String number = file.replaceAll("\\D+", "");
\\D+ means a series of one or more non digit (0-9) characters and you replace any such series by "" i.e. nothing, which leaves the digits only.

Categories

Resources