I am learning regex and java and working on a problem related to that.
I have an input string which can be any dollar amount like
$123,456.78
$0012,345.67
$123,04.56
$123,45.06
it also could be
$0.1
I am trying to find if the dollar amount has leading zeros and trying to remove it.
so far I have tried this
string result = input_string.replaceAll(("[!^0]+)" , "");
But I guess I'm doing something wrong.
I just want to remove the leading zeros, not the ones between the amount part and not the one in cents. And if the amount is $0.1, I don't want to remove it.
Match zeroes or commas that are preceded by a dollar sign and followed by a digit:
str = str.replaceAll("(?<=\\$)[0,]+(?=\\d)", "");
See live demo.
This covers the edge cases:
$001.23 -> $1.23
$000.12 -> $0.12
$00,123.45 -> $123.45
$0,000,000.12 -> $0.12
The regex:
(?<=\\$) means the preceding character is a dollar sign
(?=\\d) means the following character is a digit
[0,]+ means one or more zeroes or commas
To handle any currency symbol:
str = str.replaceAll("(?<=[^\\d,.])[0,]+(?=\\d)", "");
See live demo.
You can use
string result = input_string.replaceFirst("(?<=\\p{Sc})[0,]+(?=\\d)", "");
See the regex demo.
Details:
(?<=\p{Sc}) - immediately on the left, there must be any currency char
[0,]+ - one or more zeros or commas
(?=\d) - immediately on the right, there must be any one digit.
This is a simple regex that fits your needs:
str = str.replaceAll("^[$][,0]+0(?![.])", "$");
Related
I am facing problems when trying to split a String by "..."
String text ="Here…It is safer.";
I tried:
String [] output = text.split("[\\...]");
String [] output = text.split("\\.");
and many others, but I haven't found the solution yet.
I know that the question is very simple, but I will be happy If somebody can explain how should I make it work.
Regex for matching three dots is \\.{3} or \\.\\.\\. or [.][.][.] or \\Q...\\E.
Both [\\...] and \\. match a single dot, because repeated characters inside a character class are treated as a single character.
Horizontal ellipsis is a different character. It is not a metacharacter in regex language, so it can be matched directly with no escaping:
String [] output = text.split("…");
In general, you can use
String[] chunks = text.split("…|\\.{3}");
To also remove the enclosing whitespace:
String[] chunks = text.split("\\s*(?:…|\\.{3})\\s*");
See this regex demo.
If you need to make sure the triple dots are NOT enclosed with other dot chars, you can add lookarounds:
String[] chunks = text.split("\\s*(?:…|(?<!\\.)\\.{3}(?!\\.))\\s*");
Details:
\s* - zero or more whitespaces
(?:...) - a non-capturing group
… - an ellipsis
| - or
(?<!\.) - a negative lookbehind that fails the match if there is a dot char immediately to the left of the current location
\.{3} - triple dots
(?!\.) - a negative lookahead that fails the match if there is a dot char immediately to the right of the current location.
See a Java demo:
String text = "Here…It is safer... The end.";
String[] chunks = text.split("\\s*(?:…|\\.{3})\\s*");
System.out.println(Arrays.toString(chunks));
// => [Here, It is safer, The end.]
Regex for multiple dots would be:
(\.)*
Java would require something like this if I remember correct:
(\\.)*
Edit: Just noticed you asked for triple dot only. Since there is a correct answer already I'm going to leave this here just in case.
I have written a regex to omit the characters after the first occurrence of some characters (, and #)
String number = "(123) (456) (7890)#123";
number = number.replaceAll("[,#](.*)", ""); //This is the 1st regex
Then a second regex to get only numbers (remove spaces and other non numeric characters)
number = number.replaceAll("[^0-9]+", ""); //This is the 2nd regex
Output: 1234567890
How can I merge the two regex into one like piping the O/p from first regex to the second.
You can combine both regex in the following way.
String number = "(123) (456) (7890)#123";
number = number.replaceAll("[,#](.*)", "").replaceAll("[^0-9]+", "");
So you need to remove all symbols other than digits and the whole rest of the string after the first hash symbol or a comma.
You cannot just concatenate the patterns with |operator because one of the patterns is anchored implicitly at the end of the string.
You need to remove any symbols but digits AND hashes with commas first since the tegex engine processes the string from left to right and then you can add the alternative to match a comma or hash with any text after them. Use DOTALL modifier in case you have newline symbols in your input.
Use
(?s)[,#].*$|[^#,0-9]+
I have numbers like this that need leading zero's removed.
Here is what I need:
00000004334300343 -> 4334300343
0003030435243 -> 3030435243
I can't figure this out as I'm new to regular expressions. This does not work:
(^0)
You're almost there. You just need quantifier:
str = str.replaceAll("^0+", "");
It replaces 1 or more occurrences of 0 (that is what + quantifier is for. Similarly, we have * quantifier, which means 0 or more), at the beginning of the string (that's given by caret - ^), with empty string.
Accepted solution will fail if you need to get "0" from "00". This is the right one:
str = str.replaceAll("^0+(?!$)", "");
^0+(?!$) means match one or more zeros if it is not followed by end of string.
Thank you to the commenter - I have updated the formula to match the description from the author.
If you know input strings are all containing digits then you can do:
String s = "00000004334300343";
System.out.println(Long.valueOf(s));
// 4334300343
Code Demo
By converting to Long it will automatically strip off all leading zeroes.
Another solution (might be more intuitive to read)
str = str.replaceFirst("^0+", "");
^ - match the beginning of a line
0+ - match the zero digit character one or more times
A exhausting list of pattern you can find here Pattern.
\b0+\B will do the work. See demo \b anchors your match to a word boundary, it matches a sequence of one or more zeros 0+, and finishes not in a word boundary (to not eliminate the last 0 in case you have only 00...000)
The correct regex to strip leading zeros is
str = str.replaceAll("^0+", "");
This regex will match 0 character in quantity of one and more at the string beginning.
There is not reason to worry about replaceAll method, as regex has ^ (begin input) special character that assure the replacement will be invoked only once.
Ultimately you can use Java build-in feature to do the same:
String str = "00000004334300343";
long number = Long.parseLong(str);
// outputs 4334300343
The leading zeros will be stripped for you automatically.
I know this is an old question, but I think the best way to do this is actually
str = str.replaceAll("(^0+)?(\d+)", "$2")
The reason I suggest this is because it splits the string into two groups. The second group is at least one digit. The first group matches 1 or more zeros at the start of the line. However, the first group is optional, meaning that if there are no leading zeros, you just get all of the digits. And, if str is only a zero, you get exactly one zero (because the second group must match at least one digit).
So if it's any number of 0s, you get back exactly one zero. If it starts with any number of 0s followed by any other digit, you get no leading zeros. If it starts with any other digit, you get back exactly what you had in the first place.
Here is the simple and proper solution.
str = str.replaceAll(/^0+/g, "");
Global Flag g is required when using replaceAll with regex
This is what I am after:
Replace all characters that are not digits and not semicolon ; with nothing: "".
Numbers must be at least 5 digits long.
Trim leading and trailing semicolon ;
So:
567834 is valid
123456;654321;3456789 is valid
123;456 is not valid(too short numbers), will be replaced with empty string ""
;123456; will be trimmed to 123456
;567890 will be trimmed to 567890
456789; will be trimmed to 456789
I was thinking of using replaceAll method to do the work.
str.replaceAll("(\\d+\\;?)*\\d+", "");
But this doesn't take care of trimming leading and trailing semicolons and doesn't replace too short numbers with "".
Any help is appreciated!
I'd recommend breaking the problem into steps. This is an easy problem if you do. A single regex will be challenging, both to develop today and to read for every day after. Readable, easily understandable code should be your objective.
String trimmedStr = str.trim();
String noSemicolons = trimmedStr.replaceAll(";", "");
Matcher matcher = Pattern.compile("^\d{5,}$").matcher(noSemicolons);
boolean isValid = matcher.matches();
You can use:
String repl = input.replaceAll(";?\\b(\\d{5,})\\b;?|[\\d;]*", "$1");
RegEx Demo
You can use this replacement:
String result = input.replaceAll("(\\d{5,})|\\d{1,4}(?:;+|\\z)|;+\\d{0,4}\\z|\\A;", "$1");
The idea is to preserve numbers with at least 5 digits first in a capture group (because the first branch on the left that succeeds wins). Other branches describes what you need to remove.
An other way:
String result = input.replaceAll("((?:\\d{5,}(?:;(?!\\z))?)*+)(?:;*\\d{0,4}(?:;+|\\z))++", "$1");
This one describes the string as a succession of parts to remove preceded by an optional part to preserve.
I have a string such as:
file = "UserTemplate324.txt"
I'd like to extract "324". How can I do this without any external libraries like Apache StringUtils?
Assuming you want "the digits just before the dot":
String number = str.replaceAll(".*?(\\d+)\\..*", "$1");
This uses regex to find and capture the digits and replace the entire input with them.
Of minor note is the use of a non-greedy quantifier to consume the minimum leading input (so as not to consume the leading part of the numbers too).
If you want to exclude the non-digit characters:
String number = file.replaceAll("\\D+", "");
\\D+ means a series of one or more non digit (0-9) characters and you replace any such series by "" i.e. nothing, which leaves the digits only.