Find a three-digit number in a string using replaceAll() - java

I have String from which I need to extract a keyword.
Something like: "I have 100 friends and 1 evil".
I need to extract "100" from that String using only replaceAll function and appropriate regex.
I tried to do it in that way:
String input = "I have 100 friends and 1 evil";
String result = input.replaceAll("[^\\d{3}]", "")
But it doesn't work. Any help would be appreciated.

You can consider any of the solutions below:
String result = input.replaceFirst(".*?(\\d{3}).*", "$1");
String result = input.replaceFirst(".*?(?<!\\d)(\\d{3})(?!\\d).*", "$1");
String result = input.replaceFirst(".*?\\b(\\d{3})\\b.*", "$1");
String result = input.replaceFirst(".*?(?<!\\S)(\\d{3})(?!\\S).*", "$1");
See the regex demo. NOTE you may use replaceAll here, too, but it makes little sense as the replacement must occur only once in this case.
Here,
.*? - matches any zero or more chars other than line break chars, as few as possible
(\d{3}) - captures into Group 1 any three digits
.* - matches any zero or more chars other than line break chars, as many as possible.
The (?<!\d) / (?!\d) lookarounds are digit boundaries, there is no match if the sequence is four or more digits. \b are word boundaries, there will be no match of the three digits are glued to a letter, digit or underscore. (?<!\S) / (?!\S) lookarounds are whitespace boundaries, there must be a space or start of string before the match and either a space or end of string after.
The replacement is $1, the value of Group 1.
See the Java demo:
String input = "I have 100 friends and 1 evil";
System.out.println(input.replaceFirst(".*?(\\d{3}).*", "$1"));
System.out.println(input.replaceFirst(".*?(?<!\\d)(\\d{3})(?!\\d).*", "$1"));
System.out.println(input.replaceFirst(".*?\\b(\\d{3})\\b.*", "$1"));
System.out.println(input.replaceFirst(".*?(?<!\\S)(\\d{3})(?!\\S).*", "$1"));
All output 100.

Related

How to match all combinations of numbers in a string that do not start with an English letter in regular matching in Java

I have a String like
String str = "305556710S or 100596269C OR CN111111111";
I just want to match the characters in this string that start with numbers or start with numbers and end with English letters,
Then prefix the matched characters add with two "??" characters.
I write a Patern like
Pattern pattern = Pattern.compile("^[0-9]{1,10}[A-Z]{0,1}", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
int start = matcher.start();
int end = matcher.end();
String matchStr = matcher.group();
System.err.println(matchStr);
}
But it can only match the first character "305556710S".
But If I modify the Pattern
Pattern pattern = Pattern.compile("[0-9]{1,10}[A-Z]{0,1}", Pattern.CASE_INSENSITIVE);
It will matches "305556710S","100596269C","111111111".But the prefix of "111111111" is English character "CN" which is not my goal.
I only want match the "305556710S" and "100596269C" and add two "??" characters before the matched Characters.Can somebody help me ?
First, you should avoid the ^ in this particular regexp. As you noticed, you can't return more than one result, as "^" is an instruction for "match the beginning of the string"
Using \b can be a solution, but you may get invalid results. For example
305556710S or -100596269C OR CN111111111
The regexp "\\b[0-9]{1,10}[A-Z]{0,}\\b" will match 100596269C (because the hyphen is not word character, so there is a word boundary between - and 1)
The following regexp matches exactly what you want: all numbers, that may be followed by some English chars, either at the beginning of the string or after a space, and either followed by a space or at the end of the string.
(?<=^| )[0-9]{1,10}[A-Z]*(?= |$)
Explanations:
(?<=^| ) is a lookbehind. It makes sure that there is either ^ (string start) or a space behind actual location. Note that lookbehinds don't add matching chars to the result: the space won't be part of the result
[0-9]{1,10}[A-Z]* matches digits (at least one, up to ten), then one or more letters.
(?= |$) is a lookahead. It makes sure that there will be either a space or $ (end of string) after this match. Like lookbehinds, the chars aren't added to the results and position remains the same : the space read here for example can also be read by the lookbehind of the next captured string
Examples : 305556710S or 100596269C OR CN111111111 matches: at index 0 [305556710S], at index 15 [100596269C]; 100596269C123does not match.
I think you need to use word boundaries \b. Try this changed pattern:
"\\b[0-9]{1,10}[A-Z]{0,1}\\b"
This prints out:
305556710S
100596269C
Why it works:
The difference here is that it will check only those character sequences that are within a pair of word boundaries. In the earlier pattern you used, a character sequence even from the middle of a word may be used to match against the pattern due to which even 11111... from CN1111... was matched against the pattern and it passed.
A word boundary also matches the end of the string input. So, even if a candidate word appears at the end of the line, it will get picked up.
If more than one English alphabet can come at the end, then remove the max occurrence indicator, 1 in this case:
"\\b[0-9]{1,10}[A-Z]{0,}\\b"

Java int to fraction

How can i change 4 -1/4 -5 to 4/1 -1/4 -5/1 using regex?
String str = "4 -1/4 -5";
String regex = "(-?\\d+/\\d+)";
Matcher matcher = Pattern.compile(regex).matcher(str);
My code finding only fraction but i want to find integer without fraction.
String result = str.replaceAll("(?<!/)\\b\\d+\\b(?!/)", "$0/1");
looks for entire numbers (\b\d+\b), not preceded by ((?<!/)) nor followed by a slash ((?!/)), and adds /1 to them.
Try (?<=-| |^)(\d+)(?!\d*\/)
Explanation:
(?<=...) - positive lookahead, assert, what precedes matches pattern inside
-| |^ - match either -, , or beginning of a line ^
(\d+) - match one or more digits and store in first capturing group
(?!\d*\/) - negative lookahead, assert what follows is not zero or mroe digits followed by \/.
Replace it with \1/1, so first capturing group followed by /1
Demo
I'm not sure I understand what you want to do here, but if you want to remove the slashes you can use:
str.replaceAll("\\/", " ");
This will leave you with a string having only the integers.

Modifying part of a regex in replaceAll call

I am trying to format a string with a regex as follows:
String string = "5.07+12.0+2.14";
string = string.replaceAll("\\.0[^0-9]","");
What I think will happen is the string will become:
5.07+122.14 //the regex will delete the .0+ next to the 12
How can I create the regex so that it deletes only the .0 not the + sign?
I would prefer to do everything in the same call to "replaceAll"
thanks for any suggestions
Matched characters will be replaced. So, instead of matching the non-digit at the end, you can use lookahead, which will perform the desired check but won't consume any characters. Also, the shorthand for a non-digit is \D, which is a bit nicer to read than [^0-9]:
String string = "5.07+12.0+2.14";
string = string.replaceAll("\\.0(?=\\D)","");
If you want to replace all trailing zeros (for example, replace 5.00 with 5 instead of 50, which you probably don't want), then repeat the 0 one or more times with + to ensure that all zeros after the decimal point get replaced:
String string = "5.07+12.000+2.14";
string = string.replaceAll("\\.0+(?=\\D)","");
If the string never contains alphabetical or underscore _ characters (those and numeric characters count as word characters), then you can make it even prettier with a word boundary instead of a lookahead. A word boundary, as it sounds, will match a position with a word character on one side and a non-word character on the other side, with \b:
string = string.replaceAll("\\.0+\\b","");

Merge multiple regex in Java

I have written a regex to omit the characters after the first occurrence of some characters (, and #)
String number = "(123) (456) (7890)#123";
number = number.replaceAll("[,#](.*)", ""); //This is the 1st regex
Then a second regex to get only numbers (remove spaces and other non numeric characters)
number = number.replaceAll("[^0-9]+", ""); //This is the 2nd regex
Output: 1234567890
How can I merge the two regex into one like piping the O/p from first regex to the second.
You can combine both regex in the following way.
String number = "(123) (456) (7890)#123";
number = number.replaceAll("[,#](.*)", "").replaceAll("[^0-9]+", "");
So you need to remove all symbols other than digits and the whole rest of the string after the first hash symbol or a comma.
You cannot just concatenate the patterns with |operator because one of the patterns is anchored implicitly at the end of the string.
You need to remove any symbols but digits AND hashes with commas first since the tegex engine processes the string from left to right and then you can add the alternative to match a comma or hash with any text after them. Use DOTALL modifier in case you have newline symbols in your input.
Use
 (?s)[,#].*$|[^#,0-9]+

How to work with regex to check a content of String

i need to check if a string have in your content minimum two commas and maximum three commas and one hyphen. I'm trying to make a regex to validate this String.
Ex:
String address = "Av. Rocio, 45, - Center";
String regex = "//,{2,3}|-{1}";
boolean isValid = address.matches(regex);
But don't working, always return false, what i did wrong? Thanks!
To match a string that has ONLY 2 or 3 commas and not more than 1 hyphen, use:
String regex = "(?s)^(?=([^,]*,){2,3}[^,]*$)(?=[^-]*-[^-]*$).*";
The matches method requires a full string match, thus, we need to add .*.
Note that {1} limiting quantifier is redundant, as - will match exactly 1 hyphen.
See IDEONE demo.
The regex (where . matches a newline due to (?s) inline dotall modifier) matches:
^ - start of string
(?=([^,]*,){2,3}[^,]*$) - Lookahead that checks the presence of 2 or 3 commas
(?=[^-]*-[^-]*$) - lookahead that requires only 1 hyphen to be in the string
.* - match all the string if the 2 conditions above are satisfied.

Categories

Resources