Java Regular expression for replacing specific strings

Java Regular expression for replacing specific strings - java

I want to replace numbers in a string if it is more than 3 digits (Phone numbers should be replaced) and it should not replace the number if it is followed by $ and if the number has decimal points. I used the below expression.
"\d{3,}+(?!\$/\.)"
Issues I face are , it is replacing numbers that are more than ten digits as i want to replace some numbers which are some ID's with more than 10 digits. Also if a number has more than 3 digits after the decimal , those numbers are also getting replaced. I dont want a number to be replaced if it has decimal points. can some body help?
For Eg, say a number string "3452678916381914". Actually it has to be replaced. But the above regex not replacing that. For numbers like $1234,45.567 - those numbers shouldn't be replaced. But above regex replacing 45.567

use lookahead and lookbehind regex, 1st assert start word boundary is not precede by a $ or ., then assert end word boundary is not follow by a $ or .
It works for both example you provided, you might need to tweak a little bit to handle some corner case
(?<![\$\.])\b\d{3,}\b(?![\$\.])
see demo, it match the first 2 but not the rest
3452678916381914 # match
1234 56789 # match
$1234,45.567
$1234
12.345
12345.6678
123$

Related

Constructing regex in Java with variable number of certain characters in pattern

So a text file is given that should follow some a priori known format. I would like to check that such a text file indeed follows the format by reading each line in the text file and comparing to a regex. So, the first line in each text file is on the following format:
First character is "O" (capital o)
Characters 2-16 are numbers, with the exception of the 6:th character which is a blank space
Characters 17-30 is a decimal number, where character 28 is a decimal point
Characters 31-40 is an integer number
...
The specification continues, however I only need help with steps 3 and 4. For instance, a decimal number could be 1000.55, but in the text file it would be preceded by 7 blank spaces so that it fits the format. The same goes for step 4: if the number is 10, then this would be preceded by 8 blank spaces in the text file so that it fits.
How can I construct a regex that detects this pattern? Since the number of blank spaces may change, I am not sure. My idea was something like this:
String regex = "O[0-9]{4} [0-9]{10}[ ]*[0-9]*,[0-9]{2}"
The first letter is "O", followed by four digits, then a blank space, then 10 digits, then an unspecified number of blank spaces followed by an unspecified number of digits. Then finally decimal point and two digits. But this does not restrict the decimal number to only 14 characters! This is unfortunate, I do not think it will work.

You could match the first part for which you know the amount of occurrences.
For step 3 and 4 you could make use of positive lookaheads to assert the amount of occurrences.
In Java you could also use \h to match a horizintal whitespace char.
^O\d{4} \d{10}(?=[ \d]{11}\.) *\d*\.\d\d(?=[ \d]{10}) {0,9}\d+
In Java with the doubled backslashes:
String regex = "^O\\d{4} \\d{10}(?=[ \\d]{11}\\.) *\\d*\\.\\d\\d(?=[ \\d]{10}) {0,9}\\d+";
^O Match O at the start of the string
\d{4} \d{10} Match 4 digits, a space and 10 digits
(?=[ \d]{11}\.)
*\d*\.\d\d Match optional spaces . and 2 digits (If only .22 should also match)
(?=[ \d]{10}) Positive lookahead, assert 10 occurrences of either a space or digit to the right from the current position
{0,9}\d+ Match 0-9 spaces and 1+ digits
Regex demo
If the length of the string is a total of 40 characters, you can use a single lookahead (?=[ \d]{11}\.) because the string length is 40 characters.
^O(?=[\d .]{39}$)\d{4} \d{10}(?=[ \d]{11}\.) *\d*\.\d\d *\d+$
Regex demo

Extract exactly n digits in a sentence using REGEX

Example
The no.s 1234 65
Input: n
For n=4, the output should be 1234
For n=2, the output should be : 65 (not 12)
Tried \d{n} which gives 12 and \d{n,} gives 1234 but i want the exact matching one.
Pattern p = Pattern.compile("//\d{n,}");

you need negative lookaround assertion: (?<!..): negative look behind, and (?!..): negative look ahead : regex101
(?<!\d)\d{4}(?!\d)
however not all regex engine supports them, maybe a work around may match also the preceeding character and following character (contrary to look-around which are 0 width matches), (\D matches all excpet a digit)
(?:^|\D)(\d{4})(?:\D|$)

I think what you meant is the \b character.
Hence, the regex you're looking for would be (for n=2):
\b\d{2}\b

From what I understand, you're looking for a regex that will match a number in a string which has n digits, taking into into account the spacing between the numbers. If that's the case, you're looking for something like this:
\b\d{4}\b
The \b will ensure the match is constrained to the start/end of a 'word' where a word is the boundary between anything matched by \w (which includes digits) and anything matched by the opposite, \W (which includes spaces).

I don't code in java but I can try to answer this using regex in general.
If your number is in the format d1d2d3d4 d5d6 and you want to extract digits d5d6, create 3 groups as r'([0-9]+)("/s")([0-9]+)' – each set of parenthesis () represent one group. Now, extract the third group only in another object which is your required output.

Regex for matching different float formats

I'm looking for a regex in scala to match several floats:
9,487,346 -> should match
9.487.356,453->should match
38,4 -> match
-38,4 -> should match
-38.5
-9,487,346.76
-38 -> should match
So basically it should match a number that:
Numbered lists are easy
possibly gave thousand separators (either comma or dot)
possibly are decimal again with either comma or dot as separator
Currently I'm stuck with
val pattern="\\d+((\\.\\d{3}+)?(,\\d{1,2}+)?|(,\\d{3}+)?(\\.\\d{1,2}+)?)"
Edit: I'm mostly concered with European Notation.
Example where the current pattern not matches: 1,052,161
I guess it would be close enough to match that the String only contains numbers,sign, comma and dot

If, as your edit suggests, you are willing to accept a string that simply "contains numbers, sign, comma and dot" then the task is trivial.
[+-]?\d[\d.,]*
update
After thinking it over, and considering some options, I realize that your original request is possible if you'll allow for 2 different RE patterns, one for US-style numbers (commas before dot) and one for Euro-style numbers (dots before comma).
def isValidNum(num: String): Boolean =
num.matches("[+-]?\\d{1,3}(,\\d{3})*(\\.\\d+)?") ||
num.matches("[+-]?\\d{1,3}(\\.\\d{3})*(,\\d+)?")
Note that the thousand separators are not optional, so a number like "1234" is not evaluated as valid. That can be changed by adding more RE patterns: || num.matches("[+-]?\\d+")

Based on your rules,
It should match a number that:
Numbered lists are easy
possibly gave thousand separators (either comma or dot)
possibly are decimal again with either comma or dot as separator
Regex:
^[+-]?\d{1,3}(?:[,.]\d{3})*(?:[,.]\d+)?$
[+-]? Allows + or - or nothing at the start
\d{1,3} allows one to 3 digits
([,.]\d{3}) allows . or , as thousands separator followed by 3 digits (* allows unlimited such matches)
(?:[,.]\d+)? allows . or , as decimal separator followed by at least one digit.
This matches all of the OP's example cases. Take a look at the demo below for more:
Regex101 Demo
However one limitation is it allows . or , as thousand separator and as decimal separator and doesn't validate that if , is thousands separator then . should be decimal separator. As a result the below cases incorrectly show up as matches:
201,350,780,88
211.950.266.4
To fix this as well, the previous regex can have 2 alternatives - one to check for a notation that has , as thousands separator and . as decimal, and another one to check vice-versa. Regex:
^[+-]?\d{1,3}(?:(?:(?:\.\d{3})*(?:\,\d+)?)|(?:(?:\,\d{3})*(?:\.\d+)?))$
Regex101 Demo
Hope this helps!

Using Regex, is it possible to use an expression such as 'Followed by' or 'Preceded by'

I have the following expression where i want to extract an identifier that is 12 digits long:
([12]\d{3})(\d{6})(\d{2})
This works fine if the string is in the following format:
ABCD123456789101
123456789101
When it gets a string like the following, how does it know which 12 digits to match on:
ABCD1234567894837376383439434343232
1234567894837376383439434343232
In the above scenario, i dont want to select the twelve digits. So the answer i think is to only select the twelve digits, if those twelve digits are not preceded or proceeded by other digits. I tried this change:
[^0-9]([12]\d{3})(\d{6})(\d{2})[^0-9]
This basically says get me the 12 digits only if the characters before and after the 12 digits are non numeric. The problem i have is i am also getting those non-numeric characters as part of the match i.e.
ABCD123456789483X7376383439434343232 returns D123456789483X
Is there anyway of checking what the preceding and proceeding characters are but not include them in the match result? i.e. only match if the preceding and proceeding characters are non numeric but don't include those non-numeric characters in the match result.

You can use lookarounds:
(?<!\\d)([12]\d{3})(\d{6})(\d{2})(?!\\d)
Here:
(?<!\\d) is a negative lookbehind which means your pattern is not preceded by a digit
(?!\\d) is a negative lookahead which means your pattern is not followed by a digit
Read more about lookarounds

How can I express such requirement using Java regular expression?

I need to check that a file contains some amounts that match a specific format:
between 1 and 15 characters (numbers or ",")
may contains at most one "," separator for decimals
must at least have one number before the separator
this amount is supposed to be in the middle of a string, bounded by alphabetical characters (but we have to exclude the malformed files).
I currently have this:
\d{1,15}(,\d{1,14})?
But it does not match with the requirement as I might catch up to 30 characters here.
Unfortunately, for some reasons that are too long to explain here, I cannot simply pick a substring or use any other java call. The match has to be in a single, java-compatible, regular expression.

^(?=.{1,15}$)\d+(,\d+)?$
^ start of the string
(?=.{1,15}$) positive lookahead to make sure that the total length of string is between 1 and 15
\d+ one or more digit(s)
(,\d+)? optionally followed by a comma and more digits
$ end of the string (not really required as we already checked for it in the lookahead).
You might have to escape backslashes for Java: ^(?=.{1,15}$)\\d+(,\\d+)?$
update: If you're looking for this in the middle of another string, use word boundaries \b instead of string boundaries (^ and $).
\b(?=[\d,]{1,15}\b)\d+(,\d+)?\b
For java:
"\\b(?=[\\d,]{1,15}\\b)\\d+(,\\d+)?\\b"
More readable version:
"\\b(?=[0-9,]{1,15}\\b)[0-9]+(,[0-9]+)?\\b"

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Regular expression for replacing specific strings - java

Related

Constructing regex in Java with variable number of certain characters in pattern

Extract exactly n digits in a sentence using REGEX

Regex for matching different float formats

Using Regex, is it possible to use an expression such as 'Followed by' or 'Preceded by'

How can I express such requirement using Java regular expression?

Categories

Resources