Constructing regex in Java with variable number of certain characters in pattern - java

So a text file is given that should follow some a priori known format. I would like to check that such a text file indeed follows the format by reading each line in the text file and comparing to a regex. So, the first line in each text file is on the following format:
First character is "O" (capital o)
Characters 2-16 are numbers, with the exception of the 6:th character which is a blank space
Characters 17-30 is a decimal number, where character 28 is a decimal point
Characters 31-40 is an integer number
...
The specification continues, however I only need help with steps 3 and 4. For instance, a decimal number could be 1000.55, but in the text file it would be preceded by 7 blank spaces so that it fits the format. The same goes for step 4: if the number is 10, then this would be preceded by 8 blank spaces in the text file so that it fits.
How can I construct a regex that detects this pattern? Since the number of blank spaces may change, I am not sure. My idea was something like this:
String regex = "O[0-9]{4} [0-9]{10}[ ]*[0-9]*,[0-9]{2}"
The first letter is "O", followed by four digits, then a blank space, then 10 digits, then an unspecified number of blank spaces followed by an unspecified number of digits. Then finally decimal point and two digits. But this does not restrict the decimal number to only 14 characters! This is unfortunate, I do not think it will work.

You could match the first part for which you know the amount of occurrences.
For step 3 and 4 you could make use of positive lookaheads to assert the amount of occurrences.
In Java you could also use \h to match a horizintal whitespace char.
^O\d{4} \d{10}(?=[ \d]{11}\.) *\d*\.\d\d(?=[ \d]{10}) {0,9}\d+
In Java with the doubled backslashes:
String regex = "^O\\d{4} \\d{10}(?=[ \\d]{11}\\.) *\\d*\\.\\d\\d(?=[ \\d]{10}) {0,9}\\d+";
^O Match O at the start of the string
\d{4} \d{10} Match 4 digits, a space and 10 digits
(?=[ \d]{11}\.)
*\d*\.\d\d Match optional spaces . and 2 digits (If only .22 should also match)
(?=[ \d]{10}) Positive lookahead, assert 10 occurrences of either a space or digit to the right from the current position
{0,9}\d+ Match 0-9 spaces and 1+ digits
Regex demo
If the length of the string is a total of 40 characters, you can use a single lookahead (?=[ \d]{11}\.) because the string length is 40 characters.
^O(?=[\d .]{39}$)\d{4} \d{10}(?=[ \d]{11}\.) *\d*\.\d\d *\d+$
Regex demo

Related

Regex for a random set of chars and digits

I am looking for a regex that matches only when it sees a string that is randomly filled by digits and chars.
For example, adfak332arg3 is allowed but 332352 and fagaaah are not allowed. .*[^\\s] looks fine for strings with only chars but how to fix it to accepts the desired strings and refuses the other two types?
Use a positive lookahead (?=) to ensure that the string contains required characters.
^(?=.*[a-zA-Z])(?=.*\d)[a-zA-Z\d]+$
Test this regex pattern here.
You can try this regex
"[\\d\\w]*\\d\\w[\\d\\w]*|[\\d\\w]*\\w\\d[\\d\\w]*"
If you need just a mixed string of characters A-Z, a-z and 0-9 you can use:
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])$
If you want to force the string to have a minimum number of characters in your string you can use (e.g. minimum 8 in the string):
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9]).{8,}$
If you want to have a string length from min-length to max-length then use (e.g. string of at least 5 characters and max 20 characters):
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9]).{5,20}$
To ensure that an input contains digits as well as characters, you could use this regex:
^(?:[A-Za-z]+\\d+|\\d+[A-Za-z]+)[A-Za-z\\d]*$
The regex ensures that the input contains at least a number and a character, and allows only numbers or characters (no special characters etc.)
(?:[A-Za-z]+\d+|\d+[A-Za-z]+) ensures that it starts with one or more characters followed by digits or alternatively |\d+[A-Za-z]+ one or more digits followed by one or more characters
[A-Za-z\d]* allows any number of characters or digits after the previous check
^ and $ to match starting and ending anchor
Regex101 Demo
Hope this helps!
Try this Regex.
[A-z][0-9]|[0-9][A-z]

Java Regular expression for replacing specific strings

I want to replace numbers in a string if it is more than 3 digits (Phone numbers should be replaced) and it should not replace the number if it is followed by $ and if the number has decimal points. I used the below expression.
"\d{3,}+(?!\$/\.)"
Issues I face are , it is replacing numbers that are more than ten digits as i want to replace some numbers which are some ID's with more than 10 digits. Also if a number has more than 3 digits after the decimal , those numbers are also getting replaced. I dont want a number to be replaced if it has decimal points. can some body help?
For Eg, say a number string "3452678916381914". Actually it has to be replaced. But the above regex not replacing that. For numbers like $1234,45.567 - those numbers shouldn't be replaced. But above regex replacing 45.567
use lookahead and lookbehind regex, 1st assert start word boundary is not precede by a $ or ., then assert end word boundary is not follow by a $ or .
It works for both example you provided, you might need to tweak a little bit to handle some corner case
(?<![\$\.])\b\d{3,}\b(?![\$\.])
see demo, it match the first 2 but not the rest
3452678916381914 # match
1234 56789 # match
$1234,45.567
$1234
12.345
12345.6678
123$

regular expression not containing two or more consecutive comma or hyphen and between numbers at most one hyphen

I have scenario to validate number range, it can be multiple range or single number.
Ex number 1 to 10 can be written as (1-3,4,5-8,9,10) here 1-3 indicates rage (1,2,3) I have tried java regex :
Pattern.matches("^[0-9][0-9,-]*,[0-9,-]*[0-9]$","11,131-132-134,45,12--10,,10");
this pattern allows consecutive hyphen and comma,
Valid Input
1) 1-3,4,5-8,9,10
2) 1-3,4-5,6-10
3) 1,2,3,4,5
4) 1,2-5,6
Invalid Input
1) ,2,3,4-10,
2) -2,3,4-10-
3) 2,3,,4-10
4) 2,3,4--10
5) 2,3,4-6-10 (Invalid range)
can someone suggest how to check the comma and hyphen should not appear two times consecutively, start and end with number, range should not repeat (4-8-10)
This should be the regex you want:
^\\d+(-\\d+)?(,\\d+(-\\d+)?)*$
It checks for the following sequence:
\\d+ : One or more digits
(-\\d+)? : An optional sequence of hyphen followed by one or more digits
(,\\d+(-\\d+)?)* : Zero or more occurrence of a comma followed by one or more digits followed by an optional sequence of hyphen followed by one or more digits
As the regex looks for a digit at the beginning, a string starting with hyphen or comma will not be allowed.
As it looks for a digit to be immediately followed by a hyphen and comma, a string having consecutive hyphens or commas, a hyphen immediately followed by a comma or the reverse would not be allowed.
As the ? in (-\\d+)?allows exactly zero or one occurrence of the (-\\d+) sequence, a range like 1-2-3 will not be matched.
If you don't need to allow a single number alone, replace the * in ^\\d+(-\\d+)?(,\\d+(-\\d+)?)*$ with +.
A repeated group is a simple way to validate a string of a repeating sequence:
(?:\\d+(?:-\\d+)?,)+(?:\\d+(?:-\\d+)?$)
Live demo

I need help on regular expression to allow number with character

condition:
123 not valid
123 A valid
abc123 valid
abc123Ab valid
I have to apply regular expression compulsory character with number?
This will match any string starting with an optional set of digits followed by a combination of white spaces, letters and digits. But it still matches 123_ (that's 123 followed by a space `)
^\d*[\sa-zA-Z0-9]+$
The following will check if you have at least one letter in your string combined with optional digits, white spaces and letters.
[a-zA-Z\s\d]*[a-zA-Z]+?[a-zA-Z\s\d]*
[a-zA-Z\s\d] match a single character present in [].
Quantifier * : Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
(([a-zA-Z\s])*(\d{1,})([a-zA-Z\s]){1,}|([a-zA-Z\s]){1,}(\d{1,})([a-zA-Z\s])*)
first part of this expression will ensure string can start without any letters but atleast 1 digit must be present and should end with 1 or many letters. second part will ensure string can start with atleast 1 letter followed by atleast 1 digit and then followed by 0 or any number of letters.

Using Regex, is it possible to use an expression such as 'Followed by' or 'Preceded by'

I have the following expression where i want to extract an identifier that is 12 digits long:
([12]\d{3})(\d{6})(\d{2})
This works fine if the string is in the following format:
ABCD123456789101
123456789101
When it gets a string like the following, how does it know which 12 digits to match on:
ABCD1234567894837376383439434343232
1234567894837376383439434343232
In the above scenario, i dont want to select the twelve digits. So the answer i think is to only select the twelve digits, if those twelve digits are not preceded or proceeded by other digits. I tried this change:
[^0-9]([12]\d{3})(\d{6})(\d{2})[^0-9]
This basically says get me the 12 digits only if the characters before and after the 12 digits are non numeric. The problem i have is i am also getting those non-numeric characters as part of the match i.e.
ABCD123456789483X7376383439434343232 returns D123456789483X
Is there anyway of checking what the preceding and proceeding characters are but not include them in the match result? i.e. only match if the preceding and proceeding characters are non numeric but don't include those non-numeric characters in the match result.
You can use lookarounds:
(?<!\\d)([12]\d{3})(\d{6})(\d{2})(?!\\d)
Here:
(?<!\\d) is a negative lookbehind which means your pattern is not preceded by a digit
(?!\\d) is a negative lookahead which means your pattern is not followed by a digit
Read more about lookarounds

Categories

Resources