Regular expression in Java for matching series of integers - java

I'm trying to write a regular expression in Java to match strings that look like
(1, 2, 3, 4, 5, 6)
That is, a left parenthesis, followed by a nonzero amount of nonnegative integers (separated by a comma and then any amount of whitespace), and ending with a right parenthesis.
I've tried
([0-9]+,\s+)
Does anyone know how to write such a regular expression?

You can try this pattern:
Pattern pattern = Pattern.compile("\\(\\d+(,\\s*\\d+)*\\)");
\\d: a digit (0 to 9)
\\s: a whitespace character
+: one or more occurrences
*: zero or more occurrences
See http://regex101.com/r/wT5wX7/1
Your regex ([0-9]+,\s+) is close somehow to matching the input string but the comma has only one occurrence (you'd expect zero or more commas), and it should be followed by digits, not just whitespace.

Use this: \(([0-9]+[\,]{1}[\s]*)+[0-9]+\)
Edit: \(([0-9]+[\,]{1}[\s]*)*[0-9]+\) - also matches (1)

Something like this could possibly help you.
(\(){1}(\d+,[ ]+)+(\)){1}
or with the leading and trailng /
/(\(){1}(\d+,[ ]+)+(\)){1}/
The method you tried ([0-9]+,\s+)
is saying that you can have as many digits as you would like, followed by a comma, followed by white space. in your attempt did you account for multiple digits followed by commas, or the leading and trailing parenthesizes.

Related

Extract exactly n digits in a sentence using REGEX

Example
The no.s 1234 65
Input: n
For n=4, the output should be 1234
For n=2, the output should be : 65 (not 12)
Tried \d{n} which gives 12 and \d{n,} gives 1234 but i want the exact matching one.
Pattern p = Pattern.compile("//\d{n,}");
you need negative lookaround assertion: (?<!..): negative look behind, and (?!..): negative look ahead : regex101
(?<!\d)\d{4}(?!\d)
however not all regex engine supports them, maybe a work around may match also the preceeding character and following character (contrary to look-around which are 0 width matches), (\D matches all excpet a digit)
(?:^|\D)(\d{4})(?:\D|$)
I think what you meant is the \b character.
Hence, the regex you're looking for would be (for n=2):
\b\d{2}\b
From what I understand, you're looking for a regex that will match a number in a string which has n digits, taking into into account the spacing between the numbers. If that's the case, you're looking for something like this:
\b\d{4}\b
The \b will ensure the match is constrained to the start/end of a 'word' where a word is the boundary between anything matched by \w (which includes digits) and anything matched by the opposite, \W (which includes spaces).
I don't code in java but I can try to answer this using regex in general.
If your number is in the format d1d2d3d4 d5d6 and you want to extract digits d5d6, create 3 groups as r'([0-9]+)("/s")([0-9]+)' – each set of parenthesis () represent one group. Now, extract the third group only in another object which is your required output.

Using Regex, is it possible to use an expression such as 'Followed by' or 'Preceded by'

I have the following expression where i want to extract an identifier that is 12 digits long:
([12]\d{3})(\d{6})(\d{2})
This works fine if the string is in the following format:
ABCD123456789101
123456789101
When it gets a string like the following, how does it know which 12 digits to match on:
ABCD1234567894837376383439434343232
1234567894837376383439434343232
In the above scenario, i dont want to select the twelve digits. So the answer i think is to only select the twelve digits, if those twelve digits are not preceded or proceeded by other digits. I tried this change:
[^0-9]([12]\d{3})(\d{6})(\d{2})[^0-9]
This basically says get me the 12 digits only if the characters before and after the 12 digits are non numeric. The problem i have is i am also getting those non-numeric characters as part of the match i.e.
ABCD123456789483X7376383439434343232 returns D123456789483X
Is there anyway of checking what the preceding and proceeding characters are but not include them in the match result? i.e. only match if the preceding and proceeding characters are non numeric but don't include those non-numeric characters in the match result.
You can use lookarounds:
(?<!\\d)([12]\d{3})(\d{6})(\d{2})(?!\\d)
Here:
(?<!\\d) is a negative lookbehind which means your pattern is not preceded by a digit
(?!\\d) is a negative lookahead which means your pattern is not followed by a digit
Read more about lookarounds

Java regular expression to validate numeric comma separated number and hyphen

Valid1: 2
valid2: 3-5
Valid3: 2,4-6
valid4: 2,4,5
valid5: 2-7,8-9
Valid4: 2,5-7,9-13,15,17-20
All the expression on the above should be valid in one regex.
the digit in the left side of hyphen should be smaller than right hand side.
First, as #MikeFHay suggested above, regex were not made to check if one digit is bigger than the other (for that you'll have to parse the expression). If we'll ignore that requirement - the rest can be achieved via the following regex:
((\d\,(?=\d))|(\d\-(?=\d))|\d)+
in Java:
"((\\d\\,(?=\\d))|(\\d\\-(?=\\d))|\\d)+"
Explanation:
This regex uses lookahead to validate that each comma or dash is preceded and followed by a digit: (\d\,(?=\d)) so that each "substring" that contains a dash/comma will have to be in the format of: digit,digit or digit-digit.
Of course that a number that doesn't contain commas/dashes is also valid - hence the rightmost side of the or which is simply a \d
Link to online demo

Regex expression for timestamp with or without leading zero?

I'm struggling with Regex.
This is a sample timestamp: 00:00:00.00 (Hour, Minutes, Second.Decimal). I also want this value to match 00:0:0.00 Notice that the leasing zero is optional in the center.
I was using this: [1-60]:[1-60]:[1-60].[1-100], but that requires no leading zero. I would like help with making a SINGLE regex that works for both of the things listed above.
A complete solution would be fantastic, but if you could just point me in the right direction, that would be helpful as well.
Your solution won't actually match what you've described; it will only match a single digit in the sequence 0123456 in each position. You probably want something like
[0-5]?\d:[0-5]?\d:[0-5]?\d\.\d{1,2}
Your pattern has a number of other problems. [1-60] is a character class. It will match a single 1, 2, 3, 4, 5, 6, or 0 character. Secondly, the . in your pattern matches any character not just a literal ..
I think what you're looking for is something like this instead:
\d{1,2}:\d{1,2}:\d{1,2}\.\d{1,2}
This will match any one or two digits, followed by a literal :, followed by any one or two digits, followed by a literal :, followed by any one or two digits, followed by a literal ., followd by any one or two digits.
Or to check match only particular ranges of each time component, you can use a pattern like what chrylis suggests, although I'd generally recommend actually parsing the time value if you really need to do this.
Another option you could do:
(?:\d{1,2}:){2}\d{1,2}\.\d{1,2}
Regular expression:
(?: group, but do not capture (2 times):
\d{1,2} digits (0-9) (between 1 and 2 times)
: ':'
){2} end of grouping
\d{1,2} digits (0-9) (between 1 and 2 times)
\. '.'
\d{1,2} digits (0-9) (between 1 and 2 times)

Java regex split with any number of asterisks

I am learning regex (with this site) and trying to figure out how to parse the following string 1***2 to give me [1,2] (without using a specific case for 3 asterisk). There can be any number of asterisks that I need to split as one delimiter, so I am looking for the * char followed by the * wildcard. The delimiters could be letters as well.
The output should only only be numbers so I use ^-^0-9 to split by everything else.
So far I have tried:
input.split("[^-^0-9]"); // Gives me [1, , ,2]
input.split("[^-^0-9\\**]"); // Gives me [1***2]
input.split("[^-^0-9+\\**]"); // Gives me [1***2]
\* does not work as it is not recognized as a valid escape character.
Thanks!
You are looking for
input.split("[*]+");
This splits the string on one or more consecutive asterisks.
To allow other characters (e.g. letters) within delimiters, add them to the [*] character class.
If the delimiters could be letter..
you can use
\D+
OR
[^\d]+

Categories

Resources