Java specific regex - java

I want to write some code that checks whether the input from user matches a specific format. The format is 's' followed by a space and a number, if the input was "s 1 2 3 4", it would call some function. "s 1 2" would also be acceptable. So far I found that this regex works for a specified amount of times:
if (inputLine.matches("s \\d+ \\d+")) { }
works for 2 numbers after the s, but I need to be able to accept any number of numbers after the s.
Any idea on a regex that would suit my needs? Thank you

Change your regex to
if (inputLine.matches("s(?: \\d+)+")) { }
to match s, space and 1+ sequences of a space followed with 1+ digits.
If you allow 0 numbers after s, replace the last + quantifier with * to match zero or more occurrences.
Since the repeated capturing groups overwrite the group contents, it makes no sense using a capturing group here, thus, I suggest using a non-capturing one, (?:...).

Related

Regex: Match wildcard followed by variable length of digits

I'm trying to extract the personal number from a stringlike Personal number: 123456 with the following regex:
(Personal number|Personalnummer).*(\d{2,10})
When trying to get the second group, it will only contain the last 2 digits of the personal number. If I change the digit range to {3,10} it will match the last 3 digits of the personal number.
Now I cannot just add the whitespaces as additional group, because I cannot be sure that there will be always whitespaces - there might be none or some other characters, but the personal number will be always at the end.
Is there anyway I could instruct the Parser to get the whole digit string?
.* is working as greedy quantifier for the regex. It ends up eating all the matching characters except the last 2 that it has to leave to match the string.
You have to make it reluctant by applying ?. Like below
(Personal number|Personalnummer).*?(\d{2,10})
Now it should work perfectly.
You can also convert the first group into a non capturing group, then you'll get only the number that you want in the answer like below.
(?:Personal number|Personalnummer).*?(\d{2,10})
Use a reluctant quantifier on the wildcard match (eg *?). For instance .*? will result in the full numeric expression:
Pattern p = Pattern.compile("(Personal number|Personalnummer).*?(\\d{2,10})");//note the ?
Matcher m = p.matcher("Personal number: 123456");
if ( m.find() ){
System.out.println(m.group(2));
}

Regex to allow only 10 or 16 digit comma separated number

I want to validate a textfield in a Java based app where I want to allow only comma separated numbers and they should be either 10 or 16 digits. I have a regex that ^[0-9,;]+$ to allow only numbers, but it doesn't work for 10 or 16 digits only.
You can use {n,m} to specify length.
So matching one number with either 10 or 16 digits would be
^(\d{10}|\d{16})$
Meaning: match for exactly 10 or 16 digits and the stuff before is start-of-line and the stuff behind is end-of-line.
Now add separator:
^((\d{10}|\d{16})[,;])*(\d{10}|\d{16})$
Some sequences of 10-or-16 digit followed by either , or ; and then one sequece 10-or-16 with end-of-line.
You need to escape those \ in java.
public static void main(String[] args) {
String regex = "^((\\d{10}|\\d{16})[,;])*(\\d{10}|\\d{16})$";
String y = "0123456789,0123456789123456,0123456789";
System.out.println(y.matches(regex)); //Should be true
String n = "0123456789,01234567891234567,0123456789";
System.out.println(n.matches(regex)); //should be false
}
I would probably use this regex:
(\d{10}(?:\d{6})?,?)+
Explanation:
( - Begin capture group
\d{10} - Matching at least 10 digits
(?: - Begin non capture group
\d{6} - Match 6 more digits
)? - End group, mark as optional using ?
,? - optionally capture a comma
)+ - End outer capture group, require at least 1 or more to exist? (mabye change to * for 0 or more)
The following inputs match this regex
1234567890123456,1234567890
1234567890123456
1234567890
these inputs do not match
123,1234567890
12355
123456789012
You need to have both anchors and word boundaries:
/^(?:\b(?:\d{10}|\d{16})\b,?)*$/
The anchors are necessary so you don't get false positives for partial matches and the word boundaries are necessary so you don't get false positives for 20, 26, 30, 32 digit numbers.
Here is my version
(?:\d+,){9}\d+|(?:\d+,){15}\d+
Let's review it. First of all there is a problem to say: 10 or 16. So, I have to create actually 2 expressions with | between them.
Second, the expression itself. Your version just says that you allow digits and commas. However this is not what you really want because for example string like ,,, will match your regex.
So, the regex should be like (?:\d+,){n}\d+ that means: sequence of several digits terminated by comma and then sequence of several digits, e.g. 123,45,678 (where 123,45 match the first part and 678 match the second part)
Finally we get regex that I have written in the beginning of my answer:
(?:\d+,){9}\d+|(?:\d+,){15}\d+
And do not forget that when you write regex in you java code you have to duplicate the back slash, like this:
Pattern.compile("\\d+,{9}\\d+|\\d+,{15}\\d+")
EDIT: I have just added non-capturing group (?: ...... )

finding repeated characters in a row (3 times or more) in a string

Here is the code for finding repeated character like A in AAbbbc
String stringToMatch = "abccdef";
Pattern p = Pattern.compile("((\\w)\\2+)+");
Matcher m = p.matcher(tweet);
while (m.find())
{
System.out.println("Duplicate character " + m.group(0));
}
Now the problem is that I want to find the characters that are repeated but 3 times or more in a row,
when I change 2 to 3 in the above code it does not work,
Can anyone help?
You shouldn't change 2 to 3 because it's the number of capture groups, not it's frequency.You can use two group references here :
"((\\w)\\2\\2)+"
But still your regex doesn't match strings like your example! Since it just match repeated characters.For that aim you can use following regex :
"((\\w)\\2+\\2)+.*"
You may use the repetation quantifier.
Pattern p = Pattern.compile("(\\w)\\1{2,}");
Matcher m = p.matcher(tweet);
while (m.find())
{
System.out.println("Duplicate character " + m.group(1));
}
Now the duplicate character is captured by index 1 not index 0 which refers the whole match. Just change the number inside the repeatation quantifier to match the char which repeats n or more times like "(\\w)\\1{5,}" ..
That original regex is flawed. It only finds "word" characters (alpha, numeric, underscore). The requirement is "find characters that repeat 3 or more times in a row." The dot is the any-character metacharacter.
(?=(.)\1{3})(\1+)
So, that will find a character that occurs 4 or more consecutive times (i.e., meets your requirement of a character that "repeats" three or more times). If you really meant "occurs," change the 3 to 2. Anyway, it does a non-consuming "zero-length assertion" before capturing any data, so should be more efficient. It will only consume and capture data once you've found your minimum requirement (a single character that repeats at least 3 times). You can then consume it with the one-or-more '+' quantifier because you know it's a match you want; further quantification is redundant--your positive lookahead has already assured (asserted) that. Your results are in capture group 2 "(\1+)" and you can refer to it as \2.
Note: I tested that with perl command-line utility, so that's the raw regex. It looks like you may need to escape certain characters prior to using it in the programming language you're using.

I need help on regular expression to allow number with character

condition:
123 not valid
123 A valid
abc123 valid
abc123Ab valid
I have to apply regular expression compulsory character with number?
This will match any string starting with an optional set of digits followed by a combination of white spaces, letters and digits. But it still matches 123_ (that's 123 followed by a space `)
^\d*[\sa-zA-Z0-9]+$
The following will check if you have at least one letter in your string combined with optional digits, white spaces and letters.
[a-zA-Z\s\d]*[a-zA-Z]+?[a-zA-Z\s\d]*
[a-zA-Z\s\d] match a single character present in [].
Quantifier * : Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
(([a-zA-Z\s])*(\d{1,})([a-zA-Z\s]){1,}|([a-zA-Z\s]){1,}(\d{1,})([a-zA-Z\s])*)
first part of this expression will ensure string can start without any letters but atleast 1 digit must be present and should end with 1 or many letters. second part will ensure string can start with atleast 1 letter followed by atleast 1 digit and then followed by 0 or any number of letters.

Reg Expression Validation on a String

Can I use Reg Expression for the following use case?
I Need to write a boolean method which takes a String parameter that should satisfy following conditions.
20 character length string.
First 9 characters will be a number
Next 2 characters will be alphabets
Next 2 characters will be a number.(1 to 31 or 99)
Next 1 character will be an alphabet
Last 6 characters will be a number.
In this, I have wrote the code for the first requirement:
[a-zA-Z0-9]{20} - This expression works well for the first case. I don't know how to write a complete reg expression to meet the entire requirement.
Please help.
Yes, it is possible to use regexes for this.
Ignore the "20 characters" part and describe a string created by concatenating 9 digits, 2 letters, 2 digits, 1 letter and another digit.
Start with the string start: ^
Then 9 digits. The \d conveniently describes the character set [0-9], so \d{9} means "nine digits"
Then 2 letters. The \w class is too broad, so stick to [a-zA-Z] for a letter.
Then another two digits. They seem to be from a restricted set, so describe the set with alternation and grouping.
Then another letter and another digit.
And, finally, you have to end at the end of the string: $
For reference, this regex means "the string is nine letters, then 12-15 or 99, then another letter":
^[a-zA-Z]{9}(1[2-5]|99)[a-zA-Z]$
Read the String JavaDocs, especially the part about String.matches() as well as the documentation about regular expressions in Java.
Your first requirement is already implicit in the remaining ones, so I would just skip it. Then, just write the regex code that matches each part one after the other:
[0-9]{9}[a-zA-Z]{2}...
There is one special consideration for the number that might be 1 to 31. While it is possible to match this in one regex, it would be verbose and difficult to understand. Instead, perform basic matching in the regex and extract this part as a capturing group by putting it into parentheses:
([0-9]{2})
If you use Pattern and Matcher to apply your regex, and your string matches the pattern, you can then easily get at just thost two characters, use Integer.parseInt() to convert them to an integer (which is completely safe because you know the two characters are digits), and then check the value normally.
This regular expression takes
^[0-9]{9}[a-zA-Z]{2}([1-9]|[1-2][0-9]|3[0-1]|99)[a-zA-Z]([0-9]{6})$
takes
9 letters at start,
Followed by 2 alphabets,
Followed by number between 1 to 31 or 99,
Followed by an alphabet,
followed by 6 digits.

Categories

Resources