Regex based evaluations of given string - java

I generate a 6 characters long random number. It can be all numeric, alphabets and alphanumeric. I have to validate this string on basis of provided regular expression. For example:
If string is numeric [0-9], then it should not contain all zeroes.
If string is alphabetic [a-zA-Z], then last character cannot be X or x. And string cannot start with SVC or svc.
If string is alphanumeric [0-9a-zA-Z], then it cannot contain all zeroes 0. And string cannot start with tripple zeroes 000 and cannot end with x or X.
I need regular expressions for these that can be used with Java Matcher.

This should work:
/^((?!.{7,})[0-9]*[1-9]+[0-9]*|(?!(SVC|svc))[a-zA-Z]{5}[a-wy-zA-WY-Z]|(?!(000|.{7,}))[0-9a-zA-Z]*([a-zA-Z][0-9]|[0-9][a-zA-Z])+[0-9a-zA-Z]*[a-wy-zA-WY-Z])$/gm
I cannot explain this regex, as it is too long and just a repetitive application of the same concepts over and over. However, given the following input, it matches only the first five lines:
002000
jfkasd
002dfd
sVcabc
abc65i
000000
00012c
0123ax
SVCabx
svcabc
abc65x
abc65X
Here's the original attempt I proposed, which does not satisfy all the condition of the OP, but it is accompained by an explanation:
/^((?!.{7,})[0-9]*[1-9]+[0-9]*|[a-zA-Z]{5}[a-wy-zA-WY-Z]|(?!000)[0-9a-zA-Z]{6})$/gm
Explanation (which could read on the linked page itself):
We have three alternatives that have to match the whole line: ^(…|…|…)$;
The 2nd alernative is easy: five letters followed by one letter which is not x or X, [a-zA-Z]{5}[a-wy-zA-WY-Z] ([^xX] would match numers too or anything else).
The 3rd alternative is slightly more complex: six letters or digits, which is not preceded by 000; this uses a negative lookahead, and it works because of the anchor ^ (if you remove that, it breaks).
The 1st alternative is similar: zero or more digits, followed by one or more non-0 digits, followed by zero or more digits; all not starting by 7 or more characters.

Related

Java regex - First needs to be the letter X(case insensitve) the rest digits

I need to match to see if a string is in the format of X[d].... It has to have the letter X (case intensive) at the start and AT LEAST 1 digit after. I tried the following regex, but it doesn't matchanything:
^(?i)[x](?=.*[0-9])*$
// ^(?i)[x] - first character needs to be x (case intensive)
// (?=.*[0-9]) - should have at least one digit after and must be all digits after
Use the following.
^(?i)x\d+$
This translates to case insensitive x followed by one or more digits 0-9. There's no need for brackets around the x because it's not a set. It's only one character.
Alternatively, you can create a set that consists of upper and lower case x.
^[xX]\d+$
In Java, you may use
s.matches("(?i)x[0-9]+")
It will match a string starting with x or X and then having 1 or more digits.
You should not quantify a lookahead, a zero-width assertion, since it would mean it would match an empty location and matching it repeatedly means you are still there and the regex index is not advanced.
However, Java regex just ignores a quantified lookahead. Your current regex, ^(?i)[x](?=.*[0-9])*$, matches x but not x5 as there is only one part to match, [x]. see the Java demo.
Even if you remove the * quantifier, ^(?i)[x](?=.*[0-9])$ does not match any string since $, end of string, is required right after x while (?=.*[0-9]) positive lookahead requires a digit after any 0+ chars other than line break chars.

Regex for a random set of chars and digits

I am looking for a regex that matches only when it sees a string that is randomly filled by digits and chars.
For example, adfak332arg3 is allowed but 332352 and fagaaah are not allowed. .*[^\\s] looks fine for strings with only chars but how to fix it to accepts the desired strings and refuses the other two types?
Use a positive lookahead (?=) to ensure that the string contains required characters.
^(?=.*[a-zA-Z])(?=.*\d)[a-zA-Z\d]+$
Test this regex pattern here.
You can try this regex
"[\\d\\w]*\\d\\w[\\d\\w]*|[\\d\\w]*\\w\\d[\\d\\w]*"
If you need just a mixed string of characters A-Z, a-z and 0-9 you can use:
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])$
If you want to force the string to have a minimum number of characters in your string you can use (e.g. minimum 8 in the string):
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9]).{8,}$
If you want to have a string length from min-length to max-length then use (e.g. string of at least 5 characters and max 20 characters):
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9]).{5,20}$
To ensure that an input contains digits as well as characters, you could use this regex:
^(?:[A-Za-z]+\\d+|\\d+[A-Za-z]+)[A-Za-z\\d]*$
The regex ensures that the input contains at least a number and a character, and allows only numbers or characters (no special characters etc.)
(?:[A-Za-z]+\d+|\d+[A-Za-z]+) ensures that it starts with one or more characters followed by digits or alternatively |\d+[A-Za-z]+ one or more digits followed by one or more characters
[A-Za-z\d]* allows any number of characters or digits after the previous check
^ and $ to match starting and ending anchor
Regex101 Demo
Hope this helps!
Try this Regex.
[A-z][0-9]|[0-9][A-z]

Regex to match exactly n occurrences of letters and m occurrences of digits

I have to match an 8 character string, which can contain exactly 2 letters (1 uppercase and 1 lowercase), and exactly 6 digits, but they can be permutated arbitrarily.
So, basically:
K82v6686 would pass
3w28E020 would pass
1276eQ900 would fail (too long)
98Y78k9k would fail (three letters)
A09B2197 would fail (two capital letters)
I've tried using the positive lookahead to make sure that the string contains digits, uppercase and lowercase letters, but I have trouble with limiting it to a certain number of occurrences. I suppose I could go about it by including all possible combinations of where the letters and digits can occur:
(?=.*[0-9])(?=.*[A-Z])(?=.*[a-z]) ([A-Z][a-z][0-9]{6})|([A-Z][0-9][a-z][0-9]{5})| ... | ([0-9]{6}[a-z][A-Z])
But that's a very roundabout way of doing it, and I'm wondering if there's a better solution.
You can use
^(?=[^A-Z]*[A-Z][^A-Z]*$)(?=[^a-z]*[a-z][^a-z]*$)(?=(?:\D*\d){6}\D*$)[a-zA-Z0-9]{8}$
See the regex demo (a bit modified due to the multiline input). In Java, do not forget to use double backslashes (e.g. \\d to match a digit).
Here is a breakdown:
^ - start of string (assuming no multiline flag is to be used)
(?=[^A-Z]*[A-Z][^A-Z]*$) - check if there is only 1 uppercase letter (use \p{Lu} to match any Unicode uppercase letter and \P{Lu} to match any character other than that)
(?=[^a-z]*[a-z][^a-z]*$) - similar check if there is only 1 lowercase letter (alternatively, use \p{Ll} and \P{Ll} to match Unicode letters)
(?=(?:\D*\d){6}\D*$) - check if there are six digits in a string (=from the beginning of the string, there can be 0 or more non-digit symbols (\D matches any character but a digit, you may also replace it with [^0-9]), then followed by a digit (\d) and then followed by 0 or more non-digit characters (\D*) up to the end of string ($)) and then
[a-zA-Z0-9]{8} - match exactly 8 alphanumeric characters.
$ - end of string.
Following the logic, we can even reduce this to just
^(?=[^a-z]*[a-z][^a-z]*$)(?=(?:\D*\d){6}\D*$)[a-zA-Z0-9]{8}$
One condition can be removed as we only allow lower- and uppercase letters and digits with [a-zA-Z0-9], and when we apply 2 conditions the 3rd one is automatically performed when matching the string (one character must be an uppercase in this case).
When using it with Java matches() method, there is no need to use ^ and $ anchors at the start and end of the pattern, but you still need it in the lookaheads:
String s = "K82v6686";
String rx = "(?=[^a-z]*[a-z][^a-z]*$)" + // 1 lowercase letter check
"(?=(?:\\D*\\d){6}\\D*$)" + // 6 digits check
"[a-zA-Z0-9]{8}"; // matching 8 alphanum chars exactly
if (s.matches(rx)) {
System.out.println("Valid");
}
Pattern.matches(".*[A-Z].*", s) &&
Pattern.matches(".*[a-z].*", s) &&
Pattern.matches(".*(\\D*\\d){6}.*", s) &&
Pattern.matches(".{8}", s)
As we need an alternating automaton to be created for this task, it's much simpler to use a conjunction of regexps for constituent types of character.
We require it to have at least one lowercase letter, one uppercase letter and 6 digits, which three classes are mutually exclusive. And with the last condition we require the length of string to be exactly the sum of these numbers in such a way leaving no room for extra characters beyond the desired types. Of course we may say s.lenght() == 8 as the last condition term but this would break the style :).
Sort the string lexically and then match against ^(?:[a-z][A-Z]|[A-Z][a-z])[0-9]{6}$.

Using Regex, is it possible to use an expression such as 'Followed by' or 'Preceded by'

I have the following expression where i want to extract an identifier that is 12 digits long:
([12]\d{3})(\d{6})(\d{2})
This works fine if the string is in the following format:
ABCD123456789101
123456789101
When it gets a string like the following, how does it know which 12 digits to match on:
ABCD1234567894837376383439434343232
1234567894837376383439434343232
In the above scenario, i dont want to select the twelve digits. So the answer i think is to only select the twelve digits, if those twelve digits are not preceded or proceeded by other digits. I tried this change:
[^0-9]([12]\d{3})(\d{6})(\d{2})[^0-9]
This basically says get me the 12 digits only if the characters before and after the 12 digits are non numeric. The problem i have is i am also getting those non-numeric characters as part of the match i.e.
ABCD123456789483X7376383439434343232 returns D123456789483X
Is there anyway of checking what the preceding and proceeding characters are but not include them in the match result? i.e. only match if the preceding and proceeding characters are non numeric but don't include those non-numeric characters in the match result.
You can use lookarounds:
(?<!\\d)([12]\d{3})(\d{6})(\d{2})(?!\\d)
Here:
(?<!\\d) is a negative lookbehind which means your pattern is not preceded by a digit
(?!\\d) is a negative lookahead which means your pattern is not followed by a digit
Read more about lookarounds

Reg Expression Validation on a String

Can I use Reg Expression for the following use case?
I Need to write a boolean method which takes a String parameter that should satisfy following conditions.
20 character length string.
First 9 characters will be a number
Next 2 characters will be alphabets
Next 2 characters will be a number.(1 to 31 or 99)
Next 1 character will be an alphabet
Last 6 characters will be a number.
In this, I have wrote the code for the first requirement:
[a-zA-Z0-9]{20} - This expression works well for the first case. I don't know how to write a complete reg expression to meet the entire requirement.
Please help.
Yes, it is possible to use regexes for this.
Ignore the "20 characters" part and describe a string created by concatenating 9 digits, 2 letters, 2 digits, 1 letter and another digit.
Start with the string start: ^
Then 9 digits. The \d conveniently describes the character set [0-9], so \d{9} means "nine digits"
Then 2 letters. The \w class is too broad, so stick to [a-zA-Z] for a letter.
Then another two digits. They seem to be from a restricted set, so describe the set with alternation and grouping.
Then another letter and another digit.
And, finally, you have to end at the end of the string: $
For reference, this regex means "the string is nine letters, then 12-15 or 99, then another letter":
^[a-zA-Z]{9}(1[2-5]|99)[a-zA-Z]$
Read the String JavaDocs, especially the part about String.matches() as well as the documentation about regular expressions in Java.
Your first requirement is already implicit in the remaining ones, so I would just skip it. Then, just write the regex code that matches each part one after the other:
[0-9]{9}[a-zA-Z]{2}...
There is one special consideration for the number that might be 1 to 31. While it is possible to match this in one regex, it would be verbose and difficult to understand. Instead, perform basic matching in the regex and extract this part as a capturing group by putting it into parentheses:
([0-9]{2})
If you use Pattern and Matcher to apply your regex, and your string matches the pattern, you can then easily get at just thost two characters, use Integer.parseInt() to convert them to an integer (which is completely safe because you know the two characters are digits), and then check the value normally.
This regular expression takes
^[0-9]{9}[a-zA-Z]{2}([1-9]|[1-2][0-9]|3[0-1]|99)[a-zA-Z]([0-9]{6})$
takes
9 letters at start,
Followed by 2 alphabets,
Followed by number between 1 to 31 or 99,
Followed by an alphabet,
followed by 6 digits.

Categories

Resources