Regexp: Limiting where and how often certain chars are allowed - java

I have some form numbers I need to validate. I've tried multiple attempts but am not getting it right yet. While much is allowed in a form number there are some limits I need to impose:
All of these rules should be enforced:
A-Z allowed but not required (see bullet 4)
0-9 allowed but not required (see bullet 4)
period (decimal point) and dash, if present, only allowed once per form number . -
Minimum length is one character and it cannot be a space, dash or period
multiple spaces are allowed but two spaces may not be next to each other; also no leading or trailing spaces are allowed
This is what I had before but not all the above rules were enforced.
[A-Z0-9]([A-Za-z0-9 -.])*[A-Z0-9]
So these would be examples of valid form numbers under the new requirements:
123
123 456
A1 IL 23 MN
CL-100 2.0
These would be examples of invalid form numbers under the new requirements:
123 456
25! 25
25-IL 30-1
aa bb CC

This should work
^([A-Z0-9]|(?! )(?!.* $)(?!.* )(?!.*-.*-)(?!.*\..*\.)(?![.-]$)[A-Z0-9 .-]+)$
There are two parts. The first one [A-Z0-9] checks for a single character. If it isn't a single character then there are some exclusion rules (?! )(?!.* $)(?!.* )(?!.*-.*-)(?!.*\..*\.)(?![.-]$) (in order): no beginning with space, no ending with space, no consecutive double spaces, no two -, no two ., no single character . or - followed by end-of-string. Then there is the "base" pattern (one or more of) [A-Z0-9 .-]+
Note that you'll have to escape the \ with another \, so \\.

Does it have to be all in one regex for some reason? You could go through and match for \s\s, and make sure that returns false. then you can go through each character and make sure that not more than 1 of them is a . and not more than one of them is a -. You can also check for no leading or trailing spaces, or you can be kind to your users and simply trim the input. You can then make sure that you have at least one character, and if the length is exactly one character, it is not a dash or a period.
Finally, since all of your other conditions are now satisfied, you can match the string against [A-Z0-9 -.]* and you will have your answer.
Based on your unsuccessful regex, I suspect you have a lot more conditions you actually want met, but hopefully this was enough help to allow you to figure out how to meet them on your own.

Related

Regular expression for allowing only 1 of a set of characters

I am trying to use some regex to validate some input inside of Java code. I have been successful in implementing "basic" regex, but this one seems to be out of my scope of knowledge. I am working through RegEgg tutorials to learn more.
Here are the conditions that need to be validated:
Field will always have 8 characters
Can be all spaces
Or
Valid characters: a-zA-Z0-9 -!& or a space
Cannot begin with a space
If one of the special characters is used, it can be the only one used
Legal: "B-123---" "AB&& &" "A!!!!!!!"
Illegal: "B-123!!!" "AB&& -" "A-&! "
Has to have at least one alphanumeric character (Can't be all special characters ie: "!!!!!!!!"
This was my regex before additional validations were added:
^(\s{8}|[A-Za-z\-\!\&][ A-Za-z0-9\-\!\&]{7})$"
Then the additional validations for now allowing multiple of the special characters, and I am a bit stuck. I have been successful in using a positive lookahead, but stuck when trying to use the positive lookbehind. (I think the data before the lookbehind was consumed), but I am speculating as I am a neophyte with this part of regex.
using the or construct (a|b) is a large part of this, and you've begun applying it, so that's a good start.
You've made the rule that it can't start with a digit; nothing in the spec says this. also, - inside [] has special meaning, so escape it, or make sure it is first or last, because then you don't have to. That gets us to:
^(\s{8}|[A-Za-z0-9-!& -]{8})$
next up is the rule that it has to be all the same special character if used at all. Given that there are only 3 special characters, could be easier to just explicitly list them all:
^(\s{8}|[A-Za-z0-9 -]{8}|[A-Za-z0-9 !]{8}|[A-Za-z0-9 &]{8})$
Next up: Can't start with a space, and can't be all-special. Confirming the negative (that it ISNT all-special characters) gets complicated; lookahead seems like a better plan here. This:
^ is regexp-ese for: "Start of line". Note that this doesn't 'consume' a character. 1 is regexpese for 'only the exact character '1' will match here, nothinge else', but as it matches, it also 'consumes' that character, whereas ^ doesn't do that. 'start of line' is not a concept that can be consumed.
This notion of 'a match may fail, but if it succeeds, nothing is consumed' isn't limited to ^ and $; you can write your own:
(?=abc) will match if abc would match at this position, but does not consume it. Thus, the regexp ^(=abc)ab.d$ would match the input string abcd and nothing else. This is called positive lookahead. (it 'looks ahead' and matches if it sees the regular expression in the parens, failing if it does not).
(?!abc) is negative lookahead. It matches if it DOESNT see the thing in the parens. (?!abc)a.c will match the input adc but not the input abc.
(?<=abc) is positive lookbehind. It matches if the pattern you provide would match such that the match ends at the position you find yourself.
(?<!abc) is negative lookbehind.
Note that lookahead and lookbehind can be somewhat limited, in that they may not allow variable length patterns. But, fortunately, your requirements make it easy to limit ourselves to fixed size patterns here. Thus, we can introduce: (?![&!-]{8}) as a non-consuming unit in our regexp that will fail the match if we have all-8 special characters.
We can use this trick to fail on starting space too: (?! ) is all we need for that one.
Let's replace \s which is whitespace with just which is the space character (the problem description says 'space', not 'whitespace').
Putting it all together:
^( {8}|(?! )(?![&!-]{8})([A-Za-z0-9 -]{8}|[A-Za-z0-9 !]{8}|[A-Za-z0-9 &]{8}))$
Thats:
8 spaces, or...
not a space, and not all-8 special character, then,
any of the valid chars, any amount of spaces, and any amount of one of the 3 allowed special symbols, as long as we have precisely 8 of them...
.. OR the same thing as #3 but with the second of the three special symbols
.. OR with the third of the three.
Plug em in at regex101 along with your various examples of 'legal' and 'not legal' and you can play around with it some more.
NB: You can also use backreferences to attempt to solve the 'only one special character is allowed' part of this, but attempting to tackle the 'not all special characters' part seems quite unwieldy if you don't get to use (negative) lookahead.
Its a matter of asserting the right conditions at the start of the regex.
^(?=[ ]*$|(?![ ]))(?!.*([!&-]).*(?!\1)[!&-])[a-zA-Z0-9 !&-]{8}$
see -> https://regex101.com/r/tN5y4P/1
Some discussion:
^ # Begin of text
(?= # Assert, cannot start with a space
[ ]* $ # unless it's all spaces
| (?! [ ] )
)
(?! # Assert, not mixed special chars
.*
( [!&-] ) # (1)
.*
(?! \1 )
[!&-]
)
[a-zA-Z0-9 !&-]{8} # Consume 8 valid characters from within this class
$ # End of text

regular expression in Java (Spring configuration) with 2 specific characters in begining

I need regular expression which will start with 2 specific letters and will be 28 characters long.
The regular expression is needed, as this is in conjunction with Spring configuration, which will only take a regular expression.
I've been trying to do with this, it's not working (^[AK][28]*)
If you mean that the string should be like "AKxxxxxxxx" (28 characters in total), then you can use:
^AK.{26}$ //using 26 since AK already count for 2 characters
Regex is nothing specific to Java, nor is it that difficult if you have a look at any tutorial (and there's plenty!).
To answer your question:
AK[a-zA-Z]{26}
The above regex should solve your issue regarding a 28 character String with the first two letters being AK.
Elaboration:
AK[a-zA-Z]{26}> Characters written as such, without any special characters will be matched as is (that means they must be where they were written, in exactly that fashion)
AK[a-zA-Z]{26}> By using square brackets you can define a set of characters, signs, etc. to be matched against a part of the regex (1 by default) - you can write down all the possible characters/signs or make use of groups (e.g. a-z, /d for digits, and so forth)
AK[a-zA-Z]{26}> for each set of characters/signs you can define a repetition count, this defines how often the set can/must be applied. E.g. {26} means it must match 26 times. Other possibilities are {2, 26} meaning it must match at least 2 times but at most 26 times, or for example use an operator like *, + or ? which denote that the set can be matched 0 or more times, at least once or 0 or 1 time
In case you need it matching a whole line you would likely want to add ^ and $ at the beginning and end respectively, to tell the regex parser that it has to match a whole line/String and not just a part:
^AK[a-zA-Z]{26}$
If you need to count the number of repetitions use the {min, max} syntax. Omiting both the comma and max tells the regex parser to look for exactly minrepetitions.
For example :
.{1,3} will match for any character (shown by the dot) sequence between 1 and 3 characters long.
[AK]{2} will match for exactly 2 characters that are either A or K :
AK, AA, KA or KK.
Additionnaly, your regex uses [AK]. This means that it will match against one of the characters given, i.e. A or K.
If you need to match for the specific "AK" sequence then you need to get rid of the '[' ']' tokens.
Therefore you regex could be AK.{28} meaning it will match for AK followed by exactly 28 characters.

Pattern/Regular expression to grab a number *only* if it's the only field in the record

This has been driving me crazy the past couple of days. I'm trying to kill two birds with one stone by validating a record and extracting a field at the same time. My strategy has been to do this with a regular expression:
private Pattern firstNumber = Pattern.compile("\\d{1}");
Which I understand to mean "the first number in the line (record)." So far this has been effective at grabbing the first field (and ensuring that it's a number), but I want to take this a step further:
How can I tweak the regexp to specify that I want the number only if it's the sole field?
That is, if the record is simply 10, I want to grab 10. But if the record is 10 4, I don't want to grab anything (as this is an invalid record for the project).
I tried:
private Pattern oneNumberOnly = Pattern.compile("\\d{1}\n");
But -- to my chagrin -- this (and any other permutation of it) does not pick up any numbers. Is there something I'm missing here?
You can denote beginning of line/string with ^ and end of line/string with $, so the pattern would be
^\d+$
The {1} won't work because it excludes anything with more than one digit, such as 10. Using \d+ indicates one or more digits. Using \d may also allow decimals and negative values (not sure about Java), so if you only want digits, replace \d with [0-9].
Specifying {1} is always redundant, by the way, because by default an atom is matched once.
You can use the start line character and end line character. If you are trying to grab a number that is on its own line you can use:
Pattern.compile("^(\\d)++$");
By adding the {1} you will only get 1 digit of a number. You should also trim the string you are comparing against to get rid of any extra whitespace.
^ - Start of line character
\\d - digit character [0-9]
+ - 1 or more characters that match \d
+ - possesive (this will grab all the digits and is quicker than greedy quantifiers)
$ - End of line character

Help with regex

I'm constructing a regex which will accept at least 1 alpha numerical character and any number of spaces.
Right now I've got...[A-Za-z0-9]+[ \t\r\n]* which I understand to be at least 1 alphanumeric OR at least 1 space. How would I fix this?
EDIT: To answer the comments below I want it to accept strings which contain ATLEAST 1 alphanumeric AND any number of (including no) spaces. Right now it will accept JUST a whitespace.
EDIT2: To clarify, I don't want the any number of whitespace (including 0) to be accepted unless there is at least 1 alphanumeric character
\s*\p{Alnum}[\p{Alnum}\s]*
Your regex, [A-Za-z0-9]+[ \t\r\n]*, requires the string to start with a letter or digit (or, more accurately, it doesn't start matching until it sees one). Adding \s* allows the match to start with whitespace, but you still won't match any alphanumerics after the first whitespace character that follows an alphanumeric (for example, it won't match the xyz in abc xyz. Changing the trailing \s* to [\p{Alnum}\s]* fixes that problem.
On a side note, \p{Alnum} is exactly equivalent to [A-Za-z0-9] in Java, which is not the case in all regex flavors. I used \p{Alnum}, not just because it's shorter, but because it gives more protection from typos like [A-z] (which is syntactically valid, but almost certainly not what the author really meant).
EDIT: Performance should be considered, too. I originally included a + after the first \p{Alnum}, but I realized that wasn't a good idea. If this were part of a longer regex, and the regex didn't match right away, it could end up wasting a lot of time trying to match the same groups of characters with \p{Alnum}+ or [\p{Alnum}\s]*. The leading \s* is okay, though, because \s doesn't match any of the characters that \p{Alnum} matches.
Any one or more word char zero or more whitespace
\w+\s*
Hey try this ([^\s]+\s*) [^\s] means catch everything that is not white space, while \s* means that an white space is optional (if you really want at least one white space put + instead of )
Edit: sory mine catch everithing not only alphanumeric (put ([a-zA-Z0-9]+\s) for alphanumeric)
This should do the trick:
\s*\p{Alnum}+\s*
\p{Alnum} is an alphanumeric character: [\p{Alpha}\p{Digit}]
* says "zero or more times"
+ says "at least one" (not "or" as you seem to believe, or is written |)
| means "or"
\s is a whitespace character: [ \t\n\x0B\f\r]
EDIT: To answer the comments below I want it to accept strings which contain AT LEAST 1 alphanumeric AND any number of (including no) spaces.
The pattern I suggested requires at least one alpha numeric character.
EDIT2: To clarify, I don't want the any number of whitespace (including 0) to be accepted unless there is at least 1 alphanumeric character
The pattern I suggested will not accept only white space characters only.

How can I express such requirement using Java regular expression?

I need to check that a file contains some amounts that match a specific format:
between 1 and 15 characters (numbers or ",")
may contains at most one "," separator for decimals
must at least have one number before the separator
this amount is supposed to be in the middle of a string, bounded by alphabetical characters (but we have to exclude the malformed files).
I currently have this:
\d{1,15}(,\d{1,14})?
But it does not match with the requirement as I might catch up to 30 characters here.
Unfortunately, for some reasons that are too long to explain here, I cannot simply pick a substring or use any other java call. The match has to be in a single, java-compatible, regular expression.
^(?=.{1,15}$)\d+(,\d+)?$
^ start of the string
(?=.{1,15}$) positive lookahead to make sure that the total length of string is between 1 and 15
\d+ one or more digit(s)
(,\d+)? optionally followed by a comma and more digits
$ end of the string (not really required as we already checked for it in the lookahead).
You might have to escape backslashes for Java: ^(?=.{1,15}$)\\d+(,\\d+)?$
update: If you're looking for this in the middle of another string, use word boundaries \b instead of string boundaries (^ and $).
\b(?=[\d,]{1,15}\b)\d+(,\d+)?\b
For java:
"\\b(?=[\\d,]{1,15}\\b)\\d+(,\\d+)?\\b"
More readable version:
"\\b(?=[0-9,]{1,15}\\b)[0-9]+(,[0-9]+)?\\b"

Categories

Resources