how to limit the number of "/" in a string - java

How do I use lookahead assertion to limit by range the number of "/"
I have tired the following
^(?=/{1,3})$
but it doesn't work

The easiest solution is to use a negative lookahead:
^(?!(?:[^/]*/){4})
That basically means the string cannot contain 4 slashes.
This assumes you allow other characters between slashes, but a maximum of 3 slashes.
A positive version would be ^(?=[^/]*(?:/[^/]*){0,3}$) or ^[^/]*(?:/[^/]*){0,3}$, without the lookahead.
Of course, the problem is trivial without regular expressions, if possible.
Lets try to break that last one down:
^ - Start of the string.
[^/]* - Some characters that are not slashes (or none)
(?: ) - A logical group. Similar to (), but does not capture the result (we do not need it after validation)
/[^/]* - Slash, followed by non-slash characters.
{0,3} - From 0 to 3 times.
$ - End of the string.

You could try the following (you have to say that there should be no / afterwards):
^(?=/{1,3}([^/]|$))

Related

Modifying existing Java regex

I have the following regex that validates the allowed characters:
^[a-zA-Z0-9-?\/:;(){}\[\]|`~´.\,'+÷ !##$£%^"&*_<>=àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ\\]*$
I need to modify it so that the string being validated:
may not begin with space or “/”
may not contain “//”
may not end with “/”
For the space at the beginning I have adapted it to
^[^\s][a-zA-Z0-9-?\\/:;(){}\\[\\]|`~´.\\,'+÷ !##$£%^\"&*_<>=àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ\\\\]*$
Not sure what to do about the other two requirements
For the second one I tried combining it with ^((?!//))*$ in various ways but to no success.
Note that ^((?!\/\/))*$ matches any empty string since the lookahead is a non-consuming pattern and here it always returns true.
[^\s] at the start of your pattern will match any chars other than whitespace chars, even those you did not specify in the character class.
You can use
^(?![\s/])(?!.*//)[a-zA-Z0-9?/:;(){}\[\]|`~´.,'+÷ !##$£%^\"&*_<>=àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ\\-]*$(?<!/)
See the regex demo. Details:
^(?![\s/])(?!.*//) - at the start of string, two checks are peformed:
(?![\s/]) - no whitespace or / allowed (right at the start)
(?!.*//) - no // allowed anywhere after zero or more chars other than line break chars, as many as possible
(?<!/) is the check after the end of string is hit, and it fails the match if the last char in string is /.
Note that in Java regex declarations, you do not need to escape / since regex delimiter notation is not used, and / itself is not a special regex metacharacter.
It seems like the following regexp should be enough and more simple: (?!.*//)^[^ /].*[^/]$
So at the beginning you can use negative lookahead to prevent occurence of // anywhere in the text. Then any character but space and / is accepted at the beginning, then anything can be present (besides // which was excluded by negative lookahead) and anything but / is accepted at the end.
Since 95% of the time the special conditions on the space and forward slash
will not occur, it might be better to take those two characters out of your
big class and handle them separately if and when they occur.
The big class can also be condensed to speed things up a bit.
^(?>[a-zA-Z0-9\\!-.:-#\[\]-`{-~£´ÄÖäö÷À-ÂÇ-ÏÑ-ÔÙ-Üß-âç-ïñ-ôù-ý]+|(?:/(?!/|$)|[ ])(?<!^.))*$
https://regex101.com/r/LpCwt6/1
^
(?>
[a-zA-Z0-9\\!-.:-#\[\]-`{-~£´ÄÖäö÷À-ÂÇ-ÏÑ-ÔÙ-Üß-âç-ïñ-ôù-ý]+
| (?:
/
(?! / | $ )
| [ ]
)
(?<! ^ . )
)*
$
And if you want to absorb all the class characters it can get very small.
^(?>[!-.0-~£´ÄÖäö÷À-ÂÇ-ÏÑ-ÔÙ-Üß-âç-ïñ-ôù-ý]+|(?:/(?!/|$)|[ ])(?<!^.))*$
https://regex101.com/r/EYdM5C/1

Regular expression for allowing only 1 of a set of characters

I am trying to use some regex to validate some input inside of Java code. I have been successful in implementing "basic" regex, but this one seems to be out of my scope of knowledge. I am working through RegEgg tutorials to learn more.
Here are the conditions that need to be validated:
Field will always have 8 characters
Can be all spaces
Or
Valid characters: a-zA-Z0-9 -!& or a space
Cannot begin with a space
If one of the special characters is used, it can be the only one used
Legal: "B-123---" "AB&& &" "A!!!!!!!"
Illegal: "B-123!!!" "AB&& -" "A-&! "
Has to have at least one alphanumeric character (Can't be all special characters ie: "!!!!!!!!"
This was my regex before additional validations were added:
^(\s{8}|[A-Za-z\-\!\&][ A-Za-z0-9\-\!\&]{7})$"
Then the additional validations for now allowing multiple of the special characters, and I am a bit stuck. I have been successful in using a positive lookahead, but stuck when trying to use the positive lookbehind. (I think the data before the lookbehind was consumed), but I am speculating as I am a neophyte with this part of regex.
using the or construct (a|b) is a large part of this, and you've begun applying it, so that's a good start.
You've made the rule that it can't start with a digit; nothing in the spec says this. also, - inside [] has special meaning, so escape it, or make sure it is first or last, because then you don't have to. That gets us to:
^(\s{8}|[A-Za-z0-9-!& -]{8})$
next up is the rule that it has to be all the same special character if used at all. Given that there are only 3 special characters, could be easier to just explicitly list them all:
^(\s{8}|[A-Za-z0-9 -]{8}|[A-Za-z0-9 !]{8}|[A-Za-z0-9 &]{8})$
Next up: Can't start with a space, and can't be all-special. Confirming the negative (that it ISNT all-special characters) gets complicated; lookahead seems like a better plan here. This:
^ is regexp-ese for: "Start of line". Note that this doesn't 'consume' a character. 1 is regexpese for 'only the exact character '1' will match here, nothinge else', but as it matches, it also 'consumes' that character, whereas ^ doesn't do that. 'start of line' is not a concept that can be consumed.
This notion of 'a match may fail, but if it succeeds, nothing is consumed' isn't limited to ^ and $; you can write your own:
(?=abc) will match if abc would match at this position, but does not consume it. Thus, the regexp ^(=abc)ab.d$ would match the input string abcd and nothing else. This is called positive lookahead. (it 'looks ahead' and matches if it sees the regular expression in the parens, failing if it does not).
(?!abc) is negative lookahead. It matches if it DOESNT see the thing in the parens. (?!abc)a.c will match the input adc but not the input abc.
(?<=abc) is positive lookbehind. It matches if the pattern you provide would match such that the match ends at the position you find yourself.
(?<!abc) is negative lookbehind.
Note that lookahead and lookbehind can be somewhat limited, in that they may not allow variable length patterns. But, fortunately, your requirements make it easy to limit ourselves to fixed size patterns here. Thus, we can introduce: (?![&!-]{8}) as a non-consuming unit in our regexp that will fail the match if we have all-8 special characters.
We can use this trick to fail on starting space too: (?! ) is all we need for that one.
Let's replace \s which is whitespace with just which is the space character (the problem description says 'space', not 'whitespace').
Putting it all together:
^( {8}|(?! )(?![&!-]{8})([A-Za-z0-9 -]{8}|[A-Za-z0-9 !]{8}|[A-Za-z0-9 &]{8}))$
Thats:
8 spaces, or...
not a space, and not all-8 special character, then,
any of the valid chars, any amount of spaces, and any amount of one of the 3 allowed special symbols, as long as we have precisely 8 of them...
.. OR the same thing as #3 but with the second of the three special symbols
.. OR with the third of the three.
Plug em in at regex101 along with your various examples of 'legal' and 'not legal' and you can play around with it some more.
NB: You can also use backreferences to attempt to solve the 'only one special character is allowed' part of this, but attempting to tackle the 'not all special characters' part seems quite unwieldy if you don't get to use (negative) lookahead.
Its a matter of asserting the right conditions at the start of the regex.
^(?=[ ]*$|(?![ ]))(?!.*([!&-]).*(?!\1)[!&-])[a-zA-Z0-9 !&-]{8}$
see -> https://regex101.com/r/tN5y4P/1
Some discussion:
^ # Begin of text
(?= # Assert, cannot start with a space
[ ]* $ # unless it's all spaces
| (?! [ ] )
)
(?! # Assert, not mixed special chars
.*
( [!&-] ) # (1)
.*
(?! \1 )
[!&-]
)
[a-zA-Z0-9 !&-]{8} # Consume 8 valid characters from within this class
$ # End of text

Restrict consecutive characters using Java Regex

I need to allow alphanumeric characters , "?","." , "/" and "-" in the given string. But I need to restrict consecutive - only.
For example:
www.google.com/flights-usa should be valid
www.google.com/flights--usa should be invalid
currently I'm using ^[a-zA-Z0-9\\/\\.\\?\\_\\-]+$.
Please suggest me how to restrict consecutive - only.
You may use grouping with quantifiers:
^[a-zA-Z0-9/.?_]+(?:-[a-zA-Z0-9/.?_]+)*$
See the regex demo
Details:
^ - start of string
[a-zA-Z0-9/.?_]+ - 1 or more characters from the set defined in the character class (can be replaced with [\w/.?]+)
(?:-[a-zA-Z0-9/.?_]+)* - zero or more sequences ((?:...)*) of:
- - hyphen
[a-zA-Z0-9/.?_]+ - see above
$ - end of string.
Or use a negative lookahead:
^(?!.*--)[a-zA-Z0-9/.?_-]+$
^^^^^^^^^
See the demo here
Details:
^ - start of string
(?!.*--) - a negative lookahead that will fail the match once the regex engine finds a -- substring after any 0+ chars other than a newline
[a-zA-Z0-9/.?_-]+ - 1 or more chars from the set defined in the character class
$ - end of string.
Note that [a-zA-Z0-9_] = \w if you do not use the Pattern.UNICODE_CHARACTER_CLASS flag. So, the first would look like "^[\\w/.?]+(?:-[\\w/.?]+)*$" and the second as "^(?!.*--)[\\w/.?-]+$".
One approach is to restrict multiple dashes with negative look-behind on a dash, like this:
^(?:[a-zA-Z0-9\/\.\?\_]|(?<!-)-)+$
The right side of the |, i.e. (?<!-)-, means "a dash, unless preceded by another dash".
Demo.
I'm not sure of the efficiency of this, but I believe this should work.
^([a-zA-Z0-9\/\.\?\_]|\-([^\-]|$))+$
For each character, this regex checks if it can match [a-zA-Z0-9\/\.\?\_], which is everything you included in your regex except the hyphen. If that does not match, it instead tries to match \-([^\-]|$), which matches a hyphen not followed by another hyphen, or a hyphen at the end of the string.
Here's a demo.

Java Regex matching with parenthesis

I am trying to come up with the regex to find strings matching the following pattern:
(someNumber - someNumber) With the parenthesis included.
I tried:
"\\([1-9]*-[1-9]*\\)"
but that doesn't seem to work.
I also need to match:
The letter W or L followed by (someNumber - someNumber) With the parenthesis included.
I tried to use the same pattern above, slightly modified, but again, no luck:
"W|L \\([1-9]*-[1-9]*\\)"
Any help would be appreciated
Include W|L in parentheses:
(W|L)
If you want to include space characters before and after the minus, add \s or a space before and after -
"((W|L)\\s)?\\([1-9]*\\s-\\s[1-9]*\\)"
If you already know that there will be at least one digit, use + instead of *, as * matches zero or more, whereas + matches 1 or more.
The pattern given above matches with and without a W or L in front.
Here's a pattern that matches with and without space around the - and with or without W or L in front. Additionally, it also captures numbers containing 0, which you excluded in your original regular expression.
"((W|L)\\s)?\\(\\d+\\s?-\\s?\\d+\\)"
Further to blueygh2's answer, your regex will fail if the numbers contain zeroes. My guess is you want to avoid leading zeroes, in which case use [1-9]\d* (or [1-9][0-9]*). If you want to allow the numbers to equal 0 but otherwise avoid leading zeroes, do ([1-9]\d*|0).
You can try this :
"(W|L)\\s*\\(\\d+-\\d+\\)"

How can I express such requirement using Java regular expression?

I need to check that a file contains some amounts that match a specific format:
between 1 and 15 characters (numbers or ",")
may contains at most one "," separator for decimals
must at least have one number before the separator
this amount is supposed to be in the middle of a string, bounded by alphabetical characters (but we have to exclude the malformed files).
I currently have this:
\d{1,15}(,\d{1,14})?
But it does not match with the requirement as I might catch up to 30 characters here.
Unfortunately, for some reasons that are too long to explain here, I cannot simply pick a substring or use any other java call. The match has to be in a single, java-compatible, regular expression.
^(?=.{1,15}$)\d+(,\d+)?$
^ start of the string
(?=.{1,15}$) positive lookahead to make sure that the total length of string is between 1 and 15
\d+ one or more digit(s)
(,\d+)? optionally followed by a comma and more digits
$ end of the string (not really required as we already checked for it in the lookahead).
You might have to escape backslashes for Java: ^(?=.{1,15}$)\\d+(,\\d+)?$
update: If you're looking for this in the middle of another string, use word boundaries \b instead of string boundaries (^ and $).
\b(?=[\d,]{1,15}\b)\d+(,\d+)?\b
For java:
"\\b(?=[\\d,]{1,15}\\b)\\d+(,\\d+)?\\b"
More readable version:
"\\b(?=[0-9,]{1,15}\\b)[0-9]+(,[0-9]+)?\\b"

Categories

Resources