Restrict consecutive characters using Java Regex - java

I need to allow alphanumeric characters , "?","." , "/" and "-" in the given string. But I need to restrict consecutive - only.
For example:
www.google.com/flights-usa should be valid
www.google.com/flights--usa should be invalid
currently I'm using ^[a-zA-Z0-9\\/\\.\\?\\_\\-]+$.
Please suggest me how to restrict consecutive - only.

You may use grouping with quantifiers:
^[a-zA-Z0-9/.?_]+(?:-[a-zA-Z0-9/.?_]+)*$
See the regex demo
Details:
^ - start of string
[a-zA-Z0-9/.?_]+ - 1 or more characters from the set defined in the character class (can be replaced with [\w/.?]+)
(?:-[a-zA-Z0-9/.?_]+)* - zero or more sequences ((?:...)*) of:
- - hyphen
[a-zA-Z0-9/.?_]+ - see above
$ - end of string.
Or use a negative lookahead:
^(?!.*--)[a-zA-Z0-9/.?_-]+$
^^^^^^^^^
See the demo here
Details:
^ - start of string
(?!.*--) - a negative lookahead that will fail the match once the regex engine finds a -- substring after any 0+ chars other than a newline
[a-zA-Z0-9/.?_-]+ - 1 or more chars from the set defined in the character class
$ - end of string.
Note that [a-zA-Z0-9_] = \w if you do not use the Pattern.UNICODE_CHARACTER_CLASS flag. So, the first would look like "^[\\w/.?]+(?:-[\\w/.?]+)*$" and the second as "^(?!.*--)[\\w/.?-]+$".

One approach is to restrict multiple dashes with negative look-behind on a dash, like this:
^(?:[a-zA-Z0-9\/\.\?\_]|(?<!-)-)+$
The right side of the |, i.e. (?<!-)-, means "a dash, unless preceded by another dash".
Demo.

I'm not sure of the efficiency of this, but I believe this should work.
^([a-zA-Z0-9\/\.\?\_]|\-([^\-]|$))+$
For each character, this regex checks if it can match [a-zA-Z0-9\/\.\?\_], which is everything you included in your regex except the hyphen. If that does not match, it instead tries to match \-([^\-]|$), which matches a hyphen not followed by another hyphen, or a hyphen at the end of the string.
Here's a demo.

Related

Regex for only two comma separated values, keeping second value optional

I am creating regex for two comma separated values (example - coordinates), i am using regex like below -
^(\-?\d+(\.\d+)?),\s*(\-?\d+(\.\d+)?)$
The above regex mandates two comma separated values, but i want the second value as optional including comma, so i tried changing the regex like this -
^(\-?\d+(\.\d+)?)(,\s*(\-?\d+(\.\d+)?)$)?
This is working but and keeping the second value optional, but it is also allowing comma without any second value like below -
3456,
What can be added in the regex to not allowing comma if second value is not present ? Thanks.
You misplaced the quantifier with the anchor.
Use
^(-?\d+(\.\d+)?)(,\s*(-?\d+(\.\d+)?))?$
^^
See the regex demo.
You may adjust the number of capturing groups in your pattern and convert the optional group into non-capturing by adding ?:after the opening (. I'd use it like
^(-?\d+(?:\.\d+)?)(?:,\s*(-?\d+(?:\.\d+)?))?$
See another demo.
Also note you do not need to escape a hyphen outside a character class.
When using it in Java, do not forget to use double backslashes to define a literal backslash in the string literal and omit ^ and $ if you use the pattern with .matches() method:
s.matches("-?\\d+(?:\\.\\d+)?(?:,\\s*-?\\d+(?:\\.\\d+)?)?")
Details:
^ - start of string anchor
(-?\d+(\.\d+)?) - Group 1 matching an optional hyphen, 1+ digits, then an optional sequence (Group 2) of a dot followed with one or more digits
(,\s*(-?\d+(\.\d+)?))? - an optional sequence (Group 3) matching one or zero occurrences of:
, - comma
\s* - zero or more whitespaces
(-?\d+(\.\d+)?) - Group 4 matching
-? - an optional hyphen
\d+ - one or more digits
(\.\d+)? - Group 5 matching an optional sequence of a dot followed with 1 or more digits
$ - end of string

Java Regex, match pattern, pair of words

i am using regex to check correctness of the string in my application. I want to check if string has a following pattern: x=y&a=b&... x,y,a,b etc. can be empty.
Example of correct strings:
abc=def&gef=cda&pdf=cdf
=&gef=def
abc=&gef=def
=abc&gef=def
Example of incorrect strings:
abc=def&gef=cda&
abc=def&gef==cda&
abc=defgef=cda&abc=gda
This is my code showing current solution:
String pattern = "[[a-zA-Z0-9]*[=]{1}[a-zA-Z0-9]*[&]{1}]*";
if(!Pattern.matches(pattern, s)){
throw new IllegalArgumentException(s);
}
This solution is bad because it accepts strings like:
abc=def&gef=def&
Can anyone help me with correct pattern?
You may use the following regex:
^[a-zA-Z0-9]*=[a-zA-Z0-9]*(?:&[a-zA-Z0-9]*=[a-zA-Z0-9]*)*$
See the regex demo
When used with matches(), the ^ and $ anchors may be omitted.
Details:
^ - start of string
[a-zA-Z0-9]* - 0+ alphanumeric chars (may be replaced with \p{Alnum})
= - a = symbol
[a-zA-Z0-9]* - 0+ alphanumeric chars
= - a = symbol
(?: - start of a non-capturing group matching sequences of...
& - a & symbol
[a-zA-Z0-9]*=[a-zA-Z0-9]* - same as above
)* - ... zero or more occurrences
$ - end of string
NOTE: If you want to make the pattern more generic, you may match any char other than = and & with a [^&=] pattern that would replace a more restrictive [a-zA-Z0-9] pattern:
^[^=&]*=[^=&]*(?:&[^=&]*=[^=&]*)*$
See this regex demo
I believe you want this.
([a-zA-Z0-9]*=[a-zA-Z0-9]*&)*[a-zA-Z0-9]*=[a-zA-Z0-9]*
This matches any number of repetitions like x=y, with a & after each one; followed by one repetition like x=y without the following &.
Here you go:
^\w*=\w*(?:&(?:\w*=\w*))*$
^ is the starting anchor
(\w*=\w*) is to represent parameters like abc=def
\w matches a word character [a-zA-Z0-9_]
\w* represents 0 or more characters
& represents tha actual ampersand literal
(&(\w*=\w*))* matches any subsequents parameters like &b=d etc.
$ represents the ending anchor
Regex101 Demo
EDIT: Made all groups non-capturing.
Note: As #WiktorStribiżew has pointed out in the comments, \w will match _ as well, so above regex should be modified to exclude underscores if they are to be avoided in the pattern, i.e [A-Za-z0-9]

How to replace all non-digit charaters in a string?

I need to replace all non-digit charaters in the string. For instance:
String: 987sdf09870987=-0\\\`42
Replaced: 987**sdf**09870987**=-**0**\\\`**42
That's all non-digit char-sequence wrapped into ** charaters. How can I do that with String::replaceAll()?
(?![0-9]+$).*
the regex doesn't match what I want. How can I do that?
(\\D+)
You can use this and replace by **$1**.See demo.
https://regex101.com/r/fM9lY3/2
You can use a negated character class for a non-digit and use the 0th group back-reference to avoid overhead with capturing groups (it is minimal here, but still is):
String x = "987sdf09870987=-0\\\\\\`42";
x = x.replaceAll("[^0-9]+", "**$0**");
System.out.println(x);
See demo on IDEONE. Output: 987**sdf**09870987**=-**0**\\\`**42.
Also, in Java regex, character classes look neater than multiple escape symbols, that is why I prefer this [^0-9]+ pattern meaning match 1 or more (+) symbols other than (because of ^) digits from 0 to 9 ([0-9]).
A couple of words about your (?![0-9]+$).* regex. It consists of a negative lookahead (?![0-9]+$) that checks if from the current position onward there are no digits only (if there are only digits up to the end of string, the match fails), and .* matching any characters but a newline. You can see example of what it is doing here. I do not think it can help you since you need to actually match non-numbers, not just check if digits are absent.

Match first occurrence of semicolon in string, only if not preceded by '--'

I'm trying to write a regular expression for Java that matches if there is a semicolon that does not have two (or more) leading '-' characters.
I'm only able to get the opposite working: A semicolon that has at least two leading '-' characters.
([\-]{2,}.*?;.*)
But I need something like
([^([\-]{2,})])*?;.*
I'm somehow not able to express 'not at least two - characters'.
Here are some examples I need to evaluate with the expression:
; -- a : should match
-- a ; : should not match
-- ; : should not match
--; : should not match
-;- : should match
---; : should not match
-- semicolon ; : should not match
bla ; bla : should match
bla : should not match (; is mandatory)
-;--; : should match (the first occuring semicolon must not have two or more consecutive leading '-')
It seems that this regex matches what you want
String regex = "[^-]*(-[^-]+)*-?;.*";
DEMO
Explanation: matches will accept string that:
[^-]* can start with non dash characters
(-[^-]+)*-?; is a bit tricky because before we will match ; we need to make sure that each - do not have another - after it so:
(-[^-]+)* each - have at least one non - character after it
-? or - was placed right before ;
;.* if earlier conditions ware fulfilled we can accept ; and any .* characters after it.
More readable version, but probably little slower
((?!--)[^;])*;.*
Explanation:
To make sure that there is ; in string we can use .*;.* in matches.
But we need to add some conditions to characters before first ;.
So to make sure that matched ; will be first one we can write such regex as
[^;]*;.*
which means:
[^;]* zero or more non semicolon characters
; first semicolon
.* zero or more of any characters (actually . can't match line separators like \n or \r)
So now all we need to do is make sure that character matched by [^;] is not part of --. To do so we can use look-around mechanisms for instance:
(?!--)[^;] before matching [^;] (?!--) checks that next two characters are not --, in other words character matched by [^;] can't be first - in series of two --
[^;](?<!--) checks if after matching [^;] regex engine will not be able to find -- if it will backtrack two positions, in other words [^;] can't be last character in series of --.
How about just splitting the string along -- and if there are two or more sub strings, checking if the last one contains a semicolon?
How about using this regex in Java:
[^;]*;(?<!--[^;]{0,999};).*
Only caveat is that it works with up to 999 character length between -- and ;
Java Regex Demo
I think this is what you're looking for:
^(?:(?!--).)*;.*$
In other words, match from the start of the string (^), zero or more characters (.*) followed by a semicolon. But replacing the dot with (?:(?!--).) causes it to match any character unless it's the beginning of a two-hyphen sequence (--).
If performance is an issue, you can exclude the semicolon as well, so it never has to backtrack:
^(?:(?!--|;).)*;.*$
EDIT: I just noticed your comment that the regex should work with the matches() method, so I padded it out with .*. The anchors aren't really necessary, but they do no harm.
You need a negative lookahead!
This regex will match any string which does not contain your original match pattern:
(?!-{2,}.*?;.*).*?;.*
This Regex matches a string which contains a semicolon, but not one occuring after 2 or more dashes.
Example:

How to Java Regex to match everything but specified pattern

I am trying to match everything but garbage values in the entire string.The pattern I am trying to use is:
^.*(?!\w|\s|-|\.|[#:,]).*$
I have been testing the pattern on regexPlanet and this seems to be matching the entire string.The input string I was using was:
Vamsi///#k03#g!!!l.com 123**5
How can I get it to only match everything but the pattern,I would like to replace any string that matches with an empty space or a special charecter of my choice.
The pattern, as written, is supposed to match the whole string.
^ - start of string.
.* - zero or more of any character.
(?!\w|\s|-|\.|[#:,]) - negative look-ahead for some characters.
.* - zero or more of any character.
$ - end of string.
If you only want to match characters which aren't one of the supplied characters, try simply:
[^-\w\s.#:,]
[^...] is a negated character class, it will match any characters not supplied in the brackets. See this for more information.
Test.

Categories

Resources