What does this regular expression mean?

What does this regular expression mean? - java

In a recent interview I was asked to decipher this regex
^\^[^^]
Can you please help me with it. Also please provide some links where I can learn regex for interviews.

It matches strings that begin with ^ followed by any character other than ^.
So it would match:
^foo
^b
but not
foo
^^b
Explanation:
Caret (^) is a regex meta character with two different meanings:
Outside the character class(1st use in your regex) it works as start anchor.
Inside the character class it acts like negator if used as the first character of the character class(3rd use in your regex).
Preceding a regex with \ escapes it (makes it non-special). The 2nd use of ^ in your regex is escaped and it matches a literal ^ in the string.
Inside a character class a ^ which is not the first character of the character class is treated literally. So the 4th use in your regex is a literal ^.
Some more examples to make it clear:
^a : Matches string beginning
with a
^ab : Matches string beginning
with a followed by b
[a] : Matches a string which
has an a
[^a] : Matches a string which
does not have an a
^a[^a] : Matches a string
beginning with an a followed by any
character other than a.

I'm testing this regex here however it does not seem to be valid.
The first ^ denotes the start of the line.
The first \ escapes the following \.
Thus the second "^" is not escaped
Finally the first caret inside the square brackets [^ acts as the negation and second one ^] is not escaped as a result is not valid.
IMHO the correct regexp should be ^\^[^\^]
Guys, kindly confirm. Many thanks

Match beginning of line or string
followed by a literal \
followed by the beginning of the line or string
followed by any character that is not a space, return or new line character

The first ^ is the beginning of line.
The second one is a literal character of ^ (\ is to escape the other usual meaning of ^)
The third one is to say
a class of characters which does not include the character ^
Some example to show using Ruby:
ruby-1.9.2-p0 > "hello" =~ /^h/ # it found a match at position 0
=> 0
ruby-1.9.2-p0 > "hello" =~ /^e/ # nil means can't find it
=> nil
ruby-1.9.2-p0 > "he^llo" =~ /\^/ # found at position 2
=> 2
ruby-1.9.2-p0 > "he^llo"[/[^^]*/] # anything repeatedly but not including the ^ character
=> "he"

Related

Regex to validate custom format

I have this format: xx:xx:xx or xx:xx:xx-y, where x can be 0-9 a-f A-F and y can be only 0 or 1.
I come up with this regex: ([0-9A-Fa-f]{2}[:][0-9A-Fa-f]{2}[:][0-9A-Fa-f]{2}|[-][0-1]{1})
(See regexr).
But this matches 0a:0b:0c-3 too, which is not expected.
Is there any way to remove these cases from result?

[:] means a character from the list that contains only :. It is the same as
:. The same for [-] which has the same result as -.
Also, {1} means "the previous piece exactly one time". It does not have any effect, you can remove it altogether.
To match xx:xx:xx or xx:xx:xx-y, the part that matches -y must be optional. The quantifier ? after the optional part mark it as optional.
All in all, your regex should be like this:
[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}(-[01])?
If the regex engine you use can be told to ignore the character case then you can get rid of A-F (or a-f) from all character classes and the regex becomes:
[0-9a-f]{2}:[0-9a-f]{2}:[0-9a-f]{2}(-[01])?
How it works, piece by piece:
[0-9a-f] # any digit or letter from (and including) 'a' to 'f'
{2} # the previous piece exactly 2 times
: # the character ':'
[0-9a-f]
{2}
:
[0-9a-f]
{2}
( # start a group; it does not match anything
- # the character '-'
[01] # any character from the class (i.e. '0' or '1')
) # end of group; the group is needed for the next quantifier
? # the previous piece (i.e. the group) is optional
# it can appear zero or one times
See it in action: https://regexr.com/4rfvr
Update
As #the-fourth-bird mentions in a comment, if the regex must match the entire string then you need to anchor its ends:
^[0-9a-f]{2}:[0-9a-f]{2}:[0-9a-f]{2}(-[01])?$
^ as the first character of a regex matches the beginning of the string, $ as the last character matches the end of the string. This way the regex matches the entire string only (when there aren't other characters before or after the xx:xx:xx or xx:xx:xx-y part).
If you use the regex to find xx:xx:xx or xx:xx:xx-y in a larger string then you don't need to add ^ and $. Of course, you can add only ^ or $ to let the regex match only at the beginning or at the end of the string.

You want
xx:xx:xx or if it is followed by a -, then it must be a 0 or 1 and then it is the end (word boundry).
So you don't want any of these
0a:0b:0c-123
0a:0b:0cd
10a:0b:0c
either.
Then you want "negative lookingahead", so if you match the first part, you don't want it to be followed by a - (the first pattern) and it should end there (word boundary), and if it is followed by a -, then it must be a 0 or 1, and then a word boundary:
/\b([0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}(?!-)\b|\b[0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}-[01]\b)/i
To prevent any digit in front, a word boundary is added to the front as well.
Example: https://regexr.com/4rg42
The following almost worked:
/\b([0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}\b[^-]|\b[0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}-[01]\b)/i
but if it is the end of file and it is 3a:2b:11, then the [^-] will try to match a non - character and it won't match.
Example: https://regexr.com/4rg4q

Java Regex, match pattern, pair of words

i am using regex to check correctness of the string in my application. I want to check if string has a following pattern: x=y&a=b&... x,y,a,b etc. can be empty.
Example of correct strings:
abc=def&gef=cda&pdf=cdf
=&gef=def
abc=&gef=def
=abc&gef=def
Example of incorrect strings:
abc=def&gef=cda&
abc=def&gef==cda&
abc=defgef=cda&abc=gda
This is my code showing current solution:
String pattern = "[[a-zA-Z0-9]*[=]{1}[a-zA-Z0-9]*[&]{1}]*";
if(!Pattern.matches(pattern, s)){
throw new IllegalArgumentException(s);
}
This solution is bad because it accepts strings like:
abc=def&gef=def&
Can anyone help me with correct pattern?

You may use the following regex:
^[a-zA-Z0-9]*=[a-zA-Z0-9]*(?:&[a-zA-Z0-9]*=[a-zA-Z0-9]*)*$
See the regex demo
When used with matches(), the ^ and $ anchors may be omitted.
Details:
^ - start of string
[a-zA-Z0-9]* - 0+ alphanumeric chars (may be replaced with \p{Alnum})
= - a = symbol
[a-zA-Z0-9]* - 0+ alphanumeric chars
= - a = symbol
(?: - start of a non-capturing group matching sequences of...
& - a & symbol
[a-zA-Z0-9]*=[a-zA-Z0-9]* - same as above
)* - ... zero or more occurrences
$ - end of string
NOTE: If you want to make the pattern more generic, you may match any char other than = and & with a [^&=] pattern that would replace a more restrictive [a-zA-Z0-9] pattern:
^[^=&]*=[^=&]*(?:&[^=&]*=[^=&]*)*$
See this regex demo

I believe you want this.
([a-zA-Z0-9]*=[a-zA-Z0-9]*&)*[a-zA-Z0-9]*=[a-zA-Z0-9]*
This matches any number of repetitions like x=y, with a & after each one; followed by one repetition like x=y without the following &.

Here you go:
^\w*=\w*(?:&(?:\w*=\w*))*$
^ is the starting anchor
(\w*=\w*) is to represent parameters like abc=def
\w matches a word character [a-zA-Z0-9_]
\w* represents 0 or more characters
& represents tha actual ampersand literal
(&(\w*=\w*))* matches any subsequents parameters like &b=d etc.
$ represents the ending anchor
Regex101 Demo
EDIT: Made all groups non-capturing.
Note: As #WiktorStribiżew has pointed out in the comments, \w will match _ as well, so above regex should be modified to exclude underscores if they are to be avoided in the pattern, i.e [A-Za-z0-9]

Restrict consecutive characters using Java Regex

I need to allow alphanumeric characters , "?","." , "/" and "-" in the given string. But I need to restrict consecutive - only.
For example:
www.google.com/flights-usa should be valid
www.google.com/flights--usa should be invalid
currently I'm using ^[a-zA-Z0-9\\/\\.\\?\\_\\-]+$.
Please suggest me how to restrict consecutive - only.

You may use grouping with quantifiers:
^[a-zA-Z0-9/.?_]+(?:-[a-zA-Z0-9/.?_]+)*$
See the regex demo
Details:
^ - start of string
[a-zA-Z0-9/.?_]+ - 1 or more characters from the set defined in the character class (can be replaced with [\w/.?]+)
(?:-[a-zA-Z0-9/.?_]+)* - zero or more sequences ((?:...)*) of:
- - hyphen
[a-zA-Z0-9/.?_]+ - see above
$ - end of string.
Or use a negative lookahead:
^(?!.*--)[a-zA-Z0-9/.?_-]+$
^^^^^^^^^
See the demo here
Details:
^ - start of string
(?!.*--) - a negative lookahead that will fail the match once the regex engine finds a -- substring after any 0+ chars other than a newline
[a-zA-Z0-9/.?_-]+ - 1 or more chars from the set defined in the character class
$ - end of string.
Note that [a-zA-Z0-9_] = \w if you do not use the Pattern.UNICODE_CHARACTER_CLASS flag. So, the first would look like "^[\\w/.?]+(?:-[\\w/.?]+)*$" and the second as "^(?!.*--)[\\w/.?-]+$".

One approach is to restrict multiple dashes with negative look-behind on a dash, like this:
^(?:[a-zA-Z0-9\/\.\?\_]|(?<!-)-)+$
The right side of the |, i.e. (?<!-)-, means "a dash, unless preceded by another dash".
Demo.

I'm not sure of the efficiency of this, but I believe this should work.
^([a-zA-Z0-9\/\.\?\_]|\-([^\-]|$))+$
For each character, this regex checks if it can match [a-zA-Z0-9\/\.\?\_], which is everything you included in your regex except the hyphen. If that does not match, it instead tries to match \-([^\-]|$), which matches a hyphen not followed by another hyphen, or a hyphen at the end of the string.
Here's a demo.

Java string validation

I'm finding a regular expression which adheres below rules.
Allowed Characters
Alphabet : a-z A-Z
Numbers : 0-9
I am using [^a-zA-Z0-9] but when call
regex = "[^a-zA-Z0-9]" ;
String key = "message";
if (!key.matches(regex))
message = "Invalid key";
system will show Invalid key, The key should be valid. Could you please help me?

If you want to allow these characters [a-zA-Z0-9] you should not use ^ since it negates what is inside the [].
This expression [^a-zA-Z0-9] means anything that is not a-z A-Z or numbers : 0-9.
You may have seen the ^ being used outside the [] at the begging of a regular expression to indicate the begging string like ^[a-zA-Z0-9].

The below regex would allow one or more alphanumeric characters,
^[A-Za-z0-9]+$
Your regex [^a-zA-Z0-9], matches a single character but not of a alphanumeric character. [^..] called negated character class which do the negation of chars which are present inside that character class.
You don't need to give start or end anchors in the regex when it is passed to matches method. So [A-Za-z0-9]+ would be enough.
Explanation:
^ Anchor which denotes the start.
[A-Za-z0-9]+ , + repeats the preceding token [A-Za-z0-9] one or more times.
$ End of the line.

I think you just have to remove the not-operator. Here is the same example, only the variable is renamed:
invalidChars = "[^a-zA-Z0-9]" ;
String key = "message";
if (key.matches(invalidChars)) {
message = "Invalid key";
}
(However, the negated logic is not very readable.)

Try below Alphanumeric regex
"^[a-zA-Z0-9]$"
^ - Start of string
[a-zA-Z0-9] - multiple characters to include
$ - End of string

With validation use \A \z anchors instead of ^ $:
\\A[a-zA-Z0-9]+\\z

Matching '_' and '-' in java regexes

I had this regex in java that matched either an alphanumeric character or the tilde (~)
^([a-z0-9])+|~$
Now I have to add also the characters - and _ I've tried a few combinations, neither of which work, for example:
^([a-zA-Z0-9_-])+|~$
^([a-zA-Z0-9]|-|_)+|~$
Sample input strings that must match:
woZOQNVddd
00000
ncnW0mL14-
dEowBO_Eu7
7MyG4XqFz-
A8ft-y6hDu
~
Any clues / suggestion?

- is a special character within square brackets. It indicates a range. If it's not at either end of the regex it needs to be escaped by putting a \ before it.
It's worth pointing out a shortcut: \w is equivalent to [0-9a-zA-Z_] so I think this is more readable:
^([\w-]+|~$

You need to escape the -, like \-, since it is a special character (the range operator). _ is ok.
So ^([a-z0-9_\-])+|~$.
Edit: your last input String will not match because the regular expression you are using matches a string of alphanumeric characters (plus - and _) OR a tilde (because of the pipe). But not both. If you want to allow an optional tilde on the end, change to:
^([a-z0-9_\-])+(~?)$

If you put the - first, it won't be interpreted as the range indicator.
^([-a-zA-Z0-9_])+|~$
This matches all of your examples except the last one using the following code:
String str = "A8ft-y6hDu ~";
System.out.println("Result: " + str.matches("^([-a-zA-Z0-9_])+|~$"));
That last example won't match because it doesn't fit your description. The regex will match any combination of alphanumerics, -, and _, OR a ~ character.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

What does this regular expression mean? - java

In a recent interview I was asked to decipher this regex ^\^[^^] Can you please help me with it. Also please provide some links where I can learn regex for interviews.

Match beginning of line or string followed by a literal \ followed by the beginning of the line or string followed by any character that is not a space, return or new line character

Related

Regex to validate custom format

Java Regex, match pattern, pair of words

Restrict consecutive characters using Java Regex

Java string validation

Matching '_' and '-' in java regexes

Categories

Resources