Regular Expressions \w character class and equals sign - java

I am creating a regular expression to match the string
#servername:port:databasename
and through https://regex101.com/ I came up with
\#(((\w+.*-*)+)?\w+)(:\d+)(:\w+)
which matches
e.g. #CORA-PC:1111:databasename or #111.111.1.111:111:databasename
However when I use this regular expression to pattern match in my java code the String #CORA-PC:1111:database=name is also matched.
Why is \w matching the = equals sign? I also tried [0-9a-zA-Z] but it also matched the = equals sign?
Can anyone help me with this?
Thanks!

The .* is a greedy dot matching subpattern that matches the whole line and then backtracks to accommodate for the subsequent subpatterns. That is why the pattern can match a = symbol (see demo - Group 3 matches that part with =) .
Your pattern is rather fragile, as the first part contains nested quantifiers with optional subpatterns that slows down the regex execution and causes other issues. You need to make it more linear.
#(\w+(?:[-.]\w+)*)?(:\d+)(:\w+)
See the regex demo
The regex will match
# - # symbol
(\w+(?:[-.]\w+)*)? - an optional group matching
\w+ - 1+ word chars
(?:[-.]\w+)* - 0+ sequences of a - or . ([-.]) followed with 1+ word chars
(:\d+) - a : symbol followed with 1+ digits
(:\w+) - a : symbol followed with 1+ word chars
If you need to avoid partial matching, use String#matches() (see demo).
NOTE: In Java, backslashes must be doubled.
Code example (Java):
String s = "#CORA-PC:1111:databasename";
String rx = "#(?:\\w+(?:[-.]\\w+)*)?:\\d+:\\w+";
System.out.println(s.matches(rx));
Code example (JS):
var str = '#CORA-PC:1111:databasename';
alert(/^#(?:\w+(?:[-.]\w+)*)?:\d+:\w+$/.test(str));

Related

Java regex. Match any "value" that is no preceded by given string

I need some help with a Java regexp.
I'm working with a file that has JSON similar format:
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',value:'xxxx'},{},[]],
['zul.wgt.Label','f6DQpf',{sclass:'class notranslate',style:'font-weight: bold;',prolog:' ',value:'xxxx'},
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',label:'xxxx'},{},[]]
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',label:'xxxx'},{},[]]
I need to match any label or value data that is not preceded by a "notranslate" value on the sclass property.
I've been working on an almost working Regexp but I need the final push to match only what I've previously wrote
((?!.*?notranslate)sclass:'[\w\s]+'.*?)((value|label):'(.*?)')
Right now it matches anything from sclass that it's not followed by 'notranslate'
Thanks for your help
The values of your current regex are in the 4th capturing group
You could also use 1 capturing group instead of 4:
^(?!.*\bsclass:'[^']*\bnotranslate\b[^']*').*\b(?:label|value):'([^']+)'
Regex demo
That would match:
^ Assert start of the string
(?! Negative lookahead to assert that what is on the right does not
.*\bsclass: Match any character 0+ times followed by class:
'[^']*\bnotranslate\b[^']*' Match notranslate between single quotes and word boundaries
) Close non capturing group
.* match any character 0+ times
\b(?:label|value): Match either label or value followed by :
'([^']+)' Match ', capture in a group matching not ' 1+ times and match '
Java demo

Java Regex, match pattern, pair of words

i am using regex to check correctness of the string in my application. I want to check if string has a following pattern: x=y&a=b&... x,y,a,b etc. can be empty.
Example of correct strings:
abc=def&gef=cda&pdf=cdf
=&gef=def
abc=&gef=def
=abc&gef=def
Example of incorrect strings:
abc=def&gef=cda&
abc=def&gef==cda&
abc=defgef=cda&abc=gda
This is my code showing current solution:
String pattern = "[[a-zA-Z0-9]*[=]{1}[a-zA-Z0-9]*[&]{1}]*";
if(!Pattern.matches(pattern, s)){
throw new IllegalArgumentException(s);
}
This solution is bad because it accepts strings like:
abc=def&gef=def&
Can anyone help me with correct pattern?
You may use the following regex:
^[a-zA-Z0-9]*=[a-zA-Z0-9]*(?:&[a-zA-Z0-9]*=[a-zA-Z0-9]*)*$
See the regex demo
When used with matches(), the ^ and $ anchors may be omitted.
Details:
^ - start of string
[a-zA-Z0-9]* - 0+ alphanumeric chars (may be replaced with \p{Alnum})
= - a = symbol
[a-zA-Z0-9]* - 0+ alphanumeric chars
= - a = symbol
(?: - start of a non-capturing group matching sequences of...
& - a & symbol
[a-zA-Z0-9]*=[a-zA-Z0-9]* - same as above
)* - ... zero or more occurrences
$ - end of string
NOTE: If you want to make the pattern more generic, you may match any char other than = and & with a [^&=] pattern that would replace a more restrictive [a-zA-Z0-9] pattern:
^[^=&]*=[^=&]*(?:&[^=&]*=[^=&]*)*$
See this regex demo
I believe you want this.
([a-zA-Z0-9]*=[a-zA-Z0-9]*&)*[a-zA-Z0-9]*=[a-zA-Z0-9]*
This matches any number of repetitions like x=y, with a & after each one; followed by one repetition like x=y without the following &.
Here you go:
^\w*=\w*(?:&(?:\w*=\w*))*$
^ is the starting anchor
(\w*=\w*) is to represent parameters like abc=def
\w matches a word character [a-zA-Z0-9_]
\w* represents 0 or more characters
& represents tha actual ampersand literal
(&(\w*=\w*))* matches any subsequents parameters like &b=d etc.
$ represents the ending anchor
Regex101 Demo
EDIT: Made all groups non-capturing.
Note: As #WiktorStribiżew has pointed out in the comments, \w will match _ as well, so above regex should be modified to exclude underscores if they are to be avoided in the pattern, i.e [A-Za-z0-9]

Regex with Whitespace

I am try to write a regex to match the following:
act=MATCHME
act=Match me too
I have the following regex to match either one but not both. Here is my effort:
matches MATCHME: act=(\w+)
matches Match me too: (\w+\s\w+\s\w+)
Is there anyway to can combine the two with OR, or may I be looking at this wrong?
I am using the JAVA regex engine.
You may use an optional non-capturing group:
act=(\w+(?:\s+\w+\s+\w+)?)
^^^^^^^^^^^^^^^^^
See the regex demo
The ? matches 1 or 0 occurrences of the quantified subpattern. When it is applied to a grouping construct, the quantification is applied to the whole pattern sequence, so (?:\s+\w+\s+\w+)? matches 1 or 0 sequences of 1+ whitespaces, 1+ word chars, 1+ whitespaces and again 1+ word chars.
You may further subsegment the pattern if you need to capture 2-word substrings after act=.
Surely you know how to compose regular expressions by alternation.
This regular expression may help you
^[a-zA-Z ]*$

Restrict consecutive characters using Java Regex

I need to allow alphanumeric characters , "?","." , "/" and "-" in the given string. But I need to restrict consecutive - only.
For example:
www.google.com/flights-usa should be valid
www.google.com/flights--usa should be invalid
currently I'm using ^[a-zA-Z0-9\\/\\.\\?\\_\\-]+$.
Please suggest me how to restrict consecutive - only.
You may use grouping with quantifiers:
^[a-zA-Z0-9/.?_]+(?:-[a-zA-Z0-9/.?_]+)*$
See the regex demo
Details:
^ - start of string
[a-zA-Z0-9/.?_]+ - 1 or more characters from the set defined in the character class (can be replaced with [\w/.?]+)
(?:-[a-zA-Z0-9/.?_]+)* - zero or more sequences ((?:...)*) of:
- - hyphen
[a-zA-Z0-9/.?_]+ - see above
$ - end of string.
Or use a negative lookahead:
^(?!.*--)[a-zA-Z0-9/.?_-]+$
^^^^^^^^^
See the demo here
Details:
^ - start of string
(?!.*--) - a negative lookahead that will fail the match once the regex engine finds a -- substring after any 0+ chars other than a newline
[a-zA-Z0-9/.?_-]+ - 1 or more chars from the set defined in the character class
$ - end of string.
Note that [a-zA-Z0-9_] = \w if you do not use the Pattern.UNICODE_CHARACTER_CLASS flag. So, the first would look like "^[\\w/.?]+(?:-[\\w/.?]+)*$" and the second as "^(?!.*--)[\\w/.?-]+$".
One approach is to restrict multiple dashes with negative look-behind on a dash, like this:
^(?:[a-zA-Z0-9\/\.\?\_]|(?<!-)-)+$
The right side of the |, i.e. (?<!-)-, means "a dash, unless preceded by another dash".
Demo.
I'm not sure of the efficiency of this, but I believe this should work.
^([a-zA-Z0-9\/\.\?\_]|\-([^\-]|$))+$
For each character, this regex checks if it can match [a-zA-Z0-9\/\.\?\_], which is everything you included in your regex except the hyphen. If that does not match, it instead tries to match \-([^\-]|$), which matches a hyphen not followed by another hyphen, or a hyphen at the end of the string.
Here's a demo.

Java regex: Negative lookahead

I'm trying to craft two regular expressions that will match URIs. These URIs are of the format: /foo/someVariableData and /foo/someVariableData/bar/someOtherVariableData
I need two regexes. Each needs to match one but not the other.
The regexes I originally came up with are:
/foo/.+ and /foo/.+/bar/.+ respectively.
I think the second regex is fine. It will only match the second string. The first regex, however, matches both. So, I started playing around (for the first time) with negative lookahead. I designed the regex /foo/.+(?!bar) and set up the following code to test it
public static void main(String[] args) {
String shouldWork = "/foo/abc123doremi";
String shouldntWork = "/foo/abc123doremi/bar/def456fasola";
String regex = "/foo/.+(?!bar)";
System.out.println("ShouldWork: " + shouldWork.matches(regex));
System.out.println("ShouldntWork: " + shouldntWork.matches(regex));
}
And, of course, both of them resolve to true.
Anybody know what I'm doing wrong? I don't need to use Negative lookahead necessarily, I just need to solve the problem, and I think that negative lookahead might be one way to do it.
Thanks,
Try
String regex = "/foo/(?!.*bar).+";
or possibly
String regex = "/foo/(?!.*\\bbar\\b).+";
to avoid failures on paths like /foo/baz/crowbars which I assume you do want that regex to match.
Explanation: (without the double backslashes required by Java strings)
/foo/ # Match "/foo/"
(?! # Assert that it's impossible to match the following regex here:
.* # any number of characters
\b # followed by a word boundary
bar # followed by "bar"
\b # followed by a word boundary.
) # End of lookahead assertion
.+ # Match one or more characters
\b, the "word boundary anchor", matches the empty space between an alphanumeric character and a non-alphanumeric character (or between the start/end of the string and an alnum character). Therefore, it matches before the b or after the r in "bar", but it fails to match between w and b in "crowbar".
Protip: Take a look at http://www.regular-expressions.info - a great regex tutorial.

Categories

Resources