Regex for a dynamic string - java

I have a string that can come in different forms - one example below
%VAR('SERVER','DEFAULT') %VAR('LOC','NYC')Run_ServerRestart.ps1
I want to separate them as individual items and am not able to separate out the command in the end as a match. My regex returns 2 matches for the above example while I am looking to get 3 matches. Any ideas on what I maybe doing wrong and how to separate out them as below and if there is a more efficient way to achieve this ? We need to accommodate for spaces anywhere in between as well.
%VAR('SERVER','DEFAULT')
%VAR('LOC','NYC')
Run_ServerRestart.ps1
Here is my regex so far:
/%VAR\(\s*'([^']+)'\s*\)*\,*\s*'([^']+)'\s*\)|/*\\*[\w-]+\s*\.?\S*/g
However, this above regex is not matching some of my examples below.
c:\abc\def.txt - should show 1 match
%VAR('SERVER','USA')\C:\batch.bat - should show 2 matches - %VAR('SERVER', 'USA') and \c:\batch.bat
%VAR('SERVER','NYC') - should show 1 match
%VAR('SERVER','NYC') %VAR('APP','NNJ')Run_Command.ps1 - should show 3 matches
%VAR('SERVER','NYC') %VAR('APP','NNJ') and Run_Command.ps1
%VAR('SERVER','NYC') -File - should show 2 matches - %VAR('SERVER','NYC') and -File
/usr/bin/cat - should show 1 match
%VAR('SERVER','NYC')BATCH1.bat - should show 2 matches - %VAR('SERVER','NYC') and BATCH1.bat
ftp -s:D:\\apps\\scripts\\Intel\\daily_job.ftp - should show 1 match
%VAR('SERVER') - should show 1 match

Regex: %VAR\([^\)]+\)|[\S]+(?:\s[\S]+)?
Details:
[^] Match a single character not present in the list
[] Match a single character present in the list
+ Matches between one and unlimited times
* Matches between zero and unlimited times
| or
(?:) Non capturing group
? Matches between zero and one times
\S matches any non-whitespace character (equal to [^\r\n\t\f\v ])
\s matches any whitespace character (equal to [\r\n\t\f\v ])
Regex demo

Assuming they always look exactly like this you can just abuse the ) as a separator
(.*\))\s(.*\))(.*)
Example with explanation here: https://regex101.com/r/CLBFKa/1

Related

Having problems with java regex

I have the following regex:
/[-A-Z]{4}\d{2}/[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}.png
Basically I want to check for strings of the basic type
ABCD12/<here_is_a_random_uuid_as_a_string>.png
The UUID (which is in UPPER CASE) checking works fine, but now let's take a look at a special case. I want to accept strings like this
--CD12/...
AB--12/...
but NOT like this:
A--D12/...
But I can not get the first part of the regex right. Basically I need to check for either two digits or two -after each other twice.
For my understanding [-A-Z]{4} means "either - or something between A - Z with a length of 4". So why doesn't my pattern work?
EDIT:
This answer was posted within the comments and it works:
(?mi)^(?:--[A-Z]{2}|[A-Z]{2}(?:--|[A-Z]{2}))\d{2}/[0-9A-F]{8}(?:-[0-9A-F]{4}){3}-[0-9A-F]{12}\.png$
Can somebody explain to me what (?mi) and what (?:...) means? The normal ? means 0 or 1 time, but what is the : for?
EDIT 2:
Just for those how might have a similar problem and do not want to read all of those regexes ;)
I slightly modified an answer to also accept patterns like ----12. The end result:
"^/(?:--[A-Z]{2}|-{4}|[A-Z]{2}(?:--|[A-Z]{2}))\\d{2}/[0-9A-F]{8}(?:-[0-9A-F]{4}){3}-[0-9A-F]{12}\\.png$"
It works like a charm.
You may use this regex for your cases:
^(?:--[A-Z]{2}|[A-Z]{2}(?:--|[A-Z]{2}))\d{2}/[0-9A-F]{8}(?:-[0-9A-F]{4}){3}-[0-9A-F]{12}\.png$
RegEx Demo
Details about first part:
^: Start
(?:: Start non-capture group
--[A-Z]{2}: Match -- followed by 2 letters
|: OR
[A-Z]{2}: Match 2 letters
(?:--|[A-Z]{2}): Match -- OR 2 letters
): End non-capture group
btw (?:...) is non-capture group.
Your [-A-Z]{4} matches any four occurrences of an uppercase ASCII letter or -, so it can also match ----, A---, ---B, -B--, etc.
You want to make sure that if there are hyphens, they come after or before two letters:
(?:[A-Z]{2}--|--[A-Z]{2}|[A-Z]{4})
It means:
(?: - start of a non-capturing group:
[A-Z]{2}-- - two uppercase ASCII letters and then --
| - or
--[A-Z]{2} - -- and then any two uppercase ASCII letters
| - or
[A-Z]{4} - any four uppercase ASCII letters
) - end of the non-capturing group.
The full pattern:
(?:[A-Z]{2}--|--[A-Z]{2}|[A-Z]{4})\d{2}/[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}\.png
To force the entire string match, add ^ (start of string) and $ (end of string) anchors:
^(?:[A-Z]{2}--|--[A-Z]{2}|[A-Z]{4})\d{2}/[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}\.png$
See the regex demo
Note the . matches any char, to match a literal dot, you should escape it.

Regex to validate custom format

I have this format: xx:xx:xx or xx:xx:xx-y, where x can be 0-9 a-f A-F and y can be only 0 or 1.
I come up with this regex: ([0-9A-Fa-f]{2}[:][0-9A-Fa-f]{2}[:][0-9A-Fa-f]{2}|[-][0-1]{1})
(See regexr).
But this matches 0a:0b:0c-3 too, which is not expected.
Is there any way to remove these cases from result?
[:] means a character from the list that contains only :. It is the same as
:. The same for [-] which has the same result as -.
Also, {1} means "the previous piece exactly one time". It does not have any effect, you can remove it altogether.
To match xx:xx:xx or xx:xx:xx-y, the part that matches -y must be optional. The quantifier ? after the optional part mark it as optional.
All in all, your regex should be like this:
[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}(-[01])?
If the regex engine you use can be told to ignore the character case then you can get rid of A-F (or a-f) from all character classes and the regex becomes:
[0-9a-f]{2}:[0-9a-f]{2}:[0-9a-f]{2}(-[01])?
How it works, piece by piece:
[0-9a-f] # any digit or letter from (and including) 'a' to 'f'
{2} # the previous piece exactly 2 times
: # the character ':'
[0-9a-f]
{2}
:
[0-9a-f]
{2}
( # start a group; it does not match anything
- # the character '-'
[01] # any character from the class (i.e. '0' or '1')
) # end of group; the group is needed for the next quantifier
? # the previous piece (i.e. the group) is optional
# it can appear zero or one times
See it in action: https://regexr.com/4rfvr
Update
As #the-fourth-bird mentions in a comment, if the regex must match the entire string then you need to anchor its ends:
^[0-9a-f]{2}:[0-9a-f]{2}:[0-9a-f]{2}(-[01])?$
^ as the first character of a regex matches the beginning of the string, $ as the last character matches the end of the string. This way the regex matches the entire string only (when there aren't other characters before or after the xx:xx:xx or xx:xx:xx-y part).
If you use the regex to find xx:xx:xx or xx:xx:xx-y in a larger string then you don't need to add ^ and $. Of course, you can add only ^ or $ to let the regex match only at the beginning or at the end of the string.
You want
xx:xx:xx or if it is followed by a -, then it must be a 0 or 1 and then it is the end (word boundry).
So you don't want any of these
0a:0b:0c-123
0a:0b:0cd
10a:0b:0c
either.
Then you want "negative lookingahead", so if you match the first part, you don't want it to be followed by a - (the first pattern) and it should end there (word boundary), and if it is followed by a -, then it must be a 0 or 1, and then a word boundary:
/\b([0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}(?!-)\b|\b[0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}-[01]\b)/i
To prevent any digit in front, a word boundary is added to the front as well.
Example: https://regexr.com/4rg42
The following almost worked:
/\b([0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}\b[^-]|\b[0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}-[01]\b)/i
but if it is the end of file and it is 3a:2b:11, then the [^-] will try to match a non - character and it won't match.
Example: https://regexr.com/4rg4q

java regex to capture any number of periods within a string

I am trying to match on any of the following:
$tag:parent.child$
$tag:grand.parent.child$
$tag:great.grand.parent.child$
I have tried a bunch of combos but not sure how to do this without an exp for each one: https://regex101.com/r/cMvx9I/1
\$tag:[a-z]*\.[a-z]*\$
I know this is wrong, but haven't been able to find the right method yet. Help is greatly appreciated.
Your regex was: \$tag:[a-z]*\.[a-z]*\$
You need a repeating group of .name, so use: \$tag:[a-z]+(?:\.[a-z]+)+\$
That assumes there has to be at least 2 names. If only one name is allowed, i.e. no period, then change last + to *.
You can use \$tag:(?:[a-z]+\.)*[a-z]+\$
\$ a literal $
tag: literal tag:
(?:...) a non-capturing group of:
[a-z]+ one or more lower-case letters and
\. a literal dot
* any number of the previous group (including zero of them)
[a-z]+ one or more lower-case letters
\$ a literal $
The following pattern will match any periods within a string:
\.
Not sure if this is what you want, but you can make a non-capturing group out of a pattern and then find that a certain number of times:
\$tag:(?:[a-z]+?\.*){1,4}\$
\$tag: - Literal $tag:
(?:[a-z]+?\.*) - Non-capturing group of any word character one or more times (shortest match) followed by an optional literal period
{1,4} - The capturing group appears anywhere between 1-4 times (you can change this as needed, or use a simple + if it could be any number of times).
\$ - Literal $
I normally prefer \w instead of [a-z] as it is equivalent to [a-zA-Z0-9_], but using this depends on what you are trying to find.
Hope this helps.

Regex for partial path

I have paths like these (single lines):
/
/abc
/def/
/ghi/jkl
/mno/pqr/
/stu/vwx/yz
/abc/def/ghi/jkl
I just need patterns that match up to the third "/". In other words, paths containing just "/" and up to the first 2 directories. However, some of my directories end with a "/" and some don't. So the result I want is:
/
/abc
/def/
/ghi/jkl
/mno/pqr/
/stu/vwx/
/abc/def/
So far, I've tried (\/|.*\/) but this doesn't get the path ending without a "/".
I would recommend this pattern:
/^(\/[^\/]+){0,2}\/?$/gm
DEMO
It works like this:
^ searches for the beginning of a line
(\/[^\/]+) searches for a path element
( starts a group
\/ searches for a slash
[^\/]+ searches for some non-slash characters
{0,2} says, that 0 to 2 of those path elements should be found
\/? allows trailling slashes
$ searches for the end of the line
Use these modifiers:
g to search for several matches within the input
m to treat every line as a separate input
You need a pattern like ^(\/\w+){0,2}\/?$, it checks that you have (/ and name) no more than 2 times and that it can end with /
Details :
^ : beginning of the string
(\/\w+) : slash (escaped) and word-char, all in a group
{0,2} the group can be 0/1/2 times
\/? : slash (escaped) can be 0 or 1 time
Online DEMO
Regex DEMO
Your regex (\/|.*\/) uses an alternation which matches either a forward slash or any characters 0+ times greedy followed by matching a forward slash.
So in for example /ghi/jkl, the first match will be the first forward slash. Then this part .* of the next pattern will match from the first g until the end of the string. The engine will backtrack to last forward slash to fullfill the whole .*\/ pattern.
The trailing jkl can not be matched anymore by neither patterns of the alternation.
Note that you don't have to escape the forward slash.
You could use:
^/(?:\w+/?){0,2}$
In Java:
String regex = "^/(?:\\w+/?){0,2}$";
Regex demo
Explanation
^ Start of the string
/ Match forward slash
(?: Non capturing group
\w+ Match 1+ word characters (If you want to match more than \w you could use a character class and add to that what you want match)
/? Match optional forward slash
){0,2} Close non capturing group and repeat 0 - 2 times
$ End of the string
^(/([^/]+){0,2}\/?)$
To break it down
^ is the start of the string
{0,2} means repeat the previous between 0 and 2 times.
Then it ends with an optional slash by using a ?
String end is $ so it doesn't match longer strings.
() Around the whole thing to capture it.
But I'll point out that the is almost always the wrong answer for directory matching. Some directories have special meaning, like /../.. which actually goes up two directories, not down. Better to use the systems directory API instead for more robust results.

Restrict consecutive characters using Java Regex

I need to allow alphanumeric characters , "?","." , "/" and "-" in the given string. But I need to restrict consecutive - only.
For example:
www.google.com/flights-usa should be valid
www.google.com/flights--usa should be invalid
currently I'm using ^[a-zA-Z0-9\\/\\.\\?\\_\\-]+$.
Please suggest me how to restrict consecutive - only.
You may use grouping with quantifiers:
^[a-zA-Z0-9/.?_]+(?:-[a-zA-Z0-9/.?_]+)*$
See the regex demo
Details:
^ - start of string
[a-zA-Z0-9/.?_]+ - 1 or more characters from the set defined in the character class (can be replaced with [\w/.?]+)
(?:-[a-zA-Z0-9/.?_]+)* - zero or more sequences ((?:...)*) of:
- - hyphen
[a-zA-Z0-9/.?_]+ - see above
$ - end of string.
Or use a negative lookahead:
^(?!.*--)[a-zA-Z0-9/.?_-]+$
^^^^^^^^^
See the demo here
Details:
^ - start of string
(?!.*--) - a negative lookahead that will fail the match once the regex engine finds a -- substring after any 0+ chars other than a newline
[a-zA-Z0-9/.?_-]+ - 1 or more chars from the set defined in the character class
$ - end of string.
Note that [a-zA-Z0-9_] = \w if you do not use the Pattern.UNICODE_CHARACTER_CLASS flag. So, the first would look like "^[\\w/.?]+(?:-[\\w/.?]+)*$" and the second as "^(?!.*--)[\\w/.?-]+$".
One approach is to restrict multiple dashes with negative look-behind on a dash, like this:
^(?:[a-zA-Z0-9\/\.\?\_]|(?<!-)-)+$
The right side of the |, i.e. (?<!-)-, means "a dash, unless preceded by another dash".
Demo.
I'm not sure of the efficiency of this, but I believe this should work.
^([a-zA-Z0-9\/\.\?\_]|\-([^\-]|$))+$
For each character, this regex checks if it can match [a-zA-Z0-9\/\.\?\_], which is everything you included in your regex except the hyphen. If that does not match, it instead tries to match \-([^\-]|$), which matches a hyphen not followed by another hyphen, or a hyphen at the end of the string.
Here's a demo.

Categories

Resources