How to match and exclude "!x" with regex? - java

I have been trying to come up with a regex for Java to match a bot command:
!x play search words here
where the x can be any alphanumeric character and it works with:
"(?:\\w)(\\w+)"
However if I want to use alias "p" for "play", the regex will skip the "p" also. I've been also trying to get the skip match to work with exclamation mark without success.
One workaround I found was to use:
"[^\\!\\w]+(\\w+)"
but then the first match is " p" with whitespace. I just can't figure this out!

To avoid matching words preceded with !, you may use
"\\b(?<!!)\\w+"
See the regex demo
Details:
\b - word boundary
(?<!!) - a negative lookbehind making sure there cannot be ! right before the current position
\w+ - 1 or more word chars.
Note that lookbehinds are zero-width assertions, they just signal the regex engine whether to go on matching or stop (the text matched does not get added to the current matched text).

Related

Regex Match Reset \K Equalent In Java

I have come up with a regex pattern to match a part of a Json value. But only PRCE engine is supporting this. I want to know the Java equalent of this regex.
Simplified version
cif:\K.*(?=(.+?){4})
Matches part of the value, leaving the last 4 characters.
cif:test1234
Matched value will be test
https://regex101.com/r/xV4ZNa/1
Note: I can only define the regex and the replace text. I don't have access to the Java code since it's handle by a propriotery log masking framework.
You can write simplify the pattern to:
(?<=cif:).*(?=....)
Explanation
(?<=cif:) Positive lookbehind, assert cif: to the left
.* Match 0+ times any character without newlines
(?=....) Positive lookahead, assert 4 characters (which can include spaces)
See a regex demo.
If you don't want to match empty strings, then you can use .+ instead
(?<=cif:).+(?=....)
You can use a lookbehind assertion instead:
(?<=cif:).*(?=(.+?){4})
Demo: https://regex101.com/r/xV4ZNa/3

Regex pattern matching with multiple strings

Forgive me. I am not familiarized much with Regex patterns.
I have created a regex pattern as below.
String regex = Pattern.quote(value) + ", [NnoneOoff0-9\\-\\+\\/]+|[NnoneOoff0-9\\-\\+\\/]+, "
+ Pattern.quote(value);
This regex pattern is failing with 2 different set of strings.
value = "207e/160";
Use Case 1 -
When channelStr = "207e/160, 149/80"
Then channelStr.matches(regex), returns "true".
Use Case 2 -
When channelStr = "207e/160, 149/80, 11"
Then channelStr.matches(regex), returns "false".
Not able to figure out why? As far I can understand it may be because of the multiple spaces involved when more than 2 strings are present with separated by comma.
Not sure what should be correct pattern I should write for more than 2 strings.
Any help will be appreciated.
If you print your pattern, it is:
\Q207e/160\E, [NnoneOoff0-9\-\+\/]+|[NnoneOoff0-9\-\+\/]+, \Q207e/160\E
It consists of an alternation | matching a mandatory comma as well on the left as on the right side.
Using matches(), should match the whole string and that is the case for 207e/160, 149/80 so that is a match.
Only for this string 207e/160, 149/80, 11 there are 2 comma's, so you do get a partial match for the first part of the string, but you don't match the whole string so matches() returns false.
See the matches in this regex demo.
To match all the values, you can use a repeating pattern:
^[NnoeOf0-9+/-]+(?:,\h*[NnoeOf0-90+/-]+)*$
^ Start of string
[NnoeOf0-9\\+/-]+
(?: Non capture group
,\h* Match a comma and optional horizontal whitespace chars
[NnoeOf0-90-9\\+/-]+ Match 1+ any of the listed in the character class
)* Close the non capture group and optionally repeat it (if there should be at least 1 comma, then the quantifier can be + instead of *)
$ End of string
Regex demo
Example using matches():
String channelStr1 = "207e/160, 149/80";
String channelStr2 = "207e/160, 149/80, 11";
String regex = "^[NnoeOf0-9+/-]+(?:,\\h*[NnoeOf0-90+/-]+)*$";
System.out.println(channelStr1.matches(regex));
System.out.println(channelStr2.matches(regex));
Output
true
true
Note that in the character class you can put - at the end not having to escape it, and the + and / also does not have to be escaped.
You can use regex101 to test your RegEx. it has a description of everything that's going on to help with debugging. They have a quick reference section bottom right that you can use to figure out what you can do with examples and stuff.
A few things, you can add literals with \, so \" for a literal double quote.
If you want the pattern to be one or more of something, you would use +. These are called quantifiers and can be applied to groups, tokens, etc. The token for a whitespace character is \s. So, one or more whitespace characters would be \s+.
It's difficult to tell exactly what you're trying to do, but hopefully pointing you to regex101 will help. If you want to provide examples of the current RegEx you have, what you want to match and then the strings you're using to test it I'll be happy to provide you with an example.
^(?:[NnoneOoff0-9\\-\\+\\/]+ *(?:, *(?!$)|$))+$
^ Start
(?: ... ) Non-capturing group that defines an item and its separator. After each item, except the last, the separator (,) must appear. Spaces (one, several, or none) can appear before and after the comma, which is specified with *. This group can appear one or more times to the end of the string, as specified by the + quantifier after the group's closing parenthesis.
Regex101 Test

Get all unique file names

To preface, I am a beginner with regex. I have a string that looks something like:
my_folder/foo.xml::someextracontent
my_folder/foo.xml::someextracontent
another_folder/foo.xml::someextracontent
my_folder/bar.xml::someextracontent
my_folder/bar.xml::someextracontent
my_folder/hello.xml::someextracontent
I want to return unique XML files which are part of my_folder. So the regex will return:
my_folder/foo.xml
my_folder/bar.xml
my_folder/hello.xml
I've taken a look at Extract All Unique Lines which is close to what I need but I am not sure where to go from there.
The closest attempt I got was (?sm)(my_folder\/.*?.xml)(?=.*\1) which gets all the duplicates but I want the opposite, so I tried doing a negative lookahead instead (?sm)(my_folder\/.*?.xml)(?!.*\1) but the capture groups are totally wrong.
What am I missing here in my regex? Here's link to the regex: https://regex101.com/r/ggY2RB/1
This RegEx might help you to find the unique strings that you might be looking for:
/(\w+\/\w+\.xml)(?![\s\S]*\1)/s
If you only wish to match my_folder, you might try this:
/(\my_folder\/\w+\.xml)(?![\s\S]*\1)/s
Instead of using a positive lookahead (?=, to get the unique strings you could use a negative lookahead (?! to assert what is on the right is not what you have captured in group 1.
In your pattern you are using making the dot match a newline using (?s)and use a non greedy dot start .*? but you might also use a negated character class matching not a newline or a forward slash.
If the folder can also contain nested folders, you might use a pattern that repeats 0+ times 1+ whitespace chars followed by a forward slash.
(?s)(my_folder/(?:[^/\n]+/)*[^/\n]+\.xml)::(?!.*\1)
(?s)
( Capture group
my_folder/ Match literally
(?:[^/\n]+/)* Repeat 0+ times not a forward slash or a newline followed by a forward slash
[^/\n]+\.xml Match 1+ ot a forward slash or a newline followed by .xml
) Close capture group
::(?!.*\1) Match :: followed by asserting what is on the right does not contain what is captured in group 1
In Java
String regex = "(?s)(my_folder/(?:[^/\\n]+/)*[^/\\n]+\\.xml)::(?!.*\\1)";
Regex demo | Java demo

How to match regex pattern on single line only?

I have the following regex and sample input:
http://regex101.com/r/xK9dE3
As you can see it matching the first "yo". I only want the pattern to match on the same line (the second "yo") pattern with "cut me".
How can I make sure that the regex match is only on the same line?
Output:
Hi
Expected Output (this is what I really want):
Hi
yo keep this here
Keep this here
You can use this regex with s (DOTALL) regex flag:
^.*?(?=yo\b[^\n]*cut me:)
Online Demo: http://regex101.com/r/oV3eP7
yo\b[^\n]*cut me: is lookahead pattern that makes sure that yo with word boundary and cut me: are matched in the same line.
Remove the s or DOTALL flag and change your regex to the following:
^.*?((\yo\b.*?(cut me:)[\s\S]*))
With the DOTALL flag enabled . will match newline characters, so your match can span multiple lines including lines before yo or between yo and cut me. By removing this flag you can ensure that you only match the line with both yo and cut me, and then change the .* at the end to [\s\S]* which will match any character including newlines so that you can match to the end of the string.
http://regex101.com/r/sX2kL0
edit: Note that this takes a slightly different approach than the other answer, this will match the portion of the string that you want deleted so you can replace this portion with an empty string to remove it.

Stop regular expression from matching across lines

I have a regular expression,
end\\s+[a-zA-Z]{1}[a-zA-Z_0-9]
which is supposed to match a line with the specifications
end abcdef123
where abcdef123 must start with a letter and subsequent alphanumeric characters.
However currently it is also matching this
foobar barfooend
bar fred bob
It's picking up that end at the end of barfooend and also picking up bar in effect returning end bar as a legitimate result.
I tried
^end\\s+[a-zA-Z]{1}[a-zA-Z_0-9]
but that doesn't seem to work at all. It ends up matching nothing.
It should be fairly simple but I can't seem to nut it out.
\s includes also newline characters. So you either need to specify a character class that has only the wanted whitespace charaters or exclude the not wanted.
Use instead of \\s+ one of those:
[^\\S\r\n] this includes all whitespace but not \r and \n. See end[^\S\r\n]+[a-zA-Z][a-zA-Z_0-9]+ here on Regexr
[ \t] this includes only space and tab. See end[ \t]+[a-zA-Z][a-zA-Z_0-9]+ here on Regexr
You can use \b (word boundary detection) to check a word boundary. In our case we will use it to match the beginning of the word end. It can also be used to match the end of a word.
As #nhahtdh stated in his comment the {1} is redundant as [a-zA-Z] already matches one letter in the given range.
Also your regex does not do what you want because it only matches one alphanumeric character after the first letter. Add a + at the end (for one or more times) or * (for zero or more times).
This should work:
"\\bend\\s+[a-zA-Z]{1}[a-zA-Z_0-9]*"
Edit : I think \b is better than ^ because the latter only matches the beginning of a line.
For example take this input : "end azd123 end bfg456" There will be only one match for ^ when \b will help matching both.
Try the regular expression:
end[ ]+[a-zA-Z]\w+
\w is a word character: [a-zA-Z_0-9]

Categories

Resources