I'm trying to match such that all the characters after the last / and before . gets matched. My current challenge is that . is only sometimes present.
I have an example here: https://regex101.com/r/ThWZwX/3
Where I'm hoping to match the 'match' text in both scenarios.
Thanks,
You can use negated character class in a capture group without any need of a lookahead:
.*\/([^.]*)
RegEx Demo
We use .*\/ to match last / by using a greedy match of .* and then we use negated character class [^.]* to match until we get a dot or everything if dot is not found.
Also note that we use ([^.]*) to capture this match.
No need for anything complex
/([^/.]*)(?!.*/)
https://regex101.com/r/KHyXcJ/1
Related
I have come up with a regex pattern to match a part of a Json value. But only PRCE engine is supporting this. I want to know the Java equalent of this regex.
Simplified version
cif:\K.*(?=(.+?){4})
Matches part of the value, leaving the last 4 characters.
cif:test1234
Matched value will be test
https://regex101.com/r/xV4ZNa/1
Note: I can only define the regex and the replace text. I don't have access to the Java code since it's handle by a propriotery log masking framework.
You can write simplify the pattern to:
(?<=cif:).*(?=....)
Explanation
(?<=cif:) Positive lookbehind, assert cif: to the left
.* Match 0+ times any character without newlines
(?=....) Positive lookahead, assert 4 characters (which can include spaces)
See a regex demo.
If you don't want to match empty strings, then you can use .+ instead
(?<=cif:).+(?=....)
You can use a lookbehind assertion instead:
(?<=cif:).*(?=(.+?){4})
Demo: https://regex101.com/r/xV4ZNa/3
To preface, I am a beginner with regex. I have a string that looks something like:
my_folder/foo.xml::someextracontent
my_folder/foo.xml::someextracontent
another_folder/foo.xml::someextracontent
my_folder/bar.xml::someextracontent
my_folder/bar.xml::someextracontent
my_folder/hello.xml::someextracontent
I want to return unique XML files which are part of my_folder. So the regex will return:
my_folder/foo.xml
my_folder/bar.xml
my_folder/hello.xml
I've taken a look at Extract All Unique Lines which is close to what I need but I am not sure where to go from there.
The closest attempt I got was (?sm)(my_folder\/.*?.xml)(?=.*\1) which gets all the duplicates but I want the opposite, so I tried doing a negative lookahead instead (?sm)(my_folder\/.*?.xml)(?!.*\1) but the capture groups are totally wrong.
What am I missing here in my regex? Here's link to the regex: https://regex101.com/r/ggY2RB/1
This RegEx might help you to find the unique strings that you might be looking for:
/(\w+\/\w+\.xml)(?![\s\S]*\1)/s
If you only wish to match my_folder, you might try this:
/(\my_folder\/\w+\.xml)(?![\s\S]*\1)/s
Instead of using a positive lookahead (?=, to get the unique strings you could use a negative lookahead (?! to assert what is on the right is not what you have captured in group 1.
In your pattern you are using making the dot match a newline using (?s)and use a non greedy dot start .*? but you might also use a negated character class matching not a newline or a forward slash.
If the folder can also contain nested folders, you might use a pattern that repeats 0+ times 1+ whitespace chars followed by a forward slash.
(?s)(my_folder/(?:[^/\n]+/)*[^/\n]+\.xml)::(?!.*\1)
(?s)
( Capture group
my_folder/ Match literally
(?:[^/\n]+/)* Repeat 0+ times not a forward slash or a newline followed by a forward slash
[^/\n]+\.xml Match 1+ ot a forward slash or a newline followed by .xml
) Close capture group
::(?!.*\1) Match :: followed by asserting what is on the right does not contain what is captured in group 1
In Java
String regex = "(?s)(my_folder/(?:[^/\\n]+/)*[^/\\n]+\\.xml)::(?!.*\\1)";
Regex demo | Java demo
Want to match the character at position 7 to either be - or an Uppercase letter
This is what I have ^.{6}[-(A-Z)]
Though this matches the first 7 characters, it doesn't match the whole string. Any help appreciated.
I am using Java and wanting .matches() to return true for this String
Though this matches the first 7 characters, it doesn't match the whole string.
That's the right explanation of what is going on. You can skip over the rest of the string by adding .* at the end. Additionally, the ^ anchor at the front of the expression is implied, so you can drop it for a pattern of
.{6}[A-Z-].*
As mentioned You can use .* to match anything after your specific character so use
^.{6}[-A-Z].*
and also no need of () if you don't want to capture that specific character
I am coming from this question. Now what I want is the exact opposite.
I want to match all chracters except this pattern:
yearid="[0-9]+"
Why do I do that please?
I have tried (?!yearid="[0-9]+") but it refuses to match match.
There are actually two ways to do this. You can use [^0-9]+ where the ^ negates the term inside the brackets, or \D+ where \D is any non-digit character.
re.sub(r'yearid="[0-9]+"', '', string_to_fix)
Capture the group like normal, then substitute nothing for it, and return the complete string.
Or, if you want to go the hard way and negate it:
re.sub(r'(.*?)(?:yearid="[0-9]+")(.*)', '\1\2', string_to_fix)
This first matches everything lazily (.*?), until it finds the yearid="XXXX", matches that as a noncapturing group (?:yearid="[0-9]+"), then matches everything else (.*). Finally, it replaces the original full string with just the 1st and 2nd capture groups, essentially cutting out the section you want.
I have a state machine which is capable of matching the comments. So it can handle :
/* /* */ */
But I bogged down of skipping the contents that are inside the comment lines. Currently my comments-word regex looks something strange :
[0-9A-Za-zA-Z0-9\*\(\*\*\)\.\{\}\_\;\,\-\:" "\#]*
Are there any simple regex ( in java ) which matches all the characters? Alphabets along with special characters?
Thanks for the help.
use . (dot) if you want to match any character.
See here: Dot
. matches anything once. .* will match 0 or more of anything, while .+ will match one or more, depending on your needs.
. is the character that matches all other characters, with the possible exception of newlines (depending on whether DOTALL is enabled).
If you want to match everything EXCEPT a certain character or two, use [^...] syntax (such as [^0-9a-fA-F] to avoid matching every hexadecimal digit).
It is often useful to add a trailing ? to expressions with a dot, to match the fewest characters as possible (such as .*? or .+?). Otherwise, an unterminated dot expression may match the rest of the string.