Simple regex to match strings containing <n> chars - java

I'm writing this regexp as i need a method to find strings that does not have n dots,
I though that negative look ahead would be the best choice, so far my regexp is:
"^(?!\\.{3})$"
The way i read this is, between start and end of the string, there can be more or less then 3 dots but not 3.
Surprisingly for me this is not matching hello.here.im.greetings
Which instead i would expect to match.
I'm writing in Java so its a Perl like flavor, i'm not escaping the curly braces as its not needed in Java
Any advice?

You're on the right track:
"^(?!(?:[^.]*\\.){3}[^.]*$)"
will work as expected.
Your regex means
^ # Match the start of the string
(?!\\.{3}) # Make sure that there aren't three dots at the current position
$ # Match the end of the string
so it could only ever match the empty string.
My regex means:
^ # Match the start of the string
(?! # Make sure it's impossible to match...
(?: # the following:
[^.]* # any number of characters except dots
\\. # followed by a dot
){3} # exactly three times.
[^.]* # Now match only non-dot characters
$ # until the end of the string.
) # End of lookahead
Use it as follows:
Pattern regex = Pattern.compile("^(?!(?:[^.]*\\.){3}[^.]*$)");
Matcher regexMatcher = regex.matcher(subjectString);
foundMatch = regexMatcher.find();

Your regular expression only matches 'not' three consecutive dots. Your example seems to show you want to 'not' match 3 dots anywhere in the sentence.
Try this: ^(?!(?:.*\\.){3})
Demo+explanation: http://regex101.com/r/bS0qW1
Check out Tims answer instead.

Related

Regular expression to mask email except the three characters before the domain

I am trying to mask email address in the following different ways.
Mask all characters except first three and the ones follows the # symbol.
This expression works fine.
(?<=.{3}).(?=[^#]*?#)
abcdefgh#gmail.com -> abc*****#gmail.com
Mask all characters except last three before # symbol.
Example : abcdefgh#gmail.com -> *****fgh#gmail.com
I am not sure how to check for # and do reverse match.
Can someone throw pointers on this?
Maybe you could do a positive lookahead:
.(?=.*...#)
See the online Demo
. - Any character other than newline.
(?=.*...#) - Positive lookahead for zero or more characters other than newline followed by three characters other than newline and #.
You could use a negated character class [^\s#] matching a non whitespace char except an #. Then assert what is on the right is that negated character class 3 times followed by matching the # sign.
In the replacement use *
[^\s#](?=[^#\s]*[^#\s]{3}#)
[^\s#] Negated character class, match a non whitespace char except #
(?= Positive lookahead, assert what is on the right is
[^#\s]* Match 0+ times a non whitespace char except #
[^#\s]{3} Match 3 times a non whitespace char except #
# Match the #
) Close lookahead
Regex demo
If there can be only a single # in the email address, you could for example make use of a finite quantifier in the positive lookbehind:
(?<=(?<!\S)[^\s#]{0,1000})[^\s#](?=[^#\s]*[^#\s]{3}#[^\s#]+\.[a-z]{2,}(?!\S))
Regex demo

extract set of lines from file based on pattern match

I have a file that contains thousands of tuples(set of three lines) as follows:
# dev2
SAMETEXT %{URI} ^dev2-00.XXX.XXX.XXX
SAMETEXT %{URI} ^/XXX/
DIFFTEXT ^/XXX/(.*) https://XXX-XXX-XXX-XXX-dev2.XXX.XXX.XXX.XXX.XXX/XXX/$1 [X,Y]
There are multiple sets of same kind with different data such as dev1, dev2, dev3. Now I want to get all lines in same manner as they are in the file except dev2. File have a random or mixed groups but all groups are tuple of same lines as mentioned above.
I tried to get it with the following pattern but it give all other tuples as well which lies inside this span.
Pattern dev2Pattern = Pattern.compile("dev2\\R.*dev2-00.*\\RRewriteRule.*dev2", Pattern.DOTALL);
However, my objective is NOT to get matched pattern in resulted file. Thankx in advance.
If you want to match all the lines after # dev except when it is # dev 2 you could use a negative lookahead to assert what is right after dev is not 2.
Then match all lines that do not start with # dev followed by a digit.
^# dev(?!2\b)[0-9]+(?:\R(?!# dev[0-9]).*)*
^ Start of string
# dev(?!2\b) Match # dev and assert what is directly on the right is not 2 and word boundary
[0-9]+ Match 1+ digits
(?: Non capturing grouop
\R Match unicode newline sequence
(?!# dev[0-9]) Assert what is directly to the right is not # dev and a digit
.* If that is the case, match 0+ times any char except a newline
)* Close group and repeat 0+ times
Regex demo | Java Demo
In java
String regex = "^# dev(?!2\\b)[0-9]+(?:\\R(?!# dev[0-9]).*)*";

regex expression to remove eed from string

I am trying to replace 'eed' and 'eedly' with 'ee' from words where there is a vowel before either term ('eed' or 'eedly') appears.
So for example, the word indeed would become indee because there is a vowel ('i') that happens before the 'eed'. On the other hand the word 'feed' would not change because there is no vowel before the suffix 'eed'.
I have this regex: (?i)([aeiou]([aeiou])*[e{2}][d]|[dly]\\b)
You can see what is happening with this here.
As you can see, this is correctly identifying words that end with 'eed', but it is not correctly identifying 'eedly'.
Also, when it does the replace, it is replacing all words that end with 'eed' , even words like feed which it should not remove the eed
What should I be considering here in order to make it correctly identify the words based on the rules I specified?
You can use:
str = str.replaceAll("(?i)\\b(\\w*?[aeiou]\\w*)eed(?:ly)?", "$1ee");
Updated RegEx Demo
\\b(\\w*?[aeiou]\\w*) before eed or eedly makes sure there is at least one vowel in the same word before this.
To expedite this regex you can use negated expression regex:
\\b([^\\Waeiou]*[aeiou]\\w*)eed(?:ly)?
RegEx Breakup:
\\b # word boundary
( # start captured group #`
[^\\Waeiou]* # match 0 or more of non-vowel and non-word characters
[aeiou] # match one vowel
\\w* # followed by 0 or more word characters
) # end captured group #`
eed # followed by literal "eed"
(?: # start non-capturing group
ly # match literal "ly"
)? # end non-capturing group, ? makes it optional
Replacement is:
"$1ee" which means back reference to captured group #1 followed by "ee"
find dly before finding d. otherwise your regex evaluation stops after finding eed.
(?i)([aeiou]([aeiou])*[e{2}](dly|d))

Java regex: Negative lookahead

I'm trying to craft two regular expressions that will match URIs. These URIs are of the format: /foo/someVariableData and /foo/someVariableData/bar/someOtherVariableData
I need two regexes. Each needs to match one but not the other.
The regexes I originally came up with are:
/foo/.+ and /foo/.+/bar/.+ respectively.
I think the second regex is fine. It will only match the second string. The first regex, however, matches both. So, I started playing around (for the first time) with negative lookahead. I designed the regex /foo/.+(?!bar) and set up the following code to test it
public static void main(String[] args) {
String shouldWork = "/foo/abc123doremi";
String shouldntWork = "/foo/abc123doremi/bar/def456fasola";
String regex = "/foo/.+(?!bar)";
System.out.println("ShouldWork: " + shouldWork.matches(regex));
System.out.println("ShouldntWork: " + shouldntWork.matches(regex));
}
And, of course, both of them resolve to true.
Anybody know what I'm doing wrong? I don't need to use Negative lookahead necessarily, I just need to solve the problem, and I think that negative lookahead might be one way to do it.
Thanks,
Try
String regex = "/foo/(?!.*bar).+";
or possibly
String regex = "/foo/(?!.*\\bbar\\b).+";
to avoid failures on paths like /foo/baz/crowbars which I assume you do want that regex to match.
Explanation: (without the double backslashes required by Java strings)
/foo/ # Match "/foo/"
(?! # Assert that it's impossible to match the following regex here:
.* # any number of characters
\b # followed by a word boundary
bar # followed by "bar"
\b # followed by a word boundary.
) # End of lookahead assertion
.+ # Match one or more characters
\b, the "word boundary anchor", matches the empty space between an alphanumeric character and a non-alphanumeric character (or between the start/end of the string and an alnum character). Therefore, it matches before the b or after the r in "bar", but it fails to match between w and b in "crowbar".
Protip: Take a look at http://www.regular-expressions.info - a great regex tutorial.

Regex for multiple lines

I am looking for a pattern for multiple lines
I am new to regex and heavily using them using in my project
I need to come up with a pattern that will match a few group of lines. The pattern should
match either these lines
* Source: Test *
* *
or
Ord. 429 Tckt. 1
or
Guest:
Yes, it is not clear. I got a pattern for the second line ( Ord. 429 Tckt. 1) which is:
[\s]+[\w]+[\.][\s]+[\d]+[\s]+[\w]+[\.][\s]+[\d]+
If you need one large regex to match all of these, the following should work if you have the Pattern.DOTALL and Pattern.MULTILINE flags set (see Rubular):
^\*[^\n]*\*$.*?^\*[^\n]*\*$|^\w+\.[ \t]+\d+[ \t]+\w+\.[ \t]+\d+$|^Guest:[^\n]*$
Here is a breakdown of the different sections (split by the |):
Your first group of lines:
^\*[^\n]*\*$.*?^\*[^\n]*\*$
---------------------------
^ # start of a line
\* # a literal '*'
[^\n]* # any number of non-newline characters
\* # a literal '*'
$ # end of a line
.*? # any number of characters, as few as possible (includes newlines)
^\*[^\n]*\*$ # repeat of the first six elements of pattern as described above
The second line portion (for lines like 'Ord. 429 Tckt. 1') is adapted from yours with some minor changes.
^\w+\.[ \t]+\d+[ \t]+\w+\.[ \t]+\d+$
As for the third, it should be pretty basic, start of a line followed by 'Guest:' and then any number of non-newline characters.
^Guest:[^\n]*$
Add the multi-line switch (?s) to the front of your regex:
(?s)[\s]+[\w]+[\.][\s]+[\d]+[\s]+[\w]+[\.][\s]+[\d]+
I'm assuming that you are using Java. You would be using java.util.Regex. You are probably looking for the Pattern.DOTALL flag on Pattern. This treats line terminators as a character that you can match with ..
Pattern.compile("^*\sSource: Test\s**\s*", Patther.DOTALL);
It depends on how strict you want to be, but the above will match the first line in the first snippet (including the line terminator).
If you need more help with the API or this is the wrong API, edit your question to be clearer.
Are you trying to match all three in a single regex? It can be done, but the patter will be a bit ugly. I can probably help with that too.
A decent regex tester page is: http://www.fileformat.info/tool/regex.htm. You can do a google search for something like regex java tester.
Just one last thing, the pattern at the bottom won't do what you want if I understand fully.
[\s]+ matches one or more spaces, so whitespace is required on the front. Also, you don't need the square brackets. They work, but are only needed for alternation. If you wanted to match either a or b but not both: [ab]. But, if you want to match just a, you just put a in your pattern.
\s+ one or more spaces
\w+ one or more word chars (no digits or punctuation,etc)
. period
\s+ some whitespace
\d+ some digits
\s+ some whitespace
\w some word chars
. period
\s+ some whitespace
\d+ a single digit
so,
\s+\w+\.\s+\d+\s+\w+\.\s+\d+
Are there supposed to be blank lines in between the Source: Test and the line with just the stars?
You are going to end up with something like this:
(?: # non-capturing group
\s*\* Source: Test\s+\* # first line of the of the first block
\s+\*\s+\* # second line, assuming that there is no space
# between lines or an arbitrary amout of whitespace
) # end of first group
| # or....
(?: # second group (non capturing)
\s+\w+\.\s+\d+\s+\w+\.\s+\d+ # what we discussed before for Org/Tckt
)
|
(?:\s+Guest:) # the last one is easy :)
You may or may not know this, but comments like I have up there can be put into your code via the Pattern.COMMENTS flag. Some people like that. I've also broken up the different groups into their own constant and then pasted them together when compiling the patter. I like that pretty well.
I hope all of this helps.

Categories

Resources