regex expression to remove eed from string

regex expression to remove eed from string - java

I am trying to replace 'eed' and 'eedly' with 'ee' from words where there is a vowel before either term ('eed' or 'eedly') appears.
So for example, the word indeed would become indee because there is a vowel ('i') that happens before the 'eed'. On the other hand the word 'feed' would not change because there is no vowel before the suffix 'eed'.
I have this regex: (?i)([aeiou]([aeiou])*[e{2}][d]|[dly]\\b)
You can see what is happening with this here.
As you can see, this is correctly identifying words that end with 'eed', but it is not correctly identifying 'eedly'.
Also, when it does the replace, it is replacing all words that end with 'eed' , even words like feed which it should not remove the eed
What should I be considering here in order to make it correctly identify the words based on the rules I specified?

You can use:
str = str.replaceAll("(?i)\\b(\\w*?[aeiou]\\w*)eed(?:ly)?", "$1ee");
Updated RegEx Demo
\\b(\\w*?[aeiou]\\w*) before eed or eedly makes sure there is at least one vowel in the same word before this.
To expedite this regex you can use negated expression regex:
\\b([^\\Waeiou]*[aeiou]\\w*)eed(?:ly)?
RegEx Breakup:
\\b # word boundary
( # start captured group #`
[^\\Waeiou]* # match 0 or more of non-vowel and non-word characters
[aeiou] # match one vowel
\\w* # followed by 0 or more word characters
) # end captured group #`
eed # followed by literal "eed"
(?: # start non-capturing group
ly # match literal "ly"
)? # end non-capturing group, ? makes it optional
Replacement is:
"$1ee" which means back reference to captured group #1 followed by "ee"

find dly before finding d. otherwise your regex evaluation stops after finding eed.
(?i)([aeiou]([aeiou])*[e{2}](dly|d))

Related

Regex to validate custom format

I have this format: xx:xx:xx or xx:xx:xx-y, where x can be 0-9 a-f A-F and y can be only 0 or 1.
I come up with this regex: ([0-9A-Fa-f]{2}[:][0-9A-Fa-f]{2}[:][0-9A-Fa-f]{2}|[-][0-1]{1})
(See regexr).
But this matches 0a:0b:0c-3 too, which is not expected.
Is there any way to remove these cases from result?

[:] means a character from the list that contains only :. It is the same as
:. The same for [-] which has the same result as -.
Also, {1} means "the previous piece exactly one time". It does not have any effect, you can remove it altogether.
To match xx:xx:xx or xx:xx:xx-y, the part that matches -y must be optional. The quantifier ? after the optional part mark it as optional.
All in all, your regex should be like this:
[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}(-[01])?
If the regex engine you use can be told to ignore the character case then you can get rid of A-F (or a-f) from all character classes and the regex becomes:
[0-9a-f]{2}:[0-9a-f]{2}:[0-9a-f]{2}(-[01])?
How it works, piece by piece:
[0-9a-f] # any digit or letter from (and including) 'a' to 'f'
{2} # the previous piece exactly 2 times
: # the character ':'
[0-9a-f]
{2}
:
[0-9a-f]
{2}
( # start a group; it does not match anything
- # the character '-'
[01] # any character from the class (i.e. '0' or '1')
) # end of group; the group is needed for the next quantifier
? # the previous piece (i.e. the group) is optional
# it can appear zero or one times
See it in action: https://regexr.com/4rfvr
Update
As #the-fourth-bird mentions in a comment, if the regex must match the entire string then you need to anchor its ends:
^[0-9a-f]{2}:[0-9a-f]{2}:[0-9a-f]{2}(-[01])?$
^ as the first character of a regex matches the beginning of the string, $ as the last character matches the end of the string. This way the regex matches the entire string only (when there aren't other characters before or after the xx:xx:xx or xx:xx:xx-y part).
If you use the regex to find xx:xx:xx or xx:xx:xx-y in a larger string then you don't need to add ^ and $. Of course, you can add only ^ or $ to let the regex match only at the beginning or at the end of the string.

You want
xx:xx:xx or if it is followed by a -, then it must be a 0 or 1 and then it is the end (word boundry).
So you don't want any of these
0a:0b:0c-123
0a:0b:0cd
10a:0b:0c
either.
Then you want "negative lookingahead", so if you match the first part, you don't want it to be followed by a - (the first pattern) and it should end there (word boundary), and if it is followed by a -, then it must be a 0 or 1, and then a word boundary:
/\b([0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}(?!-)\b|\b[0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}-[01]\b)/i
To prevent any digit in front, a word boundary is added to the front as well.
Example: https://regexr.com/4rg42
The following almost worked:
/\b([0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}\b[^-]|\b[0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}-[01]\b)/i
but if it is the end of file and it is 3a:2b:11, then the [^-] will try to match a non - character and it won't match.
Example: https://regexr.com/4rg4q

Java Regular Expression (greedy/nongreedy)

So I'm trying to separate the following two groups formatted as:
FIRST - GrouP second.group.txt
The first group can contain any character
The second group is a dot(.) delimited string.
I'm using the following regex to separate these two groups:
([A-Z].+).*?([a-z]+\.[a-z]+)
However, it gives a wrong result:
1: FIRST - GrouP second.grou
2: p.txt
I don't understand because I'm using "nongreedy" separater (.*?) instead of the greedy one (. *)
What am I doing wrong here?
Thanks

You can this regex to match both groups:
\b([A-Z].+?)\s*\b([a-z]+(?:\.[a-z]+)+)\b
RegEx Demo
Breakup:
\b # word boundary
([A-Z].+?) # match [A-Z] followed by 1 or more chars (lazy)
\s* # match 0 or more spaces
\b # word boundary
([a-z]+ # match 1 or more of [a-z] chars
(?:\.[a-z]+)+) # match a group of dot followed by 1 or more [a-z] chars
\b # word boundary
PS: (?:..) is used for non-capturing group.

This is one possible solution that should be pretty compact:
(.*?-\s*\S+)|(\S+\.?)+
https://regex101.com/r/iW8mE5/1
It is looking for anything followed by a dash, zero or more spaces, and then non-whitespace characters. And if it doesn't find that, it looks for non-whitespace followed by an optional decimal.

Regex in java to check for double consonant

I want to write a regex in Java to check if a string ends in double consonant.
My regex is not working.
\\w+[^aeiou]\\1$
Appreciate your help
Thanks a ton.

It doesn't work since \1 references a non-existent subpattern. You need to assign a capturing group. Capturing groups could be used later on in the regular expression as a backreference to what was matched in that captured group.
\\w+([^aeiou])\\1$
Based off the comment above about your regular expression not only matching double consonants, I would consider combining an intersection with negation to make sure the grouped character is an actual letter character.
(?i)\\w+([a-z&&[^aeiou]])\\1$

This might work.
# "(?i)\\w+(?:(?![aeiou])[a-z]){2}$"
(?i) # Case independent
\w+
(?:
(?! [aeiou] ) # Not a vowel ahead
[a-z] # Consonant only
){2}
$

Simple regex to match strings containing <n> chars

I'm writing this regexp as i need a method to find strings that does not have n dots,
I though that negative look ahead would be the best choice, so far my regexp is:
"^(?!\\.{3})$"
The way i read this is, between start and end of the string, there can be more or less then 3 dots but not 3.
Surprisingly for me this is not matching hello.here.im.greetings
Which instead i would expect to match.
I'm writing in Java so its a Perl like flavor, i'm not escaping the curly braces as its not needed in Java
Any advice?

You're on the right track:
"^(?!(?:[^.]*\\.){3}[^.]*$)"
will work as expected.
Your regex means
^ # Match the start of the string
(?!\\.{3}) # Make sure that there aren't three dots at the current position
$ # Match the end of the string
so it could only ever match the empty string.
My regex means:
^ # Match the start of the string
(?! # Make sure it's impossible to match...
(?: # the following:
[^.]* # any number of characters except dots
\\. # followed by a dot
){3} # exactly three times.
[^.]* # Now match only non-dot characters
$ # until the end of the string.
) # End of lookahead
Use it as follows:
Pattern regex = Pattern.compile("^(?!(?:[^.]*\\.){3}[^.]*$)");
Matcher regexMatcher = regex.matcher(subjectString);
foundMatch = regexMatcher.find();

Your regular expression only matches 'not' three consecutive dots. Your example seems to show you want to 'not' match 3 dots anywhere in the sentence.
Try this: ^(?!(?:.*\\.){3})
Demo+explanation: http://regex101.com/r/bS0qW1
Check out Tims answer instead.

Regex email address validation

Can someone please explain this java Regex to me?
^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|in|aero|jobs|museum)$
This regex is used to validate an email address.

Validating email addresses is now considered bad practice (stop validating email addresses with regex), especially with such expression as in your question. For example here's a more complete expression.
As for this expression let's break it in parts:
Beginning of the matched string
^
Matches at least one character from the list
[a-z0-9!#$%&'*+/=?^_`{|}~-]+
Non-capturing (see backreference) group which can be repeated 0..n times, that matches a . and then at least one character from the list.
(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
Just this character
#
Non-capturing group matching one character in this list [a-z0-9] and then possibly more characters from the following lists. Matched string must start and end with [a-z0-9] and inside it can have [a-z0-9-].
(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+
Non-capturing group that matches 2 uppercase letters or one of the words.
(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|in|aero|jobs|museum)
End of the string.
$

^ # Beginning of the line
[a-z0-9!#$%&'*+/=?^_`{|}~-]+ # One or more (+) characters from the
bracket expression, i.e., letters [a-z],
numbers [0-9], !, $, %, et cetera
(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)* # Zero or more (*) of the above
expression, preceded by a dot \\.
# # Literal #
(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+ # A digit or a letter, followed by
optional digits, letters, or dashes,
followed by a a dot
(?:[A-Z]{2}|com|org|net...) # Country code ([A-Z]{2}), or a top level
domain, such as com, org, net.
$ # End of the line
Using a concrete example, john#foo.com. The first part of the e-mail, john, will be matched by ^[a-z0-9!#$%&'*+/=?^_{|}~-]+. The # will be matched by, well, #. The domain foo, as well as the dot, is matched by (?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+. Finally, the TLD com is matched by the alternation (?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|in|aero|jobs|museum).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

regex expression to remove eed from string - java

find dly before finding d. otherwise your regex evaluation stops after finding eed. (?i)([aeiou]([aeiou])*[e{2}](dly|d))

Related

Regex to validate custom format

Java Regular Expression (greedy/nongreedy)

Regex in java to check for double consonant

Simple regex to match strings containing <n> chars

Regex email address validation

Categories

Resources