Regular expression for year with optional closing parenthesis - java

I am struggling to get the following regular expression (in Java) to work nicely. I want to see if a string has a year, and the strings can be
Mar 3, 2014
or sometimes with a closing parenthesis such as
Mar 3, 2014)
I am using
text.matches("\\b((19|20)\\d{2})(\\)?)\\b")
which works in most cases, but does not match if string ends at the parenthesis
If I use
text.matches("\\b((19|20)\\d{2})(\\)?)$")
it matches text that ends after the parenthesis but not a string that has another space
I thought that \b would include end of string, but cannot get it to work.
I know I can use two regex's but that seems really ugly.

Your main problem is that matches checks if entire string matches regex. What you want is to test if string contains substring which can be matched by regex. To do so use
Pattern p = Pattern.compile(yourRegex);
Matcher m = p.matcher(stringYouWantToTest);
if (m.find()){
//tested string contains part which can be matched by regex
}else{
//part which could be matched by regex couldn't be found
}
You can also surround your regex with .* to let it match characters beside part you wanted to find and use matches like you are doing now,
if(yourString.matches(".*"+yourRegex+".*"))
but this will have to iterate over entire string.
In other words you can try to find \\b(19|20)\\d{2}\\b using Pattern/Matcher or use something like matches(".*\\b(19|20)\\d{2}\\b.*").
BTW parenthesis ) are not included in \w class so \b will accept place between \w and ) as word boundary so for instance "9)" will match regex \d\b\).

Your question isn't very clear, but from what I understand, this should work for you:
text.matches("((?:19|20)(?:\\d){2})\\)?");
Demo: http://regex101.com/r/lO0aH4/3

You could try something like :
".*(19|20)[0-9]{2}\\)?$"
I'm not sure it could help you, it would better to give us a complete example of string to match. Must the string be ended by a year (with optional parenthesis) or may it be something else after ?

Related

Why does this pattern not match? ([\\\\A\\\\W]its[\\\\W\\\\z])

I'm trying to do a replace with this pattern, so I need to match this:
String pattern = "[\\\\A\\\\W]its[\\\\W\\\\z]";
The way I'm interpreting my pattern is: either a beginning of the string OR a non word character like a space or comma, then an "its", then a non word character OR the end of the string.
Why doesn't it match on this "its" inside this string?
its about time
The idea of what this is supposed to do it's supposed to detect incorrectly written words like "its" and fix them to "it's".
Also why do I need so many escape characters in order for the pattern to be accepted by the vm at all?
\\A and \\z are boundary matches. They cannot go inside character classes. If you use them properly, i.e. with two slashes instead of four, regex pattern compiler would throw an exception, because \A or \z cannot go inside [] blocks.
Use straight | syntax with non-capturing groups instead:
String pattern = "(?:\\A|\\W)its(?:\\W|\\z)";
Demo.

Java regular expression matching two consecutive consonants

I'm trying to match only strings with two consecutive consonants. but no matter what input I give to myString this never evaluates to true, so I have to assume something is wrong with the syntax of my regex. Any ideas?
if (Pattern.matches("([^aeiou]&&[^AEIOU]){2}", myString)) {...}
Additional info:
myString is a substring of at most two characters
There is no whitespace, as this string is the output of a .split with a whitespace delimiter
I'm not worried about special characters, as the program just concatenates and prints the result, though if you'd like to show me how to include something like [b-z]&&[^eiou] in your answer I would appreciate it.
Edit:
After going through these answers and testing a little more, the code I finally used was
if (myString.matches("(?i)[b-z&&[^eiou]]{2}")) {...}
[^aeiou] matches non-letter characters as well, so you should use a different pattern:
Pattern rx = Pattern.compile("[bcdfghjklmnpqrstuvwxyz]{2}", Pattern.CASE_INSENSITIVE);
if (rx.matches(myString)) {...}
If you would like to use && for an intersection, you can do it like this:
"[a-z&&[^aeiou]]{2}"
Demo.
To use character class intersection, you need to wrap your syntax inside of a bracketed expression. The below matches characters that are both lowercase letters and not vowels.
[a-z&&[^aeiou]]{2}

Extract specific data from string with regex

I want to capture multiple string which match some specific patterns,
For example my string is like
String textData = "#1_Label for UK#2_Label for US#4_Label for FR#";
I want to get string between two # which match with string like for UK
Output should like this
if match string is UK than
output should be 1_Label for UK
if match string is label than
output should be 1_Label for UK, 2_Label for US and 4_Label for FR
if match string is 1_ than
output should be 1_Label for UK
I don't want to extract data via array list and extraction should be case insensitive.
Can you please help me out from this problem?
Regards,
Ashish Mishra
You can use this regex for search:
#([^#]*?Label[^#]*)(?=#)
Replace Label with your search keyword.
RegEx Demo
Java Pattern:
Pattern p = Pattern.compile( "#([^#]*?" + Pattern.quote(keyword) + "[^#]*)(?=#)" );
If the data always is between two hashes, try a regex like this: (?i)#.*your_match.*# where your_match would be UK, label, 1_ etc.
Then use this expression in conjunction with the Pattern and Matcher classes.
If you want to match multiple strings, you'd need to exclude the hashes from the match by using look-around methods as well as reluctant modifiers, e.g. (?i)(?<=#).*?label.*?(?=#).
Short breakdown:
(?i) will make the expression case insensitive
(?<=#) is a positive look-behind, i.e. the match must be preceeded by a hash (but doesn't include the hash)
.*? matches any sequence of characters but is reluctant, i.e. it tries to match as few characters as possible
(?=#) is a positive look-ahead, which means the match must be followed by a hash (also not included in the match)
Without the look-around methods the hashes would be included in the match and thus using Matcher.find() you'd skip every other label in your test string, i.e. you'd get the matches #1_Label for UK# and #4_Label for FR# but not #2_Label for US#.
Without the relucatant modifiers the expression would match everything between the first and the last hash.
As an alternative and better, replace .*? with [^#]*, which would mean that the match cannot contain any hash, thus removing the need for reluctant modifiers as well as removing the problem that looking for US would match 1_Label for UK#2_Label for US.
So most probably the final regex you're after looks like this: (?i)(?<=#)[^#]*your_match[^#]*(?=#).
([^#]*UK[^#]*) for UK
([^#]*Label[^#]*) for Label
([^#]*1_[^#]*) for 1_
Try this.Grab the captures.See demo.
http://regex101.com/r/kQ0zR5/3
http://regex101.com/r/kQ0zR5/4
http://regex101.com/r/kQ0zR5/5
I have solved this problem with below pattern,
(?i)([^#]*?us[^#]*)(?=#)
Thank you so much Anubhava, VKS and Thomas for you reply.
Regards,
Ashish Mishra

capture all characters between match character (single or repeated) on string

I'm trying to extract the string preceding a specific character (even when character is repeated, like this (ie: underscore '_'):
this_is_my_example_line_0
this_is_my_example_line_1_
this_is_my_example_line_2___
_this_is_my_ _example_line_3_
__this_is_my___example_line_4__
and after running my regex I should get this (the regex should ignore the any instances of the matching character in the middle of the string):
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4
In other words I'm trying to 'trim' the matched character(s) at the beginning and end of string.
I'm trying to use a Regex in Java to accomplish this, my idea is to capture the group of characters between the special character(s) at the end or beginning of the line.
So far I can only do this successfully for example 3 with this regexp:
/[^_]+|_+(.*)[_$]+|_$+/
[^_]+ not 'underscore' once or more
| OR
_+ underscore once or more
(.*) capture all characters
[_$]+ not 'underscore' once or more followed by end of line
|_$+ OR 'underscore' once or more followed by end of line
I just realized that this excludes the first word of the message on example 0,1,2 since the string doesn't start with underscore and it only starts matching after finding a underscore..
Is there an easier way not involving regex?
I don't really care about the first character (although it would be nice) I only need to ignore the repeating character at the end.. it looks that (by this regex tester) just doing this, would work? /()_+$/ the empty parenthesis matches anything before a single or repeting matches at the end of the line.. would that be correct?
Thank you!
There are a couple of options here, you could either replace matches of ^_+|_+$ with an empty string, or extract the contents of the first capture group from the match of ^_*(.*?)_*$. Note that if your strings may be multiple lines and you want to perform the replacement on each line then you will need to use the Pattern.MULTILINE flag for either approach. If your strings may be multiple lines and you only want to replacement to occur at the very beginning and end, don't use Pattern.MULTILINE but use Pattern.DOTALL for the second approach.
For example: http://regexr.com?355ff
How about [^_\n\r](.*[^_\n\r])??
Demo
String data=
"this_is_my_example_line_0\n" +
"this_is_my_example_line_1_\n" +
"this_is_my_example_line_2___\n" +
"_this_is_my_ _example_line_3_\n" +
"__this_is_my___example_line_4__";
Pattern p=Pattern.compile("[^_\n\r](.*[^_\n\r])?");
Matcher m=p.matcher(data);
while(m.find()){
System.out.println(m.group());
}
output:
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4

How do I write a regular expression to find the following pattern?

I am trying to write a regular expression to do a find and replace operation. Assume Java regex syntax. Below are examples of what I am trying to find:
12341+1
12241+1R1
100001+1R2
So, I am searching for a string beginning with one or more digits, followed by a "1+1" substring, followed by 0 or more characters. I have the following regex:
^(\d+)(1\\+1).*
This regex will successfully find the examples above, however, my goal is to replace the strings with everything before "1+1". So, 12341+1 would become 1234, and 12241+1R1 would become 1224. If I use the first grouped expression $1 to replace the pattern, I get the wrong result as follows:
12341+1 becomes 12341
12241+1R1 becomes 12241
100001+1R2 becomes 100001
Any ideas?
Your existing regex works fine, just that you are missing a \ before \d
String str = "100001+1R2";
str = str.replaceAll("^(\\d+)(1\\+1).*","$1");
Working link
IMHO, the regex is correct.
Perhaps you wrote it wrong in the code. If you want to code the regex ^(\d+)(1\+1).* in a string, you have to write something like String regex = "^(\\d+)(1\\+1).*".
Your output is the result of ^(\d+)(1+1).* replacement, as you miss some backslash in the string (e.g. "^(\\d+)(1\+1).*").
Your regex looks fine to me - I don't have access to java but in JavaScript the code..
"12341+1".replace(/(\d+)(1\+1)/g, "$1");
Returns 1234 as you'd expect. This works on a string with many 'codes' in too e.g.
"12341+1 54321+1".replace(/(\d+)(1\+1)/g, "$1");
gives 1234 5432.
Personally, I wouldn't use a Regex at all (it'd be like using a hammer on a thumbtack), I'd just create a substring from (Pseudocode)
stringName.substring(0, stringName.indexOf("1+1"))
But it looks like other posters have already mentioned the non-greedy operator.
In most Regex Syntaxes you can add a '?' after a '+' or '*' to indicate that you want it to match as little as possible before moving on in the pattern. (Thus: ^(\d+?)(1+1) matches any number of digits until it finds "1+1" and then, NOT INCLUDING the "1+1" it continues matching, whereas your original would see the 1 and match it as well).

Categories

Resources