Match the second substring using regular expression - java

I need a regular expression that matches the second "abc" in "abcasdabchjkabc".
I attempt to write code like this,
Pattern p = Pattern.compile("(?<=abc(.*?))abc");
but it throws a java.util.regex.PatternSyntaxException:
Look-behind group does not have an obvious maximum length near index 11
(?<=abc(.*?))abc
^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.group0(Pattern.java:2488)
at java.util.regex.Pattern.sequence(Pattern.java:1806)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.<init>(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:823)
Please show me the right one!

You cannot use * or + in a look-behind assertion.
Why does the look-behind expression in this regex not have an "obvious maximum length"?
Regex look-behind without obvious maximum length in Java
Do you actually want to match everything in between the two abcs?
Pattern.compile("abc(.*?)abc");
Or do you just want to check that there are two abcs?
Pattern.compile("abc.*?abc");
I don't see a need for lookbehind in either case.

I guess you want something like:
java.util.regex.Pattern.compile("(?<=abc.{1,99})abc");
It finds the second abc.

A simple option is to match your pattern twice:
String input = "abcXYabcZRabc";
Pattern p = Pattern.compile("abc");
Matcher m = p.matcher(input);
m.find(); // what to do when there is no match?
m.find(); // what to do when there is only one match?
System.out.println("Second match is between " + m.start() + " and " + m.end());
Working example: http://ideone.com/uVZL3j

Related

java regex tell which column not match

Good day,
My java code is as follow:
Pattern p = Pattern.compile("^[a-zA-Z0-9$&+,:;=\\[\\]{}?##|\\\\'<>._^*()%!/~\"`  -]*$");
String i = "f698fec0-dd89-11e8-b06b-☺";
Matcher tagmatch = p.matcher(i);
System.out.println("tagmatch is " + tagmatch.find());
As expected, the answer will be false, because there is ☺ character inside. However, I would like to show the column number that not match. For this example, it should show column 25th having the invalid character.
May I know how can I do this?
You should remove anchors from your regex and then use Matcher#end() method to get the position where it stopped the previous match like this:
String i = "f698fec0-dd89-11e8-b06b-☺";
Pattern p = Pattern.compile("[\\w$&+,:;=\\[\\]{}?##|\\\\'<>.^*()%!/~\"` -]+");
Matcher m = p.matcher(i);
if (m.lookingAt() && i.length() > m.end()) {
System.out.println("Match <" + m.group() + "> failed at: " + m.end());
}
Output:
Match <f698fec0-dd89-11e8-b06b-> failed at: 24
PS: I have used lookingAt() to ensure that we match the pattern starting from the beginning of the region. You can use find() as well to get the next match anywhere or else keep the start anchor in pattern as
"^[\\w$&+,:;=\\[\\]{}?##|\\\\'<>.^*()%!/~\"` -]+"
and use find() to effectively make it behave like the above code with lookingAt().
Read difference between lookingAt() and find()
I have refactored your regex to use \w instead of [a-zA-Z0-9_] and used quantifier + (meaning match 1 or more) instead of * (meaning match 0 or more) to avoid returning success for zero-length matches.

regex for not matching alpha plus numeric range

I have the following regex
.{19}_.{3}PDR_.{8}(ABCD|CTNE|PFRE)006[0-9][0-9].{3}_.{6}\.POC
a match is for example
NRM_0157F0680884976_598PDR_T0060000ABCD00619_00_6I1N0T.POC
and would like to negate the (ABCD|CTNE|PFRE)006[0-9][0-9]
portion such that
NRM_0157F0680884976_598PDR_T0060000ABCD00719_00_6I1N0T.POC
is a match but
NRM_0157F0680884976_598PDR_T0060000ABCD007192_00_6I1N0T.POC
or
NRM_0157F0680884976_598PDR_T0060000ABCD0061_00_6I1N0T.POC
is not (the negated part must be 9 chars long just like the non negated part for a total length of 58 chars).
Consider using the following pattern:
\b(?:ABCD|CTNE|PFRE)006[0-9][0-9]\b
Sample Java code:
String input = "Matching value is ABCD00601 but EFG123 is non matching";
Pattern r = Pattern.compile("\\b(?:ABCD|CTNE|PFRE)006[0-9][0-9]\\b");
Matcher m = r.matcher(input);
while (m.find()) {
System.out.println("Found a match: " + m.group());
}
This prints:
Found a match: ABCD00601
I would like to propose this expression
(ABCD|CTNE|PFRE)006\d{1,2}
where \d{1,2} catches any one or two digit number
that is it would get any alphanumeric values from ABCD0060~ABCD00699 or CTNE0060~CTNE00699 or PFRE0060~PFRE00699
Edit #1:
as user #Hao Wu mentioned the above regex would also accept if its ABCD0060 which is not ideal so
this should do the job by removing 1 from the { } we can get
alphanumeric values from ABCD00600~ABCD00699 or CTNE00600~CTNE00699 or PFRE00600~PFRE00699
so the resulting regex would be
(ABCD|CTNE|PFRE)006\d{2}

Java Matcher Pattern issue

I am trying to extract everything that is after this string path /share/attachments/docs/. All my strings are starting with /share/attachments/docs/
For example: /share/attachments/docs/image2.png
Number of characters after ../docs/ is not static!
I tried with
Pattern p = Pattern.compile("^(.*)/share/attachments/docs/(\\d+)$");
Matcher m = p.matcher("/share/attachments/docs/image2.png");
m.find();
String link = m.group(2);
System.out.println("Link #: "+link);
But I am getting Exception that: No match found.
Strange because if I use this:
Pattern p = Pattern.compile("^(.*)ABC Results for draw no (\\d+)$");
Matcher m = p.matcher("ABC Results for draw no 2888");
then it works!!!
Also one thing is that in some very rare cases my string does not start with /share/attachments/docs/ and then I should not parse anything but that is not related directly to the issue, but it will be good to handle.
I am getting Exception that: No match found.
This is because image2.png doesn't match with \d+ use a more appropriate pattern like .+ assuming that you want to extract image2.png.
Your regular expression will then be ^(.*)/share/attachments/docs/(.+)$
In case of ABC Results for draw no 2888, the regexp ^(.*)ABC Results for draw no (\\d+)$ works because you have several successive digits at the end of your String while in the first case you had image2.png that is a mix of letters and digits which is the reason why there were no match found.
Generally speaking to avoid getting an IllegalStateException: No match found, you need first to check the result of find(), if it returns true the input String matches:
if (m.find()) {
// The String matches with the pattern
String link = m.group(2);
System.out.println("Draw #: "+link);
} else {
System.out.println("Input value doesn't match with the pattern");
}
The regular expression \d+ (expressed as \\d+ inside a string literal) matches a run of one or more digits. Your example input does not have a corresponding digit run, so it is not matched. The regex metacharacter . matches any character (+/- newline, depending on regex options); it seems like that may be what you're really after.
Additionally, when you use Matcher.find() it is unnecessary for the pattern to match the whole string, so it is needless to include .* to match leading context. Furthermore, find() returns a value that tells you whether a match to the pattern was found. You generally want to use this return value, and in your particular case you can use it to reject those rare non-matching strings.
Maybe this is more what you want:
Pattern p = Pattern.compile("/share/attachments/docs/(.+)$");
Matcher m = p.matcher("/share/attachments/docs/image2.png");
String link;
if (m.find()) {
link = m.group(1);
System.out.println("Draw #: " + link);
} else {
link = null;
System.out.println("Draw #: (not found)");
}

Searching characters with regular expressions

How do I search a string that can have a "<=", ">=" or a "="?
I´ve reached this point:
[<>][=]
so it searches the first two
Is there any character that inside the [<>] searches "nothing" so i will just get the [=] that follows?
To make some pattern optional, one or zero occurrences, use ? quantifier:
[<>]?=
In Java, you can use it with matches() to check if a string contains <=, >= or just =:
if (s.matches("(?s).*[<>]?=.*")) {...}
Or using a Matcher#find() (demo):
String s = "Some = equal sign";
Pattern pattern = Pattern.compile("[<>]?=");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println("Found " + matcher.group());
} // => Found =
An alternative to #stribizhev's suggestion to use ? is to explicitly enumerate the three cases:
(<=|>=|=)

Regular expression matching "dictionary words"

I'm a Java user but I'm new to regular expressions.
I just want to have a tiny expression that, given a word (we assume that the string is only one word), answers with a boolean, telling if the word is valid or not.
An example... I want to catch all words that is plausible to be in a dictionary... So, i just want words with chars from a-z A-Z, an hyphen (for example: man-in-the-middle) and an apostrophe (like I'll or Tiffany's).
Valid words:
"food"
"RocKet"
"man-in-the-middle"
"kahsdkjhsakdhakjsd"
"JESUS", etc.
Non-valid words:
"gipsy76"
"www.google.com"
"me#gmail.com"
"745474"
"+-x/", etc.
I use this code, but it won't gave the correct answer:
Pattern p = Pattern.compile("[A-Za-z&-&']");
Matcher m = p.matcher(s);
System.out.println(m.matches());
What's wrong with my regex?
Add a + after the expression to say "one or more of those characters":
Escape the hyphen with \ (or put it last).
Remove those & characters:
Here's the code:
Pattern p = Pattern.compile("[A-Za-z'-]+");
Matcher m = p.matcher(s);
System.out.println(m.matches());
Complete test:
String[] ok = {"food","RocKet","man-in-the-middle","kahsdkjhsakdhakjsd","JESUS"};
String[] notOk = {"gipsy76", "www.google.com", "me#gmail.com", "745474","+-x/" };
Pattern p = Pattern.compile("[A-Za-z'-]+");
for (String shouldMatch : ok)
if (!p.matcher(shouldMatch).matches())
System.out.println("Error on: " + shouldMatch);
for (String shouldNotMatch : notOk)
if (p.matcher(shouldNotMatch).matches())
System.out.println("Error on: " + shouldNotMatch);
(Produces no output.)
This should work:
"[A-Za-z'-]+"
But "-word" and "word-" are not valid. So you can uses this pattern:
WORD_EXP = "^[A-Za-z]+(-[A-Za-z]+)*$"
Regex - /^([a-zA-Z]*('|-)?[a-zA-Z]+)*/
You can use above regex if you don't want successive "'" or "-".
It will give you accurate matching your text.
It accepts
man-in-the-middle
asd'asdasd'asd
It rejects following string
man--in--midle
asdasd''asd
Hi Aloob please check with this, Bit lengthy, might be having shorter version of this, Still...
[A-z]*||[[A-z]*[-]*]*||[[A-z]*[-]*[']*]*

Categories

Resources