Searching characters with regular expressions - java

How do I search a string that can have a "<=", ">=" or a "="?
I´ve reached this point:
[<>][=]
so it searches the first two
Is there any character that inside the [<>] searches "nothing" so i will just get the [=] that follows?

To make some pattern optional, one or zero occurrences, use ? quantifier:
[<>]?=
In Java, you can use it with matches() to check if a string contains <=, >= or just =:
if (s.matches("(?s).*[<>]?=.*")) {...}
Or using a Matcher#find() (demo):
String s = "Some = equal sign";
Pattern pattern = Pattern.compile("[<>]?=");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println("Found " + matcher.group());
} // => Found =

An alternative to #stribizhev's suggestion to use ? is to explicitly enumerate the three cases:
(<=|>=|=)

Related

Match longest string in Regex OR in case of common substring

In a regex OR, When there are multiple inputs with a common prefix, The regex will match the first input in Regex OR instead of longest match.
For example, for the regular expression regex = (KA|KARNATAKA) and input = KARNATAKA the output will be 2 matches match1 =KA and match2 = KA.
But what I want is complete longest possible match out of given input in Regex OR which is match1 = KARNATAKA in my given example.
Here is the example in a regex client
So what I am doing right now is, I am sorting the input in Regex OR by length in descending order.
My question is, Can we specify in the regex itself to match the longest possible String? Or is sorting the only way to do it?
I have already refered this question and I don't see a solution other than sorting
You can use word boundary (\b) to avoid matching prefixes
For the case you mentioned: the following regex will only match KA or KARNATAKA
(\bKA\b|\bKARNATAKA\b)
Try here
You can create a helper method for this:
public final class PatternHelper {
public static Pattern compileSortedOr(String regex) {
Matcher matcher = Pattern.compile("(.*)\\((.*\\|.*)\\)(.*)").matcher(regex);
if (matcher.matches()) {
List<String> conditions = Arrays.asList(matcher.group(2).split("\\|"));
List<String> sortedConditions = conditions.stream()
.sorted((c1, c2) -> c2.length() - c1.length())
.collect(Collectors.toList());
return Pattern.compile(matcher.group(1) +
"(" +
String.join("|", sortedConditions) +
")" +
matcher.group(3));
}
return Pattern.compile(regex);
}
}
Matcher matcher = PatternHelper.compileSortedOr("(KA|KARNATAKA)").matcher("KARNATAKA");
if (matcher.matches()) {
System.out.println(matcher.group(1));
}
Output:
KARNATAKA
P.S. This only works for simple expressions without nested brackets. You would need to tweak if you are expecting much complex expressions.

Get a substring from string multiple times

I have a String that I don't know how long it is or what caracters are used in it.
I want to search in the string and get any substring found inside "" .
I tried to use pattern.compile but it always return an empty string
Pattern p = Pattern.compile("\".\"");
Matcher m = p.matcher(mystring);
while(m.find()){
System.out.println(m.group().toString());
}
How can I do it?
Use the .+? to get all characters inside "" with grouping
Pattern p = Pattern.compile("\".+?\"");
The .+ specifies that you want at least one or more characters inside the quotations. The ? specifies that it is a reluctant quantifier, which means it will put different quotations into different groups.
Unit test example:
#Test
public void test() {
String test = "speak \"friend\" and \"enter\"";
Pattern p = Pattern.compile("\".+?\"");
Matcher m = p.matcher(test);
while(m.find()){
System.out.println(m.group().toString().replace("\"", ""));
}
}
Output:
friend
enter
That is because your regex actually searches for one character between " and " ... if you want to search for more character, you should rewrite your regex to "\".?\""

Java regex returns full string instead of capture

Java Code:
String imagesArrayResponse = xmlNode.getChildText("files");
Matcher m = Pattern.compile("path\":\"([^\"]*)").matcher(imagesArrayResponse);
while (m.find()) {
String path = m.group(0);
}
String:
[{"path":"upload\/files\/56727570aaa08922_0.png","dir":"files","name":"56727570aaa08922_0","original_name":"56727570aaa08922_0.png"}{"path":"upload\/files\/56727570aaa08922_0.png","dir":"files","name":"56727570aaa08922_0","original_name":"56727570aaa08922_0.png"}{"path":"upload\/files\/56727570aaa08922_0.png","dir":"files","name":"56727570aaa08922_0","original_name":"56727570aaa08922_0.png"}{"path":"upload\/files\/56727570aaa08922_0.png","dir":"files","name":"56727570aaa08922_0","original_name":"56727570aaa08922_0.png"}]
m.group returns
path":"upload\/files\/56727570aaa08922_0.png"
instead of captured value of path. Where I am wrong?
See the documentation of group( int index ) method
When called with 0, it returns the entire string. Group 1 is the first.
To avoid such a trap, you should use named group with syntax :
"path\":\"(?<mynamegroup>[^\"]*)"
javadoc:
Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().
m.group(1) will give you the Match. If there are more than one matchset (), it will be m.group(2), m.group(3),...
By convention, AFAIK in regex engines the 0th group is always the whole matched string. Nested groups start at 1.
Check out the grouping options in Matcher.
Matcher m =
Pattern.compile(
//<- (0) -> that's group(0)
// <-(1)-> that's group(1)
"path\":\"([^\"]*)").matcher(imagesArrayResponse);
Change your code to
while (m.find()) {
String path = m.group(1);
}
And you should be okay. This is also worth checking out: What is a non-capturing group? What does a question mark followed by a colon (?:) mean?

Match the second substring using regular expression

I need a regular expression that matches the second "abc" in "abcasdabchjkabc".
I attempt to write code like this,
Pattern p = Pattern.compile("(?<=abc(.*?))abc");
but it throws a java.util.regex.PatternSyntaxException:
Look-behind group does not have an obvious maximum length near index 11
(?<=abc(.*?))abc
^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.group0(Pattern.java:2488)
at java.util.regex.Pattern.sequence(Pattern.java:1806)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.<init>(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:823)
Please show me the right one!
You cannot use * or + in a look-behind assertion.
Why does the look-behind expression in this regex not have an "obvious maximum length"?
Regex look-behind without obvious maximum length in Java
Do you actually want to match everything in between the two abcs?
Pattern.compile("abc(.*?)abc");
Or do you just want to check that there are two abcs?
Pattern.compile("abc.*?abc");
I don't see a need for lookbehind in either case.
I guess you want something like:
java.util.regex.Pattern.compile("(?<=abc.{1,99})abc");
It finds the second abc.
A simple option is to match your pattern twice:
String input = "abcXYabcZRabc";
Pattern p = Pattern.compile("abc");
Matcher m = p.matcher(input);
m.find(); // what to do when there is no match?
m.find(); // what to do when there is only one match?
System.out.println("Second match is between " + m.start() + " and " + m.end());
Working example: http://ideone.com/uVZL3j

Regular expression matching "dictionary words"

I'm a Java user but I'm new to regular expressions.
I just want to have a tiny expression that, given a word (we assume that the string is only one word), answers with a boolean, telling if the word is valid or not.
An example... I want to catch all words that is plausible to be in a dictionary... So, i just want words with chars from a-z A-Z, an hyphen (for example: man-in-the-middle) and an apostrophe (like I'll or Tiffany's).
Valid words:
"food"
"RocKet"
"man-in-the-middle"
"kahsdkjhsakdhakjsd"
"JESUS", etc.
Non-valid words:
"gipsy76"
"www.google.com"
"me#gmail.com"
"745474"
"+-x/", etc.
I use this code, but it won't gave the correct answer:
Pattern p = Pattern.compile("[A-Za-z&-&']");
Matcher m = p.matcher(s);
System.out.println(m.matches());
What's wrong with my regex?
Add a + after the expression to say "one or more of those characters":
Escape the hyphen with \ (or put it last).
Remove those & characters:
Here's the code:
Pattern p = Pattern.compile("[A-Za-z'-]+");
Matcher m = p.matcher(s);
System.out.println(m.matches());
Complete test:
String[] ok = {"food","RocKet","man-in-the-middle","kahsdkjhsakdhakjsd","JESUS"};
String[] notOk = {"gipsy76", "www.google.com", "me#gmail.com", "745474","+-x/" };
Pattern p = Pattern.compile("[A-Za-z'-]+");
for (String shouldMatch : ok)
if (!p.matcher(shouldMatch).matches())
System.out.println("Error on: " + shouldMatch);
for (String shouldNotMatch : notOk)
if (p.matcher(shouldNotMatch).matches())
System.out.println("Error on: " + shouldNotMatch);
(Produces no output.)
This should work:
"[A-Za-z'-]+"
But "-word" and "word-" are not valid. So you can uses this pattern:
WORD_EXP = "^[A-Za-z]+(-[A-Za-z]+)*$"
Regex - /^([a-zA-Z]*('|-)?[a-zA-Z]+)*/
You can use above regex if you don't want successive "'" or "-".
It will give you accurate matching your text.
It accepts
man-in-the-middle
asd'asdasd'asd
It rejects following string
man--in--midle
asdasd''asd
Hi Aloob please check with this, Bit lengthy, might be having shorter version of this, Still...
[A-z]*||[[A-z]*[-]*]*||[[A-z]*[-]*[']*]*

Categories

Resources