Java regex matching each occurence separately - java

I have this regex:
<a href(.*foo.bar.*)a>
For this string, it gives me only 1 match, but I need it to give 3 matches.
First RANDOM TEXT COULD BE HERE Second RANDOM TEXT COULD BE HERE Third
So each a href should be individual.
How could I accomplish this?
EDIT:
This code searches for matches:
Pattern pattern = Pattern.compile("<a href(.*foo.bar.*)a>");
Matcher matcher = pattern.matcher(body);
List<String> matches = new ArrayList<String>();
while (matcher.find()) {
matches.add(matcher.group());
}

Change to:
<a href(.*?foo\.bar.*?)a>
It removes the greediness. And real dots should be escaped to \..

Use .*? instead of .*. The greedy quantifier matches characters as many as possible, while the reluctant quantifier matches the least number of characters in a single find operation.
Besides, use foo\.bar if you intend to match a literal text of "foo.bar".

Hope below code will help you:
int noOfTimefoundString = 0;
Pattern pattern = Pattern.compile("<a href=\"https://foo.bar");
Matcher matcher = pattern.matcher(body);
List<String> matches = new ArrayList<String>();
while (matcher.find()) {
matches.add(matcher.group());
noOfTimefoundString++;
}
Iterator matchesItr = matches.iterator();
while(matchesItr.hasNext()){
System.out.println(matchesItr.next());
}
System.out.println("No. of times search string found = "+noOfTimefoundString);

Related

Match longest string in Regex OR in case of common substring

In a regex OR, When there are multiple inputs with a common prefix, The regex will match the first input in Regex OR instead of longest match.
For example, for the regular expression regex = (KA|KARNATAKA) and input = KARNATAKA the output will be 2 matches match1 =KA and match2 = KA.
But what I want is complete longest possible match out of given input in Regex OR which is match1 = KARNATAKA in my given example.
Here is the example in a regex client
So what I am doing right now is, I am sorting the input in Regex OR by length in descending order.
My question is, Can we specify in the regex itself to match the longest possible String? Or is sorting the only way to do it?
I have already refered this question and I don't see a solution other than sorting
You can use word boundary (\b) to avoid matching prefixes
For the case you mentioned: the following regex will only match KA or KARNATAKA
(\bKA\b|\bKARNATAKA\b)
Try here
You can create a helper method for this:
public final class PatternHelper {
public static Pattern compileSortedOr(String regex) {
Matcher matcher = Pattern.compile("(.*)\\((.*\\|.*)\\)(.*)").matcher(regex);
if (matcher.matches()) {
List<String> conditions = Arrays.asList(matcher.group(2).split("\\|"));
List<String> sortedConditions = conditions.stream()
.sorted((c1, c2) -> c2.length() - c1.length())
.collect(Collectors.toList());
return Pattern.compile(matcher.group(1) +
"(" +
String.join("|", sortedConditions) +
")" +
matcher.group(3));
}
return Pattern.compile(regex);
}
}
Matcher matcher = PatternHelper.compileSortedOr("(KA|KARNATAKA)").matcher("KARNATAKA");
if (matcher.matches()) {
System.out.println(matcher.group(1));
}
Output:
KARNATAKA
P.S. This only works for simple expressions without nested brackets. You would need to tweak if you are expecting much complex expressions.

Android Java regexp pattern

I ping a host. In result a standard output. Below a REGEXP but it do not work correct. Where I did a mistake?
String REGEXP ="time=(\\\\d+)ms";
Pattern pattern = Pattern.compile(REGEXP);
Matcher matcher = pattern.matcher(result);
if (matcher.find()) {
result = matcher.group(1);
}
You only need \\d+ in your regex because
Matcher looks for the pattern (using which it is created) and then tries to find every occurance of the pattern in the string being matched.
Use while(matcher.group(1) in case of multiple occurances.
each () represents a captured group.
You have too many backslashes. Assuming you want to get the number from a string like "time=32ms", then you need:
String REGEXP ="time=(\\d+)ms";
Pattern pattern = Pattern.compile(REGEXP);
Matcher matcher = pattern.matcher(result);
if (matcher.find()) {
result = matcher.group(1);
}
Explanation: The search pattern you are looking for is "\d", meaning a decimal number, the "+" means 1 or more occurrences.
To get the "\" to the matcher, it needs to be escaped, and the escape character is also "\".
The brackets define the matching group that you want to pick out.
With "\\\\d+", the matcher sees this as "\\d+", which would match a backslash followed by one or more "d"s. The first backslash protects the second backslash, and the third protects the fourth.

Extracting some pattern using regex

I'm trying to write a regex pattern that will match a "digit~digit~string~sentence". eg 14~742091~065M998~P E ROUX 214. I've come up with the following so far:
String regex= "\\d+~?\\d+~?\\w+~?"
How do I extract the sentence after the last ~?
Use Capturing Groups:
\d+~?\d+~?\w+~(.*)
group(1) contains the part you want.
Another solution is using String#split:
String[] splitted = myString.split("~");
String res = splitted[splitted.length() - 1];
Use capturing groups (), as demonstrated in this pattern: "\\d+~\\d+~\\w+~(.*)". Note that you don't need the greedy quantifier ?.
String input = "14~742091~065M998~P E ROUX 214";
Pattern pattern = Pattern.compile("\\d+~\\d+~\\w+~(.*)");
//Pattern pattern = Pattern.compile("(?:\\d+~){2}\\w+~(.*)"); (would also work)
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
System.out.println(matcher.group(1));
}
Prints:
P E ROUX 214
you should use ( ) to extract the output you want,
for more details see here
.*~(.*)$
This simple regex should work for you.
See demo
try the regexp below, the sentence only contains alphanumeric and spaces
^\d+~\d+~\w+~[\w\s]+

java Pattern Matching issue

I have an issue to write proper regex to match URL.
String input = "AAAhttp://www.gmail.comBBBBabc#gmail.com"
String regex = "www.*.com" // To match www.gmail.com URL
Pattern p = Pattern.compile(regex)
Matcher m = p.matcher(input)
while(m.find()){
}
Here I want to remove the Url www.gmail.com. However it matches till end of string to match email address also which ends with gmail.com.
Can someone help me to get proper regex to match only the URL?
.* does a greedy match. You have to add ? after * to does an reluctant match.
"www\\..*?\\.com"
Your code would be,
String s = "AAAhttp://www.gmail.comBBBBabc#gmail.com";
Pattern p = Pattern.compile("www\\..*?\\.com");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(0));
}
IDEONE
String regex = "www\\..*?\\.com"
Non-greedy repetition of the wildcard '.' and escape dot when literally
A negated character class is faster than .*?
Use this regex:
www\.[^.]+\.com
[^.]+ means any character that is not a dot.
In Java we need to escape some characters:
// for instance
Pattern regex = Pattern.compile("www\\.[^.]+\\.com");
// etc

Java Regex for changing every ith index in every word of a string

I've written a regex \b\S\w(\S(?=.)) to find every third symbol in a word and replace it with '1'. Now I'm trying to use this expression but really don't know how to do it right.
Pattern pattern = Pattern.compile("\\b\\S\\w(\\S(?=.))");
Matcher matcher = pattern.matcher("lemon apple strawberry pumpkin");
while (matcher.find()) {
System.out.print(matcher.group(1) + " ");
}
So result is:
m p r m
And how can I use this to make a string like this
le1on ap1le st1awberry pu1pkin
You could use something like this:
"lemon apple strawberry pumpkin".replaceAll("(?<=\\b\\S{2})\\S", "1")
Would produce your example output. The regex would replace any non space character preceded by two non space characters and then a word boundary.
This means that "words" like 12345 would be changed into 12145 since 3 is matched by \\S (not space).
Edit:
Updated the regex to better cater to the revised question title, change 2 to i-1 to replace the ith letter of the word.
There is another way to access the index of the matcher
Like this:
Pattern pattern = Pattern.compile("\\b\\S\\w(\\S(?=.))");
String string = "lemon apple strawberry pumpkin";
char[] c = string.toCharArray();
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
c[matcher.end() - 1] = '1';////// may be it's not perfect , but this way in case of you want to access the index in which the **sring** is matches with the pattern
}
System.out.println(c);

Categories

Resources