Pattern matching to match longest substring - java

I have this regex D+U.
It should match once for the following String UDDDUDUU but with Java it matches three times. It matches DDDU DU. I am using https://regex101.com/ to check my regex, and it should only match once, the DDDU.
I am trying to solve this HackerRank challenge. I am also trying to use Pattern's because I want to practice using those classes.
What exactly am I doing wrong?
This is my code:
static int match(int n, String s) {
Matcher matcher = Pattern.compile("D+U").matcher(s);
int count = 0;
int i = 0;
while (matcher.find(i)) {
count++;
i = matcher.end() + 1;
}
return count;
}

The regex + match one or more of the preceding character/regular expression. So this will match any sequence of D and the U.
If you want to return the longest match you could do:
static String match(String s) {
ArrayList<String> matches = new ArrayList<>();
Matcher matcher = Pattern.compile("D+U").matcher(s);
int i = 0;
while (matcher.find(i)) {
matches.add(matcher.group());
i = matcher.end();
}
return Collections.max(matches, Comparator.comparing(c -> c.length()));
}
Which (with the test case of UDDDUDUU) returns DDDU. Also note that I removed the parameter of n as you never used it

Related

how to find the number of a specific pattern appeared in an other pattern using regular expression?

I'm trying to solve a problem using regular expression where I need to find the replicated pattern in another one but i have a problem in the interleaved pattern
for example:
1010 and 10101010
the answer must be 3 but it gives me 2
int count=0;
Pattern expression = Pattern.compile(s);
Matcher matcher = expression.matcher(scanner.next());
while(matcher.find())
{count++;}
System.out.println(count);
Once the variable is found in the first one it searches from there and on,
so it gett's complicated to actually find all matches in your way.
Here is a recursive solution I've modified from old home work project,
probably not optimised but it works.
public static void main(String[] args)
{
String str1 = "1010";
String str2 = "10101010";
int len = str2.length();
System.out.println(match(str1,str2,len,len,0));
}
static int match(String str1,String str2,int len,int j,int times)
{
if(j == 0)
return times;
Pattern p = Pattern.compile(str1);
Matcher m = p.matcher(str2);
int n = 0;
if(m.find()){
n = m.start()+1;
times++;
}
return match(str1,str2.substring(n,str2.length()),str2.length(),--j,times);
}

Finding the longest "number sequence" in a string using only a single regex

I want to find a single regex which matches the longest numerical string in a URL.
I.e for the URL: http://stackoverflow.com/1234/questions/123456789/ask, I would like it to return : 123456789
I thought I could use : ([\d]+)
However this returns the first match from the left, not the longest.
Any ideas :) ?
This regex will be used as an input to a strategy pattern, which extracts certain characteristics from urls:
public static String parse(String url, String RegEx) {
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(url);
if (m.find()) {
return m.group(1);
}
return null;
}
So it would be much tidier if I could use a single regex. :( –
Don't use regex. Just iterate the characters:
String longest = 0;
int i = 0;
while (i < str.length()) {
while (i < str.length() && !Character.isDigit(str.charAt(i))) {
++i;
}
int start = i;
while (i < str.length() && Character.isDigit(str.charAt(i))) {
++i;
}
if (i - start > longest.length()) {
longest = str.substring(start, i);
}
}
#Andy already gave a non-regex answer, which is probably faster, but if you want to use regex, you must, as #Jan points out, add logic, e.g.:
public String findLongestNumber(String input) {
String longestMatch = "";
int maxLength = 0;
Matcher m = Pattern.compile("([\\d]+)").matcher(input);
while (m.find()) {
String currentMatch = m.group();
int currentLength = currentMatch.length();
if (currentLength > maxLength) {
maxLength = currentLength;
longestMatch = currentMatch;
}
}
return longestMatch;
}
t
Not possible with pure Regex, however I would do it this way (using Stream Max and Regex) :
String url = "http://stackoverflow.com/1234/questions/123456789/ask";
Pattern biggest = Pattern.compile("/(\\d+)/");
Matcher m = biggest.matcher(url);
List<String> matches = new ArrayList<>();
while(m.find()){
matches.add(m.group(1));
}
System.out.println(matches.parallelStream().max((String a, String b) -> Integer.compare(a.length(), b.length())).get());
Will print : 123456789

Get element starting with letter from List

I have a list and I want to get the position of the string which starts with specific letter.
I am trying this code, but it isn't working.
List<String> sp = Arrays.asList(splited);
int i2 = sp.indexOf("^w.*$");
The indexOf method doesn't accept a regex pattern. Instead you could do a method like this:
public static int indexOfPattern(List<String> list, String regex) {
Pattern pattern = Pattern.compile(regex);
for (int i = 0; i < list.size(); i++) {
String s = list.get(i);
if (s != null && pattern.matcher(s).matches()) {
return i;
}
}
return -1;
}
And then you simply could write:
int i2 = indexOfPattern(sp, "^w.*$");
indexOf doesn't accept a regex, you should iterate on the list and use Matcher and Pattern to achieve that:
Pattern pattern = Pattern.compile("^w.*$");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.print(matcher.start());
}
Maybe I misunderstood your question. If you want to find the index in the list of the first string that begins with "w", then my answer is irrelevant. You should iterate on the list, check if the string startsWith that string, and then return its index.

How to Insert Commas Into a Number WITHIN a String of Other Words

I have a String like the following:
"The answer is 1000"
I want to insert commas into the number 1000 without destroying the rest of the String.
NOTE: I also want to use this for other Strings of differing lengths, so substring(int index) would not be advised for getting the number.
The best way that I can think of is to use a regex command, but I have no idea how.
Thanks in advance!
The following formats all the non-decimal numbers:
public String formatNumbers(String input) {
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(input);
NumberFormat nf = NumberFormat.getInstance();
StringBuffer sb = new StringBuffer();
while(m.find()) {
String g = m.group();
m.appendReplacement(sb, nf.format(Double.parseDouble(g)));
}
return m.appendTail(sb).toString();
}
e.g. if you call: formatNumbers("The answer is 1000 1000000")
Result is: "The answer is 1,000 1,000,000"
See: NumberFormat and Matcher.appendReplacement().
modified from Most efficient way to extract all the (natural) numbers from a string:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Example {
private static final String REGEX = "\\d+";
public static void main(String[] args) {
String input = "dog dog 1342 dog doggie 2321 dogg";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(input); // get a matcher object
int end = 0;
String result = "";
while (m.find()) {
result = result + input.substring(end, m.start());
result = result
+ addCommas(
input.substring(
m.start(), m.end()));
end = m.end();
}
System.out.println(result);
}
private static String addCommas(String s) {
char[] c = s.toCharArray();
String result = "";
for (int i = 0; i < s.length(); i++) {
if (s.length() % 3 == i % 3)
result += ",";
result += c[i];
}
return result;
}
}
You could use the regular expression:
[0-9]+
To find contiguous sets of digits, so it would match 1000, or 7500 or 22387234, etc.. You can test this on http://regexpal.com/ This doesn't handle the case of numbers that involve decimal points, BTW.
This isn't a complete, with code answer, but the basic algorithm is as follows:
You use that pattern to find the index(es) of the match(es) within the string (the index of the characters where the various matches start)
From each of those indexes, you copy the digits into a temporary string that contains only the digits of the number(s) in the String
You write a function that starts at the end of the String, and for every 3rd digit (from the end) you insert a comma before it, unless the index of the current character is 0 (which will prevent 300 from being turned into ,300
Replace the original number in the source string with the comma'ed String, using the replace() method

How can I count the number of matches for a regex?

Let's say I have a string which contains this:
HelloxxxHelloxxxHello
I compile a pattern to look for 'Hello'
Pattern pattern = Pattern.compile("Hello");
Matcher matcher = pattern.matcher("HelloxxxHelloxxxHello");
It should find three matches. How can I get a count of how many matches there were?
I've tried various loops and using the matcher.groupCount() but it didn't work.
matcher.find() does not find all matches, only the next match.
Solution for Java 9+
long matches = matcher.results().count();
Solution for Java 8 and older
You'll have to do the following. (Starting from Java 9, there is a nicer solution)
int count = 0;
while (matcher.find())
count++;
Btw, matcher.groupCount() is something completely different.
Complete example:
import java.util.regex.*;
class Test {
public static void main(String[] args) {
String hello = "HelloxxxHelloxxxHello";
Pattern pattern = Pattern.compile("Hello");
Matcher matcher = pattern.matcher(hello);
int count = 0;
while (matcher.find())
count++;
System.out.println(count); // prints 3
}
}
Handling overlapping matches
When counting matches of aa in aaaa the above snippet will give you 2.
aaaa
aa
aa
To get 3 matches, i.e. this behavior:
aaaa
aa
aa
aa
You have to search for a match at index <start of last match> + 1 as follows:
String hello = "aaaa";
Pattern pattern = Pattern.compile("aa");
Matcher matcher = pattern.matcher(hello);
int count = 0;
int i = 0;
while (matcher.find(i)) {
count++;
i = matcher.start() + 1;
}
System.out.println(count); // prints 3
This should work for matches that might overlap:
public static void main(String[] args) {
String input = "aaaaaaaa";
String regex = "aa";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
int from = 0;
int count = 0;
while(matcher.find(from)) {
count++;
from = matcher.start() + 1;
}
System.out.println(count);
}
From Java 9, you can use the stream provided by Matcher.results()
long matches = matcher.results().count();
If you want to use Java 8 streams and are allergic to while loops, you could try this:
public static int countPattern(String references, Pattern referencePattern) {
Matcher matcher = referencePattern.matcher(references);
return Stream.iterate(0, i -> i + 1)
.filter(i -> !matcher.find())
.findFirst()
.get();
}
Disclaimer: this only works for disjoint matches.
Example:
public static void main(String[] args) throws ParseException {
Pattern referencePattern = Pattern.compile("PASSENGER:\\d+");
System.out.println(countPattern("[ \"PASSENGER:1\", \"PASSENGER:2\", \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\", \"PASSENGER:1\" ]", referencePattern));
System.out.println(countPattern("[ ]", referencePattern));
}
This prints out:
2
0
1
0
This is a solution for disjoint matches with streams:
public static int countPattern(String references, Pattern referencePattern) {
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(
new Iterator<Integer>() {
Matcher matcher = referencePattern.matcher(references);
int from = 0;
#Override
public boolean hasNext() {
return matcher.find(from);
}
#Override
public Integer next() {
from = matcher.start() + 1;
return 1;
}
},
Spliterator.IMMUTABLE), false).reduce(0, (a, c) -> a + c);
}
Use the below code to find the count of number of matches that the regex finds in your input
Pattern p = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL);// "regex" here indicates your predefined regex.
Matcher m = p.matcher(pattern); // "pattern" indicates your string to match the pattern against with
boolean b = m.matches();
if(b)
count++;
while (m.find())
count++;
This is a generalized code not specific one though, tailor it to suit your need
Please feel free to correct me if there is any mistake.

Categories

Resources