java regex multiple patterns sequential matching

java regex multiple patterns sequential matching - java

I have a specific question, to which I couldn't find any answer online. Basically, I would like to run a pattern-matching operation on a text, with multiple patterns. However, I do not wish that the matcher gets me the result all at once, but instead that each pattern is called at different stages of the loop, at the same time that specific operations are performed on each of these stages. So for instance, imagining I have Pattern1, Pattern2, and Pattern3, I would like something like:
if (Pattern 1 = true) {
delete Pattern1;
} else if (Pattern 2 = true) {
delete Pattern2;
} else if (Pattern 3 = true) {
replace with 'something;
} .....and so on
(this is just an illustration of the loop, so probably the syntax is not correct, )
My question is then: how can I compile different patterns, while calling them separately?
(I've only seen multiple patterns compiled together and searched together with the help of AND/OR and so on..that's not what I'm looking for unfortunately) Could I save the patterns in an array and call each of them on my loop?

Prepare your Pattern objects pattern1, pattern2, pattern3 and store them at any container (array or list). Then loop over this container using usePattern(Pattern newPattern) method of Matcher object at each iteration.

You can make a common interface, and make anonymous implementations that use patterns or whatever else you may want to transform your strings:
interface StringProcessor {
String process(String source);
}
StringProcessor[] processors = new StringProcessor[] {
new StringProcessor() {
private final Pattern p = Pattern.compile("[0-9]+");
public String process(String source) {
String res = source;
if (p.matcher(source).find()) {
res = ... // delete
}
return res;
}
}
, new StringProcessor() {
private final Pattern p = Pattern.compile("[a-z]+");
public String process(String source) {
String res = source;
if (p.matcher(source).find()) {
res = ... // replace
}
return res;
}
}
, new StringProcessor() {
private final Pattern p = Pattern.compile("[%^##]{2,5}");
public String process(String source) {
String res = source;
if (p.matcher(source).find()) {
res = ... // do whatever else
}
return res;
}
}
};
String res = "My starting string 123 and more 456";
for (StringProcessor p : processors) {
res = p.process(res);
}
Note that implementations of StringProcessor.process do not need to use regular expressions at all. The loop at the bottom has no idea the regexp is involved in obtaining the results.

Related

Java console input handling

This is my first question here, I hope it's not too based on opinions. I've searched on the internet for quite a while now, but couldn't find a similar question.
I need to write a Java program that reads commands from the console, validates the input, gets the parameters and passes them on to a different class.
There are some restrictions on what I can do and use (university).
Only the packages java.util, java.lang and java.io are allowed
Each method can only be 80 lines long
Each line can only be 120 characters long
I am not allowed to use System.exit / Runtime.exit
The Terminal class is used to handle user input. Terminal.readLine() will read a line from the console, like Scanner.nextLine()
I have a fully working program - however my solution will not be accepted because of the way I handle console inputs (runInteractionLoop() method too long). I'm doing it like this:
The main class has the main method and an "interaction loop" where console inputs are handled. The main method calls the interaction loop in a while loop, with a boolean "quit" as a guardian.
private static boolean quit = false;
...
public static void main(String[] args) {
...
while (quit == false) {
runInteractionLoop();
}
}
The interaction loop handles console input. I need to check for 16 different commands - each with their own types of parameters. I chose to work with Patterns and Matchers, because I can use the groups for convenience. Now the problems start - I have never learned how to correctly handle user inputs. What I have done here is, for each possible command, create a new Matcher, see if the input matches, if it does then do whatever needs to be done for this input.
private static runInteractionLoop() {
Matcher m;
String query = Terminal.readLine;
m = Pattern.compile("sliding-window (\\d+) (-?\\d+(?:\\.\\d+)?;)*(-?\\d+(?:\\.\\d+)?)").matcher(query);
if (m.matches()) {
xyz.doSth(Integer.parseInt(m.group(1)), ......);
...
return;
}
m = Pattern.compile("record ([a-z]+) (-?\\d+(?:\\.\\d+)?)").matcher(query);
if (m.matches()) {
xyz.doSthElse(m.group(1), Double.parseDouble(m.group(2)));
return;
}
...
if (query.equals("quit")) {
quit = true;
return;
}
Terminal.printError("invalid input");
}
As you can see, doing this 16 times stretches out the method to more than 80 lines (5 lines per input max). It's also obviously very inefficient and to be honest, I'm quite ashamed to be posting this here (crap code). I just don't know how to do this correctly, using only java.util and having some way to quickly get the parameters (e.g. the Matcher groups here).
Any ideas? I would be very grateful for suggestions. Thanks.
EDIT/UPDATE:
I have made the decision to split the verification into two methods - one for each half of the commands. Looks ugly, but passes the Uni's checkstyle requirements. However, I'd still be more than happy if someone shows me a better solution to my problem - for the future (because I obviously have no idea how to make this prettier, shorter and/or more efficient).

I guess you could try something painful like this where you separate everything into a chain of method calls:
private static runInteractionLoop() {
Matcher m;
String query = Terminal.readLine;
m = Pattern.compile("sliding-window (\\d+) (-?\\d+(?:\\.\\d+)?;)*(-?\\d+(?:\\.\\d+)?)").matcher(query);
if (m.matches()) {
xyz.doSth(Integer.parseInt(m.group(1)), ......);
...
return;
} else {
tryDouble(query, m);
}
}
Private static tryDouble(String query, Matcher m) {
m = Pattern.compile("record ([a-z]+) (-?\\d+(?:\\.\\d+)?)").matcher(query);
if (m.matches()) {
xyz.doSthElse(m.group(1), Double.parseDouble(m.group(2)));
return;
} else {
trySomethingElse(query, m);
}
}
Private static trySomethingElse(String query, Matcher m) {
...
if (query.equals("quit")) {
quit = true;
return;
}
Terminal.printError("invalid input");
}

I would solve this with an abstract class CommandValidator:
public abstract class CommandValidator {
/* getter and setter */
public Matcher resolveMatcher(String query) {
return Pattern.compile(getCommand()).matcher(query);
}
public abstract String getCommand();
public abstract void doSth();
}
and would implement 16 different CommandValidators for each handler and implement the abstract methods differently:
public class IntegerCommandValidator extends CommandValidator {
#Override
public String getCommand() {
return "sliding-window (\\d+) (-?\\d+(?:\\.\\d+)?;)*(-?\\d+(?:\\.\\d+)?)";
}
#Override
public void doSth() {
/* magic here, parameter input the matcher and xyz, or have it defined as field at the class */
// xyz.doSth(Integer.parseInt(m.group(1)), ......);
}
}
Since you need the matcher in your CommandValidator you might set it as field of the class, or just give it into the doSth() method.
Then you can instantiate each concrete Validator in a list and iterate through every validator, resolve the matcher and look if it matches:
private static Set<CommandValidator> allConcreteValidators;
public static void main(String[] args) {
/* */
allConcreteValidators.add(new IntegerCommandValidator());
/* */
while (quit == false) {
runInteractionLoop();
}
}
private static runInteractionLoop() {
String query = Terminal.readLine;
for (CommandValidator validator : allConcreteValidators) {
if (validator.resolveMatcher(query).matches()) {
validator.doSth();
}
}
}
Of course you could build a lookup method before, if there even is a validator which fits and handle the case that you don't have any validator defined.
Might be a bit over engineered for your exercise. Maybe you can give the command into the constructor of your concrete validators, if they share the same doSth magic as well.
Ofc you should find better names for the classes, because it is not only a validator but something different.

You can boil down each possibility to two lines (or three if there must be a closing bracket on a separat line) by delegating the match work to a submethod:
if ( Matcher m = matches( query, "sliding-window (\\d+) (-?\\d+(?:\\.\\d+)?;)*(-?\\d+(?:\\.\\d+)?)") != null)
xyz.doSth(Integer.parseInt(m.group(1)), ......);
else if ( Matcher m = matches( query, "record ([a-z]+) (-?\\d+(?:\\.\\d+)?)") != null)
xyz.doSthElse(m.group(1), Double.parseDouble(m.group(2)));
...
else
private Matcher matches( String input, String regexp)
{
Matcher result = Pattern.compile(regexp).matcher(input);
if ( result.matches() )
return result;
else
return null;
}

Match a string against multiple regex patterns

I have an input string.
I am thinking how to match this string against more than one regular expression effectively.
Example Input: ABCD
I'd like to match against these reg-ex patterns, and return true if at least one of them matches:
[a-zA-Z]{3}
^[^\\d].*
([\\w&&[^b]])*
I am not sure how to match against multiple patterns at once. Can some one tell me how do we do it effectively?

If you have just a few regexes, and they are all known at compile time, then this can be enough:
private static final Pattern
rx1 = Pattern.compile("..."),
rx2 = Pattern.compile("..."),
...;
return rx1.matcher(s).matches() || rx2.matcher(s).matches() || ...;
If there are more of them, or they are loaded at runtime, then use a list of patterns:
final List<Pattern> rxs = new ArrayList<>();
for (Pattern rx : rxs) if (rx.matcher(input).matches()) return true;
return false;

you can make one large regex out of the individual ones:
[a-zA-Z]{3}|^[^\\d].*|([\\w&&[^b]])*

To avoid recreating instances of Pattern and Matcher classes you can create one of each and reuse them. To reuse Matcher class you can use reset(newInput) method.
Warning: This approach is not thread safe. Use it only when you can guarantee that only one thread will be able to use this method, otherwise create separate instance of Matcher for each methods call.
This is one of possible code examples
private static Matcher m1 = Pattern.compile("regex1").matcher("");
private static Matcher m2 = Pattern.compile("regex2").matcher("");
private static Matcher m3 = Pattern.compile("regex3").matcher("");
public boolean matchesAtLeastOneRegex(String input) {
return m1.reset(input).matches()
|| m2.reset(input).matches()
|| m3.reset(input).matches();
}

like it was explained in (Running multiple regex patterns on String) it is better to concatenate each regex to one large regex and than run the matcher only one. This is an large improvement is you often reuse the regex.

I'm not sure what effectively means, but if it's about performance and you want to check a lot of strings, I'd go for this
...
static Pattern p1 = Pattern.compile("[a-zA-Z]{3}");
static Pattern p2 = Pattern.compile("^[^\\d].*");
static Pattern p3 = Pattern.compile("([\\w&&[^b]])*");
public static boolean test(String s){
return p1.matcher(s).matches ? true:
p2.matcher(s).matches ? true:
p3.matcher(s).matches;
}
I'm not sure how it will affect performance, but combining them all in one regexp with | could also help.

Here's an alternative.
Note that one thing this doesn't do is return them in a specific order. But one could do that by sorting by m.start() for example.
private static HashMap<String, String> regs = new HashMap<String, String>();
...
regs.put("COMMA", ",");
regs.put("ID", "[a-z][a-zA-Z0-9]*");
regs.put("SEMI", ";");
regs.put("GETS", ":=");
regs.put("DOT", "\\.");
for (HashMap.Entry<String, String> entry : regs.entrySet()) {
String key = entry.getKey();
String value = entry.getValue();
Matcher m = Pattern.compile(value).matcher("program var a, b, c; begin a := 0; end.");
boolean f = m.find();
while(f)
{
System.out.println(key);
System.out.print(m.group() + " ");
System.out.print(m.start() + " ");
System.out.println(m.end());
f = m.find();
}
}
}

Option to ignore case with .contains method?

Is there an option to ignore case with .contains() method?
I have an ArrayList of DVD object. Each DVD object has a few elements, one of them is a title. And I have a method that searches for a specific title. It works, but I'd like it to be case insensitive.

If you're using Java 8
List<String> list = new ArrayList<>();
boolean containsSearchStr = list.stream().anyMatch("search_value"::equalsIgnoreCase);

I'm guessing you mean ignoring case when searching in a string?
I don't know any, but you could try to convert the string to search into either to lower or to upper case, then search.
// s is the String to search into, and seq the sequence you are searching for.
bool doesContain = s.toLowerCase().contains(seq);
Edit:
As Ryan Schipper suggested, you can also (and probably would be better off) do seq.toLowerCase(), depending on your situation.

private boolean containsIgnoreCase(List<String> list, String soughtFor) {
for (String current : list) {
if (current.equalsIgnoreCase(soughtFor)) {
return true;
}
}
return false;
}

In Java 8 you can use the Stream interface:
return dvdList.stream().anyMatch(d -> d.getTitle().equalsIgnoreCase("SomeTitle"));

I know I'm a little late to the party but in Kotlin you can easily use:
fun Collection<String>.containsIgnoreCase(item: String) = any {
it.equals(item, ignoreCase = true)
}
val list = listOf("Banana")
println(list.contains("banana"))
println(list.containsIgnoreCase("BaNaNa"))

You can replace contains() for equalsIgnoreCase using stream() as below
List<String> names = List.of("One","tWo", "ThrEe", "foUR", "five", "Six", "THREE");
boolean contains = names.stream().anyMatch(i -> i.equalsIgnoreCase("three"))

This probably isn't the best way for your particular problem, but you can use the String.matches(String regex) method or the matcher equivalent. We just need to construct a regular expression from your prospective title. Here it gets complex.
List<DVD> matchingDvds(String titleFragment) {
String escapedFragment = Pattern.quote(titleFragment);
// The pattern may have contained an asterisk, dollar sign, etc.
// For example, M*A*S*H, directed by Robert Altman.
Pattern pat = Pattern.compile(escapedFragment, Pattern.CASE_INSENSITIVE);
List<DVD> foundDvds = new ArrayList<>();
for (DVD dvd: catalog) {
Matcher m = pat.matcher(dvd.getTitle());
if (m.find()) {
foundDvds.add(dvd);
}
}
return foundDvds;
}
But this is inefficient, and it's being done purely in Java. You would do better to try one of these techniques:
Learn the Collator and CollationKey classes.
If you have no choice but to stay in the Java world, add a method to DVD, boolean matches(String fragment). Have the DVD tell you what it matches.
Use a database. If it supports case-insensitive collations, declare the title column of the DVD table that way. Use JDBC or Hibernate or JPA or Spring Data, whichever you choose.
If the database supports advanced text search, like Oracle, use that.
Back in the Java world, use Apache Lucene and possibly Apache Solr.
Use a language tuned for case-insensitive matches.
If you can wait until Java 8, use lambda expressions. You can avoid the Pattern and Matcher class that I used above by building the regex this way:
String escapedFragment = Pattern.quote(titleFragment);
String fragmentAnywhereInString = ".*" + escapedFragment + ".*";
String caseInsensitiveFragment = "(?i)" + fragmentAnywhereInString;
// and in the loop, use:
if(dvd.getTitle().matches(caseInsensitiveFragment)) {
foundDvds.add(dvd);
}
But this compiles the pattern too many times. What about lower-casing everything?
if (dvd.getTitle().toLowerCase().contains(titleFragment.toLowerCase()))
Congratulations; you've just discovered the Turkish problem. Unless you state the locale in toLowerCase, Java finds the current locale. And the lower-casing is slow because it has to take into account the Turkish dotless i and dotted I. At least you have no patterns and no matchers.

You can't guarantee that you're always going to get String objects back, or that the object you're working with in the List implements a way to ignore case.
If you do want to compare Strings in a collection to something independent of case, you'd want to iterate over the collection and compare them without case.
String word = "Some word";
List<String> aList = new ArrayList<>(); // presume that the list is populated
for(String item : aList) {
if(word.equalsIgnoreCase(item)) {
// operation upon successful match
}
}

Kotlin Devs, go with any / none
private fun compareCategory(
categories: List<String>?,
category: String
) = categories?.any { it.equals(category, true) } ?: false

The intuitive solution to transform both operands to lower case (or upper case) has the effect of instantiating an extra String object for each item which is not efficient for large collections. Also, regular expressions are an order of magnitude slower than simple characters comparison.
String.regionMatches() allows to compare two String regions in a case-insensitive way. Using it, it's possible to write an efficient version of a case-insensitive "contains" method. The following method is what I use; it's based on code from Apache commons-lang:
public static boolean containsIgnoreCase(final String str, final String searchStr) {
if (str == null || searchStr == null) {
return false;
}
final int len = searchStr.length();
final int max = str.length() - len;
for (int i = 0; i <= max; i++) {
if (str.regionMatches(true, i, searchStr, 0, len)) {
return true;
}
}
return false;
}

It's very simple using the power of Kotlin's extension function, this answer may help Java and Kotlin developers.
inline fun List<String>.contains(text: String, ignoreCase: Boolean = false) = this.any { it.equals(text, ignoreCase) }
// Usage
list.contains("text", ignoreCase = true)

With a null check on the dvdList and your searchString
if (!StringUtils.isEmpty(searchString)) {
return Optional.ofNullable(dvdList)
.map(Collection::stream)
.orElse(Stream.empty())
.anyMatch(dvd >searchString.equalsIgnoreCase(dvd.getTitle()));
}

private List<String> FindString(String stringToLookFor, List<String> arrayToSearchIn)
{
List<String> ReceptacleOfWordsFound = new ArrayList<String>();
if(!arrayToSearchIn.isEmpty())
{
for(String lCurrentString : arrayToSearchIn)
{
if(lCurrentString.toUpperCase().contains(stringToLookFor.toUpperCase())
ReceptacleOfWordsFound.add(lCurrentString);
}
}
return ReceptacleOfWordsFound;
}

For Java 8, You can have one more solution like below
List<String> list = new ArrayList<>();
String searchTerm = "dvd";
if(String.join(",", list).toLowerCase().contains(searchTerm)) {
System.out.println("Element Present!");
}

If you are looking for contains & not equals then i would propose below solution.
Only drawback is if your searchItem in below solution is "DE" then also it would match
List<String> list = new ArrayList<>();
public static final String[] LIST_OF_ELEMENTS = { "ABC", "DEF","GHI" };
String searchItem= "def";
if(String.join(",", LIST_OF_ELEMENTS).contains(searchItem.toUpperCase())) {
System.out.println("found element");
break;
}

For Java 8+, I recommend to use following library method.
org.apache.commons.lang3.StringUtils
list.stream()
.filter(text -> StringUtils.containsIgnoreCase(text, textToSearch))

public List<DdsSpreadCoreBean> filteredByGroupName(DdsSpreadCoreBean ddsSpreadFilterBean, List<DdsSpreadCoreBean> spreadHeaderList){
List<DdsSpreadCoreBean> filteredByGroupName = new ArrayList<>();
filteredByGroupName = spreadHeaderList.stream().
filter(s->s.getGroupName()
.toLowerCase
.contains(ddsSpreadFilterBean.getGroupName())).collect(Collectors.toList());
return filteredByGroupName;
}

Option to ignore case with .contains method?
Check the below example
boolean contains = employeeTypes.stream().anyMatch(i -> i.equalsIgnoreCase(employeeType));
I added Custom Annotation for validation in my project
#Target({ElementType.FIELD, ElementType.PARAMETER})
#Retention(RetentionPolicy.RUNTIME)
#Documented
#Constraint(validatedBy = EmployeeTypeValidator.class)
public #interface ValidateEmployeeType {
public String message() default "Invalid employeeType: It should be either Permanent or Vendor";
Class<?>[] groups() default { };
Class<? extends Payload>[] payload() default { };
}
Validation of EmployeeType
public class EmployeeTypeValidator implements ConstraintValidator<ValidateEmployeeType, String> {
#Override
public boolean isValid(String employeeType, ConstraintValidatorContext constraintValidatorContext) {
List<String> employeeTypes = Arrays.asList("Permanent", "vendor", "contractual");
boolean contains = employeeTypes.stream().anyMatch(i -> i.equalsIgnoreCase(employeeType));
return contains;
}
}
Entity of Employee
#Data
#AllArgsConstructor
#NoArgsConstructor
public class Employee {
private int empId;
#NotBlank(message = "firstName shouldn't be null or empty")
private String firstName;
#NotBlank(message = "lastName shouldn't be null or empty")
private String lastName;
#Past(message = "start shouldn't be before current date")
#JsonFormat(pattern = "dd-MM-yyyy")
private Date doj;
#NotNull(message = "department shouldn't be null")
#NotEmpty(message = "department shouldn't be empty")
private String dept;
#Email(message = "invalid email id")
private String email;
#ValidateEmployeeType
private String employeeType;
}
For Validation, We need Dependency in pom.xml
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-validation</artifactId>
</dependency>
Note: SNAPSHOT, M1, M2, M3, and M4 releases typically WORK IN PROGRESS. The Spring team is still working on them, Recommend NOT using them.

You can apply little trick over this.
Change all the string to same case: either upper or lower case
For C# Code:
List searchResults = sl.FindAll(s => s.ToUpper().Contains(seachKeyword.ToUpper()));
For Java Code:
import java.util.*;
class Test
{
public static void main(String[] args)
{
String itemCheck="check";
ArrayList<String> listItem =new ArrayList<String>();
listItem.add("Check");
listItem.add("check");
listItem.add("CHeck");
listItem.add("Make");
listItem.add("CHecK");
for(String item :listItem)
{
if(item.toUpperCase().equals(itemCheck.toUpperCase()))
{
System.out.println(item);
}
}
}
}

Highest performance for finding substrings

I have an array of strings (keywords), and I need to check how many of those strings existing within a larger string (text read from file). I need the check to be case insensitive.
At this moment what I do is this:
private void findKeywords() {
String body = email.getMessage();
for (String word : keywords) {
if (body.toLowerCase().contains(word.toLowerCase())) {
//some actions }
if (email.getSubject().contains(word)) {
//some actions
}
}
}
From reading questions in here another solution came up:
private void findKeywords() {
String body = email.getMessage();
for (String word : keywords) {
boolean body_match = Pattern.compile(Pattern.quote(word), Pattern.CASE_INSENSITIVE).matcher(body).find();
boolean subject_match = Pattern.compile(Pattern.quote(word), Pattern.CASE_INSENSITIVE).matcher(email.getSubject()).find();
if (body_match) {
rating++;
}
if (subject_match) {
rating++;
}
}
}
Which of these solutions is more efficient? Also is there another way to do this that is better? Any accepted solutions must be simple to implement(on par with the above) and preferably without external libraries as this is not very important issue in this case.

Both of the solutions seem viable to me. One improvement I would suggest is moving functions out of the loop. In your current code you are repeatedly doing actions such as toLowerCase() and Pattern.compile which you only need to do once.
Obviously there are much faster methods to solve this problem, but they require much more complex code than these 5-liners.

Better: build a single pattern with all keywords. Then search on that pattern. Assuming your keywords do not contain meta-characters (characters with special meanings in patterns), then use:
StringBuilder keywordRegex = new StringBuilder();
for (String w : keywords) {
keywordRegex.append("|"+w);
}
Pattern p = Pattern.compile(keywordRegex.substring(1));
Matcher m = new p.matcher(textToMatch);
while (m.find()) {
// match is at m.start(); word is m.group(0);
}
Much more efficient than iterating through all keywords: pattern compilation (once) will have generated an automata that looks for all keywords at once.

I think the explicit regex solution you mentioned would be more efficient since it doesn't have the toLowerCase operation, which would copy the input string in memory and make chars lowercase.
Both solutions should be practical and your question is mostly academic, but I think the regexes provide cleaner code.

If your email bodies are very large, writing a specialized case-insensitive contains may be justified, because you can avoid calling toUpperCase() on big strings:
static bool containsIgnoreCase(String big, String small) {
if (small == null || big == null || small.length() > big.length()) {
return false;
}
String smallLC = small.toLowerCase();
String smallUC = small.toUpperCase();
for (int i = 0; i < big.length(); ++i) {
if (matchesAt(big, i, smallLC, smallUC)) {
return true;
}
}
return false;
}
private static bool matchesAt(String big, int index, String lc, String uc) {
if (index + lc.length() > big.length()) {
return false;
}
for (int i = 0; i < lc.length(); ++i) {
char c = big.charAt(i + index);
if ((c != lc.charAt(i)) && (c != uc.charAt(i))) {
return false;
}
}
return true;
}

avoid code duplication

consider the following code:
if (matcher1.find()) {
String str = line.substring(matcher1.start()+7,matcher1.end()-1);
/*+7 and -1 indicate the prefix and suffix of the matcher... */
method1(str);
}
if (matcher2.find()) {
String str = line.substring(matcher2.start()+8,matcher2.end()-1);
method2(str);
}
...
I have n matchers, all matchers are independent (if one is true, it says nothing about the others...), for each matcher which is true - I am invoking a different method on the content it matched.
question: I do not like the code duplication nor the "magic numbers" in here, but I'm wondering if there is better way to do it...? (maybe Visitor Pattern?) any suggestions?

Create an abstract class, and add offset in subclass (with string processing too... depending of your requirement).
Then populate them in a list and process the list.
Here is a sample absract processor:
public abstract class AbsractProcessor {
public void find(Pattern pattern, String line) {
Matcher matcher = p.matcher(line);
if (matcher.find()) {
process(line.substring(matcher.start() + getStartOffset(), matcher.end() - getEndOffset()));
}
}
protected abstract int getStartOffset();
protected abstract int getEndOffset();
protected abstract void process(String str);
}

Simple mark the part of the regex that you want to pass to the method with a capturing group.
For example if your regex is foo.*bar and you are not interested in foo or bar, make the regex foo(.*)bar. Then always grab the group 1 from the Matcher.
Your code would then look like this:
method1(matcher1.group(1));
method2(matcher2.group(2));
...
One further step would be to replace your methods with classes implementing an like this:
public interface MatchingMethod {
String getRegex();
void apply(String result);
}
Then you can easily automate the task:
for (MatchingMethod mm : getAllMatchingMethods()) {
Pattern p = Pattern.compile(mm.getRegex());
Matcher m = p.matcher(input);
while (m.find()) {
mm.apply(m.group(1));
}
Note that if performance is important, then pre-compiling the Pattern can improve runtime if you apply this to many inputs.

You could make it a little bit shorter, but I the question is, is this really worth the effort:
private String getStringFromMatcher(Matcher matcher, int magicNumber) {
return line.subString(matcher.start() + magicNumber, matcher.end() - 1 )
}
if (matcher1.find()) {
method1(getStringFromMatcher(matcher1, 7);
}
if (matcher2.find()) {
method2.(getStringFromMatcher(mather2, 8);
}

use Cochard's solution combined with a factory (switch statement) with all the methodX methods. so you can call it like this:
Factory.CallMethodX(myEnum.MethodX, str)
you can assign the myEnum.MethodX in the population step of Cochard's solution

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java regex multiple patterns sequential matching - java

Prepare your Pattern objects pattern1, pattern2, pattern3 and store them at any container (array or list). Then loop over this container using usePattern(Pattern newPattern) method of Matcher object at each iteration.

Related

Java console input handling

Match a string against multiple regex patterns

Option to ignore case with .contains method?

Highest performance for finding substrings

avoid code duplication

Categories

Resources