Java regex - Determine which capture group was matched and count occurences - java

Suppose that I want to build a very large regex with capture groups on run-time based on user's decisions.
Simple example:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
static boolean findTag, findWordA, findOtherWord, findWordX;
static final String TAG = "(<[^>]+>)";
static final String WORD_A = "(wordA)";
static final String OTHER_WORD = "(anotherword)";
static final String WORD_X = "(wordX)";
static int tagCount = 0;
static int wordACount = 0;
static int otherWordCount = 0;
static int wordXCount = 0;
public static void main(String[] args) {
// Boolean options that will be supplied by the user
// make them all true in this example
findTag = true;
findWordA = true;
findOtherWord = true;
findWordX = true;
String input = "<b>this is an <i>input</i> string that contains wordX, wordX, anotherword and wordA</b>";
StringBuilder regex = new StringBuilder();
if (findTag)
regex.append(TAG + "|");
if (findWordA)
regex.append(WORD_A + "|");
if (findOtherWord)
regex.append(OTHER_WORD + "|");
if (findWordX)
regex.append(WORD_X + "|");
if (regex.length() > 0) {
regex.setLength(regex.length() - 1);
Pattern pattern = Pattern.compile(regex.toString());
System.out.println("\nWHOLE REGEX: " + regex.toString());
System.out.println("\nINPUT STRING: " + input);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
// only way I know of to find out which group was matched:
if (matcher.group(1) != null) tagCount++;
if (matcher.group(2) != null) wordACount++;
if (matcher.group(3) != null) otherWordCount++;
if (matcher.group(4) != null) wordXCount++;
}
System.out.println();
System.out.println("Group1 matches: " + tagCount);
System.out.println("Group2 matches: " + wordACount);
System.out.println("Group3 matches: " + otherWordCount);
System.out.println("Group4 matches: " + wordXCount);
} else {
System.out.println("No regex to build.");
}
}
}
The problem is that I can only count each group's matches only when I know beforehand which regex/groups the user wants to find.
Note that the full regex will contain a lot more capture groups and they will be more complex.
How can I determine which capture group was matched so that I can count each group's occurrences, without knowing beforehand which groups the user wants to find?

construct the regex to used named groups:
(?<tag>wordA)|(?<wordx>wordX)|(?<anotherword>anotherword)

Related

I have Return type masked Credit Card from which I need to Find Card Type

I am Passing creditCardNumber as 4242***4242
which is masked. How I can get Card Type based on Masked credit card number?
String regVisa = "^4[0-9]{2}(?:[0-9]{3})?$";
String reVisa = "(?:4[0-9]{12}(?:[0-9]{3})?$)";
String regMaster = "^5[1-5][0-9]{14}$";
String regExpress = "^3[47][0-9]{13}$";
String regDiners = "^3(?:0[0-5]|[68][0-9])[0-9]{11}$";
String regDiscover = "^6(?:011|5[0-9]{2})[0-9]{12}$";
String regJCB= "^(?:2131|1800|35\\d{3})\\d{11}$";
if(creditCardNumber.matches(regVisa))
return "visa";
if (creditCardNumber.matches(regMaster))
return "mastercard";
if (creditCardNumber.matches(regExpress))
return "amex";
if (creditCardNumber.matches(regDiners))
return "DINERS";
if (creditCardNumber.matches(regDiscover))
return "discover";
if (creditCardNumber.matches(regJCB))
return "jcb";
if (creditCardNumber.matches(reVisa))
return "VISA";
return "invalid";
Maybe, you have to design some "masked" expression for each, such as:
^4[0-9]{2}[0-9]\\*{3}[0-9]{4}$
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class re{
public static void main(String[] args){
final String regex = "^4[0-9]{2}[0-9]\\*{3}[0-9]{4}$";
final String string = "4242***4242\n"
+ "5242***4242";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
Output
Full match: 4242***4242
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

How to get a certain word out of a string

So i need a help to find certain word in the given string. So I've made a string and used a for loop to get the word i want, but it doesn't seem to be working, I only want to get the 2019 out of the string.
public void wStart() throws Exception {
String folder = "file/print/system/2019/12 - December";
String[] folderSplit = folder.split("/");
for (int i=3; i < folderSplit.length; i++) {
String folderResult = folderSplit[i];
System.out.println(folderResult);
}
}
If we only wish to get the year in a string with no other four digits number, we would be simply using this expression:
(\d{4})
Demo 1
or we would add additional boundaries, such as:
\/(\d{4})\/
Demo 2
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "(\\d{4})";
final String string = "file/print/system/2019/12 - December";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
RegEx Circuit
jex.im visualizes regular expressions:
If the year would always be the second to last path element, then just access that element:
String folder = "file/print/system/2019/12 - December";
String[] parts = folder.split("/");
String year = parts[parts.length-2];
If instead the year could be any path element, then we can try fishing it out:
String year = folder.replaceAll(".*\\b(\\d{4})\\b.*", "$1");
You can try this method, which I just added a statement that compares loop values with the string:
public static void wStart() throws Exception
{
String folder = "file/print/system/2019/12 - December";
String[] folderSplit = folder.split("/");
for(int i = 3; i < folderSplit.length; i++)
{
if(folderSplit[i] == "2019"){
String folderResult = folderSplit[i];
System.out.println("Excepted "+folderResult);
}else{
String folderResult = folderSplit[i];
System.out.println("Not Excepted "+folderResult);
}
}
}

Java regex in capturing groups

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public static void main(String args[]) {
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
} else {
System.out.println("NO MATCH");
}
}
}
The output
Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT300
Found value: 0
Please I want to understand this code in regular expression in java.
I think that you want to extract a number from the given string.
Pattern pattern = Pattern.compile("(?<prefix>\\D*)(?<number>\\d+)(?<suffix>\\D*)");
Matcher matcher = pattern.matcher("This order was placed for QT3000! OK?");
if (matcher.matches()) {
System.out.println("Prefix: " + matcher.group("prefix")); // Prefix: This order was placed for QT
System.out.println("Number: " + matcher.group("number")); // Number: 3000
System.out.println("Suffix: " + matcher.group("suffix")); // Suffix: ! OK?
} else
System.out.println("NO MATCH");
In case you want to capture whole string, then you should use Matcher.matcher() to check regular expression.
if(matcher.matches()) {
// string matches with regular expression
} else {
// string does not match with regular expression
}
If you want to find multiple matches, then you should use Matcher.hasNext().
while (matcher.matches()) {
// next match found
}
Demo at www.regex101.com
You can use the Scanner class to parse the integers inside of a string of text. I also added utility methods to grow and fit an array.
import java.util.*;
public class NumberExtractor {
public static void main(String[] args) {
String test = "This order was placed for QT3000! OK?";
int[] numbers = extractNumbers(test);
System.out.println(Arrays.toString(numbers)); // [ 3000 ]
}
public static int[] extractNumbers(String str) {
return extractNumbers(str, 10);
}
public static int[] extractNumbers(String str, int defaultSize) {
int count = 0;
int[] result = new int[defaultSize];
Scanner scanner = new Scanner(str);
scanner.useDelimiter("[^\\d]+"); // Number pattern
while (scanner.hasNextInt()) {
if (count == result.length) {
result = growArray(result, 1.5f);
}
result[count++] = scanner.nextInt();
}
scanner.close();
return clipArray(result, count);
}
private static int[] growArray(int[] original, float growthPercent) {
int[] copy = new int[(int) (original.length * growthPercent)];
System.arraycopy(original, 0, copy, 0, Math.min(original.length, copy.length));
return copy;
}
private static int[] clipArray(int[] original, int length) {
return clipArray(original, 0, length);
}
private static int[] clipArray(int[] original, int start, int length) {
int[] copy = new int[length];
System.arraycopy(original, start, copy, 0, length);
return copy;
}
}
Firstly, as Aaron explained, the regex engine matches all the input string by first group. Secondly it backtracks to the find the part of the string matches with the second group and by just one digit the second group would be satisfied. Eventually, the rest of the string would be matched by the last group(3rd one).
Now Consider below code based on your sample code with some changes on pattern and one more printing statement:
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d{4})(.*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3));
} else {
System.out.println("NO MATCH");
}
printing added statement: m.group(0) is equivalent of m.group() which means return all the matches of the given pattern in the given input string. By having such a pattern we have three other group indexes too. So by printing whole groups it can help us to find out what is happening right now by applying that pattern to that string.
pattern change: the change in the pattern can confirm the asserted statement about how the java regex engine works toward the original statement. So, the new pattern can select all the digits present in the input string and the output would change to below one:
Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?

split the string using regex when there is no delimeters using java

I have a string eg : DIGITAL SPORTS$8.95HD AO$9.95UCC REC$1.28RENTAL FEE$7.00LOCAL FRANCHISE$4.67
Now I want to split the string and create a map as
DIGITAL SPORTS $8.95
HD AO $9.95
UCC REC $1.28
RENTAL FEE $7.00
LOCAL FRANCHISE $4.67
I wrote a regular expression to split the string. Please find below piece of code
private static String ledgerString = "DIGITAL SPORTS$8.95HD AO$9.95UCC REC$1.28RENTAL FEE$7.00LOCAL FRANCHISE$4.67";
private static Pattern pattern1 = Pattern.compile("([[a-zA-Z ]*\\$[0-9]*.[0-9][0-9]]*)");
private static Matcher matcher = null;
public static void main(String[] args) {
// TODO Auto-generated method stub
matcher = pattern1.matcher(ledgerString.trim());
if (matcher.find()) {
System.out.println(matcher.group(1));
}
}
could you please some one help me how to extract the data from the above string
Your pattern in group 1 is in character class [...] which is probably now that you ware trying to do. Maybe change your pattern to
Pattern.compile("([a-zA-Z ]*)(\\$[0-9]*.[0-9][0-9]*)");
and use it like this
while (matcher.find()) {
System.out.println(matcher.group(1)+" "+matcher.group(2));
}
Also since Java7 you can name groups (?<name>...) so this is also possible
Pattern.compile("(?<name>[a-zA-Z ]*)(?<price>\\$[0-9]*.[0-9][0-9]*)");
while (matcher.find()) {
System.out.println(matcher.group("name")+" "+matcher.group("price"));
}
Output
DIGITAL SPORTS $8.95
HD AO $9.95
UCC REC $1.28
RENTAL FEE $7.00
LOCAL FRANCHISE $4.67
private static String ledgerString = "DIGITAL SPORTS$8.95HD AO$9.95UCC REC$1.28RENTAL FEE$7.00LOCAL FRANCHISE$4.67";
private static Pattern pattern1 = Pattern.compile("([a-zA-Z ]+)(\\$[0-9]*\\.[0-9][0-9])");
private static Matcher matcher = null;
public static void main(String[] args) {
// TODO Auto-generated method stub
matcher = pattern1.matcher(ledgerString.trim());
while (matcher.find()) {
System.out.println(matcher.group(1) + " " + matcher.group(2));
}
}
Try this:
The Regex: (?:(.+?)(\$\d*(?:\.\d+)?))
String regex = "(?:(.+?)(\\$\\d*(?:\\.\\d+)?))";
Demo
Maybe you could replace all occurrences of the "$" symbol with ",$" (comma dollar) symbol. After which you may split it using "," (comma). Do something like:
ledgerString = ledgerString.replaceAll("$", ",$");
String[] tokens = ledgerString.split(",");
The regex you want to use is one that matches each String that you are interested in. Therefore you want to use
Pattern.compile("([a-zA-Z]\$[0-9].[0-9][0-9])");
as this identifies each 'line' you're interested in. You can then use split("$") on each line to separate description from price.
Here is, yet another way of doing that:
String mainString = "DIGITAL SPORTS$8.95HD AO$9.95UCC REC$1.28RENTAL FEE$7.00LOCAL FRANCHISE$4.67";
String[] splittedArray = mainString.split("[0-9][A-Z]");
int currentLength = 0;
for(int i =0; i < splittedArray.length; i++) {
String splitedString;
if(i == 0) {
char endChar = mainString.charAt(splittedArray[i].length());
splitedString = splittedArray[i] + endChar;
currentLength += splittedArray[i].length();
}
else if(i == splittedArray.length -1){
char beginChar = mainString.charAt(currentLength + 1);
splitedString = beginChar + splittedArray[i];
}
else {
char beginChar = mainString.charAt(currentLength + 1);
char endChar = mainString.charAt(currentLength+splittedArray[i].length()+2);
splitedString = beginChar + splittedArray[i] + endChar;
currentLength += splittedArray[i].length()+2;
}
System.out.println(splitedString);
}

Problem with formatting a regular expression

I am trying to format this regular expression into a String pattern
(^(234\d{7,12})$)|(\b(234\d{7,12}\b\s*,\s*)(\b(234\d{7,12})\b)*)
This is an accurate regex (a has been validated in regexpal.com as being so)
But when I try it in java, it shows errors. And even if I escape it using //, It still dosen't give an accurate Logic. Please. How can I solve this.
package MCast;
import java.util.StringTokenizer;
import java.util.regex.*;
import javax.swing.JOptionPane;
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
/**
*
* #author nnanna
*/
public class Verify {
private static final String STOP = "STOP";
private static final String VALID = "Valid Java Identifier";
private static final String INVALID = " Not Valid Number Format: Must be of the form 23400000023";
private static final String VALID_IDENTIFIER_PATTERN = "(^(234\\d{7,12})$)|(\\b(234\\d{7,12}\\b\\s*,\\s*)(\\b(234\\d{7,12})\\b)*)";
// private static final String VALID_IDENTIFIER_PATTERN2 = "[[2-3][2-3][3-4][0-9]*[ ][2-3][2-3][3-4][0-9]*]*";//[,][2[0-9]]{11}]*";
static String str;
boolean reply;
public Verify() {
}
public int countNo(String stringToCount) {
int j = stringToCount.length();
int count = 0;
for (int i = 0; i < j; i++) {
if (stringToCount.charAt(i) == ',') {
count += 1;
}
}
// System.out.println(count);
return count + 1;
}
public boolean pattern(String str){
Matcher match;
Pattern pattern = Pattern.compile(VALID_IDENTIFIER_PATTERN);
match = pattern.matcher(str);
if (match.matches()) {
reply = true;
JOptionPane.showMessageDialog(null, str + ":\n" + reply + "\n" +countNo(str));
} else {
reply = false;
JOptionPane.showMessageDialog(null, str + ":\n" + reply + "\n");
}
return reply;
}
public static void main(String args[]){
Verify a = new Verify();
String test1 = "23439869450";
String test2 = "23439869450,23439869450";
String test3 = "23439869450,23439869450,23439869450";
String test4 = "23439869450,23439869450,23439869450,23439869450,23439869450,23439869450";
String test5 = "07039869450,23439869450,23439869450,23439869450,23439869450,23439869450";
// a.pattern(test1);
// System.out.println(a.countNo(test1));
a.pattern(test3);
System.out.println(a.countNo(test2));
System.out.println(a.pattern(test1));
System.out.println(a.pattern(test2));
System.out.println(a.pattern(test3));
System.out.println(a.pattern(test4));
System.out.println(a.pattern(test4));
//
// a.pattern(null);
// System.out.println(a.countNo(test1));
}
}
You need to double the backslashes. And your regex isn't doing what you think it is. Use this:
Pattern regex = Pattern.compile(
"^\n" +
"234\\d{7,12}\\s*,\\s*234\\d{7,12} # match a pair\n" +
"(?:\\s*,\\s*234\\d{7,12}\\s*,\\s*234\\d{7,12})* # optionally match more pairs\n" +
"$",
Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
foundMatch = regexMatcher.matches();
This allows pairs of numbers starting with 234; 10-15 digits long. All numbers must be comma-separated.
don't you mean escaping the \ with \\ and not //
ex : \\d

Categories

Resources