Java regex in capturing groups - java

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public static void main(String args[]) {
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
} else {
System.out.println("NO MATCH");
}
}
}
The output
Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT300
Found value: 0
Please I want to understand this code in regular expression in java.

I think that you want to extract a number from the given string.
Pattern pattern = Pattern.compile("(?<prefix>\\D*)(?<number>\\d+)(?<suffix>\\D*)");
Matcher matcher = pattern.matcher("This order was placed for QT3000! OK?");
if (matcher.matches()) {
System.out.println("Prefix: " + matcher.group("prefix")); // Prefix: This order was placed for QT
System.out.println("Number: " + matcher.group("number")); // Number: 3000
System.out.println("Suffix: " + matcher.group("suffix")); // Suffix: ! OK?
} else
System.out.println("NO MATCH");
In case you want to capture whole string, then you should use Matcher.matcher() to check regular expression.
if(matcher.matches()) {
// string matches with regular expression
} else {
// string does not match with regular expression
}
If you want to find multiple matches, then you should use Matcher.hasNext().
while (matcher.matches()) {
// next match found
}
Demo at www.regex101.com

You can use the Scanner class to parse the integers inside of a string of text. I also added utility methods to grow and fit an array.
import java.util.*;
public class NumberExtractor {
public static void main(String[] args) {
String test = "This order was placed for QT3000! OK?";
int[] numbers = extractNumbers(test);
System.out.println(Arrays.toString(numbers)); // [ 3000 ]
}
public static int[] extractNumbers(String str) {
return extractNumbers(str, 10);
}
public static int[] extractNumbers(String str, int defaultSize) {
int count = 0;
int[] result = new int[defaultSize];
Scanner scanner = new Scanner(str);
scanner.useDelimiter("[^\\d]+"); // Number pattern
while (scanner.hasNextInt()) {
if (count == result.length) {
result = growArray(result, 1.5f);
}
result[count++] = scanner.nextInt();
}
scanner.close();
return clipArray(result, count);
}
private static int[] growArray(int[] original, float growthPercent) {
int[] copy = new int[(int) (original.length * growthPercent)];
System.arraycopy(original, 0, copy, 0, Math.min(original.length, copy.length));
return copy;
}
private static int[] clipArray(int[] original, int length) {
return clipArray(original, 0, length);
}
private static int[] clipArray(int[] original, int start, int length) {
int[] copy = new int[length];
System.arraycopy(original, start, copy, 0, length);
return copy;
}
}

Firstly, as Aaron explained, the regex engine matches all the input string by first group. Secondly it backtracks to the find the part of the string matches with the second group and by just one digit the second group would be satisfied. Eventually, the rest of the string would be matched by the last group(3rd one).
Now Consider below code based on your sample code with some changes on pattern and one more printing statement:
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d{4})(.*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3));
} else {
System.out.println("NO MATCH");
}
printing added statement: m.group(0) is equivalent of m.group() which means return all the matches of the given pattern in the given input string. By having such a pattern we have three other group indexes too. So by printing whole groups it can help us to find out what is happening right now by applying that pattern to that string.
pattern change: the change in the pattern can confirm the asserted statement about how the java regex engine works toward the original statement. So, the new pattern can select all the digits present in the input string and the output would change to below one:
Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?

Related

Java regex - Determine which capture group was matched and count occurences

Suppose that I want to build a very large regex with capture groups on run-time based on user's decisions.
Simple example:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
static boolean findTag, findWordA, findOtherWord, findWordX;
static final String TAG = "(<[^>]+>)";
static final String WORD_A = "(wordA)";
static final String OTHER_WORD = "(anotherword)";
static final String WORD_X = "(wordX)";
static int tagCount = 0;
static int wordACount = 0;
static int otherWordCount = 0;
static int wordXCount = 0;
public static void main(String[] args) {
// Boolean options that will be supplied by the user
// make them all true in this example
findTag = true;
findWordA = true;
findOtherWord = true;
findWordX = true;
String input = "<b>this is an <i>input</i> string that contains wordX, wordX, anotherword and wordA</b>";
StringBuilder regex = new StringBuilder();
if (findTag)
regex.append(TAG + "|");
if (findWordA)
regex.append(WORD_A + "|");
if (findOtherWord)
regex.append(OTHER_WORD + "|");
if (findWordX)
regex.append(WORD_X + "|");
if (regex.length() > 0) {
regex.setLength(regex.length() - 1);
Pattern pattern = Pattern.compile(regex.toString());
System.out.println("\nWHOLE REGEX: " + regex.toString());
System.out.println("\nINPUT STRING: " + input);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
// only way I know of to find out which group was matched:
if (matcher.group(1) != null) tagCount++;
if (matcher.group(2) != null) wordACount++;
if (matcher.group(3) != null) otherWordCount++;
if (matcher.group(4) != null) wordXCount++;
}
System.out.println();
System.out.println("Group1 matches: " + tagCount);
System.out.println("Group2 matches: " + wordACount);
System.out.println("Group3 matches: " + otherWordCount);
System.out.println("Group4 matches: " + wordXCount);
} else {
System.out.println("No regex to build.");
}
}
}
The problem is that I can only count each group's matches only when I know beforehand which regex/groups the user wants to find.
Note that the full regex will contain a lot more capture groups and they will be more complex.
How can I determine which capture group was matched so that I can count each group's occurrences, without knowing beforehand which groups the user wants to find?
construct the regex to used named groups:
(?<tag>wordA)|(?<wordx>wordX)|(?<anotherword>anotherword)

How to extract a multiple quoted substrings in Java

I have a string that has multiple substring which has to be extracted. Strings which will be extracted is between ' character.
I could only extract the first or the last one when I use indexOf or regex.
How could I extract them and put them into array or list without parsing the same string only?
resultData = "Error 205: 'x' data is not crawled yet. Check 'y' and 'z' data and update dataset 't'";
I have a tried below;
protected static String errorsTPrinted(String errStr, int errCode) {
if (errCode== 202 ) {
ArrayList<String> ar = new ArrayList<String>();
Pattern p = Pattern.compile("'(.*?)'");
Matcher m = p.matcher(errStr);
String text;
for (int i = 0; i < errStr.length(); i++) {
m.find();
text = m.group(1);
ar.add(text);
}
return errStr = "Err 202: " + ar.get(0) + " ... " + ar.get(1) + " ..." + ar.get(2) + " ... " + ar.get(3);
}
Edit
I used #MinecraftShamrock 's approach.
if (errCode== 202 ) {
List<String> getQuotet = getQuotet(errStr, '\'');
return errStr = "Err 202: " + getQuotet.get(0) + " ... " + getQuotet.get(1) + " ..." + getQuotet.get(2) + " ... " + getQuotet.get(3);
}
You could use this very straightforward algorithm to do so and avoid regex (as one can't be 100% sure about its complexity):
public List<String> getQuotet(final String input, final char quote) {
final ArrayList<String> result = new ArrayList<>();
int n = -1;
for(int i = 0; i < input.length(); i++) {
if(input.charAt(i) == quote) {
if(n == -1) { //not currently inside quote -> start new quote
n = i + 1;
} else { //close current quote
result.add(input.substring(n, i));
n = -1;
}
}
}
return result;
}
This works with any desired quote-character and has a runtime complexity of O(n). If the string ends with an open quote, it will not be included. However, this can be added quite easily.
I think this is preferable over regex as you can ba absolutely sure about its complexity. Also, it works with a minimum of library classes. If you care about efficiency for big inputs, use this.
And last but not least, it does absolutely not care about what is between two quote characters so it works with any input string.
Simply use the pattern:
'([^']++)'
And a Matcher like so:
final Pattern pattern = Pattern.compile("'([^']++)'");
final Matcher matcher = pattern.matcher(resultData);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
This loops through each match in the String and prints it.
Output:
x
y
z
t
Here is a simple approach (assuming there are no escaping characters etc.):
// Compile a pattern to find the wanted strings
Pattern p = Pattern.compile("'([^']+)'");
// Create a matcher for given input
Matcher m = p.matcher(resultData);
// A list to put the found strings into
List<String> list = new ArrayList<String>();
// Loop over all occurrences
while(m.find()) {
// Retrieve the matched text
String text = m.group(1);
// Do something with the text, e.g. add it to a List
list.add(text);
}

split the string using regex when there is no delimeters using java

I have a string eg : DIGITAL SPORTS$8.95HD AO$9.95UCC REC$1.28RENTAL FEE$7.00LOCAL FRANCHISE$4.67
Now I want to split the string and create a map as
DIGITAL SPORTS $8.95
HD AO $9.95
UCC REC $1.28
RENTAL FEE $7.00
LOCAL FRANCHISE $4.67
I wrote a regular expression to split the string. Please find below piece of code
private static String ledgerString = "DIGITAL SPORTS$8.95HD AO$9.95UCC REC$1.28RENTAL FEE$7.00LOCAL FRANCHISE$4.67";
private static Pattern pattern1 = Pattern.compile("([[a-zA-Z ]*\\$[0-9]*.[0-9][0-9]]*)");
private static Matcher matcher = null;
public static void main(String[] args) {
// TODO Auto-generated method stub
matcher = pattern1.matcher(ledgerString.trim());
if (matcher.find()) {
System.out.println(matcher.group(1));
}
}
could you please some one help me how to extract the data from the above string
Your pattern in group 1 is in character class [...] which is probably now that you ware trying to do. Maybe change your pattern to
Pattern.compile("([a-zA-Z ]*)(\\$[0-9]*.[0-9][0-9]*)");
and use it like this
while (matcher.find()) {
System.out.println(matcher.group(1)+" "+matcher.group(2));
}
Also since Java7 you can name groups (?<name>...) so this is also possible
Pattern.compile("(?<name>[a-zA-Z ]*)(?<price>\\$[0-9]*.[0-9][0-9]*)");
while (matcher.find()) {
System.out.println(matcher.group("name")+" "+matcher.group("price"));
}
Output
DIGITAL SPORTS $8.95
HD AO $9.95
UCC REC $1.28
RENTAL FEE $7.00
LOCAL FRANCHISE $4.67
private static String ledgerString = "DIGITAL SPORTS$8.95HD AO$9.95UCC REC$1.28RENTAL FEE$7.00LOCAL FRANCHISE$4.67";
private static Pattern pattern1 = Pattern.compile("([a-zA-Z ]+)(\\$[0-9]*\\.[0-9][0-9])");
private static Matcher matcher = null;
public static void main(String[] args) {
// TODO Auto-generated method stub
matcher = pattern1.matcher(ledgerString.trim());
while (matcher.find()) {
System.out.println(matcher.group(1) + " " + matcher.group(2));
}
}
Try this:
The Regex: (?:(.+?)(\$\d*(?:\.\d+)?))
String regex = "(?:(.+?)(\\$\\d*(?:\\.\\d+)?))";
Demo
Maybe you could replace all occurrences of the "$" symbol with ",$" (comma dollar) symbol. After which you may split it using "," (comma). Do something like:
ledgerString = ledgerString.replaceAll("$", ",$");
String[] tokens = ledgerString.split(",");
The regex you want to use is one that matches each String that you are interested in. Therefore you want to use
Pattern.compile("([a-zA-Z]\$[0-9].[0-9][0-9])");
as this identifies each 'line' you're interested in. You can then use split("$") on each line to separate description from price.
Here is, yet another way of doing that:
String mainString = "DIGITAL SPORTS$8.95HD AO$9.95UCC REC$1.28RENTAL FEE$7.00LOCAL FRANCHISE$4.67";
String[] splittedArray = mainString.split("[0-9][A-Z]");
int currentLength = 0;
for(int i =0; i < splittedArray.length; i++) {
String splitedString;
if(i == 0) {
char endChar = mainString.charAt(splittedArray[i].length());
splitedString = splittedArray[i] + endChar;
currentLength += splittedArray[i].length();
}
else if(i == splittedArray.length -1){
char beginChar = mainString.charAt(currentLength + 1);
splitedString = beginChar + splittedArray[i];
}
else {
char beginChar = mainString.charAt(currentLength + 1);
char endChar = mainString.charAt(currentLength+splittedArray[i].length()+2);
splitedString = beginChar + splittedArray[i] + endChar;
currentLength += splittedArray[i].length()+2;
}
System.out.println(splitedString);
}

Palindrome Checker Java boolean insight

So I'm making a palindrome checker in java, and i've seemed to hit a roadblock this is my code so far:
public class StringUtil
{
public static void main(String[] args)
{
System.out.println("Welcome to String Util.");
Scanner word = new Scanner(System.in);
String X = word.nextLine();
String R = palindrome(X);
System.out.println();
System.out.println("Original Word: " + X);
System.out.println("Palindrome: " + R);
}
public static boolean palindrome(String word)
{
int t = word.length(); //length of the word as a number
int r = 0;
if(word.charAt(t) == word.charAt(r))
{
return true;
}
else
return false;
}
so far I only want it to check if the first letter is the same as the last, but when i compile it i get an incompatible types error on "String R = palindrome(X);" How would i get it to print true or false on the output statement below it?
Your palindrome method returns a boolean, but you're attempting to assign it to a String. Change the definition of the R variable to boolean.
Since the characters are 0-indexed in a string (or array) in java, the last character is at length - 1
try
int t = word.length() - 1;
Oops, that's not the problem you're having. However, you would notice it immediately once the type error is resolved.
word.charAt(t - 1), otherwise you over counted the string, also return "true"; and return "false" if you'd like to use String as your result type;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public static void main(String args[]) {
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
} else {
System.out.println("NO MATCH");
}
}
}

java comparing the first 2 characters

I want to find in a Text the starting is number followed by .
example:
1.
11.
111.
My code for x. ( x is number) this is working . issue is when x is more than 2 digits.
x= Character.isDigit(line.charAt(0));
if(x)
if (line.charAt(1)=='.')
How can I extend this logic to see if x is a integer followed by .
My first issue is :
I need to fond the given line has x. format or not where x is a integr
You can use the regex [0-9]\. to see if there exists a digit followed by a period in the string.
If you need to ensure that the pattern is always at the beginning of the string you can use ^[0-9]+\.
Why not using a regular expression?
([0-9]+)[.]
You can use regex:
Pattern.compile("C=(\\d+\\.\\d+)")
However, more general would be:
Pattern.compile("[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?")
Now to work with the Pattern you do something like:
Pattern pattern = Pattern.compile("[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?");
Matcher matcher = pattern.matcher(EXAMPLE_TEST);
// Check all occurances
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end() + " ");
System.out.println(matcher.group());
}
Edit: Whoops, misread.
try this:
public static boolean prefix(String s) {
return s.matches("[0-9]+\\.");
}
public class ParsingData {
public static void main(String[] args) {
//String one = "1.";
String one = "11.";
int index = one.indexOf(".");
String num = (String) one.subSequence(0, index);
if(isInteger(num)) {
int number = Integer.parseInt(num);
System.out.println(number);
}
else
System.out.println("Not an int");
}
public static boolean isInteger(String string) {
try {
Integer.valueOf(string);
return true;
} catch (NumberFormatException e) {
return false;
}
}
}

Categories

Resources