Getting match of Group with Asterisk?

Getting match of Group with Asterisk? - java

How can I get the content for a group with an asterisk?
For example I'd like to pare a comma separated list, e. g. 1,2,3,4,5.
private static final String LIST_REGEX = "^(\\d+)(,\\d+)*$";
private static final Pattern LIST_PATTERN = Pattern.compile(LIST_REGEX);
public static void main(String[] args) {
final String list = "1,2,3,4,5";
final Matcher matcher = LIST_PATTERN.matcher(list);
System.out.println(matcher.matches());
for (int i = 0, n = matcher.groupCount(); i < n; i++) {
System.out.println(i + "\t" + matcher.group(i));
}
}
And the output is
true
0 1,2,3,4,5
1 1
How can I get every single entry, i. e. 1, 2, 3, ...?
I am searching for a common solution. This is only a demonstrative example.
Please imagine a more complicated regex like ^\\[(\\d+)(,\\d+)*\\]$ to match a list like [1,2,3,4,5]

You can use String.split().
for (String segment : "1,2,3,4,5".split(","))
System.out.println(segment);
Or you can repeatedly capture with assertion:
Pattern pattern = Pattern.compile("(\\d),?");
for (Matcher m = pattern.matcher("1,2,3,4,5");; m.find())
m.group(1);
For your second example you added you can do a similar match.
for (String segment : "!!!!![1,2,3,4,5] //"
.replaceFirst("^\\D*(\\d(?:,\\d+)*)\\D*$", "$1")
.split(","))
System.out.println(segment);
I made an online code demo. I hope this is what you wanted.
how can I get all the matches (zero, one or more) for a arbitary group with an asterisk (xyz)*? [The group is repeated and I would like to get every repeated capture.]
No, you cannot. Regex Capture Groups and Back-References tells why:
The Returned Value for a Given Group is the Last One Captured
Since a capture group with a quantifier holds on to its number, what value does the engine return when you inspect the group? All engines return the last value captured. For instance, if you match the string A_B_C_D_ with ([A-Z]_)+, when you inspect the match, Group 1 will be D_. With the exception of the .NET engine, all intermediate values are lost. In essence, Group 1 gets overwritten each time its pattern is matched.

I assume you may be looking for something like the following, this will handle both of your examples.
private static final String LIST_REGEX = "^\\[?(\\d+(?:,\\d+)*)\\]?$";
private static final Pattern LIST_PATTERN = Pattern.compile(LIST_REGEX);
public static void main(String[] args) {
final String list = "[1,2,3,4,5]";
final Matcher matcher = LIST_PATTERN.matcher(list);
matcher.find();
int i = 0;
String[] vals = matcher.group(1).split(",");
System.out.println(matcher.matches());
System.out.println(i + "\t" + matcher.group(1));
for (String x : vals) {
i++;
System.out.println(i + "\t" + x);
}
}
Output
true
0 1,2,3,4,5
1 1
2 2
3 3
4 4
5 5

Related

What RegEx separates terms of Polynomial

I have a String 5x^3-2x^2+5x
I want a regex which splits this string as
5x^3,
-2x^2,
5x
I tried "(-)|(\\+)",
but this did not work. As it did not consider negative power terms.

You can split your string using this regex,
\+|(?=-)
The way this works is, it splits the string consuming + character but if there is - then it splits using - but doesn't consume - as that is lookahead.
Check out this Java code,
String s = "5x^3-2x^2+5x";
System.out.println(Arrays.toString(s.split("\\+|(?=-)")));
Gives your expected output below,
[5x^3, -2x^2, 5x]
Edit:
Although in one of OP's comment in his post he said, there won't be negative powers but just in case you have negative powers as well, you can use this regex which handles negative powers as well,
\+|(?<!\^)(?=-)
Check this updated Java code,
List<String> list = Arrays.asList("5x^3-2x^2+5x", "5x^3-2x^-2+5x");
for (String s : list) {
System.out.println(s + " --> " +Arrays.toString(s.split("\\+|(?<!\\^)(?=-)")));
}
New output,
5x^3-2x^2+5x --> [5x^3, -2x^2, 5x]
5x^3-2x^-2+5x --> [5x^3, -2x^-2, 5x]

Maybe,
-?[^\r\n+-]+(?=[+-]|$)
or some similar expressions might have been worked OK too, just in case you might have had constants in the equations.
Demo
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "-?[^\r\n+-]+(?=[+-]|$)";
final String string = "5x^3-2x^2+5x\n"
+ "5x^3-2x^2+5x-5\n"
+ "-5x^3-2x^2+5x+5";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

In below program , You can get break of every single variable. So debug it and combine regex as you need it. It will work fine for all input.
import java.util.regex.*;
class Main
{
public static void main(String[] args)
{
String txt="5x^3-2x^2+5x";
String re1="([-+]\\d+)"; // Integer Number 1
String re2="((?:[a-z][a-z0-9_]*))"; // Variable Name 1
String re3="(\\^)"; // Any Single Character 1
String re4="([-+]\\d+)"; // Integer Number 2
String re5="([-+]\\d+)"; // Integer Number 1
String re6="((?:[a-z][a-z0-9_]*))"; // Variable Name 2
String re7="(\\^)"; // Any Single Character 2
String re8="([-+]\\d+)"; // Integer Number 3
String re9="([-+]\\d+)"; // Integer Number 2
String re10="((?:[a-z][a-z0-9_]*))"; // Variable Name 3
Pattern p = Pattern.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9+re10,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String int1=m.group(1);
String var1=m.group(2);
String c1=m.group(3);
String int2=m.group(4);
String signed_int1=m.group(5);
String var2=m.group(6);
String c2=m.group(7);
String int3=m.group(8);
String signed_int2=m.group(9);
String var3=m.group(10);
System.out.print("("+int1.toString()+")"+"("+var1.toString()+")"+"("+c1.toString()+")"+"("+int2.toString()+")"+"("+signed_int1.toString()+")"+"("+var2.toString()+")"+"("+c2.toString()+")"+"("+int3.toString()+")"+"("+signed_int2.toString()+")"+"("+var3.toString()+")"+"\n");
}
}
}

Add all the numbers which have + symbol and replace the same with the added value

I would like to group all the numbers to add if they are supposed to be added.
Test String: '82+18-10.2+3+37=6 + 7
Here 82+18 cab be added and replaced with the value as '100.
Then test string will become: 100-10.2+3+37=6 +7
Again 2+3+37 can be added and replaced in the test string as
follows: 100-10.42=6 +7
Now 6 +7 cannot be done because there is a space after value
'6'.
My idea was to extract the numbers which are supposed to be added like below:
82+18
2+3+37
And then add it and replace the same using the replace() method in string
Tried Regex:
(?=([0-9]{1,}[\\+]{1}[0-9]{1,}))
Sample Input:
82+18-10.2+3+37=6 + 7
Java Code for identifying the groups to be added and replaced:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class ReplaceAddition {
static String regex = "(?=([0-9]{1,}[\\+]{1}[0-9]{1,}))";
static String testStr = "82+18-10.2+3+37=6 + 7 ";
public static void main(String[] args) {
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(testStr);
while (matcher.find()) {
System.out.println(matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
}
}
Output:
82+18
2+18
2+3
3+37
Couldn't understand where I'm missing. Help would be appreciated...

I tried simplifying the regexp by removing the positive lookahead operator
(?=...)
And the enclosing parenthesis
(...)
After these changes, the regexp is as follows
static String regex = "[0-9]{1,}[\\+]{1}[0-9]{1,}";
When I run it, I'm getting the following result:
82+18
2+3
This is closer to the expected, but still not perfect, because we're getting "2+3" instead of 2+3+37. In order to handle any number of added numbers instead of just two, the expression can be further tuned up to:
static String regex = "[0-9]{1,}(?:[\\+]{1}[0-9]{1,})+";
What I added here is a non-capturing group
(?:...)
with a plus sign meaning one or more repetition. Now the program produces the output
82+18
2+3+37
as expected.

Another solution is like so:
public static void main(String[] args)
{
final var p = Pattern.compile("(?:\\d+(?:\\+\\d+)+)");
var text = new StringBuilder("82+18-10.2+3+37=6 + 7 ");
var m = p.matcher(text);
while(m.find())
{
var sum = 0;
var split = m.group(0).split("\\+");
for(var str : split)
{
sum += Integer.parseInt(str);
}
text.replace(m.start(0),m.end(0),""+sum);
m.reset();
}
System.out.println(text);
}
The regex (?:\\d+(?:\\+\\d+)+) finds:
(?: Noncapturing
\\d+ Any number of digits, followed by
(?: Noncapturing
\\+ A plus symbol, and
\\d+ Any number of digits
)+ Any number of times
) Once
So, this regex matches an instance of any number of numbers separated by '+'.

How to skip a portion of a line in Java- Don't know what I am doing

I need to only use every other number in a string of numbers.
This is how the xml file content comes to us, but I only want to use The first group and then every other group.
The second number in each group can be ignored. The most important number are the first like 1,3,5 and 29
Can you help? Each group equals “x”:”x”,
<CatcList>{“1":"15","2":"15","3":"25","4":"25","5":"35","6":"35","29":"10","30":"10"}</CatcList>
Right now my script looks like this, but I am not the one who wrote it.
I only included the portion that would be needed. The StartPage would be the variable used.
If you have knowledge of how to add 1 to the EndPage Integer, that would be very helpful as well.
Thank you!
Util.StringList xs;
line.parseLine(",", "", xs);
for (Int i=0; i<xs.Size; i++) {
Int qty = xs[i].right(xs[i].Length - xs[i].find(":")-1).toInt()-1;
for (Int j=0; j<qty; j++) {
Output_0.File.DocId = product;
Output_0.File.ImagePath = Image;
Output_0.File.ImagePath1 = Image;
Output_0.File.StartPage = xs[i].left(xs[i].find(("-"))).toInt()-1;
Output_0.File.EndPage = xs[i].mid(xs[i].find("-")+1, (xs[i].find(":") - xs[i].find("-")-1)).toInt()-0;
Output_0.File.Quantity = qty.toString();
Output_0.File.commit();

You can use Pattern with a loop and some condition to extract this information :
String string = "<CatcList>{\"1\":\"15\",\"2\":\"15\",\"3\":\"25\",\"4\":\"25\","
+ "\"5\":\"35\",\"6\":\"35\",\"29\":\"10\",\"30\":\"10\"}</CatcList> ";
String regex = "\"(\\d+)\":\"\\d+\"";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
int index = 0;
while (matcher.find()) {
if (index % 2 == 0) {//From your question you have to get alternative results 1 3 5 ...
System.out.println(matcher.group(1) + " ---> " + matcher.group());
}
index++;
}
Outputs
1 ---> "1":"15"
3 ---> "3":"25"
5 ---> "5":"35"
29 ---> "29":"10"
The regex "(\d+)":"\d+" should match any combination of a "number":"number" i used (\d+) so I can get only the information of this group.

That XML value looks like a JSON map. All that .right, .mid, and .left code looks pretty confusing to me without details of how those methods work. Does something like this seem more clear:
// leaving out all the backslash escapes of the embedded quotes
String xmlElement = "{"1":"15","2":"15","3":"25","4":"25","5":"35","6":"35","29":"10","30":"10"}";
xmlElement = xmlElement.replaceAll("[}{]", "");
String[] mapEntryStrings = xmlElement.split(",");
Map<Integer, String> oddStartPages = new HashMap<Integer, String>();
for (String entry : mapEntryStrings) {
String[] keyAndValue = entry.split(":");
int key = Integer.parseInt(keyAndValue[0]);
if (key % 2 == 1) {// if odd
oddStartPages.put(key, keyAndValue[1]);
}
}
Then, the set of keys in the oddStartPages Map is exactly the set of first numbers in your "first and every other group" requirement.

Try this:
,?("(?<num>\d+)":"\d+"),?("\d+":"\d+")?
The group called num will contain every other occurrence of the first part of "x":"x"
so for the values:
"1":"14","2":"14","3":"14","4":"24","5":"33","6":"44","7":"55"
the group called 'num' will contain 1 3 5 and 7.
see example here
Edit: Once you have the numbers extracted, you can do with it what you need:
Pattern datePatt = Pattern.compile(",?(\"(?<num>\\d+)\":\"\\d+\"),?(\"\\d+\":\"\\d+\")?");
String dateStr = "\"1\":\"14\",\"2\":\"14\",\"3\":\"14\",\"4\":\"24\",\"5\":\"33\",\"6\":\"44\",\"7\":\"55\"";
Matcher m = datePatt.matcher(dateStr);
while (m.find()) {
System.out.printf("%s%n", m.group("num"));
}

add +n to the name of some Strings id ends with "v + number"

I have an array of Strings
Value[0] = "Documento v1.docx";
Value[1] = "Some_things.pdf";
Value[2] = "Cosasv12.doc";
Value[3] = "Document16.docx";
Value[4] = "Nodoc";
I want to change the name of the document and add +1 to the version of every document. But only the Strings of documents that ends with v{number} (v1, v12, etc).
I used the regex [v]+a*^ but only i obtain the "v" and not the number after the "v"

If all your strings ending with v + digits + extension are to be processed, use a pattern like v(\\d+)(?=\\.[^.]+$) and then manipulate the value of Group 1 inside the Matcher#appendReplacement method:
String[] strs = { "Documento v1.docx", "Some_things.pdf", "Cosasv12.doc", "Document16.docx", "Nodoc"};
Pattern pat = Pattern.compile("v(\\d+)(?=\\.[^.]+$)");
for (String s: strs) {
StringBuffer result = new StringBuffer();
Matcher m = pat.matcher(s);
while (m.find()) {
int n = 1 + Integer.parseInt(m.group(1));
m.appendReplacement(result, "v" + n);
}
m.appendTail(result);
System.out.println(result.toString());
}
See the Java demo
Output:
Documento v2.docx
Some_things.pdf
Cosasv13.doc
Document16.docx
Nodoc
Pattern details
v - a v
(\d+) - Group 1 value: one or more digits
(?=\.[^.]+$) - that are followed with a literal . and then 1+ chars other than . up to the end of the string.

The Regex v\d+ should match on the letter v, followed by a number (please note that you may need to write it as v\\d+ when assigning it to a String). Further enhancement of the Regex depends in what your code looks like. You may want to to wrap in a Capturing Group like (v\d+), or even (v(\d+)).
The first reference a quick search turns up is
https://docs.oracle.com/javase/tutorial/essential/regex/ ,
which should be a good starting point.

Try a regex like this:
([v])([1-9]{1,3})(\.)
notice that I've already included the point in order to have less "collisions" and a maximum of 999 versions({1,3}).
Further more I've used 3 different groups so that you can easily retrieve the version number increase it and replace the string.
Example:
String regex = ;
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(time);
if(matcher.matches()){
int version = matcher.group(2); // don't remember if is 0 or 1 based
}

Splitting up input using regular expressions in Java

I am making a program that lets a user input a chemical for example C9H11N02. When they enter that I want to split it up into pieces so I can have it like C9, H11, N, 02. When I have it like this I want to make changes to it so I can make it C10H12N203 and then put it back together. This is what I have done so far. using the regular expression I have used I can extract the integer value, but how would I go about get C10, H11 etc..?
System.out.println("Enter Data");
Scanner k = new Scanner( System.in );
String input = k.nextLine();
String reg = "\\s\\s\\s";
String [] data;
data = input.split( reg );
int m = Integer.parseInt( data[0] );
int n = Integer.parseInt( data[1] );

It can be done using look arounds:
String[] parts = input.split("(?<=.)(?=[A-Z])");
Look arounds are zero-width, non-consuming assertions.
This regex splits the input where the two look arounds match:
(?<=.) means "there is a preceding character" (ie not at the start of input)
(?=[A-Z]) means "the next character is a capital letter" (All elements start with A-Z)
Here's a test, including a double-character symbol for some edge cases:
public static void main(String[] args) {
String input = "C9KrBr2H11NO2";
String[] parts = input.split("(?<=.)(?=[A-Z])");
System.out.println(Arrays.toString(parts));
}
Output:
[C9, Kr, Br2, H11, N, O2]
If you then wanted to split up the individual components, use a nested call to split():
public static void main(String[] args) {
String input = "C9KrBr2H11NO2";
for (String component : input.split("(?<=.)(?=[A-Z])")) {
// split on non-digit/digit boundary
String[] symbolAndNumber = component.split("(?<!\\d)(?=\\d)");
String element = symbolAndNumber[0];
// elements without numbers won't be split
String count = symbolAndNumber.length == 1 ? "1" : symbolAndNumber[1];
System.out.println(element + " x " + count);
}
}
Output:
C x 9
Kr x 1
Br x 2
H x 11
N x 1
O x 2

Did you accidentally put zeroes into some of those formula where the letter "O" (oxygen) was supposed to be? If so:
"C10H12N2O3".split("(?<=[0-9A-Za-z])(?=[A-Z])");
[C10, H12, N2, O3]
"CH2BrCl".split("(?<=[0-9A-Za-z])(?=[A-Z])");
[C, H2, Br, Cl]

I believe the following code should allow you to extract the various elements and their associated count. Of course, brackets make things more complicated, but you didn't ask about them!
Pattern pattern = Pattern.compile("([A-Z][a-z]*)([0-9]*)");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
String element = matcher.group(1);
int count = 1;
if (matcher.groupCount > 1) {
try {
count = Integer.parseInt(matcher.group(2));
} catch (NumberFormatException e) {
// Regex means we should never get here!
}
}
// Do stuff with this component
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Getting match of Group with Asterisk? - java

Related

What RegEx separates terms of Polynomial

Add all the numbers which have + symbol and replace the same with the added value

How to skip a portion of a line in Java- Don't know what I am doing

add +n to the name of some Strings id ends with "v + number"

Splitting up input using regular expressions in Java

Categories

Resources