Using regex to find substring in Java - java

I am trying to pull two strings (which represent integers i where -999999 <= i <= 999999) out of the string 'left.' There will always be exactly two strings representing two integers. Also I want the regex to match {"-1", "2"} for "-1-2", not {"-1", "-2"}. I've been going through the tutorials on http://www.regular-expressions.info and the stackoverflow regex page for going on four hours now. I am testing my expressions in a Java program. Here's what I've got
String left = "-123--4567";
Pattern pattern = Pattern.compile("-?[0-9]{1,6}");
Matcher matcher = pattern.matcher(left);
arg1 = matcher.group(1);
arg2 = matcher.group(2);
System.out.println("arg1: " + arg1 + " arg2: " + arg2);
This code should produce
arg1: -123 arg2: -4567

Here's a self-contained example of what you're probably trying to do:
String[] examples = {
"-123--4567",
"123-4567",
"-123-4567",
"123--4567"
};
// ┌ group 1:
// |┌ zero or one "-"
// || ┌ any number of digits (at least one)
// || | ┌ zero or one "-" as separator
// || | | ┌ group 2
// || | | |┌ zero or one "-"
// || | | || ┌ any number of digits (at least one)
Pattern p = Pattern.compile("(-?\\d+)-?(-?\\d+)");
// iterating over examples
for (String s: examples) {
// matching
Matcher m = p.matcher(s);
// iterating over matches (only 1 per example here)
while (m.find()) {
// printing out group1 --> group 2 back references
System.out.printf("%s --> %s%n", m.group(1), m.group(2));
}
}
Output
-123 --> -4567
123 --> 4567
-123 --> 4567
123 --> -4567

You can use this regex:
(-?[0-9]{1,6})-?
And grab capture group #1
RegEx Demo

Related

How can I get values of the same group several times in Matcher regex Java?

After multiple searches, I come to you.
I made a regex in java :
^([a-zA-Z][a-zA-Z0-9]{0,10})( \(([0-9]),([a-zA-Z][a-zA-Z0-9]{0,10})\))+$
When I try this : "S1 (4,S5)"
It returns :
Group 1 -> "S1"
Group 2 -> " (4,S5)"
Group 3 -> "(4,S5)"
it works well.
But when I try this : "S1 (4,S5) (2,S3)"
it returns :
Group 1 -> "S1"
Group 2 -> " (2,S3)"
Group 3 -> "(2,S3)"
It doesn't want to return the (4,S5)
How can I get the same group several times ?
Thanks for your help !
You could make use of the \G anchor to get contiguous matches with 2 capture groups instead of multiple capture groups.
(?:^([a-zA-Z][a-zA-Z0-9]{0,10})|\G(?!^))\h*(\([0-9]+,[a-zA-Z][a-zA-Z0-9]{0,10}\))
The pattern matches:
(?: Non capture group (for the alternation)
^ Start of string
( Capture group 1
[a-zA-Z][a-zA-Z0-9]{0,10} Match a single char a-zA-Z followed by 0-10 times
) Close group 1
| Or
\G(?!^) Assert the position at the end of the previous match, or at the start of the string. As we specify that we have a specific match at the start of the string in the first part of the alternation, we can rule out that position using a negative lookahead using (?!^)
) Close non capture group
\h* Match optional horizontal whitespace characters
( Capture group 2
\( Match (
[0-9]+, Match 1+ digits and a comma
[a-zA-Z][a-zA-Z0-9]{0,10} Match a single char a-zA-Z followed by 0-10 times
\)
) Close group 2
Regex demo | Java demo
String regex = "(?:^([a-zA-Z][a-zA-Z0-9]{0,10})|\\G(?!^))\\h*(\\([0-9]+,[a-zA-Z][a-zA-Z0-9]{0,10}\\))";
String string = "S1 (4,S5)\n"
+ "S2 (4,S5) (2,S3)";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
if (matcher.group(i) != null) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
Output
Group 1: S1
Group 2: (4,S5)
Group 1: S2
Group 2: (4,S5)
Group 2: (2,S3)

add +n to the name of some Strings id ends with "v + number"

I have an array of Strings
Value[0] = "Documento v1.docx";
Value[1] = "Some_things.pdf";
Value[2] = "Cosasv12.doc";
Value[3] = "Document16.docx";
Value[4] = "Nodoc";
I want to change the name of the document and add +1 to the version of every document. But only the Strings of documents that ends with v{number} (v1, v12, etc).
I used the regex [v]+a*^ but only i obtain the "v" and not the number after the "v"
If all your strings ending with v + digits + extension are to be processed, use a pattern like v(\\d+)(?=\\.[^.]+$) and then manipulate the value of Group 1 inside the Matcher#appendReplacement method:
String[] strs = { "Documento v1.docx", "Some_things.pdf", "Cosasv12.doc", "Document16.docx", "Nodoc"};
Pattern pat = Pattern.compile("v(\\d+)(?=\\.[^.]+$)");
for (String s: strs) {
StringBuffer result = new StringBuffer();
Matcher m = pat.matcher(s);
while (m.find()) {
int n = 1 + Integer.parseInt(m.group(1));
m.appendReplacement(result, "v" + n);
}
m.appendTail(result);
System.out.println(result.toString());
}
See the Java demo
Output:
Documento v2.docx
Some_things.pdf
Cosasv13.doc
Document16.docx
Nodoc
Pattern details
v - a v
(\d+) - Group 1 value: one or more digits
(?=\.[^.]+$) - that are followed with a literal . and then 1+ chars other than . up to the end of the string.
The Regex v\d+ should match on the letter v, followed by a number (please note that you may need to write it as v\\d+ when assigning it to a String). Further enhancement of the Regex depends in what your code looks like. You may want to to wrap in a Capturing Group like (v\d+), or even (v(\d+)).
The first reference a quick search turns up is
https://docs.oracle.com/javase/tutorial/essential/regex/ ,
which should be a good starting point.
Try a regex like this:
([v])([1-9]{1,3})(\.)
notice that I've already included the point in order to have less "collisions" and a maximum of 999 versions({1,3}).
Further more I've used 3 different groups so that you can easily retrieve the version number increase it and replace the string.
Example:
String regex = ;
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(time);
if(matcher.matches()){
int version = matcher.group(2); // don't remember if is 0 or 1 based
}

Java Regex: Grouping consecutive 1 or 0 in a binary string

I want to capture all the consecutive groups in a binary string
1000011100001100111100001
should give me
1
0000
111
0000
11
00
1111
0000
1
I have made ([1?|0?]+) regex in my java application to group the consequential 1 or 0 in the string like 10000111000011.
But when I run it in my code, there is nothing in the console printed:
String name ="10000111000011";
regex("(\\[1?|0?]+)" ,name);
public static void regex(String regex, String searchedString) {
Pattern pattern = Pattern.compile(regex);
Matcher regexMatcher = pattern.matcher(searchedString);
while (regexMatcher.find())
if (regexMatcher.group().length() > 0)
System.out.println(regexMatcher.group());
}
To avoid syntax error in the runtime of regex, I have changed the ([1?|0?]+) to the (\\[1?|0?]+)
Why there is no group based on regex?
First - just as an explanation - your regex defines a character class ([ ... ]) that matches any of the characters 1, ?, | or 0 one or more times (+). I think you mean to have ( ... ) in it, among other things, which would make the | an alternation lazy matching a 0 or a 1. But that's not either what you want (I think ;).
Now, the solution might be this:
([01])\1*
which matches a 0 or a 1, and captures it. Then it matches any number of the same digit (\1 is a back reference to what ever is captured in the first capture group - in this case the 0 or the 1) any number of times.
Check it out at ideone.
You can try this:
(1+|0+)
Explanation
Sample Code:
final String regex = "(1+|0+)";
final String string = "10000111000011\n"
+ "11001111110011";
final Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Group " + 1 + ": " + matcher.group(1));
}

Regex to find Integers in particular string lines

I have this regex to find integers in a string (newlines). However, I want to filtrate this. I want the regex to find the number in certain lines, and not others.
String:
String test= "ytrt.ytrwyt.ytreytre.test1,0,2,0"
+"sfgtr.ytyer.qdfre.uyeyrt.test2,0,8,0"
+"sfgtr.ytyer.qdfre.uyeyrt.test3,0,3,0";
pattern = "(?<=,)\\d+";
pr = Pattern.compile(pattern);
match = pr.matcher(test);
System.out.println();
if (match.find()) {
System.out.println("Found: " + match.group());
}
This regex find the integers after the comma, for all the lines. If I want a particular regex to find the integers in the line containing "test1", "test2", and "test3". How should I do this? I want to create three different regex, but my regex skills are weak.
First regex should print out 2. The second 8 and the third 3.
You can expand your pattern to include test[123] in the lookbehind, which would match test1, test2, or test3:
String pattern = "(?<=test[123][^,]{0,100},[^,]{1,100},)\\d+";
Pattern pr = Pattern.compile(pattern);
Matcher match = pr.matcher(test);
System.out.println();
while (match.find()) {
System.out.println("Found: " + match.group());
}
The ,[^,] portion skis everything between two commas that follow testN.
I use {0,100} in place of * and {1,100} in place of + inside lookbehind expressions, because Java regex engine requires that lookbehinds had a pre-defined limit on their length. If you need to allow skipping more than 100 characters, adjust the maximum length accordingly.
Demo.
You can use the following Pattern and loop for this:
String test= "ytrt.ytrwyt.ytreytre.test1,0,2,0"
+ System.getProperty("line.separator")
+"sfgtr.ytyer.qdfre.uyeyrt.test2,0,8,0"
+ System.getProperty("line.separator")
+"sfgtr.ytyer.qdfre.uyeyrt.test3,0,3,0";
// | "test" literal
// | | any number of digits
// | | | comma
// | | | any number of digits
// | | | | comma
// | | | | | group1, your digits
Pattern p = Pattern.compile("test\\d+,\\d+,(\\d+)");
Matcher m = p.matcher(test);
while (m.find()) {
// prints back-reference to group 1
System.out.printf("Found: %s%n", m.group(1));
}
Output
Found: 2
Found: 8
Found: 3
You could also use capturing groups to extract the test number and the other number from the string:
String pattern = "test([123]),\\d+,(\\d+),";
...
while (match.find()) {
// get and parse the number after "test" (first capturing group)
int testNo = Integer.parseInt(match.group(1));
// get and parse the number you wanted to extract (second capturing group)
int num = Integer.parseInt(match.group(2));
System.out.println("test"+testNo+": " + num);
}
Which prints
test1: 2
test2: 8
test3: 3
Note: In this example parsing the strings is only done for demonstration purposes, but it could be useful, if you want to do something with the numbers, like storing them in a array.
Update: If you also want to match strings like "ytrt.ytrwyt.test1.ytrwyt,0,2,0" you could change pattern to "test([123])\\D*,\\d+,(\\d+)," to allow any number of non-digits to follow test1, test2 or test3 (preceding the comma seperated ints).

How to split bas on cmbination of strings and white spaces and symbols?

I work to decompose a conditional string as shown below, by getting rid of the meaningless parts and splitting it to array contains the useful parts only, as:
String s1="01:IF rd.h && dq.L && o.LL && v.L THEN la.VHB , av.VHR with 0.4610;";
System.out.println(s1);
String [] s2 = s1.split("([\\d]{2,3}?(:IF))?[\\s,&]+(with)?");
for(int i=0;i<s2.length;i++)System.out.println(s2[i]);
The "01:IF", "with", "&" and any white spaces are separators and required to be eliminated. The execution result is:
01:IF rd.h && dq.L && o.LL && v.L THEN la.VHB , av.VHR with 0.4610;
<--- un wonted space
rd.h
dq.L
o.LL
v.L
THEN
la.VHB
av.VHR
<--- un wonted space
0.4610;
The space appears as first and ninth element in the split string. How can I get rid of these extra spaces? Also, I need more good examples on how to utilize the different options, mentioned in split.regex, and how to combine them in one regex. Most Most of the answers in the Stack Overflow are based on one separator, no a complex combinations exist with illustrations.
Thanks.
You could achieve the same using Pattern and Matcher classes.
String s1="01:IF rd.h && dq.L && o.LL && v.L THEN la.VHB , av.VHR with 0.4610;";
Matcher m = Pattern.compile("(?:\\d{2,3}?(?::IF))?[\\s,&]+(?:with)?|(\\S+)").matcher(s1);
while(m.find())
{
if(m.group(1) != null)
System.out.println(m.group(1));
}
I just turned all the capturing groups present in your regex to non-capturing groups and added an extra |(\\S+) at the last, which means do matching only on the remaining string (except the matched characters). (\\S+) captures one or more non-space characters.
Output:
rd.h
dq.L
o.LL
v.L
THEN
la.VHB
av.VHR
0.4610;
DEMO
I would use a different strategy instead of splitting and sanitizing.
Assuming the entities listed in your desired output represent all the patterns you are willing to keep:
String test = "01:IF rd.h && dq.L && o.LL && v.L THEN la.VHB , av.VHR with 0.4610;";
// | positive look behind for start of input or whitespace
// | | "rd.h" etc.
// | | | OR
// | | | | "0.4610;" etc.
// | | | | | Positive lookahead for end of input or whitespace
Pattern p = Pattern.compile("(?<=^|\\s)(\\p{Alpha}+\\.\\p{Alpha}+|\\d+\\.\\d+;)(?=\\s|$)");
Matcher m = p.matcher(test);
StringBuilder result = new StringBuilder();
while (m.find()) {
result.append(m.group()).append(System.getProperty("line.separator"));
}
System.out.println(result);
Output
rd.h
dq.L
o.LL
v.L
la.VHB
av.VHR
0.4610;
Explanation
The Pattern here looks for positive matches rather than sanitizing and splitting.
It includes the pattern for "rd.h" etc., or the pattern for "0.4160" etc., generalized as possible.
It then iterates over the matches and builds a String with the desired output, one match per line.

Categories

Resources