Is regex from Java buggy or am i missing something? - java

With this regex :
private static String p = "^\\(([-+]?([1-8]?\\d(\\.\\d+)?|90(\\.0+)?))\\,([-+]?(180(\\.0+)?|((1[0-7]\\d)|([1-9]?\\d))(\\.\\d+)?))\\)$";//"^(\\-?\d+(\.\d+)?),\s*(\\-?\d+(\\.\d+)?)$";
It is impossible for me to get the values and i don't understand why...
With an input like that :
(50,180) //or even
(-50,-180)
Why my regex doesn't get me the number 180 and can get the value 50??
I mean, my Pattern object can get always the first value after parenthesis and before "," but can't get the value after ",".
What's the problem with my regex ?
My code:
private static String patternGeographicCoordinates = "^\\(([-+]?([1-8]?\\d(\\.\\d+)?|90(\\.0+)?))\\,([-+]?(180(\\.0+)?|((1[0-7]\\d)|([1-9]?\\d))(\\.\\d+)?))\\)$";
....
Pattern geographicCoordinates = Pattern.compile(patternGeographicCoordinates);
try(BufferedReader br = new BufferedReader(new FileReader(file))) {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
....
Matcher m1 = geographicCoordinates.matcher(line); //line is a line from a file (String)
....
if(m1.matches()){
System.out.println("IT DID WORK, LINE: "+line+", M.GROUP: "+m1.group(3));
sb.append(line);
sb.append(System.lineSeparator());
}

Why don't you just remove the parenthesis and split around the comma?
import org.apache.commons.lang3.StringUtils;
...
theString = StringUtils.strip(theString,"()"));
String[] tokens = theString.split(",");
Double number2 = Double.parse(tokens[1]);

If you want to use regex anyway, you can do it like:
Pattern p = Pattern.compile("\\(([-]?\\d+)\\s*\\,\\s*([-]?\\d+)\\)$");
String input = "(-50,-80)";
Matcher m = p.matcher(input);
if(m.find())
{
System.out.println(m.group(1));
System.out.println(m.group(2));
}
See demo here

You're looking at wrong group indices. Check your regexp with this parser: https://regex101.com/
Here are the matching groups for the input (50,180):
1. [1-3] `50`
2. [1-3] `50`
5. [4-7] `180`
6. [4-7] `180`
Update
The regexp is made for more complex inputs than you supply in your example, that's why there are groups with null values. The additional groups are for decimal parts and special cases (apparently meaningful for coordinate parsing).
Look at the input (90.00,180.00). It's parsed into the following groups:
1. [1-6] `90.00`
2. [1-6] `90.00`
4. [3-6] `.00`
5. [7-13] `180.00`
6. [7-13] `180.00`
7. [10-13] `.00`
Now group 4 is matching (\.0+)? and group 7 is matching (\.\d+). You see that |90is an alternative, a special case of 90.00 degrees presumably. That's why group 3 is still empty but 4 is filled.
With input (85.21,150.34) you will get more groups filled:
1. [1-6] `85.21`
2. [1-6] `85.21`
3. [3-6] `.21`
5. [7-13] `150.34`
6. [7-13] `150.34`
8. [7-10] `150`
9. [7-10] `150`
11. [10-13] `.34`
Now group 3 is filled, but not the group 4, because it's [1-8]?\d case.
Also, since you have nested groups, same values are assigned twice: to 1 and 2 for instance.

Related

Regex capturing groups within logical OR

I have a set of strings I need to parse and extract values from. They look like:
/apple/1212d3fe
/cat/23224a2f4
/auto/445478eefd
/somethingelse/1234fded
It should match only apple, cat and auto. The output I expect is:
1212, d3fe
23224, a2f4
445478, eefd
null
I need to come up with a regex capturing groups to do the same. I am able to extract the second part but not the first one. The closest I came up with is:
String r2 = "^/(apple/[0-9]{4}|cat/[0-9]{5}|auto/[0-9]{6})([a-f0-9]{4})$";
System.out.println(r2);
Pattern pattern2 = Pattern.compile(r2);
Matcher matcher2 = pattern2.matcher("/apple/2323efff");
if (matcher2.find()) {
System.out.println(matcher2.group(1));
System.out.println(matcher2.group(2));
}
UPDATED QUESTION:
I have a set of strings I need to parse and extract values from. They look like:
/apple/1212d3fe
/cat/23e24a2f4
/auto/df5478eefd
/somethingelse/1234fded
It should match only apple, cat and auto. The output I expect is the everything after the 2nd '/' split as follows: 4 characters if 'apple', 5 characters if 'cat' and 6 characters if 'auto' like:
1212, d3fe
23e24, a2f4
df5478, eefd
null
I need to come up with a regex capturing groups to do the same. I am able to extract the second part but not the first one. The closest I came up with is:
String r2 = "^/(apple/[0-9]{4}|cat/[0-9]{5}|auto/[0-9]{6})([a-f0-9]{4})$";
System.out.println(r2);
Pattern pattern2 = Pattern.compile(r2);
Matcher matcher2 = pattern2.matcher("/apple/2323efff");
if (matcher2.find()) {
System.out.println(matcher2.group(1));
System.out.println(matcher2.group(2));
}
I can do it without the regex OR(|) but it breaks when I include it. Any help with the right regex?
Updated Answer:
As per your updated question you can use this regex based on lookbehind assertions:
/((?<=apple/).{4}|(?<=cat/).{5}|(?<=auto/).{6})(.+)$
RegEx Demo
This regex uses 2 capture groups after matching /
In 1st group we have 3 lookbehind conditions with alternations.
(?<=apple/).{4} makes sure that we match 4 characters that have apple/ on left hand side. Likewise we match 5 and 6 character strings that have cat/ and /auto/.
In 2nd capture group we match remaining characters before end of line.
You could use the regex \/[apple|auto|cat]+\/(\d*)(.*), See here
If you want the last group to have exactly 4 digits you can use this regex:
/(apple|cat|auto)/([0-9a-f]+)([0-9a-f]{4})
Here is a working example:
List<String> strings = Arrays.asList("/apple/1212d3fe", "/cat/23224a2f4", "/auto/445478eefd");
Pattern pattern = Pattern.compile("/(apple|cat|auto)/([0-9a-f]+)([0-9a-f]{4})");
for (String string : strings) {
Matcher matcher = pattern.matcher(string);
if (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
}
}
If you want for digits after apple, 5 after cat and 6 after auto you can split your algorithm in 2 parts:
List<String> strings = Arrays.asList("/apple/1212d3fe", "/cat/23224a2f4", "/auto/445478eefd", "/some/445478eefd");
Pattern firstPattern = Pattern.compile("/(apple|cat|auto)/([0-9a-f]+)");
for (String string : strings) {
Matcher firstMatcher = firstPattern.matcher(string);
if (firstMatcher.find()) {
String first = firstMatcher.group(1);
System.out.println(first);
int length = getLength(first);
Pattern secondPattern = Pattern.compile("([0-9a-f]{" + length + "})([0-9a-f]{4})");
Matcher secondMatcher = secondPattern.matcher(string);
if (secondMatcher.find()) {
System.out.println(secondMatcher.group(1));
System.out.println(secondMatcher.group(2));
}
}
}
private static int getLength(String key) {
switch (key) {
case "apple":
return 4;
case "cat":
return 5;
case "auto":
return 6;
}
throw new IllegalArgumentException("key not allowed");
}

Split String at different lengths in Java

I want to split a string after a certain length.
Let's say we have a string of "message"
123456789
Split like this :
"12" "34" "567" "89"
I thought of splitting them into 2 first using
"(?<=\\G.{2})"
Regexp and then join the last two and again split into 3 but is there any way to do it on a single go using RegExp. Please help me out
Use ^(.{2})(.{2})(.{3})(.{2}).* (See it in action in regex101) to group the String to the specified length and grab the groups as separate Strings
String input = "123456789";
List<String> output = new ArrayList<>();
Pattern pattern = Pattern.compile("^(.{2})(.{2})(.{3})(.{2}).*");
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
output.add(matcher.group(i));
}
}
System.out.println(output);
NOTE: Group capturing starts from 1 as the group 0 matches the whole String
And a Magnificent Sorcery from #YCF_L from comment
String pattern = "^(.{2})(.{2})(.{3})(.{2}).*";
String[] vals = "123456789".replaceAll(pattern, "$1-$2-$3-$4").split("-");
Whats the magic here is you can replace the captured group by replaceAll() method. Use $n (where n is a digit) to refer to captured subsequences. See this stackoverflow question for better explanation.
NOTE: here its assumed that no input string contains - in it.
if so, then find any other character that will not be in any of
your input strings so that it can be used as a delimiter.
test this regex in regex101 with 123456789 test string.
^(\d{2})(\d{2})(\d{3})(\d{2})$
output :
Match 1
Full match 0-9 `123456789`
Group 1. 0-2 `12`
Group 2. 2-4 `34`
Group 3. 4-7 `567`
Group 4. 7-9 `89`

RegEx to extract command line arguments - Java

I know there is a lot out there.. But I couldn't figure out how to extract two parameters from Scanner(System.in);
commandline = scanner.readLine();
Two parameters are allowed:
First one can be one of AHG or the digits between 4 to 9.
The second parameters again between 4 to 9 OR any number.
It should handle all the scenarios:
" A 3 " //spaces before and after
"A 3" // spaces between the params
"A 7 6" // Unwanted 3rd parameter
" 6 " // Only one param with spaces.
So how to write Regex for this to extract the above?
I tried this one. \\w\\s. But this did not work. I am poor with RegEx.
Use this on the string returned by readLine():
String [] arguments = commandLine.split( "\\s+" );
The \\s+ stands for at least one whitespace character as separator.
Then check how many elements the array has.
Fimally check the formats of the two arguments
arguments[0].matches("\\s*[AHG4-9]");
arguments[1].matches("\\d");
Try:
public static ArrayList<String> parseArguments(String argument){
Pattern regex = Pattern.compile("^\\s*([AHG4-9])\\s*(\\d)?\\s*$",
Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
Matcher regexMatcher = regex.matcher(argument);
if (regexMatcher.find()) {
ArrayList<String> arguments = new ArrayList<String>();
arguments.add(regexMatcher.group(1));
if(regexMatcher.group(2) != null)
{
arguments.add(regexMatcher.group(2));
}
return arguments;
}
return null;
}
Depending on your input:
It will print:
[A,3]
Above regex also enforce argument rules. e.g as you mention first parameter can be A,H,G or number between 4 and 9. 2nd argument any number and can be optional

Regex - Match numbers & special cases

I'm trying to make a regex that would produce the following results :
for 7.0 + 5 - :asc + (8.256 - :b)^2 + :d/3 : 7.0, 5, :asc, 8.256, :b, 2, :d, 3
for -+*-/^^ )รง# : nothing
It's should first match numbers which can be float, so in my regex I have : [0-9]+(\\.[0-9])? but it should also mach special cases like :a or :Abc.
To be more precise, it should (if possible) match anything but mathematical operators /*+^- and parentheses.
So here is my final regex : ([0-9]+(\\.[0-9])?)|(:[a-zA-Z]+) but it's not working because matcher.groupCount() returns 3 for both of the examples I gave.
Groups are what you specifically group in the regex. Anything surrounded in parentheses is a group. (Hello) World has 1 group, Hello. What you need to be doing is finding all the matches.
In your code ([0-9]+(\\.[0-9])?)|(:[a-zA-Z]+), 3 sets of parentheses can be seen. This is why you will always be given 3 groups in every match.
Your code works fine as it is, here is an example:
String text = "7.0 + 5 - :asc + (8.256 - :b)^2 + :d/3";
Pattern p = Pattern.compile("([0-9]+(\\.[0-9]+)?)|(:[a-zA-Z]+)");
Matcher m = p.matcher(text);
List<String> matches = new ArrayList<String>();
while (m.find()) matches.add(m.group());
for (String match : matches) System.out.println(match);
The ArrayList matches will contain all of the matches that your regex finds.
The only change I made was add a + after the second [0-9].
Here is the output:
7.0
5
:asc
8.256
:b
2
:d
3
Here is some more information about groups in java.
Does that help?
Your regex is correct, run the following code:
String input = "7.0 + 5 - :asc + (8.256 - :b)^2 + :d/3"; // your input
String regex = "(\\d+(\\.\\d+)?)|(:[a-z-A-Z]+)"; // exactly yours.
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
Your problem is the understanding of the method matcher.groupCount(). JavaDoc clearly says
Returns the number of capturing groups in this matcher's pattern.
([^\()+\-*\s])+ //put any mathematical operator inside square bracket

Parsing a value using regular expression in java

I am trying to read a line and parse a value using regular expression in java. The line that contains the value looks something like this,
...... TESTYY912345 .......
...... TESTXX967890 ........
Basically, it contains 4 letters, then any two ASCII values followed by numeric 9 then (any) digits. And, i want to get the value, 912345 and 967890.
This is what I have so far in regular expression,
... TEST[\x00-\xff]{2}[9]{1} ...
But, this skips the 9 and parse 12345 and 67890. (I want to include 9 as well).
Thanks for your help.
You are pretty close. Capture the entire group (9\\d*) after matching TEST\\p{ASCII}{2}. This way, you'll capture the 9 and the following digits:
String s = "...... TESTYY912345 ......";
Pattern p = Pattern.compile("TEST\\p{ASCII}{2}(9\\d+)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(1)); // 912345
}
See my comment for a working expression, "TEST.{2}(9\\d*)".
final Pattern pattern = Pattern.compile("TEST.{2}(9\\d*)");
for (final String str : Arrays.asList("...... TESTYY912345 .......",
"...... TESTXX967890 ........")) {
final Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
final int value = Integer.valueOf(matcher.group(1));
System.out.println(value);
}
}
See the result on ideone:
912345
967890
This will match any two characters (except a line terminator) for what is XX and YY in your example, and will take any digits after the 9.

Categories

Resources