Parsing array syntax using regex - java

I think what I am asking is either very trivial or already asked, but I have had a hard time finding answers.
We need to capture the inner number characters between brackets within a given string.
so given the string
StringWithMultiArrayAccess[0][9][4][45][1]
and the regex
^\w*?(\[(\d+)\])+?
I would expect 6 capture groups and access to the inner data.
However, I end up only capturing the last "1" character in capture group 2.
If it is important heres my java junit test:
#Test
public void ensureThatJsonHandlerCanHandleNestedArrays(){
String stringWithArr = "StringWithMultiArray[0][0][4][45][1]";
Pattern pattern = Pattern.compile("^\\w*?(\\[(\\d+)\\])+?");
Matcher matcher = pattern.matcher(stringWithArr);
matcher.find();
assertTrue(matcher.matches()); //passes
System.out.println(matcher.group(2)); //prints 1 (matched from last array symbols)
assertEquals("0", matcher.group(2)); //expected but its 1 not zero
assertEquals("45", matcher.group(5)); //only 2 capture groups exist, the whole string and the 1 from the last array brackets
}

In order to capture each number, you need to change your regex so it (a) captures a single number and (b) is not anchored to--and therefore limited by--any other part of the string ("^\w*?" anchors it to the start of the string). Then you can loop through them:
Matcher mtchr = Pattern.compile("\\[(\\d+)\\]").matcher(arrayAsStr);
while(mtchr.find()) {
System.out.print(mtchr.group(1) + " ");
}
Output:
0 9 4 45 1

Related

regex for not matching alpha plus numeric range

I have the following regex
.{19}_.{3}PDR_.{8}(ABCD|CTNE|PFRE)006[0-9][0-9].{3}_.{6}\.POC
a match is for example
NRM_0157F0680884976_598PDR_T0060000ABCD00619_00_6I1N0T.POC
and would like to negate the (ABCD|CTNE|PFRE)006[0-9][0-9]
portion such that
NRM_0157F0680884976_598PDR_T0060000ABCD00719_00_6I1N0T.POC
is a match but
NRM_0157F0680884976_598PDR_T0060000ABCD007192_00_6I1N0T.POC
or
NRM_0157F0680884976_598PDR_T0060000ABCD0061_00_6I1N0T.POC
is not (the negated part must be 9 chars long just like the non negated part for a total length of 58 chars).
Consider using the following pattern:
\b(?:ABCD|CTNE|PFRE)006[0-9][0-9]\b
Sample Java code:
String input = "Matching value is ABCD00601 but EFG123 is non matching";
Pattern r = Pattern.compile("\\b(?:ABCD|CTNE|PFRE)006[0-9][0-9]\\b");
Matcher m = r.matcher(input);
while (m.find()) {
System.out.println("Found a match: " + m.group());
}
This prints:
Found a match: ABCD00601
I would like to propose this expression
(ABCD|CTNE|PFRE)006\d{1,2}
where \d{1,2} catches any one or two digit number
that is it would get any alphanumeric values from ABCD0060~ABCD00699 or CTNE0060~CTNE00699 or PFRE0060~PFRE00699
Edit #1:
as user #Hao Wu mentioned the above regex would also accept if its ABCD0060 which is not ideal so
this should do the job by removing 1 from the { } we can get
alphanumeric values from ABCD00600~ABCD00699 or CTNE00600~CTNE00699 or PFRE00600~PFRE00699
so the resulting regex would be
(ABCD|CTNE|PFRE)006\d{2}

Partially mask data of a group of number using regex

I would like to partially mask data using regex. Here is the input :
123-12345-1234567
And here is what I'd like as output :
1**-*****-*****67
I figure out how to replace for the last group but I don't know to do for the rest of the data.
String s = "123-12345-1234567";
System.out.println(s.replaceAll("\\d(?=\\d{2})", "*")); // output is *23-***45-*****67
Also, I'd like to use only regex because I have different type of data, so different type of mask. I don't want to create functions for each type of data.
For example :
AAAAAAAAA // becomes ********AA
12334567 // becomes 123******
Thanks for your help !
We can use the following regex replacement approach:
String input = "123-12345-1234567";
String output = input.substring(0, 1) +
input.substring(1, input.length()-2).replaceAll("\\d", "*") +
input.substring(input.length()-2);
System.out.println(output); // 1**-*****-*****67
Here we concatenate together the first digit, followed by the middle portion with all digits replaced by *, along with the final two digits.
Edit: A pure regex solution, which, however, is more lines of code than the above and might be less performant.
String input = "123-12345-1234567";
String pattern = "^(\\d)(.*)(\\d{2})$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
if (m.find()) {
String output = m.group(1) + m.group(2).replaceAll("\\d", "*") + m.group(3);
System.out.println(output); // 1**-*****-*****67
}
Java supports a fixed quantifier in a lookbehind, so what you might do is use a pattern with an alternation to account for the different scenario's if you must use a regex only.
Using the lookarounds you can select a single character to be replaced by *
Note that this is hard to maintain, and it would be a better option to write separate functions for the different data formats using separate patterns or string functions (perhaps accompanied by unit tests)
(?<=^\d{3,7})\d(?=\d*$)|(?<=^[A-Z]{0,6})[A-Z](?=[A-Z]*$)|\d(?<=^\d{2,3})(?=\d?-\d{5}-\d{7}$)|\d(?<=^\d{3}-\d{1,5}(?:-\d{1,5})?)
The separate parts match:
(?<=^\d{3,7})\d(?=\d*$) Match a digit asserting 3-7 digits to the left and only digits to the right
| Or
(?<=^[A-Z]{0,6})[A-Z](?=[A-Z]*$) Match A-Z asserting 0-6 chars to the left and only chars A-Z to the right
| Or
\d(?<=^\d{2,3})(?=\d?-\d{5}-\d{7}$) Match a digit asserting 2-3 digits to the left and optional digit, - with 5 digits and - with 7 digits to the right
| Or
\d(?<=^\d{3}-\d{1,5}(?:-\d{1,5})?) Match a digit asserting 3 digits to the left followed - and 1-5 digits and optionally - with 1-5 digits
Regex demo | Java demo
String regex = "(?<=^\\d{3,7})\\d(?=\\d*$)|(?<=^[A-Z]{0,6})[A-Z](?=[A-Z]*$)|\\d(?<=^\\d{2,3})(?=\\d?-\\d{5}-\\d{7}$)|\\d(?<=^\\d{3}-\\d{1,5}(?:-\\d{1,5})?)";
String s1 = "123-12345-1234567";
String s2 = "AAAAAAAAA";
String s3 = "12334567";
System.out.println(s1.replaceAll(regex, "*"));
System.out.println(s2.replaceAll(regex, "*"));
System.out.println(s3.replaceAll(regex, "*"));
Output
1**-*****-*****67
*******AA
123*****
public static void main(String[] args) {
System.out.println("123-12345-1234567".replaceAll("(?<=.{1,})\\d(?=.{3,})", "*"));
System.out.println("AAAAAAAAA".replaceAll(".(?=.{2,})", "*"));
System.out.println("12334567".replaceAll("(?<=.{3,}).", "*"));
}
output:
1**-*****-*****67
*******AA
123*****

I want to match exactly 2 times with regex

I have something like this string.
XXXX^^^141409i1^^^XXXX.
I want to match those 3 ^ in a group and the group exactly 2 times. I wrote this but it doesn't seem to work.
(?:(\^){3}){2}
EDIT
I have to split it and extract the number in the middle. The point is that that group should consist of exactly 3 ^ and exactly 2 times. If the first group has only 1 or 2 ^ it will stop matching. That string is user input and if he inputs more than that string, for example XXXX^^^141409i1^^^XXXX^^^^XXXX then it shouldn't match the last group, only the first 2. (Sorry if I'm too ambiguous.)
EDIT2
The point of the exercise is to split the string and get the number in the middle, I wrote this line but the problem is that it matches every ^^^ and i only want to match 2 times exactly.
String[] split = s.split("(\\^){3}");
If I correctly understood what you want, I hope this will help you:
String input = "XXXX^^^141409i1^^^XXXX^^^^XXXX";
Pattern pattern = Pattern.compile(".*?\\^{3}(\\w+)\\^{3}");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
System.out.println("The number in the middle: " + matcher.group(1));
}
Output:
The number in the middle: 141409i1
Here you can see how it works: https://regexr.com/51r9e

Regular Expression in Java. Unexpected behaviour

I am trying to match mostly numbers, but depending on the Words which follow the Expression I need to make a difference.
I match every Number which is not followed by a Temperature Term like °C or a Time Specification.
My Regular Expression looks like this:
(((\d+?)(\s*)(\-)(\s*))?(\d+)(\s*))++(?!minuten|Minuten|min|Min|Stunden|stunden|std|Std|°C| °C)
Here is an Example: http://regexr.com?33jeg
While this Behavior is what I expected Java does the Following:
Index is the corresponding Group to the Match 4
0: "4 "1: "4 "2: "0 - "3: "0"4: " "5: "-"6: " "7: "4"8: " "9: "°C"
You need to Know that I match every String separate. So the match for the 5 looks like this:
0: "5 "1: "5 "2: "null"3: "null"4: "null"5: "null"6: "null"7: "5"8: " "9: "null"
This is how Id like the other Match to be. This unpleasant behavior is only when a "-" is somewhere in the String before the Match
My Java Code is the following:
public static void adaptPortionDetails(EList<Step> steps, double multiplicator){
String portionMatcher = "(((\\d+?)(\\s*)(\\-)(\\s*))?(\\d+)(\\s*))++(?!°C|Grad|minuten|Minuten|min|Min|Stunden|stunden|std|Std)";
for (int i = 0; i < steps.size(); i++) {
Matcher matcher = Pattern.compile(portionMatcher).matcher(
steps.get(i).getDescription());
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
printGroups(matcher);
String newValue1Str;
if (matcher.group(3) == null){
newValue1Str = "";
System.out.println("test");
}else{
double newValue1 = Integer.parseInt(matcher.group(3)) * multiplicator;
newValue1Str = Fraction.getFraction(newValue1).toProperString();
}
double newValue2 = Integer.parseInt(matcher.group(7)) * multiplicator;
String newValue2Str = Fraction.getFraction(newValue2).toProperString();
matcher.appendReplacement(sb, newValue1Str + "$4$5$6" + newValue2Str + "$8");
}
matcher.appendTail(sb);
steps.get(i).setDescription(sb.toString());
}
}
Hope you can tell what I'm missing.
This seems to be a bug (or feature?) in Java's implementation. It doesn't seem to reset the captured text for the capturing group when the matching has to be redone from the next index.
This test reveals the discrepancy in behavior between Java regex engine and PHP's PCRE.
Regex: (\d+(-\d+)?){1}+(?!x)
Input: 34 34-43x 78 90
Java result: 3 matches (34, 78, 90). The 2nd capturing group of the 2nd match is -43. The 2nd capturing group captures nothing for 1st and 3rd match.
PHP result: Also the same 3 matches, but 2nd capturing group captures nothing for all matches. For PHP's PCRE implementation, when the match has to be redone, the captured text of the capturing groups are reset.
This is tested this on JRE 6 Update 37 and JRE 7 Update 11.
Same result for this, just to prove the point that captured text is not reset when matching has to be redone:
Regex: a(\d+(-\d+)?){1}+(?!x)
Input: a34 a34-43x a78 a90
PHP result
Some comment about your regex
I think the ++ should be {1}+, since it seems that you want to modify one number or one range of number at a time, while making the match possessive to discard unwanted numbers.
Workaround
The first group (the outer most capturing group), which captures everything (one number or a range of number), will always be overwritten when a match is found. Hence you can rely on it. You can check whether there exist a - in the group 1 (with contains method). If there is, then you can tell that capturing group 2 contains captured text from the current match, and you can use the captured text. If there is not, then you can ignore all the captured text in capturing group 2 and its nested capturing groups.

Capturing groups using If then else regular expression construct in java

I have an input string in the following format
String input = "00IG356001110002005064007000000";
Characters 3-7 is the code.
Characters 8-12 is the amount.
Based on the code in the input string (IG356 in the sample input string), i need to capture the amount(00111 in the sample).
The value in the amount (characters 8-12) should be picked up only for specific codes and the logic is detailed below.
The code should not be SG356. If it is SG356, not a match and exit.
a. If the code is not SG356, check if the codes are IG902 or SG350, in this case capture the amount(00111)
else
b. Check for the 3 numbers in the code (characters 5-7, 356 in this sample). If they are 200,201,356,370. go ahead and capture the amount
I am using the regular expression shown below:
Using positive lookahead and if then else construct.
String regex= ".{2}(?!SG356)((?=IG902|SG350).{5}(.{5}).+|.{2}(?=200|201|356|370).{3}(.{5}).+)";
The regular expression works fine if the code in the input string is IG902 or SG350 (when the 'if' part of the regex is getting matched). but if the 'else' is getting matched, i am unable to capture the amount.
This regular expression is working fine while just checking for a match.
.{2}(?!SG356)((?=IG902|SG350).+|.{2}(?=200|201|356|370).+)
The problem is only while capturing the group.
I am running this in Java. Any help would be greatly appreciated.
The java code i am using is :
public String getTsqlSum(String input, String regex){
String value = null;
Matcher m = Pattern.compile(regex).matcher(input);
System.out.println("Group Count: " + m.groupCount());
if (m.matches()) {
for (int i=0;i<m.groupCount();i++){
System.out.println("For i: " + i +" Value: " + m.group(i));
}
}
return value;
}
public void forumTest(){
//String input = "00IG902001110002005064007000000";
String input = "00IG356001110002005064007000000";
String regex= ".{2}(?!SG356)(?:(?=IG902|SG350).{5}|.{2}(?=200|201|356|370).{3})(.{5}).+";
System.out.println(match(input, regex));
String match = getTsqlSum(input, regex);
System.out.println("Match: " + match);
}
The regular expression works fine if the code in the input string is IG902 or SG350 (when the 'if' part of the regex is getting matched). but if the 'else' is getting matched, i am unable to capture the amount.
You are not unable to capture the amount, the expression is working fine. But if you are in the second part of the alternation (This is not a regex if-then-else) then your result is in a different capturing group. You will find it in the capturing group 3 and not in the second one like when you are matching in the first part of the alternation.
String regex= ".{2}(?!SG356)((?=IG902|SG350).{5}(.{5}).+|.{2}(?=200|201|356|370).{3}(.{5}).+)";
Group number 1 2 3
In a regular expression the capturing groups are numbered by their opening brackets and this continues also in an alternation. In Perl there would be a construct that gives the capturing groups of an alternation the same number, but I think thats the only flavour that is able to do this.
In Java you need to check after the expression in which group you have the result.
See my answer here, similar topic
You can change your regex and make the alternation before the capturing group
try this
.{2}(?!SG356)(?:(?=IG902|SG350).{5}|.{2}(?=200|201|356|370).{3})(.{5}).+
You will find your result in both cases in the group 1. (I made the first one a non capturing group using the ?:)
Update after the source was added
Your loop is wrong, that means the groups are starting at 1, if you want the content of group one, you have to use m.group(1).
In group m.group(0) you will find the whole matched string.
Try this
for (int i=1;i<=m.groupCount();i++){
System.out.println("For i: " + i +" Value: " + m.group(i));
}

Categories

Resources