I have a set of strings I need to parse and extract values from. They look like:
/apple/1212d3fe
/cat/23224a2f4
/auto/445478eefd
/somethingelse/1234fded
It should match only apple, cat and auto. The output I expect is:
1212, d3fe
23224, a2f4
445478, eefd
null
I need to come up with a regex capturing groups to do the same. I am able to extract the second part but not the first one. The closest I came up with is:
String r2 = "^/(apple/[0-9]{4}|cat/[0-9]{5}|auto/[0-9]{6})([a-f0-9]{4})$";
System.out.println(r2);
Pattern pattern2 = Pattern.compile(r2);
Matcher matcher2 = pattern2.matcher("/apple/2323efff");
if (matcher2.find()) {
System.out.println(matcher2.group(1));
System.out.println(matcher2.group(2));
}
UPDATED QUESTION:
I have a set of strings I need to parse and extract values from. They look like:
/apple/1212d3fe
/cat/23e24a2f4
/auto/df5478eefd
/somethingelse/1234fded
It should match only apple, cat and auto. The output I expect is the everything after the 2nd '/' split as follows: 4 characters if 'apple', 5 characters if 'cat' and 6 characters if 'auto' like:
1212, d3fe
23e24, a2f4
df5478, eefd
null
I need to come up with a regex capturing groups to do the same. I am able to extract the second part but not the first one. The closest I came up with is:
String r2 = "^/(apple/[0-9]{4}|cat/[0-9]{5}|auto/[0-9]{6})([a-f0-9]{4})$";
System.out.println(r2);
Pattern pattern2 = Pattern.compile(r2);
Matcher matcher2 = pattern2.matcher("/apple/2323efff");
if (matcher2.find()) {
System.out.println(matcher2.group(1));
System.out.println(matcher2.group(2));
}
I can do it without the regex OR(|) but it breaks when I include it. Any help with the right regex?
Updated Answer:
As per your updated question you can use this regex based on lookbehind assertions:
/((?<=apple/).{4}|(?<=cat/).{5}|(?<=auto/).{6})(.+)$
RegEx Demo
This regex uses 2 capture groups after matching /
In 1st group we have 3 lookbehind conditions with alternations.
(?<=apple/).{4} makes sure that we match 4 characters that have apple/ on left hand side. Likewise we match 5 and 6 character strings that have cat/ and /auto/.
In 2nd capture group we match remaining characters before end of line.
You could use the regex \/[apple|auto|cat]+\/(\d*)(.*), See here
If you want the last group to have exactly 4 digits you can use this regex:
/(apple|cat|auto)/([0-9a-f]+)([0-9a-f]{4})
Here is a working example:
List<String> strings = Arrays.asList("/apple/1212d3fe", "/cat/23224a2f4", "/auto/445478eefd");
Pattern pattern = Pattern.compile("/(apple|cat|auto)/([0-9a-f]+)([0-9a-f]{4})");
for (String string : strings) {
Matcher matcher = pattern.matcher(string);
if (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
}
}
If you want for digits after apple, 5 after cat and 6 after auto you can split your algorithm in 2 parts:
List<String> strings = Arrays.asList("/apple/1212d3fe", "/cat/23224a2f4", "/auto/445478eefd", "/some/445478eefd");
Pattern firstPattern = Pattern.compile("/(apple|cat|auto)/([0-9a-f]+)");
for (String string : strings) {
Matcher firstMatcher = firstPattern.matcher(string);
if (firstMatcher.find()) {
String first = firstMatcher.group(1);
System.out.println(first);
int length = getLength(first);
Pattern secondPattern = Pattern.compile("([0-9a-f]{" + length + "})([0-9a-f]{4})");
Matcher secondMatcher = secondPattern.matcher(string);
if (secondMatcher.find()) {
System.out.println(secondMatcher.group(1));
System.out.println(secondMatcher.group(2));
}
}
}
private static int getLength(String key) {
switch (key) {
case "apple":
return 4;
case "cat":
return 5;
case "auto":
return 6;
}
throw new IllegalArgumentException("key not allowed");
}
Related
I would like to partially mask data using regex. Here is the input :
123-12345-1234567
And here is what I'd like as output :
1**-*****-*****67
I figure out how to replace for the last group but I don't know to do for the rest of the data.
String s = "123-12345-1234567";
System.out.println(s.replaceAll("\\d(?=\\d{2})", "*")); // output is *23-***45-*****67
Also, I'd like to use only regex because I have different type of data, so different type of mask. I don't want to create functions for each type of data.
For example :
AAAAAAAAA // becomes ********AA
12334567 // becomes 123******
Thanks for your help !
We can use the following regex replacement approach:
String input = "123-12345-1234567";
String output = input.substring(0, 1) +
input.substring(1, input.length()-2).replaceAll("\\d", "*") +
input.substring(input.length()-2);
System.out.println(output); // 1**-*****-*****67
Here we concatenate together the first digit, followed by the middle portion with all digits replaced by *, along with the final two digits.
Edit: A pure regex solution, which, however, is more lines of code than the above and might be less performant.
String input = "123-12345-1234567";
String pattern = "^(\\d)(.*)(\\d{2})$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
if (m.find()) {
String output = m.group(1) + m.group(2).replaceAll("\\d", "*") + m.group(3);
System.out.println(output); // 1**-*****-*****67
}
Java supports a fixed quantifier in a lookbehind, so what you might do is use a pattern with an alternation to account for the different scenario's if you must use a regex only.
Using the lookarounds you can select a single character to be replaced by *
Note that this is hard to maintain, and it would be a better option to write separate functions for the different data formats using separate patterns or string functions (perhaps accompanied by unit tests)
(?<=^\d{3,7})\d(?=\d*$)|(?<=^[A-Z]{0,6})[A-Z](?=[A-Z]*$)|\d(?<=^\d{2,3})(?=\d?-\d{5}-\d{7}$)|\d(?<=^\d{3}-\d{1,5}(?:-\d{1,5})?)
The separate parts match:
(?<=^\d{3,7})\d(?=\d*$) Match a digit asserting 3-7 digits to the left and only digits to the right
| Or
(?<=^[A-Z]{0,6})[A-Z](?=[A-Z]*$) Match A-Z asserting 0-6 chars to the left and only chars A-Z to the right
| Or
\d(?<=^\d{2,3})(?=\d?-\d{5}-\d{7}$) Match a digit asserting 2-3 digits to the left and optional digit, - with 5 digits and - with 7 digits to the right
| Or
\d(?<=^\d{3}-\d{1,5}(?:-\d{1,5})?) Match a digit asserting 3 digits to the left followed - and 1-5 digits and optionally - with 1-5 digits
Regex demo | Java demo
String regex = "(?<=^\\d{3,7})\\d(?=\\d*$)|(?<=^[A-Z]{0,6})[A-Z](?=[A-Z]*$)|\\d(?<=^\\d{2,3})(?=\\d?-\\d{5}-\\d{7}$)|\\d(?<=^\\d{3}-\\d{1,5}(?:-\\d{1,5})?)";
String s1 = "123-12345-1234567";
String s2 = "AAAAAAAAA";
String s3 = "12334567";
System.out.println(s1.replaceAll(regex, "*"));
System.out.println(s2.replaceAll(regex, "*"));
System.out.println(s3.replaceAll(regex, "*"));
Output
1**-*****-*****67
*******AA
123*****
public static void main(String[] args) {
System.out.println("123-12345-1234567".replaceAll("(?<=.{1,})\\d(?=.{3,})", "*"));
System.out.println("AAAAAAAAA".replaceAll(".(?=.{2,})", "*"));
System.out.println("12334567".replaceAll("(?<=.{3,}).", "*"));
}
output:
1**-*****-*****67
*******AA
123*****
I want to split a string after a certain length.
Let's say we have a string of "message"
123456789
Split like this :
"12" "34" "567" "89"
I thought of splitting them into 2 first using
"(?<=\\G.{2})"
Regexp and then join the last two and again split into 3 but is there any way to do it on a single go using RegExp. Please help me out
Use ^(.{2})(.{2})(.{3})(.{2}).* (See it in action in regex101) to group the String to the specified length and grab the groups as separate Strings
String input = "123456789";
List<String> output = new ArrayList<>();
Pattern pattern = Pattern.compile("^(.{2})(.{2})(.{3})(.{2}).*");
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
output.add(matcher.group(i));
}
}
System.out.println(output);
NOTE: Group capturing starts from 1 as the group 0 matches the whole String
And a Magnificent Sorcery from #YCF_L from comment
String pattern = "^(.{2})(.{2})(.{3})(.{2}).*";
String[] vals = "123456789".replaceAll(pattern, "$1-$2-$3-$4").split("-");
Whats the magic here is you can replace the captured group by replaceAll() method. Use $n (where n is a digit) to refer to captured subsequences. See this stackoverflow question for better explanation.
NOTE: here its assumed that no input string contains - in it.
if so, then find any other character that will not be in any of
your input strings so that it can be used as a delimiter.
test this regex in regex101 with 123456789 test string.
^(\d{2})(\d{2})(\d{3})(\d{2})$
output :
Match 1
Full match 0-9 `123456789`
Group 1. 0-2 `12`
Group 2. 2-4 `34`
Group 3. 4-7 `567`
Group 4. 7-9 `89`
I want to use Pattern and Matcher to return the following string as multiple variables.
ArrayList <Pattern> pArray = new ArrayList <Pattern>();
pArray.add(Pattern.compile("\\[[0-9]{2}/[0-9]{2}/[0-9]{2} [0-9]{2}:[0-9]{2}\\]"));
pArray.add(Pattern.compile("\\[\\d{1,5}\\]"));
pArray.add(Pattern.compile("\\[[a-zA-Z[^#0-9]]+\\]"));
pArray.add(Pattern.compile("\\[#.+\\]"));
pArray.add(Pattern.compile("\\[[0-9]{10}\\]"));
Matcher iMatcher;
String infoString = "[03/12/13 10:00][30][John Smith][5554215445][#Comment]";
for (int i = 0 ; i < pArray.size() ; i++)
{
//out.println(pArray.get(i).toString());
iMatcher = pArray.get(i).matcher(infoString);
while (dateMatcher.find())
{
String found = iMatcher.group();
out.println(found.substring(1, found.length()-1));
}
}
}
the program outputs:
[03/12/13 10:00]
[30]
[John Smith]
[\#Comment]
[5554215445]
The only thing I need is to have the program not print the brackets and the # character.
I can easily avoid printing the brackets using substrings inside the loop but I cannot avoid the # character. # is only a comment indentifier in the string.
Can this be done inside the loop?
How about this?
public static void main(String[] args) {
String infoString = "[03/12/13 10:00][30][John Smith][5554215445][#Comment]";
final Pattern pattern = Pattern.compile("\\[#?(.+?)\\]");
final Matcher matcher = pattern.matcher(infoString);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
You just need to make the .+ non greedy and it will match everything between square brackets. We then use a match group to grab what we want rather than using the whole matched pattern, a match group is represented by (pattern). The #? matches a hash before the match group so that it doesn't get into the group.
The match group is retreived using matcher.group(1).
Output:
03/12/13 10:00
30
John Smith
5554215445
Comment
Use lookaheads. i.e. change all your \\[ (in your regex) with positive lookbehind:
(?<=\\[)
and then change all your \\] (in your regex) with positive lookahead:
(?=\\])
finally change \\[# (in your regex) with positive lookbehind:
(?<=\\[#)
I am trying to read a line and parse a value using regular expression in java. The line that contains the value looks something like this,
...... TESTYY912345 .......
...... TESTXX967890 ........
Basically, it contains 4 letters, then any two ASCII values followed by numeric 9 then (any) digits. And, i want to get the value, 912345 and 967890.
This is what I have so far in regular expression,
... TEST[\x00-\xff]{2}[9]{1} ...
But, this skips the 9 and parse 12345 and 67890. (I want to include 9 as well).
Thanks for your help.
You are pretty close. Capture the entire group (9\\d*) after matching TEST\\p{ASCII}{2}. This way, you'll capture the 9 and the following digits:
String s = "...... TESTYY912345 ......";
Pattern p = Pattern.compile("TEST\\p{ASCII}{2}(9\\d+)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(1)); // 912345
}
See my comment for a working expression, "TEST.{2}(9\\d*)".
final Pattern pattern = Pattern.compile("TEST.{2}(9\\d*)");
for (final String str : Arrays.asList("...... TESTYY912345 .......",
"...... TESTXX967890 ........")) {
final Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
final int value = Integer.valueOf(matcher.group(1));
System.out.println(value);
}
}
See the result on ideone:
912345
967890
This will match any two characters (except a line terminator) for what is XX and YY in your example, and will take any digits after the 9.
I want to use regex to find unknown number of arguments in a string. I think that if I explain it would be hard so let's just see the example:
The regex: #ISNULL\('(.*?)','(.*?)','(.*?)'\)
The String: #ISNULL('1','2','3')
The result:
Group[0] "#ISNULL('1','2','3')" at 0 - 20
Group[1] "1" at 9 - 10
Group[2] "2" at 13 - 14
Group[3] "3" at 17 - 18
That's working great.
The problem begins when I need to find unknown number of arguments (2 and more).
What changes do I need to do to the regex in order to find all the arguments that will occur in the string?
So, if I parse this string "#ISNULL('1','2','3','4','5','6')" I'll find all the arguments.
If you don't know the number of potential matches in a repeated construct, you need a regex engine that supports captures in addition to capturing groups. Only .NET and Perl 6 offer this currently.
In C#:
string pattern = #"#ISNULL\(('([^']*)',?)+\)";
string input = #"#ISNULL('1','2','3','4','5','6')";
Match match = Regex.Match(input, pattern);
if (match.Success) {
Console.WriteLine("Matched text: {0}", match.Value);
for (int ctr = 1; ctr < match.Groups.Count; ctr++) {
Console.WriteLine(" Group {0}: {1}", ctr, match.Groups[ctr].Value);
int captureCtr = 0;
foreach (Capture capture in match.Groups[ctr].Captures) {
Console.WriteLine(" Capture {0}: {1}",
captureCtr, capture.Value);
captureCtr++;
}
}
}
In other regex flavors, you have to do it in two steps. E.g., in Java (code snippets courtesy of RegexBuddy):
First, find the part of the string you need:
Pattern regex = Pattern.compile("#ISNULL\\(('([^']*)',?)+\\)");
// or, using non-capturing groups:
// Pattern regex = Pattern.compile("#ISNULL\\((?:'(?:[^']*)',?)+\\)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group();
}
Then use another regex to find and iterate over your matches:
List<String> matchList = new ArrayList<String>();
try {
Pattern regex = Pattern.compile("'([^']*)'");
Matcher regexMatcher = regex.matcher(ResultString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group(1));
}
This answer is somewhat speculative as i have no clue what regex engine you are using.
If the parameters are always numbers and always enclosed in single quotes, then why don't you try using the digit class like this:
'(\d)+?'
This is just the \d class and the extraneous #ISNULL stuff removed, as i assume you are only interested in the parameters themselves. You may not need the + and of course i don't know whether the engine you are using supports the lazy ? operator, just give it a go.