Partially mask data of a group of number using regex - java

I would like to partially mask data using regex. Here is the input :
123-12345-1234567
And here is what I'd like as output :
1**-*****-*****67
I figure out how to replace for the last group but I don't know to do for the rest of the data.
String s = "123-12345-1234567";
System.out.println(s.replaceAll("\\d(?=\\d{2})", "*")); // output is *23-***45-*****67
Also, I'd like to use only regex because I have different type of data, so different type of mask. I don't want to create functions for each type of data.
For example :
AAAAAAAAA // becomes ********AA
12334567 // becomes 123******
Thanks for your help !

We can use the following regex replacement approach:
String input = "123-12345-1234567";
String output = input.substring(0, 1) +
input.substring(1, input.length()-2).replaceAll("\\d", "*") +
input.substring(input.length()-2);
System.out.println(output); // 1**-*****-*****67
Here we concatenate together the first digit, followed by the middle portion with all digits replaced by *, along with the final two digits.
Edit: A pure regex solution, which, however, is more lines of code than the above and might be less performant.
String input = "123-12345-1234567";
String pattern = "^(\\d)(.*)(\\d{2})$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
if (m.find()) {
String output = m.group(1) + m.group(2).replaceAll("\\d", "*") + m.group(3);
System.out.println(output); // 1**-*****-*****67
}

Java supports a fixed quantifier in a lookbehind, so what you might do is use a pattern with an alternation to account for the different scenario's if you must use a regex only.
Using the lookarounds you can select a single character to be replaced by *
Note that this is hard to maintain, and it would be a better option to write separate functions for the different data formats using separate patterns or string functions (perhaps accompanied by unit tests)
(?<=^\d{3,7})\d(?=\d*$)|(?<=^[A-Z]{0,6})[A-Z](?=[A-Z]*$)|\d(?<=^\d{2,3})(?=\d?-\d{5}-\d{7}$)|\d(?<=^\d{3}-\d{1,5}(?:-\d{1,5})?)
The separate parts match:
(?<=^\d{3,7})\d(?=\d*$) Match a digit asserting 3-7 digits to the left and only digits to the right
| Or
(?<=^[A-Z]{0,6})[A-Z](?=[A-Z]*$) Match A-Z asserting 0-6 chars to the left and only chars A-Z to the right
| Or
\d(?<=^\d{2,3})(?=\d?-\d{5}-\d{7}$) Match a digit asserting 2-3 digits to the left and optional digit, - with 5 digits and - with 7 digits to the right
| Or
\d(?<=^\d{3}-\d{1,5}(?:-\d{1,5})?) Match a digit asserting 3 digits to the left followed - and 1-5 digits and optionally - with 1-5 digits
Regex demo | Java demo
String regex = "(?<=^\\d{3,7})\\d(?=\\d*$)|(?<=^[A-Z]{0,6})[A-Z](?=[A-Z]*$)|\\d(?<=^\\d{2,3})(?=\\d?-\\d{5}-\\d{7}$)|\\d(?<=^\\d{3}-\\d{1,5}(?:-\\d{1,5})?)";
String s1 = "123-12345-1234567";
String s2 = "AAAAAAAAA";
String s3 = "12334567";
System.out.println(s1.replaceAll(regex, "*"));
System.out.println(s2.replaceAll(regex, "*"));
System.out.println(s3.replaceAll(regex, "*"));
Output
1**-*****-*****67
*******AA
123*****

public static void main(String[] args) {
System.out.println("123-12345-1234567".replaceAll("(?<=.{1,})\\d(?=.{3,})", "*"));
System.out.println("AAAAAAAAA".replaceAll(".(?=.{2,})", "*"));
System.out.println("12334567".replaceAll("(?<=.{3,}).", "*"));
}
output:
1**-*****-*****67
*******AA
123*****

Related

Regex for finding only single alphabets in a string and ignore consecutive double

I have searched a lot but I am unable to find a regex that could select only single alphabets and double them while those alphabets which are already double, should remain untouched.
I tried
String str = "yahoo";
str = str.replaceAll("(\\w)\\1+", "$0$0");
But since this (\\w)\\1+ selects all double elements, my output becomes yahoooo. I tried to add negation to it !(\\w)\\1+ but didn't work and output becomes same as input. I have tried
str.replaceAll(".", "$0$0");
But that doubles every character including which are already doubled.
Please help to write an regex that could replace all single character with double while double character should remain untouched.
Example
abc -> aabbcc
yahoo -> yyaahhoo (o should remain untouched)
opinion -> ooppiinniioonn
aaaaaabc -> aaaaaabbcc
You can match using this regex:
((.)\2+)|(.)
And replace it with:
$1$3$3
RegEx Demo
RegEx Explanation:
((.)\2+): Match a character and capture in group #2 and using \2+ next to it to make sure we match all multiple repeats of captured character. Capture all the repeated characters in group #1
|: OR
(.): Match any character and capture in group #3
Code Demo:
import java.util.List;
class Ideone {
public static void main(String[] args) {
List<String> input = List.of("aaa", "abc", "yahoo",
"opinion", "aaaaaabc");
for (String s: input) {
System.out.println( s + " => " +
s.replaceAll("((.)\\2+)|(.)", "$1$3$3") );
}
}
}
Output:
aaa => aaa
abc => aabbcc
yahoo => yyaahhoo
opinion => ooppiinniioonn
aaaaaabc => aaaaaabbcc
The solution by #anubhava, if viable in Java, is probably the best way to go. For a more brute force approach, we can try a regex iteration approach on the following pattern:
(\\w)\\1+|\\w
This matches, eagerly, a series of similar letters (two or more of them), followed by, that failing, a single letter. For each match, we can no-op on the multi-letter match, and double up any other single letter. Here is a short Java code which does this:
List<String> inputs = Arrays.asList(new String[] {"abc", "yahoo", "opinion", "aaaaaabc"});
String pattern = "(\\w)\\1+|\\w";
Pattern r = Pattern.compile(pattern);
for (String input : inputs) {
Matcher m = r.matcher(input);
StringBuffer buffer = new StringBuffer();
while (m.find()) {
if (m.group().matches("(\\w)\\1+")) {
m.appendReplacement(buffer, m.group());
}
else {
m.appendReplacement(buffer, m.group() + m.group());
}
}
m.appendTail(buffer);
System.out.println(input + " => " + buffer.toString());
}
}
This prints:
abc => aabbcc
yahoo => yyaahhoo
opinion => ooppiinniioonn
aaaaaabc => aaaaaabbcc
I've got two different understandings of the question.
If the goal is to get an even amount of each word character:
Search for (\w)\1? and replace with $1$1 (regex101 demo).
If just solely characters should be duplicated and others left untouched:
Search for (\w)\1?(\1*) and replace with $1$1$2 (regex 101 demo).
Captures a word character \w to $1, optionally matches the same character again. The second variant captures any more of the same character to $2 for attaching in the replacement.
FYI: If using as a Java string remember to escape the pattern. E.g. \1 -> \\1, \w ->\\w, ...

Masking using regular expressions for below format

I am trying to write a regular expression to mask the below string. Example below.
Input
A1../D//FASDFAS--DFASD//.F
Output (Skip first five and last two Alphanumeric's)
A1../D//FA***********D//.F
I am trying using below regex
([A-Za-z0-9]{5})(.*)(.{2})
Any help would be highly appreciated.
You solve your issue by using Pattern and Matcher with a regex which match multiple groups :
String str = "A1../D//FASDFAS--DFASD//.F";
Pattern pattern = Pattern.compile("(.*?\\/\\/..)(.*?)(.\\/\\/.*)");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
str = matcher.group(1)
+ matcher.group(2).replaceAll(".", "*")
+ matcher.group(3);
}
Detail
(.*?\\/\\/..) first group to match every thing until //
(.*?) second group to match every thing between group one and three
(.\\/\\/.*) third group to match every thing after the last character before the // until the end of string
Outputs
A1../D//FA***********D//.F
I think this solution is more readable.
If you want to do that with a single regex you may use
text = text.replaceAll("(\\G(?!^|(?:[0-9A-Za-z][^0-9A-Za-z]*){2}$)|^(?:[^0-9A-Za-z]*[0-9A-Za-z]){5}).", "$1*");
Or, using the POSIX character class Alnum:
text = text.replaceAll("(\\G(?!^|(?:\\p{Alnum}\\P{Alnum}*){2}$)|^(?:\\P{Alnum}*\\p{Alnum}){5}).", "$1*");
See the Java demo and the regex demo. If you plan to replace any code point rather than a single code unit with an asterisk, replace . with \P{M}\p{M}*+ ("\\P{M}\\p{M}*+").
To make . match line break chars, add (?s) at the start of the pattern.
Details
(\G(?!^|(?:[0-9A-Za-z][^0-9A-Za-z]*){2}$)|^(?:[^0-9A-Za-z]*[0-9A-Za-z]){5}) -
\G(?!^|(?:[0-9A-Za-z][^0-9A-Za-z]*){2}$) - a location after the successful match that is not followed with 2 occurrences of an alphanumeric char followed with 0 or more chars other than alphanumeric chars
| - or
^(?:[^0-9A-Za-z]*[0-9A-Za-z]){5} - start of string, followed with five occurrences of 0 or more non-alphanumeric chars followed with an alphanumeric char
. - any code unit other than line break characters (if you use \P{M}\p{M}*+ - any code point).
Usually, masking of characters in the middle of a string can be done using negative lookbehind (?<!) and positive lookahead groups (?=).
But in this case lookbehind group can't be used because it does not have an obvious maximum length due to unpredictable number of non-alphanumeric characters between first five alphanumeric characters (. and / in the A1../D//FA).
A substring method can used as a workaround for inability to use negative lookbehind group:
String str = "A1../D//FASDFAS--DFASD//.F";
int start = str.replaceAll("^((?:\\W{0,}\\w{1}){5}).*", "$1").length();
String maskedStr = str.substring(0, start) +
str.substring(start).replaceAll(".(?=(?:\\W{0,}\\w{1}){2})", "*");
System.out.println(maskedStr);
// A1../D//FA***********D//.F
But the most straightforward way is to use java.util.regex.Pattern and java.util.regex.Matcher:
String str = "A1../D//FASDFAS--DFASD//.F";
Pattern pattern = Pattern.compile("^((?:\\W{0,}\\w{1}){5})(.+)((?:\\W{0,}\\w{1}){2})");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
String maskedStr = matcher.group(1) +
"*".repeat(matcher.group(2).length()) +
matcher.group(3);
System.out.println(maskedStr);
// A1../D//FA***********D//.F
}
\W{0,} - 0 or more non-alphanumeric characters
\w{1} - exactly 1 alphanumeric character
(\W{0,}\w{1}){5} - 5 alphanumeric characters and any number of alphanumeric characters in between
(?:\W{0,}\w{1}){5} - do not capture as a group
^((?:\\W{0,}\\w{1}){5})(.+)((?:\\W{0,}\\w{1}){2})$ - substring with first five alphanumeric characters (group 1), everything else (group 2), substring with last 2 alphanumeric characters (group 3)

Regex capturing groups within logical OR

I have a set of strings I need to parse and extract values from. They look like:
/apple/1212d3fe
/cat/23224a2f4
/auto/445478eefd
/somethingelse/1234fded
It should match only apple, cat and auto. The output I expect is:
1212, d3fe
23224, a2f4
445478, eefd
null
I need to come up with a regex capturing groups to do the same. I am able to extract the second part but not the first one. The closest I came up with is:
String r2 = "^/(apple/[0-9]{4}|cat/[0-9]{5}|auto/[0-9]{6})([a-f0-9]{4})$";
System.out.println(r2);
Pattern pattern2 = Pattern.compile(r2);
Matcher matcher2 = pattern2.matcher("/apple/2323efff");
if (matcher2.find()) {
System.out.println(matcher2.group(1));
System.out.println(matcher2.group(2));
}
UPDATED QUESTION:
I have a set of strings I need to parse and extract values from. They look like:
/apple/1212d3fe
/cat/23e24a2f4
/auto/df5478eefd
/somethingelse/1234fded
It should match only apple, cat and auto. The output I expect is the everything after the 2nd '/' split as follows: 4 characters if 'apple', 5 characters if 'cat' and 6 characters if 'auto' like:
1212, d3fe
23e24, a2f4
df5478, eefd
null
I need to come up with a regex capturing groups to do the same. I am able to extract the second part but not the first one. The closest I came up with is:
String r2 = "^/(apple/[0-9]{4}|cat/[0-9]{5}|auto/[0-9]{6})([a-f0-9]{4})$";
System.out.println(r2);
Pattern pattern2 = Pattern.compile(r2);
Matcher matcher2 = pattern2.matcher("/apple/2323efff");
if (matcher2.find()) {
System.out.println(matcher2.group(1));
System.out.println(matcher2.group(2));
}
I can do it without the regex OR(|) but it breaks when I include it. Any help with the right regex?
Updated Answer:
As per your updated question you can use this regex based on lookbehind assertions:
/((?<=apple/).{4}|(?<=cat/).{5}|(?<=auto/).{6})(.+)$
RegEx Demo
This regex uses 2 capture groups after matching /
In 1st group we have 3 lookbehind conditions with alternations.
(?<=apple/).{4} makes sure that we match 4 characters that have apple/ on left hand side. Likewise we match 5 and 6 character strings that have cat/ and /auto/.
In 2nd capture group we match remaining characters before end of line.
You could use the regex \/[apple|auto|cat]+\/(\d*)(.*), See here
If you want the last group to have exactly 4 digits you can use this regex:
/(apple|cat|auto)/([0-9a-f]+)([0-9a-f]{4})
Here is a working example:
List<String> strings = Arrays.asList("/apple/1212d3fe", "/cat/23224a2f4", "/auto/445478eefd");
Pattern pattern = Pattern.compile("/(apple|cat|auto)/([0-9a-f]+)([0-9a-f]{4})");
for (String string : strings) {
Matcher matcher = pattern.matcher(string);
if (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
}
}
If you want for digits after apple, 5 after cat and 6 after auto you can split your algorithm in 2 parts:
List<String> strings = Arrays.asList("/apple/1212d3fe", "/cat/23224a2f4", "/auto/445478eefd", "/some/445478eefd");
Pattern firstPattern = Pattern.compile("/(apple|cat|auto)/([0-9a-f]+)");
for (String string : strings) {
Matcher firstMatcher = firstPattern.matcher(string);
if (firstMatcher.find()) {
String first = firstMatcher.group(1);
System.out.println(first);
int length = getLength(first);
Pattern secondPattern = Pattern.compile("([0-9a-f]{" + length + "})([0-9a-f]{4})");
Matcher secondMatcher = secondPattern.matcher(string);
if (secondMatcher.find()) {
System.out.println(secondMatcher.group(1));
System.out.println(secondMatcher.group(2));
}
}
}
private static int getLength(String key) {
switch (key) {
case "apple":
return 4;
case "cat":
return 5;
case "auto":
return 6;
}
throw new IllegalArgumentException("key not allowed");
}

Split String at different lengths in Java

I want to split a string after a certain length.
Let's say we have a string of "message"
123456789
Split like this :
"12" "34" "567" "89"
I thought of splitting them into 2 first using
"(?<=\\G.{2})"
Regexp and then join the last two and again split into 3 but is there any way to do it on a single go using RegExp. Please help me out
Use ^(.{2})(.{2})(.{3})(.{2}).* (See it in action in regex101) to group the String to the specified length and grab the groups as separate Strings
String input = "123456789";
List<String> output = new ArrayList<>();
Pattern pattern = Pattern.compile("^(.{2})(.{2})(.{3})(.{2}).*");
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
output.add(matcher.group(i));
}
}
System.out.println(output);
NOTE: Group capturing starts from 1 as the group 0 matches the whole String
And a Magnificent Sorcery from #YCF_L from comment
String pattern = "^(.{2})(.{2})(.{3})(.{2}).*";
String[] vals = "123456789".replaceAll(pattern, "$1-$2-$3-$4").split("-");
Whats the magic here is you can replace the captured group by replaceAll() method. Use $n (where n is a digit) to refer to captured subsequences. See this stackoverflow question for better explanation.
NOTE: here its assumed that no input string contains - in it.
if so, then find any other character that will not be in any of
your input strings so that it can be used as a delimiter.
test this regex in regex101 with 123456789 test string.
^(\d{2})(\d{2})(\d{3})(\d{2})$
output :
Match 1
Full match 0-9 `123456789`
Group 1. 0-2 `12`
Group 2. 2-4 `34`
Group 3. 4-7 `567`
Group 4. 7-9 `89`

Parsing array syntax using regex

I think what I am asking is either very trivial or already asked, but I have had a hard time finding answers.
We need to capture the inner number characters between brackets within a given string.
so given the string
StringWithMultiArrayAccess[0][9][4][45][1]
and the regex
^\w*?(\[(\d+)\])+?
I would expect 6 capture groups and access to the inner data.
However, I end up only capturing the last "1" character in capture group 2.
If it is important heres my java junit test:
#Test
public void ensureThatJsonHandlerCanHandleNestedArrays(){
String stringWithArr = "StringWithMultiArray[0][0][4][45][1]";
Pattern pattern = Pattern.compile("^\\w*?(\\[(\\d+)\\])+?");
Matcher matcher = pattern.matcher(stringWithArr);
matcher.find();
assertTrue(matcher.matches()); //passes
System.out.println(matcher.group(2)); //prints 1 (matched from last array symbols)
assertEquals("0", matcher.group(2)); //expected but its 1 not zero
assertEquals("45", matcher.group(5)); //only 2 capture groups exist, the whole string and the 1 from the last array brackets
}
In order to capture each number, you need to change your regex so it (a) captures a single number and (b) is not anchored to--and therefore limited by--any other part of the string ("^\w*?" anchors it to the start of the string). Then you can loop through them:
Matcher mtchr = Pattern.compile("\\[(\\d+)\\]").matcher(arrayAsStr);
while(mtchr.find()) {
System.out.print(mtchr.group(1) + " ");
}
Output:
0 9 4 45 1

Categories

Resources