How to surround all Bracket groups with * in a string - java

I have been trying to get a string replaceAll to work in Java that was originally from a JavaScript code block. I have the following
String regexSearch = "((?!([ \\*]))|^)\\[[A-Za-z0-9\\s]*\\](?!\\*)"; //Java Version must escape special characters again
String regexReplacement = "*$&*";
String inputString = "This is a User, [USER 1], and a second user [USER 2]";
Pattern p = Pattern.compile(regexSearch);
Matcher m = p.matcher(inputString);
System.out.println(m.replaceAll(regexReplacement));
My desired output is
This is a User, *[USER 1]*, and a second user *[USER 2]*
I keep getting illegal group reference errors.
Requirements are as follows. Any text that is surrounded by square brackets "[" and "]" will be surrounded by "*" while still retaining the brackets. However if within the bracketed text there is a "|" character then this will not apply.

Your initial ((?!([ \*]))|^)\[[A-Za-z0-9\s]*\](?!\*) regex attempts (but fails) to match [...] strings when not enclosed with * chars. In Java, you would write it as
(?<!\*)\[[A-Za-z0-9\s]*](?!\*)
String regexSearch = "(?<!\\*)\[[A-Za-z0-9\\s]*](?!\\*)";
However, you may use a more lenient expression like
String regexSearch = "\\[[^\\]\\[|]*]";
Or, if you need to keep the original behavior to fail the matches inside asterisks:
String regexSearch = "(?<!\\*)\\[[^\\]\\[|]*](?!\\*)";
See the regex demo.
It matches:
(?<!\*) - a negative lookbehind that fails the match if there is a * char immediately to the left of the current location
\[ - a [ char
[^\]\[|]* - 0 or more chars other than [, ] and |
] - a ] char
(?!\*) - a negative lookahead that fails the match if there is a * char immediately to the right of the current location.
So, it will match from the [ till the closest ] without matching other [ and | inside, i.e. it will match innermost substrings between square brackets. It will also allow any other special and non-speical chars inside brackets, like hyphens, apostrophes, etc. [A-Za-z0-9\s] only allowed ASCII letters, digits and whitespaces.
Java demo:
String regexSearch = "\\[[^\\]\\[|]*]";
String regexReplacement = "*$0*";
String inputString = "This is a User, [USER 1], and a second user [USER 2] not [USER | 3]";
Pattern p = Pattern.compile(regexSearch);
Matcher m = p.matcher(inputString);
System.out.println(m.replaceAll(regexReplacement));
// => This is a User, *[USER 1]*, and a second user *[USER 2]* not [USER | 3]

You don't need to worry about matching the whole line, the following is sufficient:
\[(.*?)\]
Replacing this with *[$1]*.
Here's a demo on RegExr.
Further explanation: taking each element in the regex in turn:
\[ - we need to escape the opening square bracket because square brackets are a reserved character in regular expressions.
(.*?) - the .*? matches zero or more of any character lazily. This is surrounded in parentheses to indicate it's a capture group.
] - close the square bracket.
We then replace this with an an asterisk followed by an open square bracket *[, the first capture group $1 and then the closing square bracket and another asterisk. ]*.

It can be done as simple as this:
String s = inputString.replaceAll("\\[.*?]", "*$0*")
No capture groups needed.
Result
This is a User, *[USER 1]*, and a second user *[USER 2]*
Explanation
\\[ Match '[', escaped since '[' has special meaning, double-escaped because of Java
.*? Match any text on single line, match as little as possible
] Match ']', no need to escape since it's not in a character class
* Literal '*'
$0 Entire matched text '[XXX]'
* Literal '*'

This should do it.
String.replaceAll -- first argument is a regex.
The second argument is the replacement string. The $1 is capture group.
String regexSearch = "\\[.*?]";
String inputString = "This is a User, [USER 1], and a second user [USER 2]";
inputString = inputString.replaceAll(regexSearch, "*$1*");
System.out.println(inputString);
Prints
This is a User, *[USER 1]*, and a second user *[USER 2]*

Try replace all [ with - *[* and do the same for ] using the string method .replace(oldChar, newChar) in java.

Related

Java regex, replace certain characters except if it matches a pattern

I have this string "person","hobby","key" and I want to remove " " for all words except for key so the output will be person,hobby,"key"
String str = "\"person\",\"hobby\",\"key\"";
System.out.println(str+"\n");
str=str.replaceAll("/*regex*/","");
System.out.println(str); //person,hobby,"key"
You may use the following pattern:
\"(?!key\")(.+?)\"
And replace with $1
Details:
\" - Match a double quotation mark character.
(?!key\") - Negative Lookahead (not followed by the word "key" and another double quotation mark).
(.+?) - Match one or more characters (lazy) and capture them in group 1.
\" - Match another double quotation mark character.
Substitution: $1 - back reference to whatever was matched in group 1.
Regex demo.
Here's a full example:
String str = "\"person\",\"hobby\",\"key\"";
String pattern = "\"(?!key\")(.+?)\"";
String result = str.replaceAll(pattern, "$1");
System.out.println(result); // person,hobby,"key"
Try it online.

Masking credit card number using regex

I am trying to mask the CC number, in a way that third character and last three characters are unmasked.
For eg.. 7108898787654351 to **0**********351
I have tried (?<=.{3}).(?=.*...). It unmasked last three characters. But it unmasks first three also.
Can you throw some pointers on how to unmask 3rd character alone?
You can use this regex with a lookahead and lookbehind:
str = str.replaceAll("(?<!^..).(?=.{3})", "*");
//=> **0**********351
RegEx Demo
RegEx Details:
(?<!^..): Negative lookahead to assert that we don't have 2 characters after start behind us (to exclude 3rd character from matching)
.: Match a character
(?=.{3}): Positive lookahead to assert that we have at least 3 characters ahead
I would suggest that regex isn't the only way to do this.
char[] m = new char[16]; // Or whatever length.
Arrays.fill(m, '*');
m[2] = cc.charAt(2);
m[13] = cc.charAt(13);
m[14] = cc.charAt(14);
m[15] = cc.charAt(15);
String masked = new String(m);
It might be more verbose, but it's a heck of a lot more readable (and debuggable) than a regex.
Here is another regular expression:
(?!(?:\D*\d){14}$|(?:\D*\d){1,3}$)\d
See the online demo
It may seem a bit unwieldy but since a credit card should have 16 digits I opted to use negative lookaheads to look for an x amount of non-digits followed by a digit.
(?! - Negative lookahead
(?: - Open 1st non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){14} - Close 1st non capture group and match it 14 times.
$ - End string ancor.
| - Alternation/OR.
(?: - Open 2nd non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){1,3} - Close 2nd non capture group and match it 1 to 3 times.
$ - End string ancor.
) - Close negative lookahead.
\d - Match a single digit.
This would now mask any digit other than the third and last three regardless of their position (due to delimiters) in the formatted CC-number.
Apart from where the dashes are after the first 3 digits, leave the 3rd digit unmatched and make sure that where are always 3 digits at the end of the string:
(?<!^\d{2})\d(?=[\d-]*\d-?\d-?\d$)
Explanation
(?<! Negative lookbehind, assert what is on the left is not
^\d{2} Match 2 digits from the start of the string
) Close lookbehind
\d Match a digit
(?= Positive lookahead, assert what is on the right is
[\d-]* 0+ occurrences of either - or a digit
\d-?\d-?\d Match 3 digits with optional hyphens
$ End of string
) Close lookahead
Regex demo | Java demo
Example code
String regex = "(?<!^\\d{2})\\d(?=[\\d-]*\\d-?\\d-?\\d$)";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
String strings[] = { "7108898787654351", "7108-8987-8765-4351"};
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
System.out.println(matcher.replaceAll("*"));
}
Output
**0**********351
**0*-****-****-*351
Don't think you should use a regex to do what you want. You could use StringBuilder to create the required string
String str = "7108-8987-8765-4351";
StringBuilder sb = new StringBuilder("*".repeat(str.length()));
for (int i = 0; i < str.length(); i++) {
if (i == 2 || i >= str.length() - 3) {
sb.replace(i, i + 1, String.valueOf(str.charAt(i)));
}
}
System.out.print(sb.toString()); // output: **0*************351
You may add a ^.{0,1} alternative to allow matching . when it is the first or second char in the string:
String s = "7108898787654351"; // **0**********351
System.out.println(s.replaceAll("(?<=.{3}|^.{0,1}).(?=.*...)", "*"));
// => **0**********351
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
It is equal to
System.out.println(s.replaceAll("(?<!^..).(?=.*...)", "*"));
See the Java demo and a regex demo.
Regex details
(?<=.{3}|^.{0,1}) - there must be any three chars other than line break chars immediately to the left of the current location, or start of string, or a single char at the start of the string
(?<!^..) - a negative lookbehind that fails the match if there are any two chars other than line break chars immediately to the left of the current location
. - any char but a line break char
(?=.*...) - there must be any three chars other than line break chars immediately to the right of the current location.
If the CC number always has 16 digits, as it does in the example, and as do Visa and MasterCard CC's, matches of the following regular expression can be replaced with an asterisk.
\d(?!\d{0,2}$|\d{13}$)
Start your engine!

How can I strip all non digits in a string except the first character?

I have a string that I want to make sure that the format is always a + followed by digits.
The following would work:
String parsed = inputString.replaceAll("[^0-9]+", "");
if(inputString.charAt(0) == '+') {
result = "+" + parsed;
}
else {
result = parsed;
}
But is there a way to have a regex in the replaceAll that would keep the + (if exists) in the beginning of the string and replace all non digits in the first line?
The following statement with the given regex would do the job:
String result = inputString.replaceAll("(^\\+)|[^0-9]", "$1");
(^\\+) find either a plus sign at the beginning of string and put it to a group ($1),
| or
[^0-9] find a character which is not a number
$1 and replace it with nothing or the plus sign at the start of group ($1)
You can use this expression:
String r = s.replaceAll("((?<!^)[^0-9]|^[^0-9+])", "");
The idea is to replace any non-digit when it is not the initial character of the string (that's the (?<!^)[^0-9] part with a lookbehind) or any character that is not a digit or plus that is the initial character of the string (the ^[^0-9+] part).
Demo.
What about just
(?!^)\D+
Java string:
"(?!^)\\D+"
Demo at regex101.com
\D matches a character that is not a digit [^0-9]
(?!^) using a negative lookahead to check, if it is not the initial character
Yes you can use this kind of replacement:
String parsed = inputString.replaceAll("^[^0-9+]*(\\+)|[^0-9]+", "$1");
if present and before the first digit in the string, the + character is captured in group 1. For example: dfd+sdfd12+sdf12 returns +1212 (the second + is removed since its position is after the first digit).
try this
1- This will allow negative and positive number and will match app special char except - and + at first position.
(?!^[-+])[^0-9.]
2- If you only want to allow + at first position
(?!^[+])[^0-9.]

Java Regular Expression: match any number of digits in round brackets if the closing bracket is the last char in the String

I need some help to save my day (or my night). I would like to match:
Any number of digits
Enclosed by round brackets "()" [The brackets contain nothing else than digits]
If the closing bracket ")" is the last character in the String.
Here's the code I have come up with:
// this how the text looks, the part I want to match are the digits in the brackets at the end of it
String text = "Some text 45 Some text, text and text (1234)";
String regex = "[no idea how to express this.....]"; // this is where the regex should be
Pattern regPat = Pattern.compile(regex);
Matcher matcher = regPat.matcher(text);
String matchedText = "";
if (matcher.find()) {
matchedText = matcher.group();
}
Please help me out with the magic expression I have only managed to match any number of digits, but not if they are enclosed in brackets and are at the end of the line...
Thanks!
You can try this regex:
String regex = "\\(\\d+\\)$";
If you need to extract just the digits, you can use this regex:
String regex = "\\((\\d+)\\)$";
and get the value of matcher.group(1). (Explanation: The ( and ) characters preceded by backslashes match the round brackets literally; the ( and ) characters not preceded by
backslashes tell the matcher that the part inside, i.e. just the digits, form a capture group, and the part matching the group can be obtained by matcher.group(1), since this is the first, and only, capture group in the regex.)
This is the required regex for your condition
\\(\\d+\\)$

Latin Regex with symbols

I need split a text and get only words, numbers and hyphenated composed-words. I need to get latin words also, then I used \p{L}, which gives me é, ú ü ã, and so forth. The example is:
String myText = "Some latin text with symbols, ? 987 (A la pointe sud-est de l'île se dresse la cathédrale Notre-Dame qui fut lors de son achèvement en 1330 l'une des plus grandes cathédrales d'occident) : ! # # $ % ^& * ( ) + - _ #$% " ' : ; > < / \ | , here some is wrong… * + () e -"
Pattern pattern = Pattern.compile("[^\\p{L}+(\\-\\p{L}+)*\\d]+");
String words[] = pattern.split( myText );
What is wrong with this regex? Why it matches symbols like "(", "+", "-", "*" and "|"?
Some of results are:
dresse // OK
sud-est // OK
occident) // WRONG
987 // OK
() // WRONG
(a // WRONG
* // WRONG
- // WRONG
+ // WRONG
( // WRONG
| // WRONG
The regex explanation is:
[^\p{L}+(\-\p{L}+)*\d]+
* Word separator will be:
* [^ ... ] No sequence in:
* \p{L}+ Any latin letter
* (\-\p{L}+)* Optionally hyphenated
* \d or numbers
* [ ... ]+ once or more.
If my understanding of your requirement is correct, this regex will match what you want:
"\\p{IsLatin}+(?:-\\p{IsLatin}+)*|\\d+"
It will match:
A contiguous sequence of Unicode Latin script characters. I restrict it to Latin script, since \p{L} will match letter in any script. Change \\p{IsLatin} to \\pL if your version of Java doesn't support the syntax.
Or several such sequences, hyphenated
Or a contiguous sequence of decimal digits (0-9)
The regex above is to be used by calling Pattern.compile, and call matcher(String input) to obtain a Matcher object, and use a loop to find matches.
Pattern pattern = Pattern.compile("\\p{IsLatin}+(?:-\\p{IsLatin}+)*|\\d+");
Matcher matcher = pattern.matcher(inputString);
while (matcher.find()) {
System.out.println(matcher.group());
}
If you want to allow words with apostrophe ':
"\\p{IsLatin}+(?:['\\-]\\p{IsLatin}+)*|\\d+"
I also escape - in the character class ['\\-] just in case you want to add more. Actually - doesn't need escaping if it is the first or last in the character class, but I escape it anyway just to be safe.
If the opening bracket of a character class is followed by a ^ then the characters listed inside the class are not allowed. So your regex allows anything except unicode letter,+,(,-,),* and digit occurring one or more times.
Note that characters like +,(,),* etc. don't have any special meaning inside a character class.
What pattern.split does is that it splits the string at patterns matching the regex. Your regex matches whitespace and hence split occurs at each occurrence of one or more whitespace. So result will be this.
For example consider this
Pattern pattern = Pattern.compile("a");
for (String s : pattern.split("sda a f g")) {
System.out.println("==>"+s);
}
Output will be
==>sd
==>
==> f g
A regular expression set description with [] can contain only letters, classes (\p{...}), sequences (e.g. a-z) and the complement symbol (^). You have to place the other magic characters you are using (+*()) outside the [ ] block.

Categories

Resources