Regex pattern for repeated words

Regex pattern for repeated words - java

I am very new to regex, I am learning it now. I have a requirement like this:
Any String starts with #newline# and also ends with #newline#. In between these two words, there could be (0 or more spaces) or (0 or more #newline#).
below is an example:
#newline# #newline# #newline##newline# #newline##newline##newline#.
How to do regex for this?
I have tried this, but not working
^#newline#|(\s+#newline#)|#newline#|#newline#$

Your ^#newline#|(\s+#newline#)|#newline#|#newline#$ matches either a #newline# at the start of the string (^#newline#), or 1+ whitespaces followed with #newline# ((\s+#newline#)), or #newline#, or (and this never matches as the previous catches all the cases of #newline#) a #newline# at the end of the string (#newline#$).
You may match these strings with
^#newline#(?:\s*#newline#)*$
or (if there should be at least 2 occurrences of #newline# in the string)
^#newline#(?:\s*#newline#)+$
^
See the regex demo.
^ - start of string
#newline# - literal string
(?:\s*#newline#)* - zero (NOTE: replacing * with + will require at least 1) or more sequences of
\s* - 0+ whitespaces
#newline# - a literal substring
$ - end of string.
Java demo:
String s = "#newline# #newline# #newline##newline# #newline##newline##newline#";
System.out.println(s.matches("#newline#(?:\\s*#newline#)+"));
// => true
Note: inside matches(), the expression is already anchored, and ^ and $ can be removed.

As far as I understand the requirements, it should be this:
^#newline#(\s|#newline#)*#newline#$
this will not match your example string, since it does not start with #newline#
without the ^ and the $ it matches a sub-string.
Check out http://www.regexplanet.com/ to play around with Regular Expressions.

Please use the pattern and matches classes to identify.
You can give the patternString string at runtime
patternString="newline";
public void findtheMatch(String patternString)
{
String text ="#newline# #newline# #newline##newline# #newline##newline##newline# ";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println("found: " + matcher.group(1));
}
}

You can try this as well:
#newline#[\s\S]+#newline#
It says, match anything that starts with #newline# followed by any combination of whitespace or non-whitespace characters and ends with #newline#.

Related

Masking credit card number using regex

I am trying to mask the CC number, in a way that third character and last three characters are unmasked.
For eg.. 7108898787654351 to **0**********351
I have tried (?<=.{3}).(?=.*...). It unmasked last three characters. But it unmasks first three also.
Can you throw some pointers on how to unmask 3rd character alone?

You can use this regex with a lookahead and lookbehind:
str = str.replaceAll("(?<!^..).(?=.{3})", "*");
//=> **0**********351
RegEx Demo
RegEx Details:
(?<!^..): Negative lookahead to assert that we don't have 2 characters after start behind us (to exclude 3rd character from matching)
.: Match a character
(?=.{3}): Positive lookahead to assert that we have at least 3 characters ahead

I would suggest that regex isn't the only way to do this.
char[] m = new char[16]; // Or whatever length.
Arrays.fill(m, '*');
m[2] = cc.charAt(2);
m[13] = cc.charAt(13);
m[14] = cc.charAt(14);
m[15] = cc.charAt(15);
String masked = new String(m);
It might be more verbose, but it's a heck of a lot more readable (and debuggable) than a regex.

Here is another regular expression:
(?!(?:\D*\d){14}$|(?:\D*\d){1,3}$)\d
See the online demo
It may seem a bit unwieldy but since a credit card should have 16 digits I opted to use negative lookaheads to look for an x amount of non-digits followed by a digit.
(?! - Negative lookahead
(?: - Open 1st non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){14} - Close 1st non capture group and match it 14 times.
$ - End string ancor.
| - Alternation/OR.
(?: - Open 2nd non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){1,3} - Close 2nd non capture group and match it 1 to 3 times.
$ - End string ancor.
) - Close negative lookahead.
\d - Match a single digit.
This would now mask any digit other than the third and last three regardless of their position (due to delimiters) in the formatted CC-number.

Apart from where the dashes are after the first 3 digits, leave the 3rd digit unmatched and make sure that where are always 3 digits at the end of the string:
(?<!^\d{2})\d(?=[\d-]*\d-?\d-?\d$)
Explanation
(?<! Negative lookbehind, assert what is on the left is not
^\d{2} Match 2 digits from the start of the string
) Close lookbehind
\d Match a digit
(?= Positive lookahead, assert what is on the right is
[\d-]* 0+ occurrences of either - or a digit
\d-?\d-?\d Match 3 digits with optional hyphens
$ End of string
) Close lookahead
Regex demo | Java demo
Example code
String regex = "(?<!^\\d{2})\\d(?=[\\d-]*\\d-?\\d-?\\d$)";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
String strings[] = { "7108898787654351", "7108-8987-8765-4351"};
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
System.out.println(matcher.replaceAll("*"));
}
Output
**0**********351
**0*-****-****-*351

Don't think you should use a regex to do what you want. You could use StringBuilder to create the required string
String str = "7108-8987-8765-4351";
StringBuilder sb = new StringBuilder("*".repeat(str.length()));
for (int i = 0; i < str.length(); i++) {
if (i == 2 || i >= str.length() - 3) {
sb.replace(i, i + 1, String.valueOf(str.charAt(i)));
}
}
System.out.print(sb.toString()); // output: **0*************351

You may add a ^.{0,1} alternative to allow matching . when it is the first or second char in the string:
String s = "7108898787654351"; // **0**********351
System.out.println(s.replaceAll("(?<=.{3}|^.{0,1}).(?=.*...)", "*"));
// => **0**********351
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
It is equal to
System.out.println(s.replaceAll("(?<!^..).(?=.*...)", "*"));
See the Java demo and a regex demo.
Regex details
(?<=.{3}|^.{0,1}) - there must be any three chars other than line break chars immediately to the left of the current location, or start of string, or a single char at the start of the string
(?<!^..) - a negative lookbehind that fails the match if there are any two chars other than line break chars immediately to the left of the current location
. - any char but a line break char
(?=.*...) - there must be any three chars other than line break chars immediately to the right of the current location.

If the CC number always has 16 digits, as it does in the example, and as do Visa and MasterCard CC's, matches of the following regular expression can be replaced with an asterisk.
\d(?!\d{0,2}$|\d{13}$)
Start your engine!

Masking using regular expressions for below format

I am trying to write a regular expression to mask the below string. Example below.
Input
A1../D//FASDFAS--DFASD//.F
Output (Skip first five and last two Alphanumeric's)
A1../D//FA***********D//.F
I am trying using below regex
([A-Za-z0-9]{5})(.*)(.{2})
Any help would be highly appreciated.

You solve your issue by using Pattern and Matcher with a regex which match multiple groups :
String str = "A1../D//FASDFAS--DFASD//.F";
Pattern pattern = Pattern.compile("(.*?\\/\\/..)(.*?)(.\\/\\/.*)");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
str = matcher.group(1)
+ matcher.group(2).replaceAll(".", "*")
+ matcher.group(3);
}
Detail
(.*?\\/\\/..) first group to match every thing until //
(.*?) second group to match every thing between group one and three
(.\\/\\/.*) third group to match every thing after the last character before the // until the end of string
Outputs
A1../D//FA***********D//.F
I think this solution is more readable.

If you want to do that with a single regex you may use
text = text.replaceAll("(\\G(?!^|(?:[0-9A-Za-z][^0-9A-Za-z]*){2}$)|^(?:[^0-9A-Za-z]*[0-9A-Za-z]){5}).", "$1*");
Or, using the POSIX character class Alnum:
text = text.replaceAll("(\\G(?!^|(?:\\p{Alnum}\\P{Alnum}*){2}$)|^(?:\\P{Alnum}*\\p{Alnum}){5}).", "$1*");
See the Java demo and the regex demo. If you plan to replace any code point rather than a single code unit with an asterisk, replace . with \P{M}\p{M}*+ ("\\P{M}\\p{M}*+").
To make . match line break chars, add (?s) at the start of the pattern.
Details
(\G(?!^|(?:[0-9A-Za-z][^0-9A-Za-z]*){2}$)|^(?:[^0-9A-Za-z]*[0-9A-Za-z]){5}) -
\G(?!^|(?:[0-9A-Za-z][^0-9A-Za-z]*){2}$) - a location after the successful match that is not followed with 2 occurrences of an alphanumeric char followed with 0 or more chars other than alphanumeric chars
| - or
^(?:[^0-9A-Za-z]*[0-9A-Za-z]){5} - start of string, followed with five occurrences of 0 or more non-alphanumeric chars followed with an alphanumeric char
. - any code unit other than line break characters (if you use \P{M}\p{M}*+ - any code point).

Usually, masking of characters in the middle of a string can be done using negative lookbehind (?<!) and positive lookahead groups (?=).
But in this case lookbehind group can't be used because it does not have an obvious maximum length due to unpredictable number of non-alphanumeric characters between first five alphanumeric characters (. and / in the A1../D//FA).
A substring method can used as a workaround for inability to use negative lookbehind group:
String str = "A1../D//FASDFAS--DFASD//.F";
int start = str.replaceAll("^((?:\\W{0,}\\w{1}){5}).*", "$1").length();
String maskedStr = str.substring(0, start) +
str.substring(start).replaceAll(".(?=(?:\\W{0,}\\w{1}){2})", "*");
System.out.println(maskedStr);
// A1../D//FA***********D//.F
But the most straightforward way is to use java.util.regex.Pattern and java.util.regex.Matcher:
String str = "A1../D//FASDFAS--DFASD//.F";
Pattern pattern = Pattern.compile("^((?:\\W{0,}\\w{1}){5})(.+)((?:\\W{0,}\\w{1}){2})");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
String maskedStr = matcher.group(1) +
"*".repeat(matcher.group(2).length()) +
matcher.group(3);
System.out.println(maskedStr);
// A1../D//FA***********D//.F
}
\W{0,} - 0 or more non-alphanumeric characters
\w{1} - exactly 1 alphanumeric character
(\W{0,}\w{1}){5} - 5 alphanumeric characters and any number of alphanumeric characters in between
(?:\W{0,}\w{1}){5} - do not capture as a group
^((?:\\W{0,}\\w{1}){5})(.+)((?:\\W{0,}\\w{1}){2})$ - substring with first five alphanumeric characters (group 1), everything else (group 2), substring with last 2 alphanumeric characters (group 3)

Why \z for regular expression doesn't work for me?

I read in the Oracle documentary that \z means end of the input. But the compilator throws errors.
I need to find a word "java" in text when java is the last word. Any sugestion how to deal with it?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Why {
public static void main(String[] args) {
String language = "java";
String text = "I'm fan of java";
Pattern p = Pattern.compile("\\s" + language + "[\\W|\\z]");
Matcher m = p.matcher(text);
System.out.println(m.find()); // <-------------- Exception
}
}
// Exception in thread "main" java.util.regex.PatternSyntaxException:
// Illegal/unsupported escape sequence near index 11 \sjava[\W|\z]

The [...] defines a character class, and you can define chars in there. \z is an anchor, a zero-width assertion. All zero-width assertions - \A, \b, \G, ^, $ - do not preserve their "special" meaning when put inside character classes.
The error you get is due to the fact that
It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular-expression language.
You seem to want to match a word that has a whitespace or start of a string before or a non-word, a digit or end of string after. I suggest using
Pattern p = Pattern.compile("(?<!\\S)" + Pattern.quote(language) + "(?![^\\W\\d])");
The (?<!\\S) is a negative lookbehind that only matches a position that is immediately preceded with a whitespace or start of string. The (?![^\\W\\d]) is a negative lookahead that fails the match if the next char is not a non-word char or not a digit (so there can be a digit, a non-word or end of string).
See the regex demo.

Remove leading trailing non numeric characters from a string in Java

I need to strip off all the leading and trailing characters from a string upto the first and last digit respectively.
Example : OBC9187A-1%A
Should return : 9187A-1
How do I achieve this in Java?
I understand regex is the solution, but I am not good at it.
I tried this replaceAll("([^0-9.*0-9])","")
But it returns only digits and strips all the alpha/special characters.

Here is a self-contained example of using regex and java to solve your problem. I would suggest looking at a regex tutorial of some kind here is a nice one.
public static void main(String[] args) throws FileNotFoundException {
String test = "OBC9187A-1%A";
Pattern p = Pattern.compile("\\d.*\\d");
Matcher m = p.matcher(test);
while (m.find()) {
System.out.println("Match: " + m.group());
}
}
Output:
Match: 9187A-1
\d matches any digit .* matches anything 0 or more times \d matches any digit. The reason we use \\d is to escape the \ for Java since \ is a special character...So this regex will match a digit followed by anything followed by another digit. This is greedy so it will take the longest/largest/greediest match so it will get the first and last digit and anything in between. The while loop is there because if there was more than 1 match it would loop through all matches. In this case there can only be 1 match so you can leave the while loop or change to if like this:
if(m.find())
{
System.out.println("Match: " + m.group());
}

This will strip leading and trailing non-digit characters from string s.
String s = "OBC9187A-1%A";
s = s.replaceAll("^\\D+", "").replaceAll("\\D+$", "");
System.out.println(s);
// prints 9187A-1
DEMO
Regex explanation
^\D+
^ assert position at start of the string
\D+ match any character that's not a digit [^0-9]
Quantifier: + Between one and unlimited times, as many times as possible
\D+$
\D+ match any character that's not a digit [^0-9]
Quantifier: + Between one and unlimited times, as many times as possible
$ assert position at end of the string

Java regexto match tuples

I need to extract tuples out of string
e.g. (1,1,A)(2,1,B)(1,1,C)(1,1,D)
and thought some regex like:
String tupleRegex = "(\\(\\d,\\d,\\w\\))*";
would work but it just gives me the first tuple. What would be proper regex to match all the tuples in the strings.

Remove the * from the regex and iterate over the matches using a java.util.regex.Matcher:
String input = "(1,1,A)(2,1,B)(1,1,C)(1,1,D)";
String tupleRegex = "(\\(\\d,\\d,\\w\\))";
Pattern pattern = Pattern.compile(tupleRegex);
Matcher matcher = pattern.matcher(input);
while(matcher.find()) {
System.out.println(matcher.group());
}
The * character is a quantifier that matches zero or more tuples. Hence your original regex would match the entire input string.

One line solution using String.split() method and here is the pattern (?!^\\()(?=\\()
Arrays.toString("(1,1,A)(2,1,B)(1,1,C)(1,1,D)".split("(?!^\\()(?=\\()"))
output:
[(1,1,A), (2,1,B), (1,1,C), (1,1,D)]
Here is DEMO as well.
Pattern explanation:
(?! look ahead to see if there is not:
^ the beginning of the string
\( '('
) end of look-ahead
(?= look ahead to see if there is:
\( '('
) end of look-ahead

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex pattern for repeated words - java

You can try this as well: #newline#[\s\S]+#newline# It says, match anything that starts with #newline# followed by any combination of whitespace or non-whitespace characters and ends with #newline#.

Related

Masking credit card number using regex

Masking using regular expressions for below format

Why \z for regular expression doesn't work for me?

Remove leading trailing non numeric characters from a string in Java

Java regexto match tuples

Categories

Resources