How to use regular expression character class properly? - java

String a = "77*b+7-77/98+6";
String b[] = a.split("[*+-/]"); // works fine
b[] = a.split("[+/- *]"); // gives pattern syntax exception because of " * "
b[] = a.split("[*/+-]"); // works fine
b[] = a.split("[-*]"); // works fine
Please, help me to figure out this.

In Regex square brackets [] denote a character class. A character class can have two characters separated by a hyphen a-z to denote a range of characters.
This means that if the hyphen is used, and either end of the range is invalid, this is an invalid pattern. This hyphen must be escaped in this case, \\- in Java.
But, if the hyphen is used either at the beginning or end of a character range then the hyphen is not treated as a metacharater - because it cannot be a range. So your other patterns work because the hyphen is effectively escaped.
b[] = a.split("[*/+-]"); // works fine
^ at the end
b[] = a.split("[-*]"); // works fine
^ at the start
The first expression has +-/, which is a valid range from + to / in the ASCII character set, equivalent to the literal characters +,-./.
The errored expression has /-, i.e. the range from / to SPACE. SPACE is character 32 and / is character 47 so your range is 47-32, the range is backwards.

Related

Masking credit card number using regex

I am trying to mask the CC number, in a way that third character and last three characters are unmasked.
For eg.. 7108898787654351 to **0**********351
I have tried (?<=.{3}).(?=.*...). It unmasked last three characters. But it unmasks first three also.
Can you throw some pointers on how to unmask 3rd character alone?
You can use this regex with a lookahead and lookbehind:
str = str.replaceAll("(?<!^..).(?=.{3})", "*");
//=> **0**********351
RegEx Demo
RegEx Details:
(?<!^..): Negative lookahead to assert that we don't have 2 characters after start behind us (to exclude 3rd character from matching)
.: Match a character
(?=.{3}): Positive lookahead to assert that we have at least 3 characters ahead
I would suggest that regex isn't the only way to do this.
char[] m = new char[16]; // Or whatever length.
Arrays.fill(m, '*');
m[2] = cc.charAt(2);
m[13] = cc.charAt(13);
m[14] = cc.charAt(14);
m[15] = cc.charAt(15);
String masked = new String(m);
It might be more verbose, but it's a heck of a lot more readable (and debuggable) than a regex.
Here is another regular expression:
(?!(?:\D*\d){14}$|(?:\D*\d){1,3}$)\d
See the online demo
It may seem a bit unwieldy but since a credit card should have 16 digits I opted to use negative lookaheads to look for an x amount of non-digits followed by a digit.
(?! - Negative lookahead
(?: - Open 1st non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){14} - Close 1st non capture group and match it 14 times.
$ - End string ancor.
| - Alternation/OR.
(?: - Open 2nd non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){1,3} - Close 2nd non capture group and match it 1 to 3 times.
$ - End string ancor.
) - Close negative lookahead.
\d - Match a single digit.
This would now mask any digit other than the third and last three regardless of their position (due to delimiters) in the formatted CC-number.
Apart from where the dashes are after the first 3 digits, leave the 3rd digit unmatched and make sure that where are always 3 digits at the end of the string:
(?<!^\d{2})\d(?=[\d-]*\d-?\d-?\d$)
Explanation
(?<! Negative lookbehind, assert what is on the left is not
^\d{2} Match 2 digits from the start of the string
) Close lookbehind
\d Match a digit
(?= Positive lookahead, assert what is on the right is
[\d-]* 0+ occurrences of either - or a digit
\d-?\d-?\d Match 3 digits with optional hyphens
$ End of string
) Close lookahead
Regex demo | Java demo
Example code
String regex = "(?<!^\\d{2})\\d(?=[\\d-]*\\d-?\\d-?\\d$)";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
String strings[] = { "7108898787654351", "7108-8987-8765-4351"};
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
System.out.println(matcher.replaceAll("*"));
}
Output
**0**********351
**0*-****-****-*351
Don't think you should use a regex to do what you want. You could use StringBuilder to create the required string
String str = "7108-8987-8765-4351";
StringBuilder sb = new StringBuilder("*".repeat(str.length()));
for (int i = 0; i < str.length(); i++) {
if (i == 2 || i >= str.length() - 3) {
sb.replace(i, i + 1, String.valueOf(str.charAt(i)));
}
}
System.out.print(sb.toString()); // output: **0*************351
You may add a ^.{0,1} alternative to allow matching . when it is the first or second char in the string:
String s = "7108898787654351"; // **0**********351
System.out.println(s.replaceAll("(?<=.{3}|^.{0,1}).(?=.*...)", "*"));
// => **0**********351
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
It is equal to
System.out.println(s.replaceAll("(?<!^..).(?=.*...)", "*"));
See the Java demo and a regex demo.
Regex details
(?<=.{3}|^.{0,1}) - there must be any three chars other than line break chars immediately to the left of the current location, or start of string, or a single char at the start of the string
(?<!^..) - a negative lookbehind that fails the match if there are any two chars other than line break chars immediately to the left of the current location
. - any char but a line break char
(?=.*...) - there must be any three chars other than line break chars immediately to the right of the current location.
If the CC number always has 16 digits, as it does in the example, and as do Visa and MasterCard CC's, matches of the following regular expression can be replaced with an asterisk.
\d(?!\d{0,2}$|\d{13}$)
Start your engine!

Scanning letters and floats using the java scanner

I have a string which looks like this:
"m 535.71429,742.3622 55.71428,157.14286 c 0,0 165.71429,-117.14286 -55.71428,-157.14286 z"
and i want the java scanner to ouput the following strings: "m", "535.71429", "742.3622", "55.71428", "157.14286", "c", ...
so everything seperated by a comma or a space, but I am having troubles getting it to work.
This is how my code looks like:
Scanner scanner = new Scanner(path_string);
scanner.useDelimiter(",||//s");
String s = scanner.next();
if (s.equals("m")){
s = scanner.next();
point[0] = Float.parseFloat(s);
s = scanner.next();
point[1] = Float.parseFloat(s);
....
but the strings that come out are: "m", " ", "5", "3", ...
I think trouble is with //s. You have to use this pattern:
scanner.useDelimiter("(,|\\s)");
Regex patterns:
abc… Letters
123… Digits
\d Any Digit
\D Any Non-digit character
. Any Character
\. Period
[abc] Only a, b, or c
[^abc] Not a, b, nor c
[a-z] Characters a to z
[0-9] Numbers 0 to 9
\w Any Alphanumeric character
\W Any Non-alphanumeric character
{m} m Repetitions
{m,n} m to n Repetitions
* Zero or more repetitions
+ One or more repetitions
? Optional character
\s Any Whitespace
\S Any Non-whitespace character
^…$ Starts and ends
(…) Capture Group
(a(bc)) Capture Sub-group
(.*) Capture all
(ab|cd) Matches ab or cd
We use dual \ because this is special symbol and | isn't
If you want the output to be strings, the Float.parseFloat(s); is of no use for your problem. Is your array a float-array?
Because if it is, your should not get any output but an NumberFormatException, because the string "m" cannot be parsed into a float.
Furthermore, to solve the problem of the single values, you could use a StringBuilder which constructs your numbers and ignores the letters and commas. A special use of the letters should be implemented.
Finally, if it is not absolutely neccessary, use double instead of float. It's just so much safer and might save your from some more problems within you program!

How can I strip all non digits in a string except the first character?

I have a string that I want to make sure that the format is always a + followed by digits.
The following would work:
String parsed = inputString.replaceAll("[^0-9]+", "");
if(inputString.charAt(0) == '+') {
result = "+" + parsed;
}
else {
result = parsed;
}
But is there a way to have a regex in the replaceAll that would keep the + (if exists) in the beginning of the string and replace all non digits in the first line?
The following statement with the given regex would do the job:
String result = inputString.replaceAll("(^\\+)|[^0-9]", "$1");
(^\\+) find either a plus sign at the beginning of string and put it to a group ($1),
| or
[^0-9] find a character which is not a number
$1 and replace it with nothing or the plus sign at the start of group ($1)
You can use this expression:
String r = s.replaceAll("((?<!^)[^0-9]|^[^0-9+])", "");
The idea is to replace any non-digit when it is not the initial character of the string (that's the (?<!^)[^0-9] part with a lookbehind) or any character that is not a digit or plus that is the initial character of the string (the ^[^0-9+] part).
Demo.
What about just
(?!^)\D+
Java string:
"(?!^)\\D+"
Demo at regex101.com
\D matches a character that is not a digit [^0-9]
(?!^) using a negative lookahead to check, if it is not the initial character
Yes you can use this kind of replacement:
String parsed = inputString.replaceAll("^[^0-9+]*(\\+)|[^0-9]+", "$1");
if present and before the first digit in the string, the + character is captured in group 1. For example: dfd+sdfd12+sdf12 returns +1212 (the second + is removed since its position is after the first digit).
try this
1- This will allow negative and positive number and will match app special char except - and + at first position.
(?!^[-+])[^0-9.]
2- If you only want to allow + at first position
(?!^[+])[^0-9.]

Why isn't my regex matching uppercase characters and underscores?

I have the following Java code:
public static void main(String[] args) {
String var = "ROOT_CONTEXT_MATCHER";
boolean matches = var.matches("/[A-Z][a-zA-Z0-9_]*/");
System.out.println("The value of 'matches' is: " + matches);
}
This prints: The value of 'matches' is: false
Why doesn't my var match the regex? If I am reading my regex correctly, it matches any String:
Beginning with an upper-case char, A-Z; then
Consisting of zero or more:
Lower-case chars a-z; or
Upper-case chars A-Z; or
Digits 0-9; or
An underscore
The String "ROOT_CONTEXT_MATCHER":
Starts with an A-Z char; and
Consists of 19 subsequent characters that are all uppper-case A-Z or are an underscore
What's going on here?!?
The issue is with the forward slash characters at the beginning and at the end of the regex. They don't have any special meaning here and are treated as literals. Simply remove them to get it fixed:
boolean matches = var.matches("[A-Z][a-zA-Z0-9_]*");
If you intended to use metacharacters for boundary matching, the correct characters are ^ for the beginning of the line, and $ for the end of the line:
boolean matches = var.matches("^[A-Z][a-zA-Z0-9_]*$");
although these are not needed here because String#matches would match the entire string.
You need to remove regex delimiers i.e. / from Java regex:
boolean matches = var.matches("[A-Z][a-zA-Z0-9_]*");
That can be further shortened to:
boolean matches = var.matches("[A-Z]\\w*");
Since \\w is equivalent of [a-zA-Z0-9_] (word character)

Java RegEx - for an Integer not containing a "."

I need to be able to return signed and unsigned integer constants with no
intervening symbols, possibly preceded by + or -. The only allowed digits are 3, 4, and 5.
I can't figure out a way to say that the expression must not contain a period before or after the integer.
This is what I have so far, but if I pass say "34.5 - 43" the string returned will be: "34 5 43".
All that needs to be returned is "43".
public String getInts(String toBeScanned){
String INT = "";
Pattern p = Pattern.compile("\\b[+-]?[3-5]+\\b");
Matcher m = p.matcher(toBeScanned);
if (m.matches() == true){
INT = toBeScanned;
}
else{
m = p.matcher(" " + toBeScanned);
while (m.find()){
INT = INT + m.group() + " ";
}
}
return INT;
}
Any thoughts or pushes in the right direction are appreciated. Is there a way to say it that the first and last character can be [\b and not .]
This is frustrating the heck out of me. Help!
You don't want a word boundary \b here. I think the best is to create your own assertion, try this
(?<![.\d])[+-]?[3-5]+(?![.\d])
See it here on Regexr
(?<![.\d]) is a negative lookbehind assertion, it says before the pattern is no dot and no digit allowed.
(?![.\d]) is a negative lookahead assertion, it says after the pattern is no dot and no digit allowed.
Improvement
to avoid that it matches stuff like "hf34" we can make it more strict
(?<![.\w])[+-]?[3-5]+(?![.\w])
See it on Regexr
The word boundary \b
\b matches on a change from a word character to a non word character. A word character is a letter or a digit or a _. That means you will also get problems with your \b before the [+-], because there is no \b between a space/start of the string and a [+-].
"\b[+-]?[3-5]+[.][3-5]+\b"
This pattern says that in order to match, there must be at least one number before, and one number after the decimal point.
Is there a way to say it that the first and last character can be [\b and not .]
[^\.\b]
matches \b but not '.'
Is that what you are looking for?
[^\.\b][+-]?[3-5]+[^\.\b]
Will match '43' but not '34.5'

Categories

Resources