Masking part of the string with a regex - java

The idea is to mask a string like it's done with a credit cards. It can be done with this one line of code. And it works. However I can't find any straightforward explanations of the regex used in this case.
public class Solution {
public static void main(String[] args) {
String t1 = "518798673672531762319871";
System.out.println(t1.replaceAll(".(?=.{4})", "*"));
}
}
Output is: ********************9871

Explanation of regex:
.(?=.{4})
.: Match any character
(?=: Start of a lookahead condition
.{4}: that asserts presence of 4 characters
): End of the lookahead condition
In simple words it matches any character in input that has 4 characters on right hand side of the current position.
Replacement is "*" which means for each matched character in inout, replace by a single * character, thus replacing all the characters in credit card number except the last 4 characters when lookahead condition fails the match (since we won't have 4 characters ahead of current position).
Read more on look arounds in regex

?=.{4} is a positive lookahead. it matches the pattern inside the brackets (the next 4 digits after the current character) without including it in the main result (the . outside the brackets) that is matching all the other characters for replacement by *
Conceive that your regex goes through the input char by char. On the first digit (5) it asks "is there a single char followed by 4 other chars? yes, ok.. replace [the 5] with *"
It repeats this until the 9 (4th from end, at which point the "is there another 4 characters after this?" question becomes "no" and the replacing stops

Related

positive lookbehind not behaving correctly

The code snippet for positive lookbehind is below
public class PositiveLookBehind {
public static void main(String[] args) {
String regex = "[a-z](?<=9)";
String input = "a9es m9x us9s w9es";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
System.out.println("===starting====");
while(matcher.find()) {
System.out.println("found:"+matcher.group()
+" start index:"+matcher.start()
+" end index is "+matcher.end());
}
System.out.println("===ending=====");
}
}
I was expecting that I should have 4 matches but to my surprise the output shows no match.
Can anyone point out my mistake?
As far as my understanding goes the regex here is alphabet preceded by digit 9 which is satisfied in 4 locations.
Problem
Notice that (?<=9) is placed after [a-z]. What it means?
Lets consider data like "a9c".
At start regex-engine places its "cursor" at start of the string which it iterates, here:
|a9c
^-regex cursor is here
Then regex-engine is trying to match each part of regex-pattern from left to right. So in case of [a-z](?<=9) it first will try to find match for [a-z] and after successfully finding that match for it, it will try to move to evaluation of (?<=9) part.
So match for [a-z] will happen here:
a9c
*<-- match for `[a-z]`
After that match regex will move cursor here:
a|9c
*^--- regex-engine cursor
^---- match for [a-z]
So now (?<=9) will be evaluated (notice position of cursor |). (?<=subregex) checks if immediately before cursor exist text which can be matched by subregex. But here since cursor is directly after a (?<=9) look-behind "sees"/includes that a as data which subexpression should test. But since a can't be matched by 9 evaluation fails.
Solution(s)
You probably wanted to check if 9 is placed before acceptable letter. To achieve that you can modify your regex in many ways:
with [a-z](?<=9.) you make look-behind test two previous characters
a9c|
^^
9. - `9` matches 9, `.` matches any character (one directly before cursor)
or simpler (?<=9)[a-z] to first look for 9 and then look for [a-z] which will let regex match 9c if cursor will be at 9|c.
Your cuurent pattern: [a-z](<=9) means: match lowercase letter and assure, that position right after the letter is preceeded by 9, which is contradiciton.
If you want to match letter preceeded by 9 use: (<=9)[a-z], which now means: assure what preceeds is 9, if so, match lowercase letter.

How to insert spaces after full stops at the end of sentences, but not in abbreviations or floating point numbers?

I have a JTextArea in which I want to replace all full stops without a space next to them e.g in "This is a sentence.This is another C.O.D sentence.This is yet another C.A.T. sentence." to "This is a sentence. This is another C.O.D sentence. This is yet another C.A.T. sentence.". But I don't want the abbreviations or floating point numbers to gain extra spaces e.g "This is a C.A.T. float 5.5" should not become "This is a C. A. T. float 5. 5"! I am using string.replaceAll(".",". ") for this which is not proving to be sufficient.
Keeping it simple, without negative look-behinds and such:
s = s.replaceAll("([^A-Z0-9.])\\.([^0-9 \t])", "$1. $2");
Replace the period when not:
after a capital itself (U.N.C. or M.Twain)
after a digit (1. - hoping the sentence does not end in a digit)
after a period (...)
before a digit (.5 - hoping the next sentence does not start with a digit)
before a space or tab
you can use the regex
([^A-Z])\.(?!\d)
which replaces all "." not followed by a number and not preceded by a uppercase letter
see the regex demo, online compiler
(You should edit your question to clearly state your requirement, e.g. handling of abbreviation)
You could replace (?<!\b[A-Z])\.(?!\d) with .<space>
Demonstration: https://regex101.com/r/g1g7Yg/1
Explanation:
(?<! ) negative look-behind group
\b[A-Z] word boundary following by one uppercase character
(i.e. one upper case character)
\. a dot
(?!\d) negative look-ahead group, of single digit
Which basically means, replace a dot if it is NOT preceded by single upper case character, and NOT followed by digit
There are still some flaws that it will not replace Hello world.1 apple 1 day. It shouldn't be difficult to change the regex to fix this if you understand the above regex.

Java Regex to match String password

I have recently encountered this question in the text book:
I am suppose to write a method to check if a string have:
at least ten characters
only letters and digits
at least three digits
I am trying to solve it by Regx, rather than iterating through every character; this is what I got so far:
String regx = "[a-z0-9]{10,}";
But this only matches the first two conditions. How should I go about the 3rd condition?
You could use a positive lookahead for 3rd condition, like this:
^(?=(?:.*\d){3,})[a-z0-9]{10,}$
^ indicates start of string.
(?= ... ) is the positive lookahead, which will search the whole string to match whatever is between (?= and ).
(?:.*\d){3,} matches at least 3 digits anywhere in the string.
.*\d matches a digit preceded by any (or none) character (if omitted then only consecutive digits would match).
{3,} matches three or more of .*\d.
(?: ... ) is a non-capturing group.
$ indicates end of string.

Java specific regex

I want to write some code that checks whether the input from user matches a specific format. The format is 's' followed by a space and a number, if the input was "s 1 2 3 4", it would call some function. "s 1 2" would also be acceptable. So far I found that this regex works for a specified amount of times:
if (inputLine.matches("s \\d+ \\d+")) { }
works for 2 numbers after the s, but I need to be able to accept any number of numbers after the s.
Any idea on a regex that would suit my needs? Thank you
Change your regex to
if (inputLine.matches("s(?: \\d+)+")) { }
to match s, space and 1+ sequences of a space followed with 1+ digits.
If you allow 0 numbers after s, replace the last + quantifier with * to match zero or more occurrences.
Since the repeated capturing groups overwrite the group contents, it makes no sense using a capturing group here, thus, I suggest using a non-capturing one, (?:...).

Reg Expression Validation on a String

Can I use Reg Expression for the following use case?
I Need to write a boolean method which takes a String parameter that should satisfy following conditions.
20 character length string.
First 9 characters will be a number
Next 2 characters will be alphabets
Next 2 characters will be a number.(1 to 31 or 99)
Next 1 character will be an alphabet
Last 6 characters will be a number.
In this, I have wrote the code for the first requirement:
[a-zA-Z0-9]{20} - This expression works well for the first case. I don't know how to write a complete reg expression to meet the entire requirement.
Please help.
Yes, it is possible to use regexes for this.
Ignore the "20 characters" part and describe a string created by concatenating 9 digits, 2 letters, 2 digits, 1 letter and another digit.
Start with the string start: ^
Then 9 digits. The \d conveniently describes the character set [0-9], so \d{9} means "nine digits"
Then 2 letters. The \w class is too broad, so stick to [a-zA-Z] for a letter.
Then another two digits. They seem to be from a restricted set, so describe the set with alternation and grouping.
Then another letter and another digit.
And, finally, you have to end at the end of the string: $
For reference, this regex means "the string is nine letters, then 12-15 or 99, then another letter":
^[a-zA-Z]{9}(1[2-5]|99)[a-zA-Z]$
Read the String JavaDocs, especially the part about String.matches() as well as the documentation about regular expressions in Java.
Your first requirement is already implicit in the remaining ones, so I would just skip it. Then, just write the regex code that matches each part one after the other:
[0-9]{9}[a-zA-Z]{2}...
There is one special consideration for the number that might be 1 to 31. While it is possible to match this in one regex, it would be verbose and difficult to understand. Instead, perform basic matching in the regex and extract this part as a capturing group by putting it into parentheses:
([0-9]{2})
If you use Pattern and Matcher to apply your regex, and your string matches the pattern, you can then easily get at just thost two characters, use Integer.parseInt() to convert them to an integer (which is completely safe because you know the two characters are digits), and then check the value normally.
This regular expression takes
^[0-9]{9}[a-zA-Z]{2}([1-9]|[1-2][0-9]|3[0-1]|99)[a-zA-Z]([0-9]{6})$
takes
9 letters at start,
Followed by 2 alphabets,
Followed by number between 1 to 31 or 99,
Followed by an alphabet,
followed by 6 digits.

Categories

Resources