Given a String containing numbers (possibly with decimals), parentheses and any amount of whitespace, I need to iterate through the String and handle each number and parenthesis.
The below works for the String "1 ( 2 3 ) 4", but does not work if I remove whitespaces between the parentheses and the numbers "1 (2 3) 4)".
Scanner scanner = new Scanner(expression);
while (scanner.hasNext()) {
String token = scanner.next();
// handle token ...
System.out.println(token);
}
Scanner uses whitespace as it's default delimiter. You can change this to use a different Regex pattern, for example:
(?:\\s+)|(?<=[()])|(?=[()])
This pattern will set the delimiter to the left bracket or right bracket or one or more whitespace characters. However, it will also keep the left and right brackets (as I think you want to include those in your parsing?) but not the whitespace.
Here is an example of using this:
String test = "123(3 4)56(7)";
Scanner scanner = new Scanner(test);
scanner.useDelimiter("(?:\\s+)|(?<=[()])|(?=[()])");
while(scanner.hasNext()) {
System.out.println(scanner.next());
}
Output:
123
(
3
4
)
56
(
7
)
Detailed Regex Explanation:
(?:\\s+)|(?<=[()])|(?=[()])
1st Alternative: (?:\\s+)
(?:\\s+) Non-capturing group
\\s+ match any white space character [\r\n\t\f ]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
2nd Alternative: (?<=[()])
(?<=[()]) Positive Lookbehind - Assert that the regex below can be matched
[()] match a single character present in the list below
() a single character in the list () literally
3rd Alternative: (?=[()])
(?=[()]) Positive Lookahead - Assert that the regex below can be matched
[()] match a single character present in the list below
() a single character in the list () literally
Scanner's .next() method uses whitespace as its delimiter. Luckily, we can change the delimiter!
For example, if you need the scanner to process to handle whitespace and parentheses, you could run this code immediately after constructing your Scanner:
scanner.useDelimiter(" ()");
Related
I am trying to mask the CC number, in a way that third character and last three characters are unmasked.
For eg.. 7108898787654351 to **0**********351
I have tried (?<=.{3}).(?=.*...). It unmasked last three characters. But it unmasks first three also.
Can you throw some pointers on how to unmask 3rd character alone?
You can use this regex with a lookahead and lookbehind:
str = str.replaceAll("(?<!^..).(?=.{3})", "*");
//=> **0**********351
RegEx Demo
RegEx Details:
(?<!^..): Negative lookahead to assert that we don't have 2 characters after start behind us (to exclude 3rd character from matching)
.: Match a character
(?=.{3}): Positive lookahead to assert that we have at least 3 characters ahead
I would suggest that regex isn't the only way to do this.
char[] m = new char[16]; // Or whatever length.
Arrays.fill(m, '*');
m[2] = cc.charAt(2);
m[13] = cc.charAt(13);
m[14] = cc.charAt(14);
m[15] = cc.charAt(15);
String masked = new String(m);
It might be more verbose, but it's a heck of a lot more readable (and debuggable) than a regex.
Here is another regular expression:
(?!(?:\D*\d){14}$|(?:\D*\d){1,3}$)\d
See the online demo
It may seem a bit unwieldy but since a credit card should have 16 digits I opted to use negative lookaheads to look for an x amount of non-digits followed by a digit.
(?! - Negative lookahead
(?: - Open 1st non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){14} - Close 1st non capture group and match it 14 times.
$ - End string ancor.
| - Alternation/OR.
(?: - Open 2nd non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){1,3} - Close 2nd non capture group and match it 1 to 3 times.
$ - End string ancor.
) - Close negative lookahead.
\d - Match a single digit.
This would now mask any digit other than the third and last three regardless of their position (due to delimiters) in the formatted CC-number.
Apart from where the dashes are after the first 3 digits, leave the 3rd digit unmatched and make sure that where are always 3 digits at the end of the string:
(?<!^\d{2})\d(?=[\d-]*\d-?\d-?\d$)
Explanation
(?<! Negative lookbehind, assert what is on the left is not
^\d{2} Match 2 digits from the start of the string
) Close lookbehind
\d Match a digit
(?= Positive lookahead, assert what is on the right is
[\d-]* 0+ occurrences of either - or a digit
\d-?\d-?\d Match 3 digits with optional hyphens
$ End of string
) Close lookahead
Regex demo | Java demo
Example code
String regex = "(?<!^\\d{2})\\d(?=[\\d-]*\\d-?\\d-?\\d$)";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
String strings[] = { "7108898787654351", "7108-8987-8765-4351"};
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
System.out.println(matcher.replaceAll("*"));
}
Output
**0**********351
**0*-****-****-*351
Don't think you should use a regex to do what you want. You could use StringBuilder to create the required string
String str = "7108-8987-8765-4351";
StringBuilder sb = new StringBuilder("*".repeat(str.length()));
for (int i = 0; i < str.length(); i++) {
if (i == 2 || i >= str.length() - 3) {
sb.replace(i, i + 1, String.valueOf(str.charAt(i)));
}
}
System.out.print(sb.toString()); // output: **0*************351
You may add a ^.{0,1} alternative to allow matching . when it is the first or second char in the string:
String s = "7108898787654351"; // **0**********351
System.out.println(s.replaceAll("(?<=.{3}|^.{0,1}).(?=.*...)", "*"));
// => **0**********351
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
It is equal to
System.out.println(s.replaceAll("(?<!^..).(?=.*...)", "*"));
See the Java demo and a regex demo.
Regex details
(?<=.{3}|^.{0,1}) - there must be any three chars other than line break chars immediately to the left of the current location, or start of string, or a single char at the start of the string
(?<!^..) - a negative lookbehind that fails the match if there are any two chars other than line break chars immediately to the left of the current location
. - any char but a line break char
(?=.*...) - there must be any three chars other than line break chars immediately to the right of the current location.
If the CC number always has 16 digits, as it does in the example, and as do Visa and MasterCard CC's, matches of the following regular expression can be replaced with an asterisk.
\d(?!\d{0,2}$|\d{13}$)
Start your engine!
I have a string
string 1(excluding the quotes) -> "my car number is #8746253 which is actually cool"
conditions - The number 8746253, could be of any length and
- the number can also be immediately followed by an end-of-line.
I want to group-out 8746253 which should not be followed by a dot "."
I have tried,
.*#(\d+)[^.].*
This will get me the number for sure, but this will match even if there is a dot, because [.^] will match the last digit of the number(for example, 3 in the below case)
string 2(excluding the quotes) -> "earth is #8746253.Kms away, which is very far"
I want to match only the string 1 type and not the string 2 types.
To match any number of digits after # that are not followed with a dot, use
(?<=#)\d++(?!\.)
The ++ is a possessive quantifier that will make the regex engine only check the lookahead (?!\.) only after the last matched digit, and won't backtrack if there is a dot after that. So, the whole match will get failed if there is a dit after the last digit in a digit chunk.
See the regex demo
To match the whole line and put the digits into capture group #1:
.*#(\d++)(?!\.).*
See this regex demo. Or a version without a lookahead:
^.*#(\d++)(?:[^.\r\n].*)?$
See another demo. In this last version, the digit chunk can only be followed with an optional sequence of a char that is not a ., CR and LF followed with any 0+ chars other than line break chars ((?:[^.\r\n].*)?) and then the end of string ($).
This works like you have described
public class MyRegex{
public static void main(String[] args) {
Pattern patern = Pattern.compile("#(\\d++)[^\\.]");
Matcher matcher1 = patern.matcher("my car number is #8746253 which is actually cool");
if(matcher1.find()){
System.out.println(matcher1.group(1));
}
Matcher matcher2 = patern.matcher("earth is #8746253.Kms away, which is very far");
if(matcher2.find()){
System.out.println(matcher1.group(1));
}else{
System.out.println("No match found");
}
}
}
Outputs:
> 8746253
> No match found
So if I have 22332, I want to replace that for BEA, as in mobile keypad.I want to see how many times a digit appear so that I can count A--2,B--22,C--222,D--3,E--33,F--333, etc(and a 0 is pause).I want to write a decoder that takes in digit string and replaces digit occurrences with letters.example : 44335557075557777 will be decoded as HELP PLS.
This is the key portion of the code:
public void printMessages() throws Exception {
File msgFile = new File("messages.txt");
Scanner input = new Scanner(msgFile);
while(input.hasNext()) {
String x = input.next();
String y = input.nextLine();
System.out.println(x+":"+y);
}
It takes the input from a file as digit String.Then Scanner prints the digit.I tried to split the string digits and then I don't know how to evaluate the appearance of the mentioned kind in the question.
for(String x : b.split(""))
System.out.print(x);
gives: 44335557075557777(input from the file).
I don't know how can I call each repetitive index and see how they formulate such pattern as in mobile keypad.If I use for loop then I have to cycle through whole string and use lots of if statements. There must be some other way.
Another suggestion of making use of regex in breaking the encoded string.
By making use of look-around + back-reference makes it easy to split the string at positions that preceding and following characters are different.
e.g.
String line = "44335557075557777";
String[] tokens = line.split("(?<=(.))(?!\\1)");
// tokens will contain ["44", "33", "555", "7", "0", "7", "555", "7777"]
Then it should be trivial for you to map each string to its corresponding character, either by a Map or even naively by bunch of if-elses
Edit: Some background on the regex
(?<=(.))(?!\1)
(?<= ) : Look behind group, which means finding
something (a zero-length patternin this example)
preceded by this group of pattern
( ) : capture group #1
. : any char
: zero-length pattern between look behind and look
ahead group
(?! ) : Negative look ahead group, which means finding
a pattern (zero-length in this example) NOT followed
by this group of pattern
\1 : back-reference, whatever matched by
capture group #1
So it means, find any zero-length positions, for which the character before and after such position is different, and use such positions to do splitting.
sc = new Scanner(new File(dataFile));
sc.useDelimiter(",|\r\n");
I don't understand how delimiter works, can someone explain this in layman terms?
The scanner can also use delimiters other than whitespace.
Easy example from Scanner API:
String input = "1 fish 2 fish red fish blue fish";
// \\s* means 0 or more repetitions of any whitespace character
// fish is the pattern to find
Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
System.out.println(s.nextInt()); // prints: 1
System.out.println(s.nextInt()); // prints: 2
System.out.println(s.next()); // prints: red
System.out.println(s.next()); // prints: blue
// don't forget to close the scanner!!
s.close();
The point is to understand the regular expressions (regex) inside the Scanner::useDelimiter. Find an useDelimiter tutorial here.
To start with regular expressions here you can find a nice tutorial.
Notes
abc… Letters
123… Digits
\d Any Digit
\D Any Non-digit character
. Any Character
\. Period
[abc] Only a, b, or c
[^abc] Not a, b, nor c
[a-z] Characters a to z
[0-9] Numbers 0 to 9
\w Any Alphanumeric character
\W Any Non-alphanumeric character
{m} m Repetitions
{m,n} m to n Repetitions
* Zero or more repetitions
+ One or more repetitions
? Optional character
\s Any Whitespace
\S Any Non-whitespace character
^…$ Starts and ends
(…) Capture Group
(a(bc)) Capture Sub-group
(.*) Capture all
(ab|cd) Matches ab or cd
With Scanner the default delimiters are the whitespace characters.
But Scanner can define where a token starts and ends based on a set of delimiter, wich could be specified in two ways:
Using the Scanner method: useDelimiter(String pattern)
Using the Scanner method : useDelimiter(Pattern pattern) where Pattern is a regular expression that specifies the delimiter set.
So useDelimiter() methods are used to tokenize the Scanner input, and behave like StringTokenizer class, take a look at these tutorials for further information:
Setting Delimiters for Scanner
Java.util.Scanner.useDelimiter() Method
And here is an Example:
public static void main(String[] args) {
// Initialize Scanner object
Scanner scan = new Scanner("Anna Mills/Female/18");
// initialize the string delimiter
scan.useDelimiter("/");
// Printing the tokenized Strings
while(scan.hasNext()){
System.out.println(scan.next());
}
// closing the scanner stream
scan.close();
}
Prints this output:
Anna Mills
Female
18
For example:
String myInput = null;
Scanner myscan = new Scanner(System.in).useDelimiter("\\n");
System.out.println("Enter your input: ");
myInput = myscan.next();
System.out.println(myInput);
This will let you use Enter as a delimiter.
Thus, if you input:
Hello world (ENTER)
it will print 'Hello World'.
I'm asking the user for input through the Scanner in Java, and now I want to parse out their selections using a regular expression. In essence, I show them an enumerated list of items, and they type in the numbers for the items they want to select, separated by a space. Here is an example:
1 yorkshire terrier
2 staffordshire terrier
3 goldfish
4 basset hound
5 hippopotamus
Type the numbers that correspond to the words you wish to exclude: 3 5
The enumerated list of items can be a just a few elements or several hundred. The current regex I'm using looks like this ^|\\.\\s+)\\d+\\s+, but I know it's wrong. I don't fully understand regular expressions yet, so if you can explain what it is doing that would be helpful too!
Pattern pattern = new Pattern(^([0-9]*\s+)*[0-9]*$)
Explanation of the RegEx:
^ : beginning of input
[0-9] : only digits
'*' : any number of digits
\s : a space
'+' : at least one space
'()*' : any number of this digit space combination
$: end of input
This treats all of the following inputs as valid:
"1"
"123 22"
"123 23"
"123456 33 333 3333 "
"12321 44 452 23 "
etc.
You want integers:
\d+
followed by any number of space, then another integer:
\d+( \d+)*
Note that if you want to use a regex in a Java string you need to escape every \ as \\.
To "parse out" the integers, you don't necessarily want to match the input, but rather you want to split it on spaces (which uses regex):
String[] nums = input.trim().split("\\s+");
If you actually want int values:
List<Integer> selections = new ArrayList<>();
for (String num : input.trim().split("\\s+"))
selections.add(Integer.parseInt(num));
If you want to ensure that your string contains only numbers and spaces (with a variable number of spaces and trailing/leading spaces allowed) and extract number at the same time, you can use the \G anchor to find consecutive matches.
String source = "1 3 5 8";
List<String> result = new ArrayList<String>();
Pattern p = Pattern.compile("\\G *(\\d++) *(?=[\\d ]*$)");
Matcher m = p.matcher(source);
while (m.find()) {
result.add(m.group(1));
}
for (int i=0;i<result.size();i++) {
System.out.println(result.get(i));
}
Note: at the begining of a global search, \G matches the start of the string.