Regular expression to find money value - java

I have response text:
Cuoc no truoc -2.134VND. Cuoc phat sinh tam tinh den 31/08/2018:
3`2.666VND (da tru KM,goi cuoc...). TKFastpay: 0VND.Tra 01.Trang sau 02.Thoat\",15
i want get result is value of money before "VND" -> -2.134 and 32.666 and 0.
I have regex
String regex = "(?<![^=])([\\d]*)(?!$[VND])";
but its not work.
Please help me!

You could use a positive lookahead (?= and a word boundary after VND \b.
-?\d+(?:\.\d+)?(?=VND\b)
Regex demo
That would match
-? Optional minus sign (To also allow a plus, you could use an optional character class [+-]?
\d+ Match one or more digits
(?:\.\d+)? An optional non capturing group matching a dot and one or more digits
(?=VND\b) Positive lookahead that asserts what is on the right is VND
In Java:
-?\\d+(?:\\.\\d+)?(?=VND\\b)
Demo Java

You may use this regex with a lookahead:
[+-]?\d+\.?\d*(?=VND)
RegEx Demo
RegEx Details:
[+-]?: Match optional + or -
\d+\.?\d*: Match a floating point number or integer number
(?=VND): Assert that we have VND at next position
Java Code:
final String regex = "[+-]?\\d+\\.?\\d*(?=VND)";
final String string = "Cuoc no truoc -2.134VND. Cuoc phat sinh tam tinh den 31/08/2018:\n"
+ "32.666VND (da tru KM,goi cuoc...). TKFastpay: 0VND.Tra 01.Trang sau 02.Thoat\\\",15";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}

Solution:
String regex = "(-?[0-9]+[\.]+[0-9]*)VND"
Description:
You should create group before each and every VND string
Inside the group first you should check whether minus sign available or not so whe have : -?
the we need to capture every digits and we are expecting to have one or more : so [0-9]+
there might be a dot sign (zero or one) in case of decimal . so we have [.]+
again you might have another series of digits after decimal ( zero or more) point so : [0-9]*

If I understand the requirement, then the regex, used in a repeated fashion, might be:
(-?)[\\d][\\d.]*(?=VND)
The idea being that you need at least one digit, followed by more digits or a decimal, then followed by VND.
A slightly improved approach would be to split the [.] to be between the digits, so:
((-?)[\d]+[.]?[\d]*)(?=VND)
Online Example

Try this simple regex
In Java
String regex = "(-?\\d*?\\.?\\d+)(?=VND)";
regex
(-?\d*?\.?\d+)(?=VND)
see regex sample

Related

Masking credit card number using regex

I am trying to mask the CC number, in a way that third character and last three characters are unmasked.
For eg.. 7108898787654351 to **0**********351
I have tried (?<=.{3}).(?=.*...). It unmasked last three characters. But it unmasks first three also.
Can you throw some pointers on how to unmask 3rd character alone?
You can use this regex with a lookahead and lookbehind:
str = str.replaceAll("(?<!^..).(?=.{3})", "*");
//=> **0**********351
RegEx Demo
RegEx Details:
(?<!^..): Negative lookahead to assert that we don't have 2 characters after start behind us (to exclude 3rd character from matching)
.: Match a character
(?=.{3}): Positive lookahead to assert that we have at least 3 characters ahead
I would suggest that regex isn't the only way to do this.
char[] m = new char[16]; // Or whatever length.
Arrays.fill(m, '*');
m[2] = cc.charAt(2);
m[13] = cc.charAt(13);
m[14] = cc.charAt(14);
m[15] = cc.charAt(15);
String masked = new String(m);
It might be more verbose, but it's a heck of a lot more readable (and debuggable) than a regex.
Here is another regular expression:
(?!(?:\D*\d){14}$|(?:\D*\d){1,3}$)\d
See the online demo
It may seem a bit unwieldy but since a credit card should have 16 digits I opted to use negative lookaheads to look for an x amount of non-digits followed by a digit.
(?! - Negative lookahead
(?: - Open 1st non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){14} - Close 1st non capture group and match it 14 times.
$ - End string ancor.
| - Alternation/OR.
(?: - Open 2nd non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){1,3} - Close 2nd non capture group and match it 1 to 3 times.
$ - End string ancor.
) - Close negative lookahead.
\d - Match a single digit.
This would now mask any digit other than the third and last three regardless of their position (due to delimiters) in the formatted CC-number.
Apart from where the dashes are after the first 3 digits, leave the 3rd digit unmatched and make sure that where are always 3 digits at the end of the string:
(?<!^\d{2})\d(?=[\d-]*\d-?\d-?\d$)
Explanation
(?<! Negative lookbehind, assert what is on the left is not
^\d{2} Match 2 digits from the start of the string
) Close lookbehind
\d Match a digit
(?= Positive lookahead, assert what is on the right is
[\d-]* 0+ occurrences of either - or a digit
\d-?\d-?\d Match 3 digits with optional hyphens
$ End of string
) Close lookahead
Regex demo | Java demo
Example code
String regex = "(?<!^\\d{2})\\d(?=[\\d-]*\\d-?\\d-?\\d$)";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
String strings[] = { "7108898787654351", "7108-8987-8765-4351"};
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
System.out.println(matcher.replaceAll("*"));
}
Output
**0**********351
**0*-****-****-*351
Don't think you should use a regex to do what you want. You could use StringBuilder to create the required string
String str = "7108-8987-8765-4351";
StringBuilder sb = new StringBuilder("*".repeat(str.length()));
for (int i = 0; i < str.length(); i++) {
if (i == 2 || i >= str.length() - 3) {
sb.replace(i, i + 1, String.valueOf(str.charAt(i)));
}
}
System.out.print(sb.toString()); // output: **0*************351
You may add a ^.{0,1} alternative to allow matching . when it is the first or second char in the string:
String s = "7108898787654351"; // **0**********351
System.out.println(s.replaceAll("(?<=.{3}|^.{0,1}).(?=.*...)", "*"));
// => **0**********351
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
It is equal to
System.out.println(s.replaceAll("(?<!^..).(?=.*...)", "*"));
See the Java demo and a regex demo.
Regex details
(?<=.{3}|^.{0,1}) - there must be any three chars other than line break chars immediately to the left of the current location, or start of string, or a single char at the start of the string
(?<!^..) - a negative lookbehind that fails the match if there are any two chars other than line break chars immediately to the left of the current location
. - any char but a line break char
(?=.*...) - there must be any three chars other than line break chars immediately to the right of the current location.
If the CC number always has 16 digits, as it does in the example, and as do Visa and MasterCard CC's, matches of the following regular expression can be replaced with an asterisk.
\d(?!\d{0,2}$|\d{13}$)
Start your engine!

extract specific word from comma separated string in java using regex

input - [1, 1111, 2020, BMW, Frontier, EXTENDED CAB PICKUP 2-DR, Silver, 16558]
I want to extract here BMW and I am using (^(?:[^\\,]*\\,){3}) this regex.
This results into - BMW, Frontier, EXTENDED CAB PICKUP 2-DR, Silver, 16558].
Could any one help me with this? thanks in advance
As you can only enter a pattern without making use of groups, you could make use of finite repetition for example {0,1000} in the positive lookbehind as Java does not support infinite repetition.
(?<=^\\[[^,]{0,1000},[^,]{0,1000},[^,]{0,1000},\\h{0,10})\\w{3,10}(?=[^\\]\\[]*\\])
Explanation
(?<= Positive lookbehind, assert what is on the left is
^\[ Start of string, match [
[^,]{0,1000},[^,]{0,1000},[^,]{0,1000}, Match 3 times any char except , followed by the ,
\h{0,10} Match 0-10 times a horizontal whitespace char
) Close lookbehind
\w{3,10} Match 3-10 word chars
(?= Positive lookahead, assert what is on the right is
[^\]\[]*\] Match until the ]
) Close lookahead
Java demo
Code example
final String regex = "(?<=^\\[[^,]{0,1000},[^,]{0,1000},[^,]{0,1000},\\h{0,10})\\w{3,10}(?=[^\\]\\[]*\\])";
final String string = "[1, 1111, 2020, BMW, Frontier, EXTENDED CAB PICKUP 2-DR, Silver, 16558]";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
BMW
If it is comma seperated string, you could just split function from the string class which would convert the comma seperated string to array. Refer - Link
The string split() method breaks a given string around matches of the
given regular expression.
Syntax - Public String [ ] split ( String regex, int limit )
Input String: 016-78967
Regular Expression: -
Output : {"016", "78967"}
Then you could into the array to find out the particular keyword from it.

Masking using regular expressions for below format

I am trying to write a regular expression to mask the below string. Example below.
Input
A1../D//FASDFAS--DFASD//.F
Output (Skip first five and last two Alphanumeric's)
A1../D//FA***********D//.F
I am trying using below regex
([A-Za-z0-9]{5})(.*)(.{2})
Any help would be highly appreciated.
You solve your issue by using Pattern and Matcher with a regex which match multiple groups :
String str = "A1../D//FASDFAS--DFASD//.F";
Pattern pattern = Pattern.compile("(.*?\\/\\/..)(.*?)(.\\/\\/.*)");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
str = matcher.group(1)
+ matcher.group(2).replaceAll(".", "*")
+ matcher.group(3);
}
Detail
(.*?\\/\\/..) first group to match every thing until //
(.*?) second group to match every thing between group one and three
(.\\/\\/.*) third group to match every thing after the last character before the // until the end of string
Outputs
A1../D//FA***********D//.F
I think this solution is more readable.
If you want to do that with a single regex you may use
text = text.replaceAll("(\\G(?!^|(?:[0-9A-Za-z][^0-9A-Za-z]*){2}$)|^(?:[^0-9A-Za-z]*[0-9A-Za-z]){5}).", "$1*");
Or, using the POSIX character class Alnum:
text = text.replaceAll("(\\G(?!^|(?:\\p{Alnum}\\P{Alnum}*){2}$)|^(?:\\P{Alnum}*\\p{Alnum}){5}).", "$1*");
See the Java demo and the regex demo. If you plan to replace any code point rather than a single code unit with an asterisk, replace . with \P{M}\p{M}*+ ("\\P{M}\\p{M}*+").
To make . match line break chars, add (?s) at the start of the pattern.
Details
(\G(?!^|(?:[0-9A-Za-z][^0-9A-Za-z]*){2}$)|^(?:[^0-9A-Za-z]*[0-9A-Za-z]){5}) -
\G(?!^|(?:[0-9A-Za-z][^0-9A-Za-z]*){2}$) - a location after the successful match that is not followed with 2 occurrences of an alphanumeric char followed with 0 or more chars other than alphanumeric chars
| - or
^(?:[^0-9A-Za-z]*[0-9A-Za-z]){5} - start of string, followed with five occurrences of 0 or more non-alphanumeric chars followed with an alphanumeric char
. - any code unit other than line break characters (if you use \P{M}\p{M}*+ - any code point).
Usually, masking of characters in the middle of a string can be done using negative lookbehind (?<!) and positive lookahead groups (?=).
But in this case lookbehind group can't be used because it does not have an obvious maximum length due to unpredictable number of non-alphanumeric characters between first five alphanumeric characters (. and / in the A1../D//FA).
A substring method can used as a workaround for inability to use negative lookbehind group:
String str = "A1../D//FASDFAS--DFASD//.F";
int start = str.replaceAll("^((?:\\W{0,}\\w{1}){5}).*", "$1").length();
String maskedStr = str.substring(0, start) +
str.substring(start).replaceAll(".(?=(?:\\W{0,}\\w{1}){2})", "*");
System.out.println(maskedStr);
// A1../D//FA***********D//.F
But the most straightforward way is to use java.util.regex.Pattern and java.util.regex.Matcher:
String str = "A1../D//FASDFAS--DFASD//.F";
Pattern pattern = Pattern.compile("^((?:\\W{0,}\\w{1}){5})(.+)((?:\\W{0,}\\w{1}){2})");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
String maskedStr = matcher.group(1) +
"*".repeat(matcher.group(2).length()) +
matcher.group(3);
System.out.println(maskedStr);
// A1../D//FA***********D//.F
}
\W{0,} - 0 or more non-alphanumeric characters
\w{1} - exactly 1 alphanumeric character
(\W{0,}\w{1}){5} - 5 alphanumeric characters and any number of alphanumeric characters in between
(?:\W{0,}\w{1}){5} - do not capture as a group
^((?:\\W{0,}\\w{1}){5})(.+)((?:\\W{0,}\\w{1}){2})$ - substring with first five alphanumeric characters (group 1), everything else (group 2), substring with last 2 alphanumeric characters (group 3)

Regex to match a digit not followed by a dot(".")

I have a string
string 1(excluding the quotes) -> "my car number is #8746253 which is actually cool"
conditions - The number 8746253, could be of any length and
- the number can also be immediately followed by an end-of-line.
I want to group-out 8746253 which should not be followed by a dot "."
I have tried,
.*#(\d+)[^.].*
This will get me the number for sure, but this will match even if there is a dot, because [.^] will match the last digit of the number(for example, 3 in the below case)
string 2(excluding the quotes) -> "earth is #8746253.Kms away, which is very far"
I want to match only the string 1 type and not the string 2 types.
To match any number of digits after # that are not followed with a dot, use
(?<=#)\d++(?!\.)
The ++ is a possessive quantifier that will make the regex engine only check the lookahead (?!\.) only after the last matched digit, and won't backtrack if there is a dot after that. So, the whole match will get failed if there is a dit after the last digit in a digit chunk.
See the regex demo
To match the whole line and put the digits into capture group #1:
.*#(\d++)(?!\.).*
See this regex demo. Or a version without a lookahead:
^.*#(\d++)(?:[^.\r\n].*)?$
See another demo. In this last version, the digit chunk can only be followed with an optional sequence of a char that is not a ., CR and LF followed with any 0+ chars other than line break chars ((?:[^.\r\n].*)?) and then the end of string ($).
This works like you have described
public class MyRegex{
public static void main(String[] args) {
Pattern patern = Pattern.compile("#(\\d++)[^\\.]");
Matcher matcher1 = patern.matcher("my car number is #8746253 which is actually cool");
if(matcher1.find()){
System.out.println(matcher1.group(1));
}
Matcher matcher2 = patern.matcher("earth is #8746253.Kms away, which is very far");
if(matcher2.find()){
System.out.println(matcher1.group(1));
}else{
System.out.println("No match found");
}
}
}
Outputs:
> 8746253
> No match found

split on integer values but not floating point values

I have a java program where I need to split on integer values but not floating point values
ie. "1/\\2" should produce: [1,/\\,2]
but "1.0/\\2.0" should produce: [1.0,/\\,2.0]
does anybody have any ideas?
or could anybody point me in the direction of how to split on the specific strings "\\/" and "/\\" ?
UPDATE: sorry! one more case! for the string "100 /\ 3.4e+45" I need to split it into:
[100,/\,3.4,e,+,45]
my current regex is (kind of really ugly):
line.split("\\s+|(?<=[-+])|(?=[-+])|(?:(?<=[0-9])(?![0-9.]|$))|(?:(?<![0-9.]|^)(?=[0-9]))|(?<=[-+()])|(?=[-+()])|(?<=e)|(?=e)");
and for the string: "100 /\ 3.4e+45" is giving me:
[100,/\,3.4,+,45]
This regex should do it:
(?:(?<=[0-9])(?![0-9.]|$))|(?:(?<![0-9.]|^)(?=[0-9]))
It's two checks, basically matching:
A digit not followed by a digit, a decimal point, or the end of text.
A digit not preceded by a digit, a decimal point, or the start of text.
It will match the empty space after/before the digit, so you can use this regex in split().
See regex101 for demo.
Follow-up
could anybody point me in the direction of how to split on the specific strings "\/" and "/\""
If you want to split before a specific pattern, use a positive lookahead: (?=xxx). If you want to split after a specific pattern, use a positive lookbehind: (?<=xxx). To do either, separate by |:
(?<=xxx)|(?=xxx)
where xxx is the text \/ or /\, i.e. the regex \\/|/\\, and doubling for Java string literal:
"(?<=\\\\/|/\\\\)|(?=\\\\/|/\\\\)"
See regex101 for demo.
You could try something like this:
String regex = "\\d+(.\\d+)?", str = "1//2";
Matcher m = Pattern.compile(regex).matcher(str);
ArrayList<String> list = new ArrayList<String>();
int index = 0;
for(index = 0 ; m.find() ; index = m.end()) {
if(index != m.start()) list.add(str.substring(index, m.start()));
list.add(str.substring(m.start(), m.end()));
}
list.add(str.substring(index));
The idea is to find number using regex and Matcher, and also add the strings in between.

Categories

Resources