split on integer values but not floating point values - java

I have a java program where I need to split on integer values but not floating point values
ie. "1/\\2" should produce: [1,/\\,2]
but "1.0/\\2.0" should produce: [1.0,/\\,2.0]
does anybody have any ideas?
or could anybody point me in the direction of how to split on the specific strings "\\/" and "/\\" ?
UPDATE: sorry! one more case! for the string "100 /\ 3.4e+45" I need to split it into:
[100,/\,3.4,e,+,45]
my current regex is (kind of really ugly):
line.split("\\s+|(?<=[-+])|(?=[-+])|(?:(?<=[0-9])(?![0-9.]|$))|(?:(?<![0-9.]|^)(?=[0-9]))|(?<=[-+()])|(?=[-+()])|(?<=e)|(?=e)");
and for the string: "100 /\ 3.4e+45" is giving me:
[100,/\,3.4,+,45]

This regex should do it:
(?:(?<=[0-9])(?![0-9.]|$))|(?:(?<![0-9.]|^)(?=[0-9]))
It's two checks, basically matching:
A digit not followed by a digit, a decimal point, or the end of text.
A digit not preceded by a digit, a decimal point, or the start of text.
It will match the empty space after/before the digit, so you can use this regex in split().
See regex101 for demo.
Follow-up
could anybody point me in the direction of how to split on the specific strings "\/" and "/\""
If you want to split before a specific pattern, use a positive lookahead: (?=xxx). If you want to split after a specific pattern, use a positive lookbehind: (?<=xxx). To do either, separate by |:
(?<=xxx)|(?=xxx)
where xxx is the text \/ or /\, i.e. the regex \\/|/\\, and doubling for Java string literal:
"(?<=\\\\/|/\\\\)|(?=\\\\/|/\\\\)"
See regex101 for demo.

You could try something like this:
String regex = "\\d+(.\\d+)?", str = "1//2";
Matcher m = Pattern.compile(regex).matcher(str);
ArrayList<String> list = new ArrayList<String>();
int index = 0;
for(index = 0 ; m.find() ; index = m.end()) {
if(index != m.start()) list.add(str.substring(index, m.start()));
list.add(str.substring(m.start(), m.end()));
}
list.add(str.substring(index));
The idea is to find number using regex and Matcher, and also add the strings in between.

Related

Masking credit card number using regex

I am trying to mask the CC number, in a way that third character and last three characters are unmasked.
For eg.. 7108898787654351 to **0**********351
I have tried (?<=.{3}).(?=.*...). It unmasked last three characters. But it unmasks first three also.
Can you throw some pointers on how to unmask 3rd character alone?
You can use this regex with a lookahead and lookbehind:
str = str.replaceAll("(?<!^..).(?=.{3})", "*");
//=> **0**********351
RegEx Demo
RegEx Details:
(?<!^..): Negative lookahead to assert that we don't have 2 characters after start behind us (to exclude 3rd character from matching)
.: Match a character
(?=.{3}): Positive lookahead to assert that we have at least 3 characters ahead
I would suggest that regex isn't the only way to do this.
char[] m = new char[16]; // Or whatever length.
Arrays.fill(m, '*');
m[2] = cc.charAt(2);
m[13] = cc.charAt(13);
m[14] = cc.charAt(14);
m[15] = cc.charAt(15);
String masked = new String(m);
It might be more verbose, but it's a heck of a lot more readable (and debuggable) than a regex.
Here is another regular expression:
(?!(?:\D*\d){14}$|(?:\D*\d){1,3}$)\d
See the online demo
It may seem a bit unwieldy but since a credit card should have 16 digits I opted to use negative lookaheads to look for an x amount of non-digits followed by a digit.
(?! - Negative lookahead
(?: - Open 1st non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){14} - Close 1st non capture group and match it 14 times.
$ - End string ancor.
| - Alternation/OR.
(?: - Open 2nd non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){1,3} - Close 2nd non capture group and match it 1 to 3 times.
$ - End string ancor.
) - Close negative lookahead.
\d - Match a single digit.
This would now mask any digit other than the third and last three regardless of their position (due to delimiters) in the formatted CC-number.
Apart from where the dashes are after the first 3 digits, leave the 3rd digit unmatched and make sure that where are always 3 digits at the end of the string:
(?<!^\d{2})\d(?=[\d-]*\d-?\d-?\d$)
Explanation
(?<! Negative lookbehind, assert what is on the left is not
^\d{2} Match 2 digits from the start of the string
) Close lookbehind
\d Match a digit
(?= Positive lookahead, assert what is on the right is
[\d-]* 0+ occurrences of either - or a digit
\d-?\d-?\d Match 3 digits with optional hyphens
$ End of string
) Close lookahead
Regex demo | Java demo
Example code
String regex = "(?<!^\\d{2})\\d(?=[\\d-]*\\d-?\\d-?\\d$)";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
String strings[] = { "7108898787654351", "7108-8987-8765-4351"};
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
System.out.println(matcher.replaceAll("*"));
}
Output
**0**********351
**0*-****-****-*351
Don't think you should use a regex to do what you want. You could use StringBuilder to create the required string
String str = "7108-8987-8765-4351";
StringBuilder sb = new StringBuilder("*".repeat(str.length()));
for (int i = 0; i < str.length(); i++) {
if (i == 2 || i >= str.length() - 3) {
sb.replace(i, i + 1, String.valueOf(str.charAt(i)));
}
}
System.out.print(sb.toString()); // output: **0*************351
You may add a ^.{0,1} alternative to allow matching . when it is the first or second char in the string:
String s = "7108898787654351"; // **0**********351
System.out.println(s.replaceAll("(?<=.{3}|^.{0,1}).(?=.*...)", "*"));
// => **0**********351
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
It is equal to
System.out.println(s.replaceAll("(?<!^..).(?=.*...)", "*"));
See the Java demo and a regex demo.
Regex details
(?<=.{3}|^.{0,1}) - there must be any three chars other than line break chars immediately to the left of the current location, or start of string, or a single char at the start of the string
(?<!^..) - a negative lookbehind that fails the match if there are any two chars other than line break chars immediately to the left of the current location
. - any char but a line break char
(?=.*...) - there must be any three chars other than line break chars immediately to the right of the current location.
If the CC number always has 16 digits, as it does in the example, and as do Visa and MasterCard CC's, matches of the following regular expression can be replaced with an asterisk.
\d(?!\d{0,2}$|\d{13}$)
Start your engine!

Regular expression to find money value

I have response text:
Cuoc no truoc -2.134VND. Cuoc phat sinh tam tinh den 31/08/2018:
3`2.666VND (da tru KM,goi cuoc...). TKFastpay: 0VND.Tra 01.Trang sau 02.Thoat\",15
i want get result is value of money before "VND" -> -2.134 and 32.666 and 0.
I have regex
String regex = "(?<![^=])([\\d]*)(?!$[VND])";
but its not work.
Please help me!
You could use a positive lookahead (?= and a word boundary after VND \b.
-?\d+(?:\.\d+)?(?=VND\b)
Regex demo
That would match
-? Optional minus sign (To also allow a plus, you could use an optional character class [+-]?
\d+ Match one or more digits
(?:\.\d+)? An optional non capturing group matching a dot and one or more digits
(?=VND\b) Positive lookahead that asserts what is on the right is VND
In Java:
-?\\d+(?:\\.\\d+)?(?=VND\\b)
Demo Java
You may use this regex with a lookahead:
[+-]?\d+\.?\d*(?=VND)
RegEx Demo
RegEx Details:
[+-]?: Match optional + or -
\d+\.?\d*: Match a floating point number or integer number
(?=VND): Assert that we have VND at next position
Java Code:
final String regex = "[+-]?\\d+\\.?\\d*(?=VND)";
final String string = "Cuoc no truoc -2.134VND. Cuoc phat sinh tam tinh den 31/08/2018:\n"
+ "32.666VND (da tru KM,goi cuoc...). TKFastpay: 0VND.Tra 01.Trang sau 02.Thoat\\\",15";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Solution:
String regex = "(-?[0-9]+[\.]+[0-9]*)VND"
Description:
You should create group before each and every VND string
Inside the group first you should check whether minus sign available or not so whe have : -?
the we need to capture every digits and we are expecting to have one or more : so [0-9]+
there might be a dot sign (zero or one) in case of decimal . so we have [.]+
again you might have another series of digits after decimal ( zero or more) point so : [0-9]*
If I understand the requirement, then the regex, used in a repeated fashion, might be:
(-?)[\\d][\\d.]*(?=VND)
The idea being that you need at least one digit, followed by more digits or a decimal, then followed by VND.
A slightly improved approach would be to split the [.] to be between the digits, so:
((-?)[\d]+[.]?[\d]*)(?=VND)
Online Example
Try this simple regex
In Java
String regex = "(-?\\d*?\\.?\\d+)(?=VND)";
regex
(-?\d*?\.?\d+)(?=VND)
see regex sample

What is the Regex for decimal numbers in Java?

I am not quite sure of what is the correct regex for the period in Java. Here are some of my attempts. Sadly, they all meant any character.
String regex = "[0-9]*[.]?[0-9]*";
String regex = "[0-9]*['.']?[0-9]*";
String regex = "[0-9]*["."]?[0-9]*";
String regex = "[0-9]*[\.]?[0-9]*";
String regex = "[0-9]*[\\.]?[0-9]*";
String regex = "[0-9]*.?[0-9]*";
String regex = "[0-9]*\.?[0-9]*";
String regex = "[0-9]*\\.?[0-9]*";
But what I want is the actual "." character itself. Anyone have an idea?
What I'm trying to do actually is to write out the regex for a non-negative real number (decimals allowed). So the possibilities are: 12.2, 3.7, 2., 0.3, .89, 19
String regex = "[0-9]*['.']?[0-9]*";
Pattern pattern = Pattern.compile(regex);
String x = "5p4";
Matcher matcher = pattern.matcher(x);
System.out.println(matcher.find());
The last line is supposed to print false but prints true anyway. I think my regex is wrong though.
Update
To match non negative decimal number you need this regex:
^\d*\.\d+|\d+\.\d*$
or in java syntax : "^\\d*\\.\\d+|\\d+\\.\\d*$"
String regex = "^\\d*\\.\\d+|\\d+\\.\\d*$"
String string = "123.43253";
if(string.matches(regex))
System.out.println("true");
else
System.out.println("false");
Explanation for your original regex attempts:
[0-9]*\.?[0-9]*
with java escape it becomes :
"[0-9]*\\.?[0-9]*";
if you need to make the dot as mandatory you remove the ? mark:
[0-9]*\.[0-9]*
but this will accept just a dot without any number as well... So, if you want the validation to consider number as mandatory you use + ( which means one or more) instead of *(which means zero or more). That case it becomes:
[0-9]+\.[0-9]+
If you on Kotlin, use ktx:
fun String.findDecimalDigits() =
Pattern.compile("^[0-9]*\\.?[0-9]*").matcher(this).run { if (find()) group() else "" }!!
Your initial understanding was probably right, but you were being thrown because when using matcher.find(), your regex will find the first valid match within the string, and all of your examples would match a zero-length string.
I would suggest "^([0-9]+\\.?[0-9]*|\\.[0-9]+)$"
There are actually 2 ways to match a literal .. One is using backslash-escaping like you do there \\., and the other way is to enclose it inside a character class or the square brackets like [.]. Most of the special characters become literal characters inside the square brackets including .. So use \\. shows your intention clearer than [.] if all you want is to match a literal dot .. Use [] if you need to match multiple things which represents match this or that for example this regex [\\d.] means match a single digit or a literal dot
I have tested all the cases.
public static boolean isDecimal(String input) {
return Pattern.matches("^[-+]?\\d*[.]?\\d+|^[-+]?\\d+[.]?\\d*", input);
}

Extracting numbers into a string array

I have a string which is of the form
String str = "124333 is the otp of candidate number 9912111242.
Please refer txn id 12323335465645 while referring blah blah.";
I need 124333, 9912111242 and 12323335465645 in a string array. I have tried this with
while (Character.isDigit(sms.charAt(i)))
I feel that running the above said method on every character is inefficient. Is there a way I can get a string array of all the numbers?
Use a regex (see Pattern and matcher):
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(<your string here>);
while (m.find()) {
//m.group() contains the digits you want
}
you can easily build ArrayList that contains each matched group you find.
Or, as other suggested, you can split on non-digits characters (\D):
"blabla 123 blabla 345".split("\\D+")
Note that \ has to be escaped in Java, hence the need of \\.
You can use String.split():
String[] nbs = str.split("[^0-9]+");
This will split the String on any group of non-numbers digits.
And this works perfectly for your input.
String str = "124333 is the otp of candidate number 9912111242. Please refer txn id 12323335465645 while referring blah blah.";
System.out.println(Arrays.toString(str.split("\\D+")));
Output:
[124333, 9912111242, 12323335465645]
\\D+ Matches one or more non-digit characters. Splitting the input according to one or more non-digit characters will give you the desired output.
Java 8 style:
long[] numbers = Pattern.compile("\\D+")
.splitAsStream(str)
.mapToLong(Long::parseLong)
.toArray();
Ah if you only need a String array, then you can just use String.split as the other answers suggests.
Alternatively, you can try this:
String str = "124333 is the otp of candidate number 9912111242. Please refer txn id 12323335465645 while referring blah blah.";
str = str.replaceAll("\\D+", ",");
System.out.println(Arrays.asList(str.split(",")));
\\D+ matches one or more non digits
Output
[124333, 9912111242, 12323335465645]
First thing comes into my mind is filter and split, then i realized that it can be done via
String[] result =str.split("\\D+");
\D matches any non-digit character, + says that one or more of these are needed, and leading \ escapes the other \ since \D would be parsed as 'escape character D' which is invalid

split strings with uppercase

I have some strings that I want to split them word by word. They are in different formats like:
THIS-IS-MY-STRING
ThisIsMyString
This_Is_My_String
This is my string
I use:
String[] x = str1.split("(?=[A-Z])|[_]|[-]|[ ]");
But there are some problems:
some elements in x array will be empty
for the first string I want “THIS” but the result of split is “T”, “H”, “I”, “S”
How should I change split to reach my purpose? Could you please help me?
You need to include look-behind as well, here you go:
String[] x = str1.split("([-_ ]|(?<=[^-_ A-Z])(?=[A-Z]))");
[-_ ] means - or _ or space.
(?<=[^-_ A-Z]) means the previous character isn't a -, _, space, or A-Z.
(?=[A-Z]) means the next character is A-Z.
Reference.
EDIT:
Unfortunately there is no way (I know of) that you can use split to split _CITY_ABC while avoiding _CITY or an empty string.
You can however only process the first and last string if not empty, but this is not ideal.
For this I suggest Matcher:
String str1 = "_CityCITY_";
Pattern p = Pattern.compile("[A-Z][a-z]+(?=[A-Z]|$)|[A-Za-z]+(?=[-_ ]|$)");
Matcher m = p.matcher(str1);
while (m.find())
System.out.println(m.group());
Try Regex.Split(). The first param is the string to split and the second string would be your regular expression. Hope this helps.

Categories

Resources