regex match two sets of digits from line - java

Matching two sets of numbers from a line. (2.66 and 34.3).
These can digits are variable in length but surrounded by whitespace. eg
Ox 2.66 abcda 34.3 abfdasd
I got 2.66 with \b(?:Ox)\s+(\d*\.*?\d+)
Any resources that can guide me in the right direction? Im stuck on matching the second separately.
cheers

You may continue the regex pattern and capture the second number after another word:
\bOx\s+(\d*\.?\d+)\s+\S+\s+(\d*\.?\d+)
See the regex demo. The second number will be in Group 2.
Details:
\b - a word boundary
Ox - a word Ox
\s+ - one or more whitespaces
(\d*\.?\d+) - Group 1: zero or more digits, an optional ., one or more digits
\s+ - one or more whitespaces
\S+ - one or more non-whitespaces
\s+ - one or more whitespaces
(\d*\.?\d+) - Group 2: zero or more digits, an optional ., one or more digits.
See a Java demo:
import java.util.*;
import java.util.regex.*;
class Test
{
public static void main (String[] args) throws java.lang.Exception
{
String s = "Ox 2.66 abcda 34.3 abfdasd";
Pattern pattern = Pattern.compile("\\bOx\\s+(\\d*\\.?\\d+)\\s+\\S+\\s+(\\d*\\.?\\d+)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1)); // => 2.66
System.out.println(matcher.group(2)); // => 34.3
}
}
}

You could write the following regular expression: ([\d\D\s]*).
If you want numeric values only then ([\d\.]*).

Related

Is there a way to find special subStrings in this case with regex?

I have a string from which numbers are extracted at the end of the String with regex.
String:
'0 DB'!$B$460
subString:
460
I solve this as follows:
String str = "'0 DB'!$B$460";
String sStr = str.replaceAll(".*?([0-9]+)$", "$1");
Old question Link:
Is there a way to find out how many numbers are at the end of a string without knowing the exact index?
Now I have a different kind of string from which I want to extract certain ranges.
String:
'0 DB'!$U$305:$AH$376
Here I would extract certain areas to the left of colon and to the right of colon.
Once the area between the dollar signs($), and the number after it. The respective areas can have different lengths. The part before the first dollar sign can consist of letters as well as numbers
So that would be 4 substrings.
subStrings:
1: U
2: 305
3: AH
4: 376
I was thinking of solving this with regex as well. But unfortunately my knowledge in this regard is limited.
Does anyone have an idea how I can solve this with regex? Or are there other ways?
Thanks
Another option is to use a specific pattern to get the 4 parts as capturing groups.
^.*?([A-Z])\$(\d+):\$([A-Z]+)\$(\d+)$
Explanation
^ Start of string
.*? Match any char except a newline 0+ times in a non greedy way
([A-Z])\$ Capture a char A-Z in group 1 and match $
(\d+):\$ Capture 1+ digits group 2 and match :$
([A-Z]+)\$ Capture 1+ chars A-Z in group 1 and match $
(\d+) Match 1+ digits in group 4
$ End of string
Regex demo | Java demo
Example code
String regex = "^.*?([A-Z])\\$(\\d+):\\$([A-Z]+)\\$(\\d+)$";
String string = "'0 DB'!$U$305:$AH$376";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
To also match both example string, you can make the second part optional.
^.*?([A-Z])\$(\d+)(?::\$([A-Z]+)\$(\d+))?$
See another regex demo
For this requirement, you can simply use the regex, (?<=\\$)\\w+ which means one or more word characters preceded by $.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String str = "'0 DB'!$U$305:$AH$376";
Matcher matcher = Pattern.compile("(?<=\\$)\\w+").matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
}
Output:
U
305
AH
376

Masking credit card number using regex

I am trying to mask the CC number, in a way that third character and last three characters are unmasked.
For eg.. 7108898787654351 to **0**********351
I have tried (?<=.{3}).(?=.*...). It unmasked last three characters. But it unmasks first three also.
Can you throw some pointers on how to unmask 3rd character alone?
You can use this regex with a lookahead and lookbehind:
str = str.replaceAll("(?<!^..).(?=.{3})", "*");
//=> **0**********351
RegEx Demo
RegEx Details:
(?<!^..): Negative lookahead to assert that we don't have 2 characters after start behind us (to exclude 3rd character from matching)
.: Match a character
(?=.{3}): Positive lookahead to assert that we have at least 3 characters ahead
I would suggest that regex isn't the only way to do this.
char[] m = new char[16]; // Or whatever length.
Arrays.fill(m, '*');
m[2] = cc.charAt(2);
m[13] = cc.charAt(13);
m[14] = cc.charAt(14);
m[15] = cc.charAt(15);
String masked = new String(m);
It might be more verbose, but it's a heck of a lot more readable (and debuggable) than a regex.
Here is another regular expression:
(?!(?:\D*\d){14}$|(?:\D*\d){1,3}$)\d
See the online demo
It may seem a bit unwieldy but since a credit card should have 16 digits I opted to use negative lookaheads to look for an x amount of non-digits followed by a digit.
(?! - Negative lookahead
(?: - Open 1st non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){14} - Close 1st non capture group and match it 14 times.
$ - End string ancor.
| - Alternation/OR.
(?: - Open 2nd non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){1,3} - Close 2nd non capture group and match it 1 to 3 times.
$ - End string ancor.
) - Close negative lookahead.
\d - Match a single digit.
This would now mask any digit other than the third and last three regardless of their position (due to delimiters) in the formatted CC-number.
Apart from where the dashes are after the first 3 digits, leave the 3rd digit unmatched and make sure that where are always 3 digits at the end of the string:
(?<!^\d{2})\d(?=[\d-]*\d-?\d-?\d$)
Explanation
(?<! Negative lookbehind, assert what is on the left is not
^\d{2} Match 2 digits from the start of the string
) Close lookbehind
\d Match a digit
(?= Positive lookahead, assert what is on the right is
[\d-]* 0+ occurrences of either - or a digit
\d-?\d-?\d Match 3 digits with optional hyphens
$ End of string
) Close lookahead
Regex demo | Java demo
Example code
String regex = "(?<!^\\d{2})\\d(?=[\\d-]*\\d-?\\d-?\\d$)";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
String strings[] = { "7108898787654351", "7108-8987-8765-4351"};
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
System.out.println(matcher.replaceAll("*"));
}
Output
**0**********351
**0*-****-****-*351
Don't think you should use a regex to do what you want. You could use StringBuilder to create the required string
String str = "7108-8987-8765-4351";
StringBuilder sb = new StringBuilder("*".repeat(str.length()));
for (int i = 0; i < str.length(); i++) {
if (i == 2 || i >= str.length() - 3) {
sb.replace(i, i + 1, String.valueOf(str.charAt(i)));
}
}
System.out.print(sb.toString()); // output: **0*************351
You may add a ^.{0,1} alternative to allow matching . when it is the first or second char in the string:
String s = "7108898787654351"; // **0**********351
System.out.println(s.replaceAll("(?<=.{3}|^.{0,1}).(?=.*...)", "*"));
// => **0**********351
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
It is equal to
System.out.println(s.replaceAll("(?<!^..).(?=.*...)", "*"));
See the Java demo and a regex demo.
Regex details
(?<=.{3}|^.{0,1}) - there must be any three chars other than line break chars immediately to the left of the current location, or start of string, or a single char at the start of the string
(?<!^..) - a negative lookbehind that fails the match if there are any two chars other than line break chars immediately to the left of the current location
. - any char but a line break char
(?=.*...) - there must be any three chars other than line break chars immediately to the right of the current location.
If the CC number always has 16 digits, as it does in the example, and as do Visa and MasterCard CC's, matches of the following regular expression can be replaced with an asterisk.
\d(?!\d{0,2}$|\d{13}$)
Start your engine!

Tokenize Words separated by non-word characters exept single quote

I have the following method I'm trying to implement: parses the input into “word tokens”: sequences of word characters separated by non-word characters. However, non-word characters can become part of a token if they are quoted (in single quotes).
I want to use regex but have trouble getting my code just right:
public static List<String> wordTokenize(String input) {
Pattern pattern = Pattern.compile ("\\b(?:(?<=\')[^\']*(?=\')|\\w+)\\b");
Matcher matcher = pattern.matcher (input);
ArrayList ans = new ArrayList();
while (matcher.find ()){
ans.add (matcher.group ());
}
return ans;
}
My regex fails to identify that starting a word mid word without space doesn't mean starting a new word. Examples:
The input: this-string 'has only three tokens' // works
The input:
"this*string'has only two#tokens'"
Expected :[this, stringhas only two#tokens]
Actual :[this, string, has only two#tokens]
The input: "one'two''three' '' four 'twenty-one'"
Expected :[onetwothree, , four, twenty-one]
Actual :[one, two, three, four, twenty-one]
How do I fix the spaces?
You want to match one or more occurrences of a word char or a substring between the closest single straight apostrophes, and remove all those apostrophes from the tokens.
Use the following regex and .replace("'", "") on the matches:
(?:\w|'[^']*')+
See the regex demo. Details:
(?: - start of a non-capturing group
\w - a word char
| - or
' - a straight single quotation mark
[^']* - any 0+ chars other than a straight single quotation mark
' - a straight single quotation mark
)+ - end of the group, 1+ occurrences.
See the Java demo:
// String s = "this*string'has only two#tokens'"; // => [this, stringhas only two#tokens]
String s = "one'two''three' '' four 'twenty-one'"; // => [onetwothree, , four, twenty-one]
Pattern pattern = Pattern.compile("(?:\\w|'[^']*')+", Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = pattern.matcher(s);
List<String> tokens = new ArrayList<>();
while (matcher.find()){
tokens.add(matcher.group(0).replace("'", ""));
}
Note the Pattern.UNICODE_CHARACTER_CLASS is added for the \w pattern to match all Unicode letters and digits.

Regex to match a digit not followed by a dot(".")

I have a string
string 1(excluding the quotes) -> "my car number is #8746253 which is actually cool"
conditions - The number 8746253, could be of any length and
- the number can also be immediately followed by an end-of-line.
I want to group-out 8746253 which should not be followed by a dot "."
I have tried,
.*#(\d+)[^.].*
This will get me the number for sure, but this will match even if there is a dot, because [.^] will match the last digit of the number(for example, 3 in the below case)
string 2(excluding the quotes) -> "earth is #8746253.Kms away, which is very far"
I want to match only the string 1 type and not the string 2 types.
To match any number of digits after # that are not followed with a dot, use
(?<=#)\d++(?!\.)
The ++ is a possessive quantifier that will make the regex engine only check the lookahead (?!\.) only after the last matched digit, and won't backtrack if there is a dot after that. So, the whole match will get failed if there is a dit after the last digit in a digit chunk.
See the regex demo
To match the whole line and put the digits into capture group #1:
.*#(\d++)(?!\.).*
See this regex demo. Or a version without a lookahead:
^.*#(\d++)(?:[^.\r\n].*)?$
See another demo. In this last version, the digit chunk can only be followed with an optional sequence of a char that is not a ., CR and LF followed with any 0+ chars other than line break chars ((?:[^.\r\n].*)?) and then the end of string ($).
This works like you have described
public class MyRegex{
public static void main(String[] args) {
Pattern patern = Pattern.compile("#(\\d++)[^\\.]");
Matcher matcher1 = patern.matcher("my car number is #8746253 which is actually cool");
if(matcher1.find()){
System.out.println(matcher1.group(1));
}
Matcher matcher2 = patern.matcher("earth is #8746253.Kms away, which is very far");
if(matcher2.find()){
System.out.println(matcher1.group(1));
}else{
System.out.println("No match found");
}
}
}
Outputs:
> 8746253
> No match found

Remove leading trailing non numeric characters from a string in Java

I need to strip off all the leading and trailing characters from a string upto the first and last digit respectively.
Example : OBC9187A-1%A
Should return : 9187A-1
How do I achieve this in Java?
I understand regex is the solution, but I am not good at it.
I tried this replaceAll("([^0-9.*0-9])","")
But it returns only digits and strips all the alpha/special characters.
Here is a self-contained example of using regex and java to solve your problem. I would suggest looking at a regex tutorial of some kind here is a nice one.
public static void main(String[] args) throws FileNotFoundException {
String test = "OBC9187A-1%A";
Pattern p = Pattern.compile("\\d.*\\d");
Matcher m = p.matcher(test);
while (m.find()) {
System.out.println("Match: " + m.group());
}
}
Output:
Match: 9187A-1
\d matches any digit .* matches anything 0 or more times \d matches any digit. The reason we use \\d is to escape the \ for Java since \ is a special character...So this regex will match a digit followed by anything followed by another digit. This is greedy so it will take the longest/largest/greediest match so it will get the first and last digit and anything in between. The while loop is there because if there was more than 1 match it would loop through all matches. In this case there can only be 1 match so you can leave the while loop or change to if like this:
if(m.find())
{
System.out.println("Match: " + m.group());
}
This will strip leading and trailing non-digit characters from string s.
String s = "OBC9187A-1%A";
s = s.replaceAll("^\\D+", "").replaceAll("\\D+$", "");
System.out.println(s);
// prints 9187A-1
DEMO
Regex explanation
^\D+
^ assert position at start of the string
\D+ match any character that's not a digit [^0-9]
Quantifier: + Between one and unlimited times, as many times as possible
\D+$
\D+ match any character that's not a digit [^0-9]
Quantifier: + Between one and unlimited times, as many times as possible
$ assert position at end of the string

Categories

Resources