extract specific word from comma separated string in java using regex

extract specific word from comma separated string in java using regex - java

input - [1, 1111, 2020, BMW, Frontier, EXTENDED CAB PICKUP 2-DR, Silver, 16558]
I want to extract here BMW and I am using (^(?:[^\\,]*\\,){3}) this regex.
This results into - BMW, Frontier, EXTENDED CAB PICKUP 2-DR, Silver, 16558].
Could any one help me with this? thanks in advance

As you can only enter a pattern without making use of groups, you could make use of finite repetition for example {0,1000} in the positive lookbehind as Java does not support infinite repetition.
(?<=^\\[[^,]{0,1000},[^,]{0,1000},[^,]{0,1000},\\h{0,10})\\w{3,10}(?=[^\\]\\[]*\\])
Explanation
(?<= Positive lookbehind, assert what is on the left is
^\[ Start of string, match [
[^,]{0,1000},[^,]{0,1000},[^,]{0,1000}, Match 3 times any char except , followed by the ,
\h{0,10} Match 0-10 times a horizontal whitespace char
) Close lookbehind
\w{3,10} Match 3-10 word chars
(?= Positive lookahead, assert what is on the right is
[^\]\[]*\] Match until the ]
) Close lookahead
Java demo
Code example
final String regex = "(?<=^\\[[^,]{0,1000},[^,]{0,1000},[^,]{0,1000},\\h{0,10})\\w{3,10}(?=[^\\]\\[]*\\])";
final String string = "[1, 1111, 2020, BMW, Frontier, EXTENDED CAB PICKUP 2-DR, Silver, 16558]";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
BMW

If it is comma seperated string, you could just split function from the string class which would convert the comma seperated string to array. Refer - Link
The string split() method breaks a given string around matches of the
given regular expression.
Syntax - Public String [ ] split ( String regex, int limit )
Input String: 016-78967
Regular Expression: -
Output : {"016", "78967"}
Then you could into the array to find out the particular keyword from it.

Related

Is there a way to find special subStrings in this case with regex?

I have a string from which numbers are extracted at the end of the String with regex.
String:
'0 DB'!$B$460
subString:
460
I solve this as follows:
String str = "'0 DB'!$B$460";
String sStr = str.replaceAll(".*?([0-9]+)$", "$1");
Old question Link:
Is there a way to find out how many numbers are at the end of a string without knowing the exact index?
Now I have a different kind of string from which I want to extract certain ranges.
String:
'0 DB'!$U$305:$AH$376
Here I would extract certain areas to the left of colon and to the right of colon.
Once the area between the dollar signs($), and the number after it. The respective areas can have different lengths. The part before the first dollar sign can consist of letters as well as numbers
So that would be 4 substrings.
subStrings:
1: U
2: 305
3: AH
4: 376
I was thinking of solving this with regex as well. But unfortunately my knowledge in this regard is limited.
Does anyone have an idea how I can solve this with regex? Or are there other ways?
Thanks

Another option is to use a specific pattern to get the 4 parts as capturing groups.
^.*?([A-Z])\$(\d+):\$([A-Z]+)\$(\d+)$
Explanation
^ Start of string
.*? Match any char except a newline 0+ times in a non greedy way
([A-Z])\$ Capture a char A-Z in group 1 and match $
(\d+):\$ Capture 1+ digits group 2 and match :$
([A-Z]+)\$ Capture 1+ chars A-Z in group 1 and match $
(\d+) Match 1+ digits in group 4
$ End of string
Regex demo | Java demo
Example code
String regex = "^.*?([A-Z])\\$(\\d+):\\$([A-Z]+)\\$(\\d+)$";
String string = "'0 DB'!$U$305:$AH$376";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
To also match both example string, you can make the second part optional.
^.*?([A-Z])\$(\d+)(?::\$([A-Z]+)\$(\d+))?$
See another regex demo

For this requirement, you can simply use the regex, (?<=\\$)\\w+ which means one or more word characters preceded by $.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String str = "'0 DB'!$U$305:$AH$376";
Matcher matcher = Pattern.compile("(?<=\\$)\\w+").matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
}
Output:
U
305
AH
376

Masking credit card number using regex

I am trying to mask the CC number, in a way that third character and last three characters are unmasked.
For eg.. 7108898787654351 to **0**********351
I have tried (?<=.{3}).(?=.*...). It unmasked last three characters. But it unmasks first three also.
Can you throw some pointers on how to unmask 3rd character alone?

You can use this regex with a lookahead and lookbehind:
str = str.replaceAll("(?<!^..).(?=.{3})", "*");
//=> **0**********351
RegEx Demo
RegEx Details:
(?<!^..): Negative lookahead to assert that we don't have 2 characters after start behind us (to exclude 3rd character from matching)
.: Match a character
(?=.{3}): Positive lookahead to assert that we have at least 3 characters ahead

I would suggest that regex isn't the only way to do this.
char[] m = new char[16]; // Or whatever length.
Arrays.fill(m, '*');
m[2] = cc.charAt(2);
m[13] = cc.charAt(13);
m[14] = cc.charAt(14);
m[15] = cc.charAt(15);
String masked = new String(m);
It might be more verbose, but it's a heck of a lot more readable (and debuggable) than a regex.

Here is another regular expression:
(?!(?:\D*\d){14}$|(?:\D*\d){1,3}$)\d
See the online demo
It may seem a bit unwieldy but since a credit card should have 16 digits I opted to use negative lookaheads to look for an x amount of non-digits followed by a digit.
(?! - Negative lookahead
(?: - Open 1st non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){14} - Close 1st non capture group and match it 14 times.
$ - End string ancor.
| - Alternation/OR.
(?: - Open 2nd non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){1,3} - Close 2nd non capture group and match it 1 to 3 times.
$ - End string ancor.
) - Close negative lookahead.
\d - Match a single digit.
This would now mask any digit other than the third and last three regardless of their position (due to delimiters) in the formatted CC-number.

Apart from where the dashes are after the first 3 digits, leave the 3rd digit unmatched and make sure that where are always 3 digits at the end of the string:
(?<!^\d{2})\d(?=[\d-]*\d-?\d-?\d$)
Explanation
(?<! Negative lookbehind, assert what is on the left is not
^\d{2} Match 2 digits from the start of the string
) Close lookbehind
\d Match a digit
(?= Positive lookahead, assert what is on the right is
[\d-]* 0+ occurrences of either - or a digit
\d-?\d-?\d Match 3 digits with optional hyphens
$ End of string
) Close lookahead
Regex demo | Java demo
Example code
String regex = "(?<!^\\d{2})\\d(?=[\\d-]*\\d-?\\d-?\\d$)";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
String strings[] = { "7108898787654351", "7108-8987-8765-4351"};
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
System.out.println(matcher.replaceAll("*"));
}
Output
**0**********351
**0*-****-****-*351

Don't think you should use a regex to do what you want. You could use StringBuilder to create the required string
String str = "7108-8987-8765-4351";
StringBuilder sb = new StringBuilder("*".repeat(str.length()));
for (int i = 0; i < str.length(); i++) {
if (i == 2 || i >= str.length() - 3) {
sb.replace(i, i + 1, String.valueOf(str.charAt(i)));
}
}
System.out.print(sb.toString()); // output: **0*************351

You may add a ^.{0,1} alternative to allow matching . when it is the first or second char in the string:
String s = "7108898787654351"; // **0**********351
System.out.println(s.replaceAll("(?<=.{3}|^.{0,1}).(?=.*...)", "*"));
// => **0**********351
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
It is equal to
System.out.println(s.replaceAll("(?<!^..).(?=.*...)", "*"));
See the Java demo and a regex demo.
Regex details
(?<=.{3}|^.{0,1}) - there must be any three chars other than line break chars immediately to the left of the current location, or start of string, or a single char at the start of the string
(?<!^..) - a negative lookbehind that fails the match if there are any two chars other than line break chars immediately to the left of the current location
. - any char but a line break char
(?=.*...) - there must be any three chars other than line break chars immediately to the right of the current location.

If the CC number always has 16 digits, as it does in the example, and as do Visa and MasterCard CC's, matches of the following regular expression can be replaced with an asterisk.
\d(?!\d{0,2}$|\d{13}$)
Start your engine!

Regular expression to find money value

I have response text:
Cuoc no truoc -2.134VND. Cuoc phat sinh tam tinh den 31/08/2018:
3`2.666VND (da tru KM,goi cuoc...). TKFastpay: 0VND.Tra 01.Trang sau 02.Thoat\",15
i want get result is value of money before "VND" -> -2.134 and 32.666 and 0.
I have regex
String regex = "(?<![^=])([\\d]*)(?!$[VND])";
but its not work.
Please help me!

You could use a positive lookahead (?= and a word boundary after VND \b.
-?\d+(?:\.\d+)?(?=VND\b)
Regex demo
That would match
-? Optional minus sign (To also allow a plus, you could use an optional character class [+-]?
\d+ Match one or more digits
(?:\.\d+)? An optional non capturing group matching a dot and one or more digits
(?=VND\b) Positive lookahead that asserts what is on the right is VND
In Java:
-?\\d+(?:\\.\\d+)?(?=VND\\b)
Demo Java

You may use this regex with a lookahead:
[+-]?\d+\.?\d*(?=VND)
RegEx Demo
RegEx Details:
[+-]?: Match optional + or -
\d+\.?\d*: Match a floating point number or integer number
(?=VND): Assert that we have VND at next position
Java Code:
final String regex = "[+-]?\\d+\\.?\\d*(?=VND)";
final String string = "Cuoc no truoc -2.134VND. Cuoc phat sinh tam tinh den 31/08/2018:\n"
+ "32.666VND (da tru KM,goi cuoc...). TKFastpay: 0VND.Tra 01.Trang sau 02.Thoat\\\",15";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}

Solution:
String regex = "(-?[0-9]+[\.]+[0-9]*)VND"
Description:
You should create group before each and every VND string
Inside the group first you should check whether minus sign available or not so whe have : -?
the we need to capture every digits and we are expecting to have one or more : so [0-9]+
there might be a dot sign (zero or one) in case of decimal . so we have [.]+
again you might have another series of digits after decimal ( zero or more) point so : [0-9]*

If I understand the requirement, then the regex, used in a repeated fashion, might be:
(-?)[\\d][\\d.]*(?=VND)
The idea being that you need at least one digit, followed by more digits or a decimal, then followed by VND.
A slightly improved approach would be to split the [.] to be between the digits, so:
((-?)[\d]+[.]?[\d]*)(?=VND)
Online Example

Try this simple regex
In Java
String regex = "(-?\\d*?\\.?\\d+)(?=VND)";
regex
(-?\d*?\.?\d+)(?=VND)
see regex sample

How to build a Regex in java to detect a whitespace or end of a string?

I am trying to build a Regex to find and extract the string containing Post office box.
Here is two examples:
str = "some text p.o. box 12456 Floor 105 streetName Street";
str = "po box 1011";
str = "post office Box 12 Floor 105 Tallapoosa Street";
str = "leclair ryan pc p.o. Box 2499 8th floor 951 east byrd street";
str = "box 1 slot 3 building 2 136 harvey road";
Here is my pattern and code:
Pattern p = Pattern.compile("p.*o.*box \\d+(\\z|\\s)");
Matcher m = p.matcher(str);
int count =0;
while(m.find()) {
count++;
System.out.println("Match number "+count);
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
}
It works with the second example and note for the first one!
If change my pattern to the following:
Pattern p = Pattern.compile("p.*o.*box \d+ ");
It works just for the first example.
The question is how to group the Regex for end of string "\z" and Regex for whitespace "\s" or " "?
New Pattern:
Pattern p = Pattern.compile("(?i)((p.*o.box\s\w\s*\d*(\z|\s*)|(box\s*\w\s*\d*(\z|\s*)) ))");

You can leverage the following code:
String str = "some text p.o. box 12456 Floor 105 streetName Street";
Pattern p = Pattern.compile("(?i)\\bp\\.?\\s*o\\.?\\s*box\\s*(\\d+)(?:\\z|\\s)");
Matcher m = p.matcher(str);
int count =0;
while(m.find()) {
count++;
System.out.println("Match: "+m.group(0));
System.out.println("Digits: "+m.group(1));
System.out.println("Match number "+count);
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
}
To make the pattern case insensitive, just add Pattern.CASE_INSENSITIVE flag to the Pattern.compile declaration or pre-pend the inline (?i) modifier to the pattern.
Also, .* matches any characters other than a newline zero or more times, I guess you wanted to match . optionally. So, you need just ? quantifier and to escape the dot so as to match a literal dot. Note how I used (...) to capture digits into Group 1 (it is called a capturing group). The group where you match the end of the string or space is inside a non-capturing grouo ((?:...)) that is used for grouping only, not for storing its value in the memory buffer. Since you wanted to match a word boundary there, I suggest replacing (?:\\z|\\s) with a mere \\b:
Pattern p = Pattern.compile("(?i)\\bp\\.?\\s*o\\.?\\s*box\\s*(\\d+)\\b");

There are a couple items in your regex that look like they need work. From what I understand you want to extract the P.O. Box number from strings of such format that you've provided. Given that, the following regex will accomplish what you want, with a following explanation. See it in action here: https://regex101.com/r/cQ8lH3/2
Pattern p = Pattern.compile("p\.?o\.? box [^ \r\n\t]+");
Firstly, you need to use only ONE slash, for escape sequences. Also, you must escape the dots. If you do not escape the dots, regex will match . as ANY single character. \. will instead match a dot symbol.
Next, you need to change the * quantifier after the \. to a ?. Why? The * symbol will match zero or more of the preceding symbol while the ? quantifier will match only one or none.
Finally rethink how you're matching the box number. Instead of matching all characters AND THEN white space, just match everything that isn't a whitespace. [^ \r\n\t]+ will match all characters that are NOT a space (), carriage return (\r), newline (\n), or tab (\t). Therefore it will consume the box number and stop as soon as it hits any whitespace or end of file.
Some of these changes may not be necessary to get your code to work for the examples you gave, but they are the proper way to build the regex you want.

RegEx: Matching n-char long sequence of repeating character

I want to split of a text string that might look like this:
(((Hello! --> ((( and Hello!
or
########No? --> ######## and No?
At the beginning I have n-times the same special character, but I want to match the longest possible sequence.
What I have at the moment is this regex:
([^a-zA-Z0-9])\\1+([a-zA-Z].*)
This one would return for the first example
( (only 1 time) and Hello!
and for the second
# and No!
How do I tell regEx I want the maximal long repetition of the matching character?
I am using RegEx as part of a Java program in case this matters.

I suggest the following solution with 2 regexps: (?s)(\\W)\\1+\\w.* for checking if the string contains same repeating non-word symbols at the start, and if yes, split with a mere (?<=\\W)(?=\\w) pattern (between non-word and a word character), else, just return a list containing the whole string (as if not split):
String ptrn = "(?<=\\W)(?=\\w)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
if (str.matches("(?s)(\\W)\\1+\\w.*")) {
System.out.println(Arrays.toString(str.split(ptrn)));
}else { System.out.println(Arrays.asList(str)); }
}
See IDEONE demo
Result:
[(((, Hello!]
[########, No?]
[$%^&^Hello!]
Also, your original regex can be modified to fit the requirement like this:
String ptrn = "(?s)((\\W)\\2+)(\\w.*)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
Pattern p = Pattern.compile(ptrn);
Matcher m = p.matcher(str);
if (m.matches()) {
System.out.println(Arrays.asList(m.group(1), m.group(3)));
}
else {
System.out.println(Arrays.asList(str));
}
}
See another IDEONE demo
That regex matches:
(?s) - DOTALL inline modifier (if the string has newline characters, .* will also match them).
((\\W)\\2+) - Capture group 1 matching and capturing into Group 2 a non-word character followed by the same character (since a backreference \2 is used) 1 or more times.
(\\w.*) - matches and captures into Group 3 a word character and then one or more characters.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

extract specific word from comma separated string in java using regex - java

input - [1, 1111, 2020, BMW, Frontier, EXTENDED CAB PICKUP 2-DR, Silver, 16558] I want to extract here BMW and I am using (^(?:[^\\,]*\\,){3}) this regex. This results into - BMW, Frontier, EXTENDED CAB PICKUP 2-DR, Silver, 16558]. Could any one help me with this? thanks in advance

Related

Is there a way to find special subStrings in this case with regex?

Masking credit card number using regex

Regular expression to find money value

How to build a Regex in java to detect a whitespace or end of a string?

RegEx: Matching n-char long sequence of repeating character

Categories

Resources