get everything after a particular string

get everything after a particular string - java

I have a String coming as "process_client_123_Tree" and "process_abc_pqr_client_123_Tree". I want to extract everything after "process_client_" and "process_abc_pqr_client_" and store it in a String variable.
Here currentKey variable can contain either of above two strings.
String clientId = // how to use currentKey here so that I can get remaining portion in this variable
What is the right way to do this? Should I just use split here or some regex?

import java.util.regex.*;
class test
{
public static void main(String args[])
{
Pattern pattern=Pattern.compile("^process_(client_|abc_pqr_client_)(.*)$");
Matcher matcher = pattern.matcher("process_client_123_Tree");
while(matcher.find())
System.out.println("String 1 Group 2: "+matcher.group(2));
matcher = pattern.matcher("process_abc_pqr_client_123_Tree");
while(matcher.find())
System.out.println("String 2 Group 2: "+matcher.group(2));
System.out.println("Another way..");
System.out.println("String 1 Group 2: "+"process_client_123_Tree".replace("process_client_", ""));
System.out.println("String 2 Group 2: "+"process_abc_pqr_client_123_Tree".replace("process_abc_pqr_client_", ""));
}
}
Output:
$ java test
String 1 Group 2: 123_Tree
String 2 Group 2: 123_Tree
Another way..
String 1 Group 2: 123_Tree
String 2 Group 2: 123_Tree
Regex breakup:
^ match start of line
process_(client_|abc_pqr_client_) match "process_" followed by "client_" or abc_pqr_client_" (captured as group 1)
(.*)$ . means any char and * means 0 or more times, so it match the rest chars in string until end ($) and captures it as group 2

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Matchit{
public static void main(String []args){
String str = "process_abc_pqr_client_123_Tree";
Pattern p = Pattern.compile("process_abc_pqr_client_(.*)|process_client_(.*)");
Matcher m = p.matcher("process_abc_pqr_client_123_Tree");
if (m.find( )) {
System.out.println("Found value: " + m.group(1) );
}
}
}
Gets you:
123_Tree
The parentheses in the regexp define the match groups. The pipe is a logical or. Dot means any character and star means any number. So, I create a pattern object with that regexp and then use a matcher object to get the part of the string that has been matched.

A regex pattern could be: "process_(?:abc_pqr_)?client_(\\w+)" regex101 demo
(?:abc_pqr_)? is the optional part
(?: opens a non capture group )? zero or one times
\w+ matches one or more word characters [A-Za-z0-9_]
Demo at RegexPlanet. Matches will be in group(1) / first capturing group.
To extend it with limit to the right, match lazily up to the right token
"process_(?:abc_pqr_)?client_(\\w+?)_trace_count"
where \w+? matches as few as possible word characters to meet condition.

Related

Regex to match a list of exact strings with some variable characters

I'm looking for a way to match a list of parameters that include some predefined characters and some variable characters using Java's String#matches method. For instance:
Possible Parameter 1: abc;[variable lowercase letters with maybe an underscore]
Possible Parameter 2: cde;[variable lowercase letters with maybe an underscore]
Possible Parameter 3: g;4
Example 1: abc;erga_sd,cde;dfgef,g;4
Example 2: g;4,abc;dsfaweg
Example 3: cde;df_ger
Each of the parameters would be comma-separated but they can come in any order and include 1, 2, and/or 3 (no duplicates)
This is the regex I have so far that partially works:
(abc;[a-z_,]+){0,1}|(cde;[a-z,]+){0,1}|(g;4,){0,1}
The problem is that it also finds something like this valid: abc;dsfg,dfvser where the beginning of the string after the comma does not start with a valid abc; or cde; or g;4

As you said:
The problem is that it also finds something like this valid:
abc;dsfg,dfvser where the beginning of the string after the comma does
not start with a valid abc; or cde; or g;4
Therefore the valid entries will always have the patterns after the comma. What you can do is, you can split the each inputs with the delimiter "," and apply the valid regex pattern to the split elements and then combine the matching results of the split elements to get the matching result of the whole input line.
Your regex should be:
(abc;[a-z_]+)|(cde;[a-z_]+)|(g;4)
You'll get any of these three patterns just like you have mentioned in your post earlier, in a valid element which you've gotten by doing a split on the input line.
Here's the code:
String regex = "(abc;[a-z_]+)|(cde;[a-z_]+)|(g;4)";
boolean finalResult = true;
for (String input: inputList.split(",")) {
finalResult = finalResult && Pattern.matches(regex,input);
}
System.out.println(finalResult);

If you want to use matches, then the whole string has to match.
^(?:(?:abc|cde);[a-z_]+|g;4)(?:,(?:(?:abc|cde);[a-z_]+|g;4))*$
Explanation
^ Start of string
(?: Non capture group
(?:abc|cde);[a-z_]+ match either abc; or cde; and 1+ chars a-z or _
| Or
g;4 Match literally
) Close non capture group
(?: Non capture group
,(?:(?:abc|cde);[a-z_]+|g;4) Match a comma, and repeat the first pattern
)* Close non capture group and optionally repeat
$ End of string
See a regex demo and a Java demo
Example code
String[] strings = {
"abc;erga_sd,cde;dfgef,g;4",
"g;4,abc;dsfaweg",
"cde;df_ger",
"g;4",
"abc;dsfg,dfvser"
};
String regex = "^(?:(?:abc|cde);[a-z_]+|g;4)(?:,(?:(?:abc|cde);[a-z_]+|g;4))*$";
Pattern pattern = Pattern.compile(regex);
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.printf("Match for %s%n", s);
} else {
System.out.printf("No match for %s%n", s);
}
}
Output
Match for abc;erga_sd,cde;dfgef,g;4
Match for g;4,abc;dsfaweg
Match for cde;df_ger
Match for g;4
No match for abc;dsfg,dfvser
If there should not be any duplicate abc; cde or g;4 you can rule that out using a negative lookahead with a backreference to match the same twice at the start of the pattern.
^(?!.*(abc;|cde;|g;4).*\1)(?:(?:abc|cde);[a-z_]+|g;4)(?:,(?:(?:abc|cde);[a-z_]+|g;4))*$
Regex demo

Is there a way to find special subStrings in this case with regex?

I have a string from which numbers are extracted at the end of the String with regex.
String:
'0 DB'!$B$460
subString:
460
I solve this as follows:
String str = "'0 DB'!$B$460";
String sStr = str.replaceAll(".*?([0-9]+)$", "$1");
Old question Link:
Is there a way to find out how many numbers are at the end of a string without knowing the exact index?
Now I have a different kind of string from which I want to extract certain ranges.
String:
'0 DB'!$U$305:$AH$376
Here I would extract certain areas to the left of colon and to the right of colon.
Once the area between the dollar signs($), and the number after it. The respective areas can have different lengths. The part before the first dollar sign can consist of letters as well as numbers
So that would be 4 substrings.
subStrings:
1: U
2: 305
3: AH
4: 376
I was thinking of solving this with regex as well. But unfortunately my knowledge in this regard is limited.
Does anyone have an idea how I can solve this with regex? Or are there other ways?
Thanks

Another option is to use a specific pattern to get the 4 parts as capturing groups.
^.*?([A-Z])\$(\d+):\$([A-Z]+)\$(\d+)$
Explanation
^ Start of string
.*? Match any char except a newline 0+ times in a non greedy way
([A-Z])\$ Capture a char A-Z in group 1 and match $
(\d+):\$ Capture 1+ digits group 2 and match :$
([A-Z]+)\$ Capture 1+ chars A-Z in group 1 and match $
(\d+) Match 1+ digits in group 4
$ End of string
Regex demo | Java demo
Example code
String regex = "^.*?([A-Z])\\$(\\d+):\\$([A-Z]+)\\$(\\d+)$";
String string = "'0 DB'!$U$305:$AH$376";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
To also match both example string, you can make the second part optional.
^.*?([A-Z])\$(\d+)(?::\$([A-Z]+)\$(\d+))?$
See another regex demo

For this requirement, you can simply use the regex, (?<=\\$)\\w+ which means one or more word characters preceded by $.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String str = "'0 DB'!$U$305:$AH$376";
Matcher matcher = Pattern.compile("(?<=\\$)\\w+").matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
}
Output:
U
305
AH
376

Regex matcher not giving expected result. Not matching number properly

I cannot understand why 2nd group is giving me only 0. I expect 3000. And do point me to a resource where I can understand better.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public static void main( String args[] ) {
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );//?
System.out.println("Found value: " + m.group(3) );
}else {
System.out.println("NO MATCH");
}
}
}

Precise the pattern, add QT before the \d pattern, or use .*? instead of the first .* to get as few chars as possible.
String pattern = "(.*QT)(\\d+)(.*)";
or
String pattern = "(.*?)(\\d+)(.*)";
will do. See a Java demo.
The (.*QT)(\\d+)(.*) will match and capture into Group 1 any 0+ chars other than line break chars, as many as possible, up to the last occurrence of QT (followed with the subsequent subpatterns), then will match and capture 1+ digits into Group 2, and then will match and capture into Group 3 the rest of the line.
The .*? in the alternative pattern will matchand capture into Group 1 any 0+ chars other than line break chars, as few as possible, up to the first chunk of 1 or more digits.
You may also use a simpler pattern like String pattern = "QT(\\d+)"; to get all digits after QT, and the result will be in Group 1 then (you won't have the text before and after the number).

The * quantifier will try to match as many as possible, because it is a greedy quantifier.
You can make it non-greedy (lazy) by changing it to *?
Then, your regex will become :
(.*?)(\d+)(.*)
And you will match 3000 in the 2nd capturing group.
Here is a regex101 demo

RegEx: Matching n-char long sequence of repeating character

I want to split of a text string that might look like this:
(((Hello! --> ((( and Hello!
or
########No? --> ######## and No?
At the beginning I have n-times the same special character, but I want to match the longest possible sequence.
What I have at the moment is this regex:
([^a-zA-Z0-9])\\1+([a-zA-Z].*)
This one would return for the first example
( (only 1 time) and Hello!
and for the second
# and No!
How do I tell regEx I want the maximal long repetition of the matching character?
I am using RegEx as part of a Java program in case this matters.

I suggest the following solution with 2 regexps: (?s)(\\W)\\1+\\w.* for checking if the string contains same repeating non-word symbols at the start, and if yes, split with a mere (?<=\\W)(?=\\w) pattern (between non-word and a word character), else, just return a list containing the whole string (as if not split):
String ptrn = "(?<=\\W)(?=\\w)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
if (str.matches("(?s)(\\W)\\1+\\w.*")) {
System.out.println(Arrays.toString(str.split(ptrn)));
}else { System.out.println(Arrays.asList(str)); }
}
See IDEONE demo
Result:
[(((, Hello!]
[########, No?]
[$%^&^Hello!]
Also, your original regex can be modified to fit the requirement like this:
String ptrn = "(?s)((\\W)\\2+)(\\w.*)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
Pattern p = Pattern.compile(ptrn);
Matcher m = p.matcher(str);
if (m.matches()) {
System.out.println(Arrays.asList(m.group(1), m.group(3)));
}
else {
System.out.println(Arrays.asList(str));
}
}
See another IDEONE demo
That regex matches:
(?s) - DOTALL inline modifier (if the string has newline characters, .* will also match them).
((\\W)\\2+) - Capture group 1 matching and capturing into Group 2 a non-word character followed by the same character (since a backreference \2 is used) 1 or more times.
(\\w.*) - matches and captures into Group 3 a word character and then one or more characters.

regex for letters or numbers in brackets

I am using Java to process text using regular expressions. I am using the following regular expression
^[\([0-9a-zA-Z]+\)\s]+
to match one or more letters or numbers in parentheses one or more times. For instance, I like to match
(aaa) (bb) (11) (AA) (iv)
or
(111) (aaaa) (i) (V)
I tested this regular expression on http://java-regex-tester.appspot.com/ and it is working. But when I use it in my code, the code does not compile. Here is my code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Tester {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("^[\([0-9a-zA-Z]+\)\s]+");
String[] words = pattern.split("(a) (1) (c) (xii) (A) (12) (ii)");
String w = pattern.
for(String s:words){
System.out.println(s);
}
}
}
I tried to use \ instead of \ but the regex gave different results than what I expected (it matches only one group like (aaa) not multiple groups like (aaa) (111) (ii).
Two questions:
How can I fix this regex and be able to match multiple groups?
How can I get the individual matches separately (like (aaa) alone and then (111) and so on). I tried pattern.split but did not work for me.

Firstly, you want to escape any backslashes in the quotation marks with another backslash. The Regex will treat it as a single backslash. (E.g. call a word character \w in quotation marks, etc.)
Secondly, you got to finish the line that reads:
String w = pattern.
That line explains why it doesn't compile.

Here is my final solution to match the individual groups of letters/numbers in brackets that appear at the beginning of a line and ignore the rest
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Tester {
static ArrayList<String> listOfEnums;
public static void main(String[] args) {
listOfEnums = new ArrayList<String>();
Pattern pattern = Pattern.compile("^\\([0-9a-zA-Z^]+\\)");
String p = "(a) (1) (c) (xii) (A) (12) (ii) and the good news (1)";
Matcher matcher = pattern.matcher(p);
boolean isMatch = matcher.find();
int index = 0;
//once you find a match, remove it and store it in the arrayList.
while (isMatch) {
String s = matcher.group();
System.out.println(s);
//Store it in an array
listOfEnums.add(s);
//Remove it from the beginning of the string.
p = p.substring(listOfEnums.get(index).length(), p.length()).trim();
matcher = pattern.matcher(p);
isMatch = matcher.find();
index++;
}
}
}

1) Your regex is incorrect. You want to match individual groups of letters / numbers in brackets, and the current regex will match only a single string of one or more such groups. I.e. it will match
(abc) (def) (123)
as a single group rather than three separate groups.
A better regex that would match only up to the closing bracket would be
\([0-9a-zA-Z^\)]+\)
2) Java requires you to escape all backslashes with another backslash
3) The split() method will not do what you want. It will find all matches in your string then throw them away and return an array of what is left over. You want to use matcher() instead
Pattern pattern = Pattern.compile("\\([0-9a-zA-Z^\\)]+\\)");
Matcher matcher = pattern.matcher("(a) (1) (c) (xii) (A) (12) (ii)");
while (matcher.find()) {
System.out.println(matcher.group());
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

get everything after a particular string - java

Related

Regex to match a list of exact strings with some variable characters

Is there a way to find special subStrings in this case with regex?

Regex matcher not giving expected result. Not matching number properly

RegEx: Matching n-char long sequence of repeating character

regex for letters or numbers in brackets

Categories

Resources