Reg expression - split string between matching strings

Reg expression - split string between matching strings - java

I am trying to get an array of strings, from a lengthy string. Array consist of strings matching between two other strings (??? and ??? in my case). I tried the following code and it's not giving me the expected results
Pattern pattern = Pattern.compile("\\?\\?\\?(.*?)\\?\\?\\?");
String[] arrayOfKeys = pattern.split("???label.missing???sdfjkhsjkdf sjkdghfjksdg ???some.label???sdjkhsdj");
for (String key : arrayOfKeys) {
System.out.println(key);
}
My expected result is:
["label.missing", "some.label"]

Use Pattern.matcher() to obtain a Matcher for the input string, then use Matcher.find() to find the pattern you want. Matcher.find() will find substring(s) that matches the Pattern provided.
Pattern pattern = Pattern.compile("\\?{3}(.*?)\\?{3}");
Matcher m = pattern.matcher(inputString);
while (m.find()) {
System.out.println(m.group(1));
}
Pattern.split() will use your pattern as delimiter to split the string (then the delimiter part is discarded), which is obviously not what you want in this case. Your regex is designed to match the text that you want to extract.
I shorten the pattern to use quantifier repeating exactly 3 times {3}, instead of writing \? 3 times.

I would create a string input with what you're trying to split, and call input.split() on it.
String input = "???label.missing???sdfjkhsjkdf sjkdghfjksdg ???some.label???sdjkhsdj";
String[] split = input.split("\\?\\?\\?");
Try it here:
http://ideone.com/VAmCyu

Pattern pattern = Pattern.compile("\\?{3}(.+?)\\?{3}");
Matcher matcher= pattern.matcher("???label.missing???sdfjkhsjkdf sjkdghfjksdg ???some.label???sdjkhsdj");
List<String> aList = new ArrayList<String>();
while(matcher.find()) {
aList.add(matcher.group(1));
}
for (String key : aList) {
System.out.println(key);
}

Related

Extracting a group from matched String in Java using regex

I have a list of String containing values like this:
String [] arr = {"${US.IDX_CA}", "${UK.IDX_IO}", "${NZ.IDX_BO}", "${JP.IDX_TK}", "${US.IDX_MT}", "more-elements-with-completely-different-patterns-which-is-irrelevant"};
I'm trying to extract all the IDX_XX from this list. So from above list, i should have, IDX_CA, IDX_IO, IDX_BO etc using regex in Java
I wrote following code:
Pattern pattern = Pattern.compile("(.*)IDX_(\\w{2})");
for (String s : arr){
Matcher m = pattern.matcher(s);
if (m.matches()){
String extract = m.group(1);
System.out.println(extract);
}
}
But this does not print anything. Can someone please tell me what mistake am i making. Thanks.

Use the following fix:
String [] arr = {"${US.IDX_CA}", "${UK.IDX_IO}", "${NZ.IDX_BO}", "${JP.IDX_TK}", "${US.IDX_MT}", "more-elements-with-completely-different-patterns-which-is-irrelevant"};
Pattern pattern = Pattern.compile("\\bIDX_(\\w{2})\\b");
for (String s : arr){
Matcher m = pattern.matcher(s);
while (m.find()){
System.out.println(m.group(0)); // Get the whole match
System.out.println(m.group(1)); // Get the 2 chars after IDX_
}
}
See the Java demo, output:
IDX_CA
CA
IDX_IO
IO
IDX_BO
BO
IDX_TK
TK
IDX_MT
MT
NOTES:
Use \bIDX_(\w{2})\b pattern that matches IDX_ and 2 word chars in between word boundaries and captures the 2 chars after IDX_ into Group 1
m.matches needs a full string match, so it is replaced with m.find()
if replaced with while in case there are more than 1 match in a string
m.group(0) contains the whole match values
m.group(1) contains the Group 1 values.

Regex expression to split a field with multiple values [duplicate]

This question already has an answer here:
Java Pattern print capturing groups
(1 answer)
Closed 5 years ago.
I want to split the string first with "duration=" and the remaining part of the string, with the code below Im able to do that. Now I want to check if the second part of the string is containing any comma(,) and split that values accordingly
String data = "duration=WEEKLY,MONTHLY";
pattern = Pattern.compile("duration=(\\S*),(\\S*)", Pattern.CASE_INSENSITIVE);
matcher = pattern.matcher(data);
if (matcher.find()) {
System.out.println(matcher.group(2)); //this prints MONTHLY(as it is group(2))
}
I want to print "WEEKLY,MONTHLY". How can I get the entire string?
matcher.toMatchResult() or matcher.toString() returns the object instance.
Any help would be highly appreciated.
Thanks

A regex like this will not work properly if you have only one value or more than two. The regex will not match with "duration=MONTHLY".
But you can use it to get the "value" part then simply String.split to get the result
String data = "duration=WEEKLY,MONTHLY";
pattern = Pattern.compile("duration=(.*)", Pattern.CASE_INSENSITIVE);
matcher = pattern.matcher(data);
if (matcher.find()) {
String value = matcher.group(1); //get the values
String[] values = value.split(","); //split on the comma
for(String s : values){ //iterate the values
System.out.println(s);
}
}

multiple regex matches in a string

i have the following text:
bla [string1] bli [string2]
I like to match string1 and string2 with regex in a loop in java.
Howto do ?
my code so far, which only matches the first string1, but not also string 2.
String sRegex="(?<=\\[).*?(?=\\])";
Pattern p = Pattern.compile(sRegex); // create the pattern only once,
Matcher m = p.matcher(sFormula);
if (m.find())
{
String sString1 = m.group(0);
String sString2 = m.group(1); // << no match
}

Your regex is not using any captured groups hence this call with throw exceptions:
m.group(1);
You can use just use:
String sRegex="(?<=\\[)[^]]*(?=\\])";
Pattern p = Pattern.compile(sRegex); // create the pattern only once,
Matcher m = p.matcher(sFormula);
while (m.find()) {
System.out.println( m.group() );
}
Also if should be replaced by while to match multiple times to return all matches.

Your approach is confused. Either write your regex so that it matches two [....] sequences in the one pattern, or call find multiple times. Your current attempt has a regex that "finds" just one [...] sequence.
Try something like this:
Pattern p = Pattern.compile("\\[([^\\]]+)]");
Matcher m = p.matcher(formula);
if (m.find()) {
String string1 = m.group(0);
if (m.find(m.end()) {
String string2 = m.group(0);
}
}
Or generalize using a loop and an array of String for the extracted strings.
(You don't need any fancy look-behind patterns in this case. And ugly "hungarian notation" is frowned in Java, so get out of the habit of using it.)

Regex to split around a word

I am struggling to get the String.split() to do what I would like it to do.
I have an Input of a string of words separated by spaces. Some words have a special function. They look something like this: "special:word".
The input string I am using to test my regex looks like this:
String str = "Hello wonderful special:world what a great special:day";
The result I would like to get from str.split(regex) is an array with the words "world" and "day";
I tried doing it with lookahead (?<=special\:)(\w+) but this splits the string at the words I am looking for. How do I inverse this expression to get the result I am looking for and what exactly do lookaheads and reverse lookaheads do?

Using split in this case would create few problems:
overcomplicated regex to match part that we should split on
Hello wonderful special:world what a great special:day
^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
after split your first element would be empty string "" because split doesn't trim first empty elements like it does in last empty elements so your result would be
["", "world", "day"]
To avoid this use more intuitive approach: instead of finding everything that is NOT part that you want, find only part that you are interested in. To do this use Pattern and Matcher classes. Here is example of how you can find all your special words:
String str = "Hello wonderful special:world what a great special:day";
Pattern p = Pattern.compile("\\b\\w+:(\\w+)\\b");//word after : will be in group 1
Matcher m = p.matcher(str);
while(m.find()){//this will iterate over all found substrings
//here we can use found substrings
System.out.println(m.group(1));
}
Output:
world
day

use patter and matcher, simple example
public static ArrayList<String> parseOut(String s)
{
ArrayList<String> list = new ArrayList<String>();
String regex = "([:])(\\w+)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
list.add(matcher.group().substring(1));
}
return list;
}

Try Pattern and Matcher
String searchPattern = "Hello wonderful special:world what a great special:day";
Pattern pa = Pattern.compile(":[a-zA-Z0-9]+");
Matcher ma = pa.matcher(searchPattern);
while(ma.find()){
System.out.println(ma.group().replaceFirst(":",""));
}
output:
world
day
By using split() we can do as:
String searchPattern1 = "Hello wonderful special:world what a great special:day";
for(String i:searchPattern1.split("\\s")){
if(i.contains(":")){
System.out.println(i.split[1]);
}
}
Here also we get the same output as above.

Why the string does not split?

While trying to split a string xyz213123kop234430099kpf4532 into tokens :
xyz213123
kop234430099
kpf4532
I wrote the following code
String s = "xyz213123kop234430099kpf4532";
String regex = "/^[a-zA-z]+[0-9]+$/";
String tokens[] = s.split(regex);
for(String t : tokens) {
System.out.println(t);
}
but instead of tokens, I get the whole string as one output. What is wrong with the regular expression I used ?

You can do that:
String s = "xyz213123kop234430099kpf4532";
String[] result = s.split("(?<=[0-9])(?=[a-z])");
The idea is to use zero width assertions to find the place where to cut the string, then I use a lookbehind (preceded by a digit [0-9]) and a lookahead (followed by a letter [a-z]).
These lookarounds are just checks and match nothing, thus the delimiter of the split is an empty string and no characters are removed from the result.

You could split on this matching between a number and not-a-number.
String s = "xyz213123kop234430099kpf4532";
String[] parts = s.split("(?<![^\\d])(?=\\D)");
for (String p : parts) {
System.out.println(p);
}
Output
xyz213123
kop234430099
kpf4532

There's nothing in your string that matches the regular expression, because your expression starts with ^ (beginning of string) and ends with $ (end of string). So it would either match the whole string, or nothing at all. But because it doesn't match the string, it is not found when you split the string into tokens. That's why you get just one big token.

You don't want to use split for that. The argument to split is the delimiter between tokens. You don't have that. Instead, you have a pattern that repeats and you want each match to the pattern. Try this instead:
String s = "xyz213123kop234430099kpf4532";
Pattern p = Pattern.compile("([a-zA-z]+[0-9]+)");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Output:
xyz213123
kop234430099
kpf4532
(I don't know by what logic you would have the second token be "3kop234430099" as in your posted question. I assume that the leading "3" is a typo.)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reg expression - split string between matching strings - java

I would create a string input with what you're trying to split, and call input.split() on it. String input = "???label.missing???sdfjkhsjkdf sjkdghfjksdg ???some.label???sdjkhsdj"; String[] split = input.split("\\?\\?\\?"); Try it here: http://ideone.com/VAmCyu

Related

Extracting a group from matched String in Java using regex

Regex expression to split a field with multiple values [duplicate]

multiple regex matches in a string

Regex to split around a word

Why the string does not split?

Categories

Resources