Get a substring from string multiple times - java

I have a String that I don't know how long it is or what caracters are used in it.
I want to search in the string and get any substring found inside "" .
I tried to use pattern.compile but it always return an empty string
Pattern p = Pattern.compile("\".\"");
Matcher m = p.matcher(mystring);
while(m.find()){
System.out.println(m.group().toString());
}
How can I do it?

Use the .+? to get all characters inside "" with grouping
Pattern p = Pattern.compile("\".+?\"");
The .+ specifies that you want at least one or more characters inside the quotations. The ? specifies that it is a reluctant quantifier, which means it will put different quotations into different groups.
Unit test example:
#Test
public void test() {
String test = "speak \"friend\" and \"enter\"";
Pattern p = Pattern.compile("\".+?\"");
Matcher m = p.matcher(test);
while(m.find()){
System.out.println(m.group().toString().replace("\"", ""));
}
}
Output:
friend
enter

That is because your regex actually searches for one character between " and " ... if you want to search for more character, you should rewrite your regex to "\".?\""

Related

How to match a string between two same delimiters?

some-string-test-moretext.csv
I want to extract the string test, which is always found after the 2nd and 3rd - delimiter.
The expression [-](.*?)[-] would match -string-. So it's probably close, but how can I move on to the next match?
If that matters, I'm using java.
If you know the number of delimiters in advance, you can just split the String.
String[] test = {
"some-string-test-moretext.csv",
"another-string-test-andthensome.csv"
};
for (String s: test) {
System.out.println(s.split("-")[2]);
}
Output
test
test
This should give you quite a good head start:
[^-]+-[^-]+-(.*?)-[^-]+\.csv
https://regex101.com/r/YjWDkv/1
I would propose this, using regex, and very short :
String str = "some-string-test-moretext.csv\n";
Matcher m = Pattern.compile("\\w+-\\w+-(\\w+).*").matcher(str);
String res = m.find() ? m.group(1) : "";
System.out.println(res);
For sureString.split() is another way :
String res = str.split("-")[2];
In sed:
$ echo 'some-string-test-moretext.csv' | sed 's/[^-]*-[^-]*-\([^-]*\)-.*/\1/'
test
[^-]* means "zero or more occurrences of any char except "-". Let's call that "notHyphen". So we're matching on notHyphen-notHyphen-\(notHyphen\)-.* and replacing the whole match with \1, that is, whatever is captured by the \(\).
In Java, you won't need to escape ( to \(, and the technique for extracting from capturing groups is different:
Pattern patt = Pattern.compile("[^-]*-[^-]*-([^-]*)-.*");
Matcher m = patt.matcher(filename);
String extracted = null;
if (m.matches()) {
extracted = m.group(1);
}

Is it possible to use a quantifier on a non-capturing group ? - regex

I catch a group by regex and I would like to catch
everything but not the group(s).
So group can have several occurences, on different locations, in the String.
My first thought was, to solve it with negativ lookahead but I failed with it. Therefore I tried it with non capturing group and I stuck here too.
(bar) (baz) foo
I want foo.
This is what I have so far:
String input = "(bar) (baz) foo";
String matchesGroup = "((?=\\().*?\\))"; //matches (...)
// as Casimir et Hippolyte commented, I know use
// ((?:(...))+) for the non capturing group
String matchesFoo = "((?:"+ matchesGroup +")+)\\s(.*)";
Pattern pattern = Pattern.compile(matchesFoo);
Matcher matcher = pattern.matcher(input);
while (matcher.find()){
System.out.println(matcher.group());
}
but there is nothing captured at all
actual :
expected : foo
Where is my fault in the regex ?
since you want to match multiple (...) groups, account for the possible trailing space and move the + to quantify one or more of those (I moved the space into the group, and the + quantifying that whole structure)
String matchesFoo = "(?:(?:(?=\\().*?\\))\\s?)+(.*)";
demo here
Why don't you try this:
String testtring = "matches matches foo";
testString = testString.replaceAll("matches", "");
System.out.println(testString);

multiple regex matches in a string

i have the following text:
bla [string1] bli [string2]
I like to match string1 and string2 with regex in a loop in java.
Howto do ?
my code so far, which only matches the first string1, but not also string 2.
String sRegex="(?<=\\[).*?(?=\\])";
Pattern p = Pattern.compile(sRegex); // create the pattern only once,
Matcher m = p.matcher(sFormula);
if (m.find())
{
String sString1 = m.group(0);
String sString2 = m.group(1); // << no match
}
Your regex is not using any captured groups hence this call with throw exceptions:
m.group(1);
You can use just use:
String sRegex="(?<=\\[)[^]]*(?=\\])";
Pattern p = Pattern.compile(sRegex); // create the pattern only once,
Matcher m = p.matcher(sFormula);
while (m.find()) {
System.out.println( m.group() );
}
Also if should be replaced by while to match multiple times to return all matches.
Your approach is confused. Either write your regex so that it matches two [....] sequences in the one pattern, or call find multiple times. Your current attempt has a regex that "finds" just one [...] sequence.
Try something like this:
Pattern p = Pattern.compile("\\[([^\\]]+)]");
Matcher m = p.matcher(formula);
if (m.find()) {
String string1 = m.group(0);
if (m.find(m.end()) {
String string2 = m.group(0);
}
}
Or generalize using a loop and an array of String for the extracted strings.
(You don't need any fancy look-behind patterns in this case. And ugly "hungarian notation" is frowned in Java, so get out of the habit of using it.)

Why does this pattern matching code not work?

I'm trying to do some pattern matching in Java:
Pattern p = Pattern.compile("(\\d+) (\\.+)");
Matcher m = p.matcher("5 soy milk");
String qty = m.group(1);
String name = m.group(2);
I want to end up with one string that contains "5" and one string that contains "soy milk". However, this pattern matching code gives me an IllegalStateException.
You have to call matches() before you attempt to get the groups.
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#matches()
public boolean matches()
Attempts to match the entire region against the pattern.
If the match succeeds then more information can be obtained via the start, end, and group methods.
Try this:
Pattern p = Pattern.compile("(\\d+) (\\.+)");
Matcher m = p.matcher("5 soy milk");
if (m.matches())
{
String qty = m.group(1);
String name = m.group(2);
}
This is because you don't initiate your Matcher. You should p.matcher(...).matches() (or .find(), or .lookingAt(), depending on the desired behaviour -- real regex matching is done with .find()).
And check the result of .matches() since in your case it returns false: \.+ ("\\.+" in a Java string) will try and match a dot one or more times; you should use .+ (".+" in a Java string) to match "any character, one or more times".

Pattern/Matcher group() to obtain substring in Java?

UPDATE: Thanks for all the great responses! I tried many different regex patterns but didn't understand why m.matches() was not doing what I think it should be doing. When I switched to m.find() instead, as well as adjusting the regex pattern, I was able to get somewhere.
I'd like to match a pattern in a Java string and then extract the portion matched using a regex (like Perl's $& operator).
This is my source string "s": DTSTART;TZID=America/Mexico_City:20121125T153000
I want to extract the portion "America/Mexico_City".
I thought I could use Pattern and Matcher and then extract using m.group() but it's not working as I expected. I've tried monkeying with different regex strings and the only thing that seems to hit on m.matches() is ".*TZID.*" which is pointless as it just returns the whole string. Could someone enlighten me?
Pattern p = Pattern.compile ("TZID*:"); // <- change to "TZID=([^:]*):"
Matcher m = p.matcher (s);
if (m.matches ()) // <- change to m.find()
Log.d (TAG, "looking at " + m.group ()); // <- change to m.group(1)
You use m.match() that tries to match the whole string, if you will use m.find(), it will search for the match inside, also I improved a bit your regexp to exclude TZID prefix using zero-width look behind:
Pattern p = Pattern.compile("(?<=TZID=)[^:]+"); //
Matcher m = p.matcher ("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
System.out.println(m.group());
}
This should work nicely:
Pattern p = Pattern.compile("TZID=(.*?):");
Matcher m = p.matcher(s);
if (m.find()) {
String zone = m.group(1); // group count is 1-based
. . .
}
An alternative regex is "TZID=([^:]*)". I'm not sure which is faster.
You are using the wrong pattern, try this:
Pattern p = Pattern.compile(".*?TZID=([^:]+):.*");
Matcher m = p.matcher (s);
if (m.matches ())
Log.d (TAG, "looking at " + m.group(1));
.*? will match anything in the beginning up to TZID=, then TZID= will match and a group will begin and match everything up to :, the group will close here and then : will match and .* will match the rest of the String, now you can get what you need in group(1)
You are missing a dot before the asterisk. Your expression will match any number of uppercase Ds.
Pattern p = Pattern.compile ("TZID[^:]*:");
You should also add a capturing group unless you want to capture everything, including the "TZID" and the ":"
Pattern p = Pattern.compile ("TZID=([^:]*):");
Finally, you should use the right API to search the string, rather than attempting to match the string in its entirety.
Pattern p = Pattern.compile("TZID=([^:]*):");
Matcher m = p.matcher("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
System.out.println(m.group(1));
}
This prints
America/Mexico_City
Why not simply use split as:
String origStr = "DTSTART;TZID=America/Mexico_City:20121125T153000";
String str = origStr.split(":")[0].split("=")[1];

Categories

Resources