Extracting a string using Regex

Extracting a string using Regex - java

I have the following code to extract the string within double quotes using Regex.
String str ="\"Java\",\"programming\"";
final Pattern pattern = Pattern.compile("\"([^\"]*)\"");
final Matcher matcher = pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
}
The output I get now is java programming.But from the String str I want the content in the second double quotes which is programming. Can any one tell me how to do that using Regex.

If you take your example, and change it slightly to:
String str ="\"Java\",\"programming\"";
final Pattern pattern = Pattern.compile("\"([^\"]*)\"");
final Matcher matcher = pattern.matcher(str);
int i = 0
while(matcher.find()){
System.out.println("match " + ++i + ": " + matcher.group(1) + "\n");
}
You should find that it prints:
match 1: Java
match 2: programming
This shows that you are able to loop over all of the matches. If you only want the last match, then you have a number of options:
Store the match in the loop, and when the loop is finished, you have the last match.
Change the regex to ignore everything until your pattern, with something like: Pattern.compile(".*\"([^\"]*)\"")
If you really want explicitly the second match, then the simplest solution is something like Pattern.compile("\"([^\"]*)\"[^\"]*\"([^\"]*)\""). This gives two matching groups.

If you want the last token inside double quotes, add an end-of-line archor ($):
final Pattern pattern = Pattern.compile("\"([^\"]*)\"$");
In this case, you can replace while with if if your input is a single line.

Great answer from Paul. Well,You can also try this pattern
final Pattern pattern = Pattern.compile(",\"(\\w+)\"");
Java program
String str ="\"Java\",\"programming\"";
final Pattern pattern = Pattern.compile(",\"(\\w+)\"");
final Matcher matcher = pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Explanation
,\": matches a comma, followed by a quotation mark "
(\\w+): matches one or more words
\": matches the last quotation mark "
Then the group(\\w+) is captured (group 1 precisely)
Output
programming

Related

java.util.regex matching parentheses

I am using java.util.regex for matching like bellow
public static void main(String[] args) {
String input = "<b>I love you (LT): </b>xxxxxxxxxxxxxxxxxxxxxxxxx";
String patternStr = "I love you (LT):";
String noParentStr = "I love you";
Pattern pattern = Pattern.compile(patternStr);
Pattern noParentPattern = Pattern.compile(noParentStr);
Matcher matcher = pattern.matcher(input);
Matcher noParrentTheseMatcher = noParentPattern.matcher(input);
System.out.println("result:" + matcher.find());
System.out.println("result no parenthese:" + noParrentTheseMatcher.find());
}
I can see the input string contain patternStr "I love you (LT):". But I get the result
result:false
result no parenthese:true
How can i match string contain parentheses '(',')'

In regex, parentheses are meta characters.
i.e., they are reserved for special use.
Specifically a feature called "Capture Groups".
Try escaping them with a \ before each bracket
I love you \(LT\):
List of all special characters that need to be escaped in a regex

As it has been pointed out in the comments, you don't need to use a regex to check if your input String contains I love you (LT):. In fact, there is no actual pattern to represent, only a character by character comparison between a portion of your input and the string you're looking for.
To achieve what you want, you could use the contains method of the String class, which suits perfectly your needs.
String input = "<b>I love you (LT): </b>xxxxxxxxxxxxxxxxxxxxxxxxx";
String strToLookFor = "I love you (LT):";
System.out.println("Result w Contains: " + input.contains(strToLookFor)); //Returns true
Instead, if you actually need to use a regex because it is a requirement. Then, as #Yarin already said, you need to escape the parenthesis since those are characters with a special meaning. They're in fact employed for capturing groups.
String input = "<b>I love you (LT): </b>xxxxxxxxxxxxxxxxxxxxxxxxx";
String strToLookFor = "I love you (LT):";
Pattern pattern = Pattern.compile(strPattern);
Matcher matcher = pattern.matcher(input);
System.out.println("Result w Pattern: " + matcher.find()); //Returns true

How to parse a range input in java

I want to parse a range of data (e.g. 100-2000) in Java. Is this code correct:
String patternStr = "^(\\\\d+)-(\\\\d+)$";
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
if(matcher.find()){
// Doing some parser
}

Too many backslashes, and you can use matches() without anchors (^$).
String inputStr = "100-2000";
String patternStr = "(\\d+)-(\\d+)";
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
if (matcher.matches()) {
System.out.println(matcher.group(1) + " - " + matcher.group(2));
}
As for your question "Is this code correct", all you had to do was wrap the code in a class with a main method and run it, and you'd get the answer: No.

No, you're double (well, quadruple)-escaping the digits.
It should be: "^(\\d+)-(\\d+)$".
Meaning:
Start of input: ^
Group 1: 1+ digit(s): (\\d+)
Hyphen literal: -
Group 2: 1+ digit(s): (\\d+)
End of input: $
Notes
The groups are useful for back-references. Here you're using none, so you can ditch the parenthesis around the \\d+ expressions.
You are parsing the representation of a range in this example.
If you want an actual range class, you can use the [min-max] idiom, where "min" and "max" are numbers, for instance [0-9].
As mentioned by Andreas, you can use String.matches without the Pattern-Matcher idiom and the ^ and $, if you want to match the whole input.

Android Java regexp pattern

I ping a host. In result a standard output. Below a REGEXP but it do not work correct. Where I did a mistake?
String REGEXP ="time=(\\\\d+)ms";
Pattern pattern = Pattern.compile(REGEXP);
Matcher matcher = pattern.matcher(result);
if (matcher.find()) {
result = matcher.group(1);
}

You only need \\d+ in your regex because
Matcher looks for the pattern (using which it is created) and then tries to find every occurance of the pattern in the string being matched.
Use while(matcher.group(1) in case of multiple occurances.
each () represents a captured group.

You have too many backslashes. Assuming you want to get the number from a string like "time=32ms", then you need:
String REGEXP ="time=(\\d+)ms";
Pattern pattern = Pattern.compile(REGEXP);
Matcher matcher = pattern.matcher(result);
if (matcher.find()) {
result = matcher.group(1);
}
Explanation: The search pattern you are looking for is "\d", meaning a decimal number, the "+" means 1 or more occurrences.
To get the "\" to the matcher, it needs to be escaped, and the escape character is also "\".
The brackets define the matching group that you want to pick out.
With "\\\\d+", the matcher sees this as "\\d+", which would match a backslash followed by one or more "d"s. The first backslash protects the second backslash, and the third protects the fourth.

Pattern/Matcher group() to obtain substring in Java?

UPDATE: Thanks for all the great responses! I tried many different regex patterns but didn't understand why m.matches() was not doing what I think it should be doing. When I switched to m.find() instead, as well as adjusting the regex pattern, I was able to get somewhere.
I'd like to match a pattern in a Java string and then extract the portion matched using a regex (like Perl's $& operator).
This is my source string "s": DTSTART;TZID=America/Mexico_City:20121125T153000
I want to extract the portion "America/Mexico_City".
I thought I could use Pattern and Matcher and then extract using m.group() but it's not working as I expected. I've tried monkeying with different regex strings and the only thing that seems to hit on m.matches() is ".*TZID.*" which is pointless as it just returns the whole string. Could someone enlighten me?
Pattern p = Pattern.compile ("TZID*:"); // <- change to "TZID=([^:]*):"
Matcher m = p.matcher (s);
if (m.matches ()) // <- change to m.find()
Log.d (TAG, "looking at " + m.group ()); // <- change to m.group(1)

You use m.match() that tries to match the whole string, if you will use m.find(), it will search for the match inside, also I improved a bit your regexp to exclude TZID prefix using zero-width look behind:
Pattern p = Pattern.compile("(?<=TZID=)[^:]+"); //
Matcher m = p.matcher ("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
System.out.println(m.group());
}

This should work nicely:
Pattern p = Pattern.compile("TZID=(.*?):");
Matcher m = p.matcher(s);
if (m.find()) {
String zone = m.group(1); // group count is 1-based
. . .
}
An alternative regex is "TZID=([^:]*)". I'm not sure which is faster.

You are using the wrong pattern, try this:
Pattern p = Pattern.compile(".*?TZID=([^:]+):.*");
Matcher m = p.matcher (s);
if (m.matches ())
Log.d (TAG, "looking at " + m.group(1));
.*? will match anything in the beginning up to TZID=, then TZID= will match and a group will begin and match everything up to :, the group will close here and then : will match and .* will match the rest of the String, now you can get what you need in group(1)

You are missing a dot before the asterisk. Your expression will match any number of uppercase Ds.
Pattern p = Pattern.compile ("TZID[^:]*:");
You should also add a capturing group unless you want to capture everything, including the "TZID" and the ":"
Pattern p = Pattern.compile ("TZID=([^:]*):");
Finally, you should use the right API to search the string, rather than attempting to match the string in its entirety.
Pattern p = Pattern.compile("TZID=([^:]*):");
Matcher m = p.matcher("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
System.out.println(m.group(1));
}
This prints
America/Mexico_City

Why not simply use split as:
String origStr = "DTSTART;TZID=America/Mexico_City:20121125T153000";
String str = origStr.split(":")[0].split("=")[1];

forming correct regular expression in dynamic string

I have a FileInputStream who reads a file which somewhere contains a string subset looking like:
...
OperatorSpecific(XXX)
{
Customer(someContent)
SaveImage()
{
...
I would like to identify the Customer(someContent) part of the string and switch the someContent inside the parenthesis for something else.
someContent will be a dynamic parameter and will contain a string of maybe 5-10 chars.
I have used regEx before, like once or twice, but I feel that in a context such as this where I don't know what value will be inside the parenthesis I'm at a loss of how I should express it...
In summary I want to have a string returned to me which has my someContent value inside the Customer-parenthesis.
Does anyone have any bright ideas of how to get this done?

Try this one (double the escaping backslashes for the use in java!)
(?<=Customer\()[^\)]*
And replace with your content.
See it here at Regexr
(?<=Customer\() is look behind assertion. It checks at every position if there is a "Customer(" on the left, if yes it matches on the right all characters that are not a ")" with the [^\)]*, this is then the part that will be replaced.
Some working java code
Pattern p = Pattern.compile("(?<=Customer\\()[^\\)]*");
String original = "Customer(someContent)";
String Replacement = "NewContent";
Matcher m = p.matcher(original);
String result = m.replaceAll(Replacement);
System.out.println(result);
This will print
Customer(NewContent)

Using groups works and non-greedy works:
String s =
"OperatorSpecific(XXX)\n {\n" +
" Customer(someContent)\n" +
" SaveImage() {";
Pattern p = Pattern.compile("Customer\\((.*?)\\)");
Matcher matcher = p.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
will print
someContent

Untested, but something like the following should work:
Pattern pattern = Pattern.compile("\\s+Customer\\(\\s*(\\w+)\\s*\\)\\s*");
Matcher matcher = pattern.matcher(input);
matcher.matches();
System.out.println(matcher.group(1));
EDIT
This of course won't work with all possible cases:
// legal variable names
Customer(_someContent)
Customer($some_Content)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extracting a string using Regex - java

If you want the last token inside double quotes, add an end-of-line archor ($): final Pattern pattern = Pattern.compile("\"([^\"]*)\"$"); In this case, you can replace while with if if your input is a single line.

Related

java.util.regex matching parentheses

How to parse a range input in java

Android Java regexp pattern

Pattern/Matcher group() to obtain substring in Java?

forming correct regular expression in dynamic string

Categories

Resources