This Java program showing me IndexOutOfBoundsException when it tries to invoke group(1). If I replace 1 with 0 then the whole line is printed.. What do I have to do?
Pattern pattern = Pattern.compile("<abhi> abhinesh </abhi>");
Matcher matcher = pattern.matcher("<abhi> abhinesh </abhi>");
if (matcher.find())
System.out.println(matcher.group(1));
else
System.out.println("Not found");
index starts at 0 so use matcher.group(0)
Edit : To match the text between tag use this regex <abhi>(.*)<\\/abhi>
This post may shed more light on your question.
Confused about Matcher Group.
In short you haven't defined any regular expression grouping to reference an alternate group. You only have the full matching string.
Below if you try adding a grouped regular expression to parse the xml you'll notice 0 has the full string, 1 has the begin tag, 2 has the value, and 3 has the end tag.
Pattern pattern = Pattern.compile("<([a-z]+)>([a-z ]+)</([a-z]+)>");
Matcher matcher = pattern.matcher("<abhi> abhinesh </abhi>");
if (matcher.find()){
System.out.println(matcher.group(0));//<abhi> abhinesh </abhi>
System.out.println(matcher.group(1));//abhi
System.out.println(matcher.group(2));// abhinesh
System.out.println(matcher.group(3));//abhi
}else{
System.out.println("Not found");
}
Try this this regex:
<abhi>(.*)<\\/abhi>
The text you're after will be stored in the first capture group.
Example:
String regex = "<abhi>(.*)<\\/abhi>";
String input = "<abhi>foo</abhi>";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
if (m.find()) {
System.out.println(m.group(1));
}
Related
I have string as follows
"ValueFilter("val1") AND ColumnFilter("val2") AND ValueFilter("val3")"
I have stored the following regex in a array. Using for loop I tried to match the pattern
"ValueFilter\\((.*?)\\)","ColumnFilter\\((.*?)\\)"
what I will do is I will replace the value in the bracket and copy it to a new string.
When I run this above regex against the string in the first loop i have XFilter so it will match both occurrence. But I want to do this in order.
Here is the i thing i want to achieve
first i want to match ValueFilter first then ColumnFilter then again ValueFilter. How can I achieve this?
Edit : Added Code
String expr = "\"ValueFilter(\"val1\") AND ColumnFilter(\"val2\") AND ValueFilter(\"val3\")\"";
String patterns = {"ValueFilter\\((.*?)\\)", "ColumnFilter\\((.*?)\\)"}
for (String pattern : patterns) {
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(expr);
while (m.find()) {
//do something
}
}
Expected Output
ValueFilter("val1")
ColumnFilter("val2")
ValueFilter("val3")
You can use this regex [XY]Filter\((.*?)\) with pattern and you have to loop throw the matches using :
String str = "\"XFilter(\"val1\") AND YFilter(\"val2\") AND XFilter(\"val3\")\"";
String regex = "[XY]Filter\\((.*?)\\)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}
Note you can i use [XY] which mean to match both X or Y,
Output
XFilter("val1")
YFilter("val2")
XFilter("val3")
regex demo
If you want to get only the value you can get the group 1 like matcher.group(1) instead, the output should be :
"val1"
"val2"
"val3"
Edit
what if I have filtername as "ValueFilter" and "ColumnFilter" instead
of X and Y
In this case you can use (Value|Column) instead of [XY] which mean match ValueFilter or ColumnFilter, the regex should look like :
String str = "\"ValueFilter(\"val1\") AND ColumnFilter(\"val2\") AND ValueFilter(\"val3\")\"";
String regex = "(Value|Column)Filter\\((.*?)\\)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}
Output
ValueFilter("val1")
ColumnFilter("val2")
ValueFilter("val3")
Check code demo
String s = aaa-bbb-ccc-ddd-ee-23-xyz;
I need to convert the above string into aaa-bbb-ccc-ddd-ee, which means my output should only print words before fifth delimiter. could any help to solve this?
You could use a Regex:
String s = "aaa-bbb-ccc-ddd-ee-23-xyz";
Pattern p = Pattern.compile("^\\w+\\-\\w+\\-\\w+\\-\\w+\\-\\w+");
Matcher matcher = p.matcher(s);
matcher.find();
System.out.println(matcher.group(0));
Output is aaa-bbb-ccc-ddd-ee
If you have more than just letters you can replace the \\w with [^\\-] which grabs all characters but the delemiter.
Use Pattern and Matcher like this:
String s = "aaa-bbb-ccc-ddd-ee-23-xyz";
Pattern pattern = Pattern.compile("^((.+?-){4}[^-]+).*$");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
s = matcher.group(1);
}
.* - search all symbols. ? - for lazy work
(.*?-) - search character sequence which end with symbol '-'
{4} - in your result string '-' 4 times
[^-]+ - after you search characters without '-'
.* - another characters after you serch
matcher.group(1) - return first group. This is ((.+?-){4}[^-]+)
I'm starting with regex in Java recently, and I cant wrap my head around this problem.
Pattern p = Pattern.compile("[^A-Z]+");
Matcher matcher = p.matcher("GETs");
if (matcher.matches()) {
System.out.println("Matched.");
} else {
System.out.println("Did not match.");
}
Result: Did not Match(Unexpected result) Explain this
I get the output "Did not match." This is strange to me, while reading https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html,
I'm using the X+, which matches "One, or more times".
I thought my code in words would go something like this:
"Check if there is one or more characters in the string "GETs" which does not belong in A to Z."
So I'm expecting the following result:
"Yes, there is one character that does not belong to A-Z in "GETs", the regex was a match."
However this is not the case, I'm confused to why this is.
I tried the following:
Pattern p = Pattern.compile("[A-Z]+");
Matcher matcher = p.matcher("GETs");
if (matcher.matches()) {
System.out.println("Matched.");
} else {
System.out.println("Did not match.");
}
Result: Did not match. (Expected result)
Pattern p = Pattern.compile("[A-Z]+");
Matcher matcher = p.matcher("GET");
if (matcher.matches()) {
System.out.println("Matched.");
} else {
System.out.println("Did not match.");
}
Result: Matched. (Expected result)
Please, explain why my first example did not work.
Matcher.matches returns true only if the ENTIRE region
matches the pattern.
For the output you are looking for, use Matches.find instead
Explanation of each case:
Pattern p = Pattern.compile("[^A-Z]+");
Matcher matcher = p.matcher("GETs");
if (matcher.matches()) {
Fails because the ENTIRE region 'GETs' isn't lowercase
Pattern p = Pattern.compile("[A-Z]+");
Matcher matcher = p.matcher("GETs");
if (matcher.matches()) {
This fails because the ENTIRE region 'GETs' isn't uppercase
Pattern p = Pattern.compile("[A-Z]+");
Matcher matcher = p.matcher("GET");
if (matcher.matches()) {
The ENTIRE region 'GET' is uppercase, the pattern matches.
You're very first regex asks to match any character that is not in an uppercase range of A-Z. The match is on the lowercase "s" in GETs.
if you want a regex to match either in UPPERCASE and lowercase, you can use this:
String test = "yes";
String test2= "YEs";
test.matches("(?i).*\\byes\\b.*");
test2.matches("(?i).*\\byes\\b.*");
will return true in the two cases
I want to match every file name which ends with .js and is stored in a directory called lib.
Therefore I created the following regular expression: (lib/)(.*?).js$.
I tested the expression (lib/)(.*?).js$ in a Regex Tester and matched this filename: src/main/lib/abc/DocumentHandler.js.
To use my expression in Java, I escaped it to: (lib/)(.*?)\\.js$.
Nevertheless, Java tells me that my expression does not match.
Here is my code:
String regEx = "(lib/)(.*?).js$";
String escapedRegEx = "(lib/)(.*?)\\.js$";
Pattern pattern = Pattern.compile(escapedRegEx);
Matcher matcher = pattern.matcher("src/main/lib/abc/DocumentHandler.js");
System.out.println("Matches: " + matcher.matches()); // false :-(
Did I forgot to escape something?
Use Matcher.find() instead of Matcher.matches() to check for subset of any string.
As per Java Doc:
Matcher#matches()
Attempts to match the entire region against the pattern.
Matcher#find()
Attempts to find the next subsequence of the input sequence that matches the pattern.
sample code:
String regEx = "(lib/)(.*)\\.js$";
String str = "src/main/lib/abc/DocumentHandler.js";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(str);
if (matcher.find()) { // <== returns true if found
System.out.println("Matches: " + matcher.group());
System.out.println("Path: " + matcher.group(2));
}
output:
Matches: lib/abc/DocumentHandler.js
Path: abc/DocumentHandler
Use Matcher#group(index) to get the matched group that is grouped by enclosing inside parenthesis (...) in the regex pattern.
You can use String#matches() method to match the whole string.
String regEx = "(.*)(/lib/)(.*?)\\.js$";
String str = "src/main/lib/abc/DocumentHandler.js";
System.out.println("Matched :" + str.matches(regEx)); // Matched : true
Note: Don't forget to escape dot . that has special meaning in regex pattern to match any thing other than new line.
Try this RegEx pattern
String regEx = "(.*)(lib\\/)(.*)(\\.js$)";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher("src/main/lib/abc/DocumentHandler.js");
It's working for me:
Firstly you don't need to escape it, and secondly you are not matching the first part of the string.
String regEx = "(.*)(lib/)(.*?).js$";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher("src/main/lib/abc/DocumentHandler.js");
I want to break a string like :
String s = "xyz213123kop234430099kpf4532";
into tokens where each token starts with an alphabet and ends with a number. So the above string can be broken down into 3 tokens :
xyz213123
kop234430099
kpf4532
This string s could be very big but the pattern will remain the same, i.e each token will start with 3 alphabets and end with a number.
How do I split them ?
Try this:
\w+?\d+
Java Matcher:
Pattern pattern = Pattern.compile("\\w+?\\d+"); //compiles the pattern we want to use
Matcher matcher = pattern.matcher("xyz213123kop234430099kpf4532"); //we create the matcher on certain string using our pattern
while(matcher.find()) //while the matcher can find the next match
{
System.out.println(matcher.group()); //print it
}
And then you could use Regex.Matches C#:
foreach(Match m in Regex.Matches("xyz213123kop234430099kpf4532", #"\w+?\d+"))
{
Console.WriteLine(m.Value);
}
And for the future this:
RegExr
Do it like this,
String s = "xyz213123kop234430099kpf4532";
Pattern p = Pattern.compile("\\w+?\\d+");
Matcher match = p.matcher(s);
while(match.find()){
System.out.println(match.group());
}
OUTPUT
xyz213123
kop234430099
kpf4532
You can start from such regexp: (\w+?\d+)
http://regexr.com?36utt