Java regex pattern matching not working for second occurrence - java

I am using java.util.Regex to match regex expression in a string. The string basically a html string.
Within that string I have two lines;
<style>templates/style/color.css</style>
<style>templates/style/style.css</style>
My requirement is to get the content inside style tag (<style>). Now I am using the pattern like;
String stylePattern = "<style>(.+?)</style>";
When I am trying to get the result using;
Pattern styleRegex = Pattern.compile(stylePattern);
Matcher matcher = styleRegex.matcher(html);
System.out.println("Matcher count : "+matcher.groupCount()+ " and "+matcher.find()); //output 1
if(matcher.find()) {
System.out.println("Inside find");
for (int i = 0; i < matcher.groupCount(); i++) {
String matchSegment = matcher.group(i);
System.out.println(matchSegment); //output 2
}
}
The result I am getting from output 1 as :
Matcher count : 1 and true
And from output 2 as;
<style>templates/style/style.css</style>
Now, I am just lost after lot of trying that how do I get both lines. I tried many other suggestion in stackoverflow itself, none worked.
I think I am doing some conceptual mistake.
Any help will be very good for me. Thanks in advance.
EDIT
I have changed code as;
Matcher matcher = styleRegex.matcher(html);
//System.out.println("find : "+matcher.find() + "Groupcount = " +matcher.groupCount());
//matcher.reset();
int i = 0;
while(matcher.find()) {
System.out.println(matcher.group(i));
i++;
}
Now the result is like;
`<style>templates/style/color.css</style>
templates/style/style.css`
Why one with style tag and another one is without style tag?

This will find all occurrences from your string.
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group());
}

Can try this:
String text = "<style>templates/style/color.css</style>\n" +
"<style>templates/style/style.css</style>";
Pattern pattern = Pattern.compile("<style>(.+?)</style>");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(text.substring(matcher.start(), matcher.end()));
}
Or:
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
}

Related

Java : RegEx to find a substring Collection

I am using below java program to find list of js files as a Substring.
String str = "jsLib//connect.facebook.net/en_US/fbevents.js , jsLib//connect.facebook.net/en_US/fbevents2.js;";
String patternStr = "(\\/.*?\\.js)";
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(html);
if (matcher.find()) {
System.out.println("Count:" + matcher.groupCount());
jsLib = matcher.group(1);
jsLib = jsLib.substring(jsLib.lastIndexOf('/') + 1, jsLib.length());
System.out.println("jsLib:" + jsLib);
}
Regex : I used String patternStr="(\\/.*?\\.js)";
Expected Result : both fbevents.js and fbevents2.js should be matched and part of result
Actual Result : only fbevents.js is matched
You may get all your results using while loop and a regex like [^/]*\.js:
String str = "jsLib//connect.facebook.net/en_US/fbevents.js , jsLib//connect.facebook.net/en_US/fbevents2.js;";
String patternStr = "[^/]*\\.js";
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println("jsLib:" + matcher.group());
}
Output:
jsLib:fbevents.js
jsLib:fbevents2.js
See the Java demo and the regex demo.
The [^/]*\.js pattern matches any 0+ chars other than / (with [^/]*) and then a .js substring.

regular expression to match a string in order

I have string as follows
"ValueFilter("val1") AND ColumnFilter("val2") AND ValueFilter("val3")"
I have stored the following regex in a array. Using for loop I tried to match the pattern
"ValueFilter\\((.*?)\\)","ColumnFilter\\((.*?)\\)"
what I will do is I will replace the value in the bracket and copy it to a new string.
When I run this above regex against the string in the first loop i have XFilter so it will match both occurrence. But I want to do this in order.
Here is the i thing i want to achieve
first i want to match ValueFilter first then ColumnFilter then again ValueFilter. How can I achieve this?
Edit : Added Code
String expr = "\"ValueFilter(\"val1\") AND ColumnFilter(\"val2\") AND ValueFilter(\"val3\")\"";
String patterns = {"ValueFilter\\((.*?)\\)", "ColumnFilter\\((.*?)\\)"}
for (String pattern : patterns) {
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(expr);
while (m.find()) {
//do something
}
}
Expected Output
ValueFilter("val1")
ColumnFilter("val2")
ValueFilter("val3")
You can use this regex [XY]Filter\((.*?)\) with pattern and you have to loop throw the matches using :
String str = "\"XFilter(\"val1\") AND YFilter(\"val2\") AND XFilter(\"val3\")\"";
String regex = "[XY]Filter\\((.*?)\\)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}
Note you can i use [XY] which mean to match both X or Y,
Output
XFilter("val1")
YFilter("val2")
XFilter("val3")
regex demo
If you want to get only the value you can get the group 1 like matcher.group(1) instead, the output should be :
"val1"
"val2"
"val3"
Edit
what if I have filtername as "ValueFilter" and "ColumnFilter" instead
of X and Y
In this case you can use (Value|Column) instead of [XY] which mean match ValueFilter or ColumnFilter, the regex should look like :
String str = "\"ValueFilter(\"val1\") AND ColumnFilter(\"val2\") AND ValueFilter(\"val3\")\"";
String regex = "(Value|Column)Filter\\((.*?)\\)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}
Output
ValueFilter("val1")
ColumnFilter("val2")
ValueFilter("val3")
Check code demo

RegEx to extract text between tags in Java

I need to extract the values after :70: in the following text file using RegEx. Value may contain line breaks as well.
My current solution is to extract the string between :70: and : but this always returns only one match, the whole text between the first :70: and last :.
:32B:xxx,
:59:yyy
something
:70:ACK1
ACK2
:21:something
:71A:something
:23E:something
value
:70:ACK2
ACK3
:71A:something
How can I achive this using Java? Ideally I want to iterate through all values, i.e.
ACK1\nACK2,
ACK2\nACK3
Thanks :)
Edit: What I'm doing right now,
Pattern pattern = Pattern.compile("(?<=:70:)(.*)(?=\n)", Pattern.DOTALL);
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
System.out.println(matcher.group())
}
Try this.
String data = ""
+ ":32B:xxx,\n"
+ ":59:yyy\n"
+ "something\n"
+ ":70:ACK1\n"
+ "ACK2\n"
+ ":21:something\n"
+ ":71A:something\n"
+ ":23E:something\n"
+ "value\n"
+ ":70:ACK2\n"
+ "ACK3\n"
+ ":71A:something\n";
Pattern pattern = Pattern.compile(":70:(.*?)\\s*:", Pattern.DOTALL);
Matcher matcher = pattern.matcher(data);
while (matcher.find())
System.out.println("found="+ matcher.group(1));
result:
found=ACK1
ACK2
found=ACK2
ACK3
You need a loop to do this.
Pattern p = Pattern.compile(regexPattern);
List<String> list = new ArrayList<String>();
Matcher m = p.matches(input);
while (m.find()) {
list.add(m.group());
}
As seen here Create array of regex matches

Get an array of Strings matching a pattern from a String

I have a long string let's say
I like this #computer and I want to buy it from #XXXMall.
I know the regular expression pattern is
Pattern tagMatcher = Pattern.compile("[#]+[A-Za-z0-9-_]+\\b");
Now i want to get all the hashtags in an array. How can i use this expression to get array of all hash tags from string something like
ArrayList hashtags = getArray(pattern, str)
You can write like?
private static List<String> getArray(Pattern tagMatcher, String str) {
Matcher m = tagMatcher.matcher(str);
List<String> l = new ArrayList<String>();
while(m.find()) {
String s = m.group(); //will give you "#computer"
s = s.substring(1); // will give you just "computer"
l.add(s);
}
return l;
}
Also you can use \\w- instead of A-Za-z0-9-_ making the regex [#]+[\\w]+\\b
This link would surely be helpful for achieving what you want.
It says:
The find() method searches for occurrences of the regular expressions
in the text passed to the Pattern.matcher(text) method, when the
Matcher was created. If multiple matches can be found in the text, the
find() method will find the first, and then for each subsequent call
to find() it will move to the next match.
The methods start() and end() will give the indexes into the text
where the found match starts and ends.
Example:
String text =
"This is the text which is to be searched " +
"for occurrences of the word 'is'.";
String patternString = "is";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
int count = 0;
while(matcher.find()) {
count++;
System.out.println("found: " + count + " : "
+ matcher.start() + " - " + matcher.end());
}
You got the hint now.
Here is one way, using Matcher
Pattern tagMatcher = Pattern.compile("#+[-\\w]+\\b");
Matcher m = tagMatcher.matcher(stringToMatch);
ArrayList<String> hashtags = new ArrayList<>();
while (m.find()) {
hashtags.add(m.group());
}
I took the liberty of simplifying your regex. # does not need to be in a character class. [A-Za-z0-9_] is the same as \w, so [A-Za-z0-9-_] is the same as [-\w]
You can use :
String val="I like this #computer and I want to buy it from #XXXMall.";
String REGEX = "(?<=#)[A-Za-z0-9-_]+";
List<String> list = new ArrayList<String>();
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(val);
while(matcher.find()){
list.add(matcher.group());
}
(?<=#) Positive Lookbehind - Assert that the character # literally be matched.
you can use the following code for getting the names
String saa = "#{akka}nikhil#{kumar}aaaaa";
Pattern regex = Pattern.compile("#\\{(.*?)\\}");
Matcher m = regex.matcher(saa);
while(m.find()) {
String s = m.group(1);
System.out.println(s);
}
It will print
akka
kumar

rules for use reg exp java

i have pattern:
host=([a-z0-9./:]*)
it's find for me host address. And i have content
host=http//:sdf3452.domain.com/
And my code is:
Matcher m;
Pattern hostP = Pattern.compile("host=([a-z0-9./:]*)");
m=hostP.matcher(content);//string 1
String match = m.group();//string 2
Log.i("host", ""+hostP.matcher(content).find());
if i delete string 1 and 2 i see true in logcat. If left as is I got exception nothing found.
I've tried all kinds of pattern. Through debug looked m variable, finds no match. Please teach me use reg exp!
Before you group() a match, you need to invoke find().
Try it like this:
Pattern hostP = Pattern.compile("host=([a-z0-9./:]*)");
Matcher m = hostP.matcher(content);
if(m.find()) {
String match = m.group();
// ...
}
EDIT
and a little demo that shows what each match-group contains:
Pattern p = Pattern.compile("host=([a-z0-9./:]*)");
Matcher m = p.matcher("host=http://sdf3452.domain.com/");
if (m.find()) {
for(int i = 0; i <= m.groupCount(); i++) {
System.out.printf("m.group(%d) = '%s'\n", i, m.group(i));
}
}
which will print:
m.group(0) = 'host=http://sdf3452.domain.com/'
m.group(1) = 'http://sdf3452.domain.com/'
As you can see, group(0), which is the same as group(), contains what the entire pattern matches.
But realize that a URL can contain much more than what your defined in [a-z0-9./:]*!
String content = "host=http://sdf3452.domain.com/";
Matcher mm;
Pattern hostP = Pattern.compile("host=([a-z0-9./:]*)");
mm=hostP.matcher(content);
String match = "";
if (mm.find()){//use m.find() first
match = mm.group(1);//1 is order number of brackets
}

Categories

Resources