rules for use reg exp java - java

i have pattern:
host=([a-z0-9./:]*)
it's find for me host address. And i have content
host=http//:sdf3452.domain.com/
And my code is:
Matcher m;
Pattern hostP = Pattern.compile("host=([a-z0-9./:]*)");
m=hostP.matcher(content);//string 1
String match = m.group();//string 2
Log.i("host", ""+hostP.matcher(content).find());
if i delete string 1 and 2 i see true in logcat. If left as is I got exception nothing found.
I've tried all kinds of pattern. Through debug looked m variable, finds no match. Please teach me use reg exp!

Before you group() a match, you need to invoke find().
Try it like this:
Pattern hostP = Pattern.compile("host=([a-z0-9./:]*)");
Matcher m = hostP.matcher(content);
if(m.find()) {
String match = m.group();
// ...
}
EDIT
and a little demo that shows what each match-group contains:
Pattern p = Pattern.compile("host=([a-z0-9./:]*)");
Matcher m = p.matcher("host=http://sdf3452.domain.com/");
if (m.find()) {
for(int i = 0; i <= m.groupCount(); i++) {
System.out.printf("m.group(%d) = '%s'\n", i, m.group(i));
}
}
which will print:
m.group(0) = 'host=http://sdf3452.domain.com/'
m.group(1) = 'http://sdf3452.domain.com/'
As you can see, group(0), which is the same as group(), contains what the entire pattern matches.
But realize that a URL can contain much more than what your defined in [a-z0-9./:]*!

String content = "host=http://sdf3452.domain.com/";
Matcher mm;
Pattern hostP = Pattern.compile("host=([a-z0-9./:]*)");
mm=hostP.matcher(content);
String match = "";
if (mm.find()){//use m.find() first
match = mm.group(1);//1 is order number of brackets
}

Related

Java regex pattern matching not working for second occurrence

I am using java.util.Regex to match regex expression in a string. The string basically a html string.
Within that string I have two lines;
<style>templates/style/color.css</style>
<style>templates/style/style.css</style>
My requirement is to get the content inside style tag (<style>). Now I am using the pattern like;
String stylePattern = "<style>(.+?)</style>";
When I am trying to get the result using;
Pattern styleRegex = Pattern.compile(stylePattern);
Matcher matcher = styleRegex.matcher(html);
System.out.println("Matcher count : "+matcher.groupCount()+ " and "+matcher.find()); //output 1
if(matcher.find()) {
System.out.println("Inside find");
for (int i = 0; i < matcher.groupCount(); i++) {
String matchSegment = matcher.group(i);
System.out.println(matchSegment); //output 2
}
}
The result I am getting from output 1 as :
Matcher count : 1 and true
And from output 2 as;
<style>templates/style/style.css</style>
Now, I am just lost after lot of trying that how do I get both lines. I tried many other suggestion in stackoverflow itself, none worked.
I think I am doing some conceptual mistake.
Any help will be very good for me. Thanks in advance.
EDIT
I have changed code as;
Matcher matcher = styleRegex.matcher(html);
//System.out.println("find : "+matcher.find() + "Groupcount = " +matcher.groupCount());
//matcher.reset();
int i = 0;
while(matcher.find()) {
System.out.println(matcher.group(i));
i++;
}
Now the result is like;
`<style>templates/style/color.css</style>
templates/style/style.css`
Why one with style tag and another one is without style tag?
This will find all occurrences from your string.
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group());
}
Can try this:
String text = "<style>templates/style/color.css</style>\n" +
"<style>templates/style/style.css</style>";
Pattern pattern = Pattern.compile("<style>(.+?)</style>");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(text.substring(matcher.start(), matcher.end()));
}
Or:
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
}

check if the text has more than one link

I want to check if the text has more than one link or not
so for that i started with the following code:
private static void twoOrMorelinks(String commentstr){
String urlPattern = "^.*((?:http|https):\\/\\/\\S+){1,}.*((?:http|https):\\/\\/\\S+){1,}.*$";
Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(commentstr);
if (m.find()) {
System.out.println("yes");
}
}
But the above code is not very professional and I am looking for something as follow:
private static void twoOrMorelinks(String commentstr){
String urlPattern = "^.*((?:http|https):\\/\\/\\S+){2,}.*$";
Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(commentstr);
if (m.find()) {
System.out.println("yes");
}
}
But this code does not work for instance I expect the code to show match for the following text but it does not:
They say 2's company watch live on...? http://www.le testin this code http://www.lexilogos.com
any idea?
Just use this to count how many links you have:
private static int countLinks(String str) {
int total = 0;
Pattern p = Pattern.compile("(?:http|https):\\/\\/");
Matcher m = p.matcher(str);
while (m.find()) {
total++;
}
return total;
}
Then
boolean hasMoreThanTwo = countLinks("They say 2's company watch live on...? http://www.le testin this code http://www.lexilogos.com") >= 2;
If you just want to know if you have two or more, just exit after you found two.
I suggest to use the find method instead of the matches that must check all the string. I rewrite your pattern to limit the amount of backtracking:
String urlPattern = "\\bhttps?://[^h]*+(?:(?:\\Bh|h(?!ttps?://))[^h]*)*+https?://";
Pattern p = Pattern.compile(urlPattern, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(str);
if (m.find()) {
// true
} else {
// false
}
pattern details:
\\b # word boundary
https?:// # scheme for http or https
[^h]*+ # all that is not an "h"
(?:
(?:
\\Bh # an "h" not preceded by a word boundary
| # OR
h(?!ttps?://) # an "h" not followed by "ttp://" or "ttps://"
)
[^h]*
)*+
https?:// # an other scheme

Matcher not finding matches

I'm trying to extract the numbers in the following string:
09/29/2014
I am currently using the code:
Pattern p = Pattern.compile("([0-9]{2})/([0-9]{2})/([0-9]{4})");
Matcher m = p.matcher(startDatepicker);
String startYear = m.group(3);
String startMonth = m.group(1);
String startDay = m.group(2);
startDatepicker contains: 09/29/2014
However, I am not receiving any matches.. I also tried escaping the forward slashes with \\ but that also didn't work.
Am I missing something?
Thanks for your help.
Before you could access the matched groups, you need to call find() on the matcher, and check that it has found a match:
Pattern p = Pattern.compile("([0-9]{2})/([0-9]{2})/([0-9]{4})");
Matcher m = p.matcher(startDatepicker);
if (!m.find()) {
return;
}
String startYear = m.group(3);
String startMonth = m.group(1);
String startDay = m.group(2);
The call of m.find() positions the matcher on the first match.
Demo.
You need to call find() to iterate through your match groups.
Pattern p = Pattern.compile("([0-9]{2})/([0-9]{2})/([0-9]{4})");
Matcher m = p.matcher(startDatepicker);
while (m.find()) {
...
}
The find() method searches for occurrences of the regex in the input passed to p.matcher(). If multiple matches can be found, this method will find the first, and then move to the next match for each subsequent call.

Pattern Matching with dynamic matcher

Sample input string : Customer ${/xml:Name} has Ordered Product ${/xml:product} of ${/xml:unit} units.
i able to find get strings that match ${ ...... } using "\\$\\{.*?\\}"
I resolve the value for string from xml and now i have to replace the value back in input string.
i am using this method,
Pattern MY_PATTERN = Pattern.compile("\\$\\{.*?\\}");
Matcher m = MY_PATTERN.matcher(inputstring);
while (m.find()) {
String s = m.group(0); // s is ${/xml:Name}
// escaping wild characters
s = s.replaceAll("${", "\\$\\{"); // s is \$\{/xml:Name}
s = s.replaceAll("}", "\\}"); // s is \$\{/xml:Name\}
Pattern inner_pattern = Pattern.compile(s);
Matcher m1 = inner_pattern.matcher(inputstring);
name = m1.replaceAll(xPathValues.get(s));
}
but i get error at s = s.replaceAll("${", "\\$\\{"); i get Pattern Syntax Exception
You must escape the { too, try $\\{
Instead of:
s = s.replaceAll("${", "\\$\\{"); // s is \$\{/xml:Name}
s = s.replaceAll("}", "\\}"); // s is \$\{/xml:Name\}
You can use it without regex method String#replace(string):
s = s.replace("${", "\\$\\{").replace("}", "\\}"); // s is \$\{/xml:Name\}
It's because you could have a regexp likea{1,4} means to match a,aa,aaa,aaaa so a times 1 to 4, java tries to interpret your regexp like this, therefore try escaping the {
Yes, you must escape the {, but I would rather capture what's inside the braces:
Pattern MY_PATTERN = Pattern.compile("\\$\\{/xml:(.*?)\\}");
Matcher m = MY_PATTERN.matcher(inputstring);
while (m.find()) {
name = m.group(1); // s is Name
...
}

Get an array of Strings matching a pattern from a String

I have a long string let's say
I like this #computer and I want to buy it from #XXXMall.
I know the regular expression pattern is
Pattern tagMatcher = Pattern.compile("[#]+[A-Za-z0-9-_]+\\b");
Now i want to get all the hashtags in an array. How can i use this expression to get array of all hash tags from string something like
ArrayList hashtags = getArray(pattern, str)
You can write like?
private static List<String> getArray(Pattern tagMatcher, String str) {
Matcher m = tagMatcher.matcher(str);
List<String> l = new ArrayList<String>();
while(m.find()) {
String s = m.group(); //will give you "#computer"
s = s.substring(1); // will give you just "computer"
l.add(s);
}
return l;
}
Also you can use \\w- instead of A-Za-z0-9-_ making the regex [#]+[\\w]+\\b
This link would surely be helpful for achieving what you want.
It says:
The find() method searches for occurrences of the regular expressions
in the text passed to the Pattern.matcher(text) method, when the
Matcher was created. If multiple matches can be found in the text, the
find() method will find the first, and then for each subsequent call
to find() it will move to the next match.
The methods start() and end() will give the indexes into the text
where the found match starts and ends.
Example:
String text =
"This is the text which is to be searched " +
"for occurrences of the word 'is'.";
String patternString = "is";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
int count = 0;
while(matcher.find()) {
count++;
System.out.println("found: " + count + " : "
+ matcher.start() + " - " + matcher.end());
}
You got the hint now.
Here is one way, using Matcher
Pattern tagMatcher = Pattern.compile("#+[-\\w]+\\b");
Matcher m = tagMatcher.matcher(stringToMatch);
ArrayList<String> hashtags = new ArrayList<>();
while (m.find()) {
hashtags.add(m.group());
}
I took the liberty of simplifying your regex. # does not need to be in a character class. [A-Za-z0-9_] is the same as \w, so [A-Za-z0-9-_] is the same as [-\w]
You can use :
String val="I like this #computer and I want to buy it from #XXXMall.";
String REGEX = "(?<=#)[A-Za-z0-9-_]+";
List<String> list = new ArrayList<String>();
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(val);
while(matcher.find()){
list.add(matcher.group());
}
(?<=#) Positive Lookbehind - Assert that the character # literally be matched.
you can use the following code for getting the names
String saa = "#{akka}nikhil#{kumar}aaaaa";
Pattern regex = Pattern.compile("#\\{(.*?)\\}");
Matcher m = regex.matcher(saa);
while(m.find()) {
String s = m.group(1);
System.out.println(s);
}
It will print
akka
kumar

Categories

Resources