Match only "<span>something</span>endofline" and not "<span>something</span></span>" - java

I have a pattern to match something like
...
<span class="count">1036</span>
...
But I don't want to match
<span class="count">1036</span></span>
Because it will catch
1036</span>
But anyway I don't want to catch the double span because I don't need this data.
I need the data between a span and end of line.
I tried with \n at the end of the span but it didn't work...
Here's the pattern:
private static final Pattern COUNT = Pattern.compile("<span class=\"count\">(.+?)</span> ");
Thank you for your answers

Try with grouping feature of regex that is enclosed inside the parenthesis () and get it using Matcher#group(1).
Regex pattern
<span class="count">([^<]*?)</span>
DEMO
Sample code:
Pattern pattern = Pattern.compile("<span class=\"count\">([^<]*?)</span>");
Matcher matcher = pattern.matcher("<span class=\"count\">1036</span></span>");
while (matcher.find()) {
System.out.println(matcher.group(1));
}
output:
1036

The regex code for "end of line" is $.
Try:
private static final Pattern COUNT = Pattern.compile("<span class=\"count\">(.+?)</span>$ ");

Use the multi-line switch (?m), which makes ^ and $ match start/end of line.
Pattern COUNT = Pattern.compile("(?m)<span class=\"count\">(.+?)</span>$");

Related

extract a set of a characters between some characters

I have a string email = John.Mcgee.r2d2#hitachi.com
How can I write a java code using regex to bring just the r2d2?
I used this but got an error on eclipse
String email = John.Mcgee.r2d2#hitachi.com
Pattern pattern = Pattern.compile(".(.*)\#");
Matcher matcher = patter.matcher
for (Strimatcher.find()){
System.out.println(matcher.group(1));
}
To match after the last dot in a potential sequence of multiple dots request that the sequence that you capture does not contain a dot:
(?<=[.])([^.]*)(?=#)
(?<=[.]) means "preceded by a single dot"
(?=#) means "followed by # sign"
Note that since dot . is a metacharacter, it needs to be escaped either with \ (doubled for Java string literal) or with square brackets around it.
Demo.
Not sure if your posting the right code. I'll rewrite it based on what it should look like though:
String email = John.Mcgee.r2d2#hitachi.com
Pattern pattern = Pattern.compile(".(.*)\#");
Matcher matcher = pattern.matcher(email);
int count = 0;
while(matcher.find()) {
count++;
System.out.println(matcher.group(count));
}
but I think you just want something like this:
String email = John.Mcgee.r2d2#hitachi.com
Pattern pattern = Pattern.compile(".(.*)\#");
Matcher matcher = pattern.matcher(email);
if(matcher.find()){
System.out.println(matcher.group(1));
}
No need to Pattern you just need replaceAll with this regex .*\.([^\.]+)#.* which mean get the group ([^\.]+) (match one or more character except a dot) which is between dot \. and #
email = email.replaceAll(".*\\.([^\\.]+)#.*", "$1");
Output
r2d2
regex demo
If you want to go with Pattern then you have to use this regex \\.([^\\.]+)# :
String email = "John.Mcgee.r2d2#hitachi.com";
Pattern pattern = Pattern.compile("\\.([^\\.]+)#");
Matcher matcher = pattern.matcher(email);
if (matcher.find()) {
System.out.println(matcher.group(1));// Output : r2d2
}
Another solution you can use split :
String[] split = email.replaceAll("#.*", "").split("\\.");
email = split[split.length - 1];// Output : r2d2
Note :
Strings in java should be between double quotes "John.Mcgee.r2d2#hitachi.com"
You don't need to escape # in Java, but you have to escape the dot with double slash \\.
There are no syntax for a for loop like you do for (Strimatcher.find()){, maybe you mean while

Getting multiple matches via regex

I want to retrieve a strings from a global string via Matcher & Pattern using REGEX.
String str = "<strong>ABC</strong>123<strong>DEF</strong>"
Pattern pattern = Pattern.compile("<strong>(.*)</strong>");
Matcher matcher = pattern.matcher(str);
My problem is that the matcher gives me just one match that is inside the global tag strong:
ABC</strong>123<strong>DEF
My objective is to get 2 matches:
ABC
DEF
Thank you very match for you help.
You need a non greedy regex:
Pattern pattern = Pattern.compile("<strong>.*?</strong>");
Use ? to specify non greedy. This means it will match the first match it finds instead of the outer most match...
If you only want ABC and DEF then you can do something like this using lookaheads and lookbehinds:
String str = "<strong>ABC</strong>123<strong>DEF</strong>";
Pattern pattern = Pattern.compile("((?<=<strong>).*?(?=</strong>))");
Matcher matcher = pattern.matcher(str);
while(matcher.find())
{
System.out.println(matcher.group());
}
If you do a google search you should be able to find information on lookaheads and lookbehinds...
I recommend to use JSOUP to parse your HTML code instead of regex as
Document doc = Jsoup.parse("<strong>ABC</strong>123<strong>DEF</strong>");
// select your tag
Elements elements = doc.select("strong");
// get the iterator to traverse all elements
Iterator<Element> it = elements.iterator();
// loop through all elements and fetch their text
while (it.hasNext()) {
System.out.println(it.next().text());
}
Output :
ABC
DEF
or get Output as single string
Document doc = Jsoup.parse("<strong>ABC</strong>123<strong>DEF</strong>");
Elements elements = doc.select("strong");
System.out.println(elements.text());
Output:
ABC DEF
Download Jsoup and add it as a dependency

I want to extracting css image path by Java Pattern expression

All , I want to write a pattern regex to extract the: "/images/colorbox/ie6/borderBottomRight.png" from cssContent=".cboxIE6 #cboxBottomRight{background:url(../images/colorbox/ie6/borderBottomRight.png);}"
Who can write a pattern regex for me? Thanks a lot.
My regex can't work as:
Pattern pattern = Pattern.compile("[.*]*/:url/(/././/(.+?)/)/;[.*]*");
Matcher matcher = pattern.matcher(cssContent);
if(matcher.find()){
System.out.println(matcher.group(0));
}
Pattern pattern = Pattern.compile(":url\\(\\.\\.([^)]+)\\)");
Matcher matcher = pattern.matcher(cssContent);
if(matcher.find()){
System.out.println(matcher.group(1));
}
The regex used to match is (quoted and without \ escaped)
":url\(\.\.([^)]+)\)"
which looks for :url(.. followed by [^)] anything that's not a closing ) bracket + one or more times; finally followed by the closing ) bracket. The group () captured is available at group(1) whereas group(0) would give you the complete string that matched i.e. from :url to the closing ).
The biggest error you were making was using "/" to escape your literal characters. You need to use "\", and annoyingly, in a java string "\" must be escaped with "\", so the total escape sequence is "\\". Then, you have matcher.group(0), which matches the entire pattern. You needmatcher.group(1)` to match the first (and only) group in your regex, which contains your string of interest. Here's the corrected code:
String cssContent = "cssContent=\".cboxIE6 #cboxBottomRight{background:url(../images/colorbox/ie6/borderBottomRight.png);}\"";
String regex = ".*?:url\\(\\.\\.(.+?)\\);[.*]*";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(cssContent);
if(matcher.find()){
System.out.println(matcher.group(1));
}

Regex for matching pattern within quotes

I have some input data such as
some string with 'hello' inside 'and inside'
How can I write a regex so that the quoted text (no matter how many times it is repeated) is returned (all of the occurrences).
I have a code that returns a single quotes, but I want to make it so that it returns multiple occurances:
String mydata = "some string with 'hello' inside 'and inside'";
Pattern pattern = Pattern.compile("'(.*?)+'");
Matcher matcher = pattern.matcher(mydata);
while (matcher.find())
{
System.out.println(matcher.group());
}
Find all occurences for me:
String mydata = "some '' string with 'hello' inside 'and inside'";
Pattern pattern = Pattern.compile("'[^']*'");
Matcher matcher = pattern.matcher(mydata);
while(matcher.find())
{
System.out.println(matcher.group());
}
Output:
''
'hello'
'and inside'
Pattern desciption:
' // start quoting text
[^'] // all characters not single quote
* // 0 or infinite count of not quote characters
' // end quote
I believe this should fit your requirements:
\'\w+\'
\'.*?' is the regex you are looking for.

Question Pattern/Matcher

I want to extract the value 5342test behind the name="buddyname" from a fieldset tag.
But there are multiple fieldsets in the HTML code.
Below the example of the string in the HTML.
<fieldset style="display:none"><input type="hidden" name="buddyname" value="5342test" /></fieldset>
I have some difficulties to put in the different patterns in Pattern.compile and i just want the value 5342test displayed not the other results, could somebody please help?
Thank you.
My code:
String stringToSearch = "5342test";
Pattern pattern = Pattern.compile("(\\value=\\})");
Matcher m = pattern.matcher(stringToSearch);
while (m.find())
{
// get the matching group
String codeGroup = m.group(1);
// print the group
System.out.format("'%s'\n", codeGroup); // should be 5342test
}
Use this pattern: Pattern pattern = Pattern.compile("<input[^>]*?value\\s*?=\\s*?\\\"(.*?)\\\"");
Since you want the input values inside a fieldset tag, you can use this regex pattern.
Pattern pattern = Pattern.compile("<fieldset[^>]*>[^<]*<input.+?value\\s*=\\s*\\\"([^\\\"]*)\\\"");
Matcher matcher = pattern.matcher("<fieldset style=\"display:none\"><input type=\"hidden\" name=\"buddyname\" value=\"5342test\" /></fieldset>");
if (matcher.find())
System.out.println(matcher.group(1)); //this prints 5342test
else
System.out.println("Input html does not have a fieldset");

Categories

Resources