Question Pattern/Matcher

Question Pattern/Matcher - java

I want to extract the value 5342test behind the name="buddyname" from a fieldset tag.
But there are multiple fieldsets in the HTML code.
Below the example of the string in the HTML.
<fieldset style="display:none"><input type="hidden" name="buddyname" value="5342test" /></fieldset>
I have some difficulties to put in the different patterns in Pattern.compile and i just want the value 5342test displayed not the other results, could somebody please help?
Thank you.
My code:
String stringToSearch = "5342test";
Pattern pattern = Pattern.compile("(\\value=\\})");
Matcher m = pattern.matcher(stringToSearch);
while (m.find())
{
// get the matching group
String codeGroup = m.group(1);
// print the group
System.out.format("'%s'\n", codeGroup); // should be 5342test
}

Use this pattern: Pattern pattern = Pattern.compile("<input[^>]*?value\\s*?=\\s*?\\\"(.*?)\\\"");

Since you want the input values inside a fieldset tag, you can use this regex pattern.
Pattern pattern = Pattern.compile("<fieldset[^>]*>[^<]*<input.+?value\\s*=\\s*\\\"([^\\\"]*)\\\"");
Matcher matcher = pattern.matcher("<fieldset style=\"display:none\"><input type=\"hidden\" name=\"buddyname\" value=\"5342test\" /></fieldset>");
if (matcher.find())
System.out.println(matcher.group(1)); //this prints 5342test
else
System.out.println("Input html does not have a fieldset");

Related

How to parse a string to get array of #tags out of the string?

so I have this string like
"#tag1 #tag2 #tag3 not_tag1 not_tag2 #tag4" (the space between tag2 and tag4 is to indicate there can be many spaces). From this string I want to parse just a tag1, tag2 and so on. They are similar to #tags we see on LinkedIn or any other social media. Is there any easy way to do this using regex or any other function in Java. Or should I do it hard way(i.e. using loops and conditions).
Tag format should be "#" (to indicate tag is starting) and space " "(to indicate end of tag). In between there can be character or numbers but start should be a character only.
example,
input : "#tag1 #tag2 #tag3 not_tag1 not_tag2 #12tag #tag4"
output : ["tag1", "tag2", "tag3", "tag4"]

split by regex: "#\w+"
EDIT: this is the correct regex, but split is not the right method.
same solution as javadev suggested, but use instead:
String input = "#tag1 #tag2 #tag3 not_tag1 not_tag2 #12tag #tag4";
Matcher matcher = Pattern.compile("#\\w+").matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
output with # as expected.

Maybe something like:
public static void main(String[] args ) {
String input = "#tag1 #tag2 #tag3 not_tag1 not_tag2 #12tag #tag4";
Pattern pattern = Pattern.compile("#([A-z][A-z0-9]*) *");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
worked for me :)
Output:
tag1
tag2
tag3
tag4

Capture the value of attribute in string?

I have file content as String in java. I need to capture the value of attribute code i.e. key.test.text and key.test.text1
<input type="button" value="<s:message code="key.test.text" />"
<input type="button2" value='<s:message code="key.test.text1' />"
There can be spaces before = like <input type="button" value = "<s:message code="key.test.text" />"
I am not sure how to capture it with regex or string ?

Use regex pattern
String regex = "value\\s*=\\s*[\"']<s:message\\s+code\\s*=\\s*[\"']([^\"']+)[\"']\\s*\\/>";
Capture group #1 will return a desired string for each match.
Java code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String input = "...";
String regex = "value\\s*=\\s*[\"']<s:message\\s+code\\s*=\\s*[\"']([^\"']+)[\"']\\s*\\/>";
List<String> allMatches = new ArrayList<String>();
Matcher m = Pattern.compile(regex).matcher(input);
while (m.find()) {
allMatches.add(m.group(1));
}
System.out.println(allMatches);
Test this demo code here.

You need just need to json_encode the string and then assign you button value and then you can read it.
Here is another solution.
First use StringEscapeUtils#unescapeHtml4() (or #unescapeXml(), depending on the original format) to unescape. Then use String#replaceAll() to get rid of characters which are creating issue.You can take help from printable ASCII range.
Then send it to the button value.

Per your latest need per your post and comments below accepted answer
Matcher matcher = Pattern.compile(
"<s:message.*?code.*?=.*?[\"'](.*?)[\"'].*?>")
.matcher(content);
int count = 0;
while (matcher.find()) {
System.out.println(matcher.group(1));
++count;
}

Getting multiple matches via regex

I want to retrieve a strings from a global string via Matcher & Pattern using REGEX.
String str = "<strong>ABC</strong>123<strong>DEF</strong>"
Pattern pattern = Pattern.compile("<strong>(.*)</strong>");
Matcher matcher = pattern.matcher(str);
My problem is that the matcher gives me just one match that is inside the global tag strong:
ABC</strong>123<strong>DEF
My objective is to get 2 matches:
ABC
DEF
Thank you very match for you help.

You need a non greedy regex:
Pattern pattern = Pattern.compile("<strong>.*?</strong>");
Use ? to specify non greedy. This means it will match the first match it finds instead of the outer most match...
If you only want ABC and DEF then you can do something like this using lookaheads and lookbehinds:
String str = "<strong>ABC</strong>123<strong>DEF</strong>";
Pattern pattern = Pattern.compile("((?<=<strong>).*?(?=</strong>))");
Matcher matcher = pattern.matcher(str);
while(matcher.find())
{
System.out.println(matcher.group());
}
If you do a google search you should be able to find information on lookaheads and lookbehinds...

I recommend to use JSOUP to parse your HTML code instead of regex as
Document doc = Jsoup.parse("<strong>ABC</strong>123<strong>DEF</strong>");
// select your tag
Elements elements = doc.select("strong");
// get the iterator to traverse all elements
Iterator<Element> it = elements.iterator();
// loop through all elements and fetch their text
while (it.hasNext()) {
System.out.println(it.next().text());
}
Output :
ABC
DEF
or get Output as single string
Document doc = Jsoup.parse("<strong>ABC</strong>123<strong>DEF</strong>");
Elements elements = doc.select("strong");
System.out.println(elements.text());
Output:
ABC DEF
Download Jsoup and add it as a dependency

Match only "somethingendofline" and not "something"

I have a pattern to match something like
...
<span class="count">1036</span>
...
But I don't want to match
<span class="count">1036</span></span>
Because it will catch
1036</span>
But anyway I don't want to catch the double span because I don't need this data.
I need the data between a span and end of line.
I tried with \n at the end of the span but it didn't work...
Here's the pattern:
private static final Pattern COUNT = Pattern.compile("<span class=\"count\">(.+?)</span> ");
Thank you for your answers

Try with grouping feature of regex that is enclosed inside the parenthesis () and get it using Matcher#group(1).
Regex pattern
<span class="count">([^<]*?)</span>
DEMO
Sample code:
Pattern pattern = Pattern.compile("<span class=\"count\">([^<]*?)</span>");
Matcher matcher = pattern.matcher("<span class=\"count\">1036</span></span>");
while (matcher.find()) {
System.out.println(matcher.group(1));
}
output:
1036

The regex code for "end of line" is $.
Try:
private static final Pattern COUNT = Pattern.compile("<span class=\"count\">(.+?)</span>$ ");

Use the multi-line switch (?m), which makes ^ and $ match start/end of line.
Pattern COUNT = Pattern.compile("(?m)<span class=\"count\">(.+?)</span>$");

Using regex to get information inside an HTML tag

I'm wondering how I could extract '4151' from the following code:
</th><td><a class="external exitstitial" rel="nofollow" href="http://services.runescape.com/m=itemdb_rs/viewitem.ws?obj=4151">Look up price</a>
I would like to use regex but if there is a better way I'm open for it!

The following works for me, assuming the href attribute value was already extracted:
String href = "http://services.runescape.com/m=itemdb_rs/viewitem.ws?obj=4151";
Pattern p = Pattern.compile("\\?obj=(\\d+)");
Matcher m = p.matcher(href);
if (m.find()) {
System.out.println(m.group(1));
}
Outputs "4151"

Here are a few parser libraries : htmlparser, jsoup, and jtidy.
In your case, regex may be fine, but here's a classic post of why you should avoid regex for html parsing.

This regex would get you the number -
Pattern regex = Pattern.compile("\\d+");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group();
}
This code is not tested and presumes your HTML string is assigned to the 'subjectString' variable.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Question Pattern/Matcher - java

Use this pattern: Pattern pattern = Pattern.compile("<input[^>]?value\\s?=\\s?\\\"(.?)\\\"");

Related

How to parse a string to get array of #tags out of the string?

Capture the value of attribute in string?

Getting multiple matches via regex

Match only "<span>something</span>endofline" and not "<span>something</span></span>"

Using regex to get information inside an HTML tag

Categories

Resources