I have file content as String in java. I need to capture the value of attribute code i.e. key.test.text and key.test.text1
<input type="button" value="<s:message code="key.test.text" />"
<input type="button2" value='<s:message code="key.test.text1' />"
There can be spaces before = like <input type="button" value = "<s:message code="key.test.text" />"
I am not sure how to capture it with regex or string ?
Use regex pattern
String regex = "value\\s*=\\s*[\"']<s:message\\s+code\\s*=\\s*[\"']([^\"']+)[\"']\\s*\\/>";
Capture group #1 will return a desired string for each match.
Java code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String input = "...";
String regex = "value\\s*=\\s*[\"']<s:message\\s+code\\s*=\\s*[\"']([^\"']+)[\"']\\s*\\/>";
List<String> allMatches = new ArrayList<String>();
Matcher m = Pattern.compile(regex).matcher(input);
while (m.find()) {
allMatches.add(m.group(1));
}
System.out.println(allMatches);
Test this demo code here.
You need just need to json_encode the string and then assign you button value and then you can read it.
Here is another solution.
First use StringEscapeUtils#unescapeHtml4() (or #unescapeXml(), depending on the original format) to unescape. Then use String#replaceAll() to get rid of characters which are creating issue.You can take help from printable ASCII range.
Then send it to the button value.
Per your latest need per your post and comments below accepted answer
Matcher matcher = Pattern.compile(
"<s:message.*?code.*?=.*?[\"'](.*?)[\"'].*?>")
.matcher(content);
int count = 0;
while (matcher.find()) {
System.out.println(matcher.group(1));
++count;
}
Related
I might receive the following cookie string.
hello=world;JSESSIONID=sdsfsf;Path=/ei
I need to extract the value of JSESSIONID
I use the following pattern but it doesn't seem to work. However https://regex101.com shows it's correct.
Pattern PATTERN_JSESSIONID = Pattern.compile(".*JSESSIONID=(?<target>[^;\\n]*)");
You can reach your goal with a simpler approach using regex (^|;)JSESSIONID=(.*);. Here is the demo on Regex101 (you have forgotten to link the regular expression using the save button). Take a look on the following code. You have to extract the matched values using the class Matcher:
String cookie = "hello=world;JSESSIONID=sdsfsf;Path=/ei";
Pattern PATTERN_JSESSIONID = Pattern.compile("(^|;)JSESSIONID=(.*);");
Matcher m = PATTERN_JSESSIONID.matcher(cookie);
if (m.find()) {
System.out.println(m.group(0));
}
Output value:
sdsfsf
Of course the result depends on the all of possible variations of the input text. The snippet above will work in every case the value is between JSESSIONID and ; characters.
You can try below regex:
JSESSIONID=([^;]+)
regex explanation
String cookies = "hello=world;JSESSIONID=sdsfsf;Path=/ei;submit=true";
Pattern pat = Pattern.compile("\\bJSESSIONID=([^;]+)");
Matcher matcher = pat.matcher(cookies);
boolean found = matcher.find();
System.out.println("Sesssion ID: " + (found ? matcher.group(1): "not found"));
DEMO
You can even get what you aiming for with Splitting and Replacing the string aswell, below I am sharing which is working for me.
String s = "hello=world;JSESSIONID=sdsfsf;Path=/ei";
List<String> sarray = Arrays.asList(s.split(";"));
String filterStr = sarray.get(sarray.indexOf("JSESSIONID=sdsfsf"));
System.out.println(filterStr.replace("JSESSIONID=", ""));
<b>Topic1</b><ul>asdasd</ul><br/><b>Topic2</b><ul>....
I want to extract everything that comes after <b>Topic1</b> and the next <b> starting tag. Which in this case would be: <ul>asdasd</ul><br/>.
Problem: it must not necessairly be the <b> tag, but could be any other repeating tag.
So my question is: how can I dynamically extract those text? The only static thinks are:
The signal keyword to look for is always "Topic1". I'd like to take the surrounding tags as the one to look for.
The tag is always repeated. In this case it's always <b>, it might as well be <i> or <strong> or <h1> etc.
I know how to write the java code, but what would the regex be like?
String regex = ">Topic1<";
Matcher m = Pattern.compile(regex).matcher(text);
while (m.find()) {
for (int i = 1; i <= m.groupCount(); i++) {
System.out.println(m.group(i));
}
}
The following should work
Topic1</(.+?)>(.*?)<\\1>
Input: <b>Topic1</b><ul>asdasd</ul><br/><b>Topic2</b><ul>
Output: <ul>asdasd</ul><br/>
Code:
Pattern p = Pattern.compile("Topic1</(.+?)>(.*?)<\\1>");
// get a matcher object
Matcher m = p.matcher("<b>Topic1</b><ul>asdasd</ul><br/><b>Topic2</b><ul>");
while(m.find()) {
System.out.println(m.group(2)); // <ul>asdasd</ul><br/>
}
Try this
String pattern = "\\<.*?\\>Topic1\\<.*?\\>"; // this will see the tag no matter what tag it is
String text = "<b>Topic1</b><ul>asdasd</ul><br/><b>Topic2</b>"; // your string to be split
String[] attributes = text.split(pattern);
for(String atr : attributes)
{
System.out.println(atr);
}
Will print out:
<ul>asdasd</ul><br/><b>Topic2</b>
I have a pattern to match something like
...
<span class="count">1036</span>
...
But I don't want to match
<span class="count">1036</span></span>
Because it will catch
1036</span>
But anyway I don't want to catch the double span because I don't need this data.
I need the data between a span and end of line.
I tried with \n at the end of the span but it didn't work...
Here's the pattern:
private static final Pattern COUNT = Pattern.compile("<span class=\"count\">(.+?)</span> ");
Thank you for your answers
Try with grouping feature of regex that is enclosed inside the parenthesis () and get it using Matcher#group(1).
Regex pattern
<span class="count">([^<]*?)</span>
DEMO
Sample code:
Pattern pattern = Pattern.compile("<span class=\"count\">([^<]*?)</span>");
Matcher matcher = pattern.matcher("<span class=\"count\">1036</span></span>");
while (matcher.find()) {
System.out.println(matcher.group(1));
}
output:
1036
The regex code for "end of line" is $.
Try:
private static final Pattern COUNT = Pattern.compile("<span class=\"count\">(.+?)</span>$ ");
Use the multi-line switch (?m), which makes ^ and $ match start/end of line.
Pattern COUNT = Pattern.compile("(?m)<span class=\"count\">(.+?)</span>$");
I have a bunch of HTML files. In these files I need to correct the src attribute of the IMG tags.
The IMG tags look typically like this:
<img alt="" src="./Suitbert_files/233px-Suitbertus.jpg" class="thumbimage" height="243" width="233" />`
where the attributes are NOT in any specific order.
I need to remove the dot and the forward slash at the beginning of the src attribute of the IMG tags so they look like this:
<img alt="" src="Suitbert%20%E2%80%93%20Wikipedia_files/233px-Suitbertus.jpg" class="thumbimage" height="243" width="233" />
I have the following class so far:
import java.util.regex.*;
public class Replacer {
// this PATTERN should find all img tags with 0 or more attributes before the src-attribute
private static final String PATTERN = "<img\\.*\\ssrc=\"\\./";
private static final String REPLACEMENT = "<img\\.*\\ssrc=\"";
private static final Pattern COMPILED_PATTERN = Pattern.compile(PATTERN, Pattern.CASE_INSENSITIVE);
public static void findMatches(String html){
Matcher matcher = COMPILED_PATTERN.matcher(html);
// Check all occurance
System.out.println("------------------------");
System.out.println("Following Matches found:");
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end() + " ");
System.out.println(matcher.group());
}
System.out.println("------------------------");
}
public static String replaceMatches(String html){
//Pattern replace = Pattern.compile("\\s+");
Matcher matcher = COMPILED_PATTERN.matcher(html);
html = matcher.replaceAll(REPLACEMENT);
return html;
}
}
So, my method findMatches(String html) seems to find correctly all IMG tags where the src attributes starts with ./.
Now my method replaceMatches(String html) does not correctly replace the matches.
I am a newbie to regex, but I assume that either the REPLACEMENT regex is incorrect or the usage of the replaceAll method or both.
A you can see, the replacement String contains 2 parts which are identical in all IMG tags:
<img and src="./. In between these 2 parts, there should be the 0 or more HTML attributes from the original string.
How do I formulate such a REPLACEMENT string?
Can somebody please enlighten me?
Don't use regex for HTML. Use a parser, obtain the src attribute and replace it.
Try these:
PATTERN = "(<img[^>]*\\ssrc=\")\\./"
REPLACEMENT = "$1"
Basically, you capture everything except the ./ in group #1, then plug it back in using the $1 placeholder, effectively stripping off the ./.
Notice how I changed your .* to [^>]*, too. If there happened to be two IMG tags on the same line, like this:
<img src="good" /><img src="./bad" />
...your regex would match this:
<img src="good" /><img src="./
It would do that even if you used a non-greedy .*?. [^>]* makes sure the match is always contained within the one tag.
Your replacement is incorrect. It will replace the matched string by the replacement (not interpreted as a regexp). If you want to achieve, what you want, you need to use groups. A group is delimited by the parenthesis of the regexp. Each opening parenthesis indicates a new group.
You can use $i in the replacement string to reproduce what a groupe has matched and where 'i' is your group number reference. See The doc of appendReplacement for the details.
// Here is an example (it looks a bit like your case but not exactly)
String input = "<img name=\"foobar\" src=\"img.png\">";
String regexp = "<img(.+)src=\"[^\"]+\"(.*)>";
Matcher m = Pattern.compile(regexp).matcher(input);
StringBuffer sb = new StringBuffer();
while(m.find()) {
// Found a match!
// Append all chars before the match and then replaces the match by the
// replacement (the replacement refers to group 1 & 2 with $1 & $2
// which match respectively everything between '<img' and 'src' and,
// everything after the src value and the closing >
m.appendReplacement(sb, "<img$1src=\"something else\"$2>";
}
m.appendTail(sb);// No more match, we append the end of input
Hope this helps you
If src attributes only occur in your HTML within img tags, you can just do this:
input.replace("src=\"./", "src=\"")
You could also do this without java by using sed if you're using a *nix OS
I want to extract the value 5342test behind the name="buddyname" from a fieldset tag.
But there are multiple fieldsets in the HTML code.
Below the example of the string in the HTML.
<fieldset style="display:none"><input type="hidden" name="buddyname" value="5342test" /></fieldset>
I have some difficulties to put in the different patterns in Pattern.compile and i just want the value 5342test displayed not the other results, could somebody please help?
Thank you.
My code:
String stringToSearch = "5342test";
Pattern pattern = Pattern.compile("(\\value=\\})");
Matcher m = pattern.matcher(stringToSearch);
while (m.find())
{
// get the matching group
String codeGroup = m.group(1);
// print the group
System.out.format("'%s'\n", codeGroup); // should be 5342test
}
Use this pattern: Pattern pattern = Pattern.compile("<input[^>]*?value\\s*?=\\s*?\\\"(.*?)\\\"");
Since you want the input values inside a fieldset tag, you can use this regex pattern.
Pattern pattern = Pattern.compile("<fieldset[^>]*>[^<]*<input.+?value\\s*=\\s*\\\"([^\\\"]*)\\\"");
Matcher matcher = pattern.matcher("<fieldset style=\"display:none\"><input type=\"hidden\" name=\"buddyname\" value=\"5342test\" /></fieldset>");
if (matcher.find())
System.out.println(matcher.group(1)); //this prints 5342test
else
System.out.println("Input html does not have a fieldset");