I have to remove data between two string as below
<PACKET>752</PACKET>
<TIME>23-Oct-2013 12:05:46 GMT Standard Time</TIME>
<INTERVAL>2</INTERVAL>
<HEADER>hi this should not be printed only</HEADER>
<DATA></DATA>
In this I have to remove data between <HEADER> and </HEADER> . Can any body give me regex for this?
I think this can do the job with RegEx:
String str="b1<HEADER>aaaaa</HEADER>b2";
String newstring = str.replaceAll("<HEADER[^>]*>([^<]*)<\\/HEADER>", "");
System.out.println(newstring);
This prints b1b2
In the case that you have other tags inside <HEADER> the above will fail. Consider the below example :
String str = "b1<HEADER>aa<xxx>xx</xxx>aaa</HEADER>b2";
String newstring = str.replaceAll("<HEADER[^>]*>([^<]*)<\\/HEADER>", "");
System.out.println(newstring);
This prints: b1<HEADER>aa<xxx>xx</xxx>aaa</HEADER>b2
To overcome this and remove also the containing tags use this:
newstring = str.replaceAll("<HEADER.+?>([^<]*)</HEADER>", "");
This will print b1b2.
Maroun's right that it's not a good idea, but if you have to do it then this might work:
(?ms)(.*<HEADER>).*(<\/HEADER>.*)
This captures everything up to and including <HEADER> in group 1, and everything from </HEADER> onwards in group 2. You can then concatenate the two to remove the bit in the middle.
See here: http://regex101.com/r/bC2eQ7
This RegEx replaces everything inside the tag with en empty String:
String input = "<PACKET>752</PACKET>...<HEADER>hi this should be printed only</HEADER><DATA></DATA>";
String output = input.replaceAll("(?<=<HEADER>).*?(?=</HEADER>)", "");
Result:
<PACKET>752</PACKET>...<HEADER></HEADER><DATA></DATA>
Related
I'm working on a program, which formats HTML Code, extracted from a PDF file.
I have a String list, which contains paragraphs and is divided by that.
As the PDF has hyperlinks, I decided to replace them with a foot note number "[1]".
This will be used for citation of sources. I will eventually plan, to put it at the end of a paragraph, or sentence, so you can look up the sources, like you would in a book.
My Problem
For some reason not all the hyperlinks are replaced.
The reason is most likely, that there is text directly next to the tag.
Hell<a href="http://www.example.com">o old chap!
Specifically the "o" part and the "hell" part is blocking the java .replaceAll function, from doing it's job.
Expected Result
Hello [1] old chap!
EDIT:
If I would just add space, before and after the URL, it might split some words like "help", into "hel p", which is also not an option.
My code would have to replace the URL tag (without the ) and create no new extra spaces.
This is some of my code, where the problem occures:
for (int i = 0; i < EN.length; i++) {
Pattern pattern_URL = Pattern.compile("<a(.+?)\">", Pattern.DOTALL);
Matcher matcher_URL = pattern_URL.matcher(EN[i]); //Checks in the curren Array part.
if (matcher_URL.find() == true) {
source_number++;
String extractedURL = matcher_URL.group(0);
//System.out.println(extractedURL);
String extractedURL_fully = extractedURL.replaceAll("href=\"", ""); //Anführungszeichen
//System.out.println(extractedURL_fully);
String nobracketURL = extractedURL.replaceAll("\\)", ""); //Remove round brackets from URL
EN[i] = EN[i].replaceAll("\\)\"", "\""); /*Replace round brackets from URL in Array. (For some reasons there have been href URLs, with an bracket at the end. This was already in the PDF. They were causing massive problems, because it didn't comment them out, so the entire replaceAll command didn't function.)*/
EN[i] = EN[i].replaceAll(nobracketURL, "[" + source_number + "]"); //Replace URL tags with number and Edgy brackets
}
else{
//System.out.println("FALSE: " + "[" + i + "]");
}
}
The whole idea of this is, that it loops through the array and replaces all the URLs, including it's starting tag <a until the end of the starting tag "> (which can also be seen in the pattern regex.)
Correct me if I'm wrong, but what you need is to eliminate all the <a> tags from a given string, right? If that's the case all you needed to do was use a code like the following:
final String string = "<a href=\"http://www.example.com\">Sen";
final Pattern pattern = Pattern.compile("<a(.+?)>", Pattern.DOTALL);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll("");
System.out.println(result); // prints "Sen"
Notice I didn't use the replaceAll from the String object, but from the Matcher object. This replaces all matches for the empty string "".
I am using the wikimedia api to get content from wikipedia pages. The api returns a lot of "\n" as plain text. I want to remove them from a string
s = s.replaceAll("\\n", "");
s = s.replaceAll("\n", "");
Neither of these work, any ideas?
When your String contains a plaintext \n it is actually a \\n otherwise it would be displayed as a linebreak, which is why I found s = s.replaceAll("\\\\n","") to be working for me. An example snippet:
class Main{
public static void main(String[] args){
String s = "Hello\\nHello";
System.out.println(s);
s = s.replaceAll("\\\\n","");
System.out.println(s);
}
}
Remember that replaceAll takes a Regex: Since you want to replace 2 /s you have to escape both of them, therefore////
Hi Please to use below code format:
s= s.replace("\n", "").replace("\r", "");
Thanks
You can use the code below:
s = s.replace("\n", "");
but, the newline character can be different among the environments.
So, you can use this
s = s.replace(System.getProperty("line.separator"), "");
Hi I have the following string variable -
[ <row id="0">
<test>123abc</test>
<INV>123456789</INV>
</row>
,
<row id="1">
<test>456def</test>
<INV>123456789</INV>
</row>
]
Its an xml but I am sending this inside java code as a pure string -
String check = inXml;
System.out.println("incoming payload is "+check);
check = check.replaceAll(",","");
check = check.replaceAll("\\[", "");
check = check.replaceAll("\\]", "");
I used replaceAll to replace "[",","and "]" now i want the values in test to be stored in a separate string like String s=123abc,456def
You should use a parser. However, if you don't want, you can write a regex that extract the values for you, for example:
String line = <yout_html_string>
Pattern p = Pattern.compile("<test>(.*?)</test>");
Matcher m = p.matcher(line);
while (m.find()) {
// m.group(1) is the text you want
// store it in a variable
}
Option1: use dom parser. Very easy to use. From that get the node 'test' and get the value inside that tag.
Option2: use substr() to find the an exact match of . Now you know the start position of your substring and also the amount of characters it will be (6), and hence find the string from position of your substring + 7
I have a bunch of HTML files. In these files I need to correct the src attribute of the IMG tags.
The IMG tags look typically like this:
<img alt="" src="./Suitbert_files/233px-Suitbertus.jpg" class="thumbimage" height="243" width="233" />`
where the attributes are NOT in any specific order.
I need to remove the dot and the forward slash at the beginning of the src attribute of the IMG tags so they look like this:
<img alt="" src="Suitbert%20%E2%80%93%20Wikipedia_files/233px-Suitbertus.jpg" class="thumbimage" height="243" width="233" />
I have the following class so far:
import java.util.regex.*;
public class Replacer {
// this PATTERN should find all img tags with 0 or more attributes before the src-attribute
private static final String PATTERN = "<img\\.*\\ssrc=\"\\./";
private static final String REPLACEMENT = "<img\\.*\\ssrc=\"";
private static final Pattern COMPILED_PATTERN = Pattern.compile(PATTERN, Pattern.CASE_INSENSITIVE);
public static void findMatches(String html){
Matcher matcher = COMPILED_PATTERN.matcher(html);
// Check all occurance
System.out.println("------------------------");
System.out.println("Following Matches found:");
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end() + " ");
System.out.println(matcher.group());
}
System.out.println("------------------------");
}
public static String replaceMatches(String html){
//Pattern replace = Pattern.compile("\\s+");
Matcher matcher = COMPILED_PATTERN.matcher(html);
html = matcher.replaceAll(REPLACEMENT);
return html;
}
}
So, my method findMatches(String html) seems to find correctly all IMG tags where the src attributes starts with ./.
Now my method replaceMatches(String html) does not correctly replace the matches.
I am a newbie to regex, but I assume that either the REPLACEMENT regex is incorrect or the usage of the replaceAll method or both.
A you can see, the replacement String contains 2 parts which are identical in all IMG tags:
<img and src="./. In between these 2 parts, there should be the 0 or more HTML attributes from the original string.
How do I formulate such a REPLACEMENT string?
Can somebody please enlighten me?
Don't use regex for HTML. Use a parser, obtain the src attribute and replace it.
Try these:
PATTERN = "(<img[^>]*\\ssrc=\")\\./"
REPLACEMENT = "$1"
Basically, you capture everything except the ./ in group #1, then plug it back in using the $1 placeholder, effectively stripping off the ./.
Notice how I changed your .* to [^>]*, too. If there happened to be two IMG tags on the same line, like this:
<img src="good" /><img src="./bad" />
...your regex would match this:
<img src="good" /><img src="./
It would do that even if you used a non-greedy .*?. [^>]* makes sure the match is always contained within the one tag.
Your replacement is incorrect. It will replace the matched string by the replacement (not interpreted as a regexp). If you want to achieve, what you want, you need to use groups. A group is delimited by the parenthesis of the regexp. Each opening parenthesis indicates a new group.
You can use $i in the replacement string to reproduce what a groupe has matched and where 'i' is your group number reference. See The doc of appendReplacement for the details.
// Here is an example (it looks a bit like your case but not exactly)
String input = "<img name=\"foobar\" src=\"img.png\">";
String regexp = "<img(.+)src=\"[^\"]+\"(.*)>";
Matcher m = Pattern.compile(regexp).matcher(input);
StringBuffer sb = new StringBuffer();
while(m.find()) {
// Found a match!
// Append all chars before the match and then replaces the match by the
// replacement (the replacement refers to group 1 & 2 with $1 & $2
// which match respectively everything between '<img' and 'src' and,
// everything after the src value and the closing >
m.appendReplacement(sb, "<img$1src=\"something else\"$2>";
}
m.appendTail(sb);// No more match, we append the end of input
Hope this helps you
If src attributes only occur in your HTML within img tags, you can just do this:
input.replace("src=\"./", "src=\"")
You could also do this without java by using sed if you're using a *nix OS
I want to display special chars as an alert box using javascript and jsp...
String encodeString = "ss\ncc";
String test = "DisplayNext('"+encodeString+"')";
String NextLink = "<br> Next";
That is
function DisplayNext(Next){
alert(Next);
}
Though I've used special chars I am not able to display them in an alert box. How can I sort this out?
Your code produce something like this:
<br><a href='#' onclick="DisplayNext('ss
cc');"> Next</a>
And what you need is:
<br> Next
If you want a line break in javascript it must look as \\n in java. So use:
String encodeString = "ss\\ncc";
String test = "DisplayNext('"+encodeString+"')";
String NextLink = "<br> Next";
Also consider using a special function to escape your String objects as javascript values. Google will easily help you find it ;)
If your String is URLEncoded in java you need to unescape it in javascript.
Java:
String s = "ë";
System.out.println(URLEncoder.encode(s, "ISO-8859-1"));
this will print out %EB
Javascript:
alert(unescape('%EB'));
this will print out the character ë in alert message