How to automatically unescape the escape characters in a string

How to automatically unescape the escape characters in a string - java

I am receiving the data from the service with the escape sequence characters...I have managed to elemenate them by this code
results=results.replace("\\\"", "\"");
if(results.startsWith("\"")) {
results=results.substring(1,results.length());
}
if(results.endsWith("\"")) {
results=results.substring(0,results.length()-1);
}
It works fine but for some strings it throws exception while creating json object...How do I automatically unescape the escape characters in the result, I have searched for answers but many of them saying to use a third party library...what is the best I can achieve this.

I think Apache Commons work pretty good. It has StringEscapeUtils class with bunch of different static methods for escaping and unescaping strings, so i think you should check it.
Good luck!

place this part of code below the parsing Array
// to remove all <P> </p> and <br /> and replace with ""
content = content.replace("<br />", "");
content = content.replace("<p>", "");
content = content.replace("</p>", "");
here for me content is object, replace according to ur necessary in the place of "content".

Related

Safely pass string to javascript

I am trying to pass a string from my java code to javascript like so:
myData.data = "${data.myString}";
This breaks if myString contains a "
I tried storing a javascript safe string instead, just replacing " with \" but then when I use myString in my jsp I get an ugly output with \" showing instead of "
What is the best way to safely pass a string and not mess up the rest of my output.

Encode it into the html in the JSP:
<input id="test_hide" type="hidden" value="${URIUtil.encodeAll("http://www.google.com?q=a b","UTF-8")}">
Then in the JavaScript:
myData.data = decodeURIComponent(document.getElementById('test_hide').getAttribute('value'));
Java - Convert String to valid URI object

Replacing the double quotes with " should work

A bad solution :
Check if your string has double quotes, if yes then use
myData.data = '${data.myString}';
if it contains single quotes use
myData.data = "${data.myString}";
This will explode if you have both single and double quotes.
A good solution :
Just use
"

Regex to remove html does not get rid of img tag

I am using a regex to remove HTML tags. I do something like -
result.replaceAll("\<.*?\>", "");
However, it does not help me get rid of the img tags in the html. Any idea what is a good way to do that?

If you cannot use HTML parsers/cleaners then I would at least suggest you to use Pattern.DOTALL flag to take care of multi-line HTML blocks. Consider code like this:
String str = "123 <img \nsrc='ping.png'>abd foo";
Pattern pt = Pattern.compile("<.*?>", Pattern.DOTALL);
Matcher matcher = pt.matcher(str);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(sb, "");
}
matcher.appendTail(sb);
System.out.println("Output: " + sb);
OUTPUT
Output: 123 abd foo

To give a more concrete recommendation, use JSoup (or NekoHTML) to parse the HTML into a Java object.
Once you've got a Document object it can easily be traversed to remove the tags. This cookbook recipe shows how to get attributes and text from the DOM.

Another suggestion is HtmlCleaner

I'm just re-iterating what others have said already, but this point cannot be over-stated: DO NOT USE REGEXES TO PARSE HTML. There are a 1,000 similar questions on this on SO. Use a proper HTML parser, it will make your life so much easier, and is far more robust and reliable. Take a look at Dom4j, Jericho, JSoup. Please.

So, a piece of code for you.
I use http://htmlparser.sourceforge.net/ to parse HTML. It is not overcomplicated and quite straightforward to use.
Basically it looks like this:
import org.htmlparser.Parser;
import org.htmlparser.util.NodeList;
import org.htmlparser.util.ParserException;
...
String html; /* read your HTML into variable 'html' */
String result=null;
....
try {
Parser p = new Parser(html);
NodeList nodes = p.parse(null);
result = nodes.asString();
} catch (ParserException e) {
e.printStackTrace();
}
That will give you plain text stripped of tags (but no substitutes like & would be fixed). And of course you can do plenty more with this library, like applying filters, visitors, iterating and all the stuff.

use html parser instead. iterate over the object, print however you like and get the best result.

I have been able achieve do this with the below code snippet.
String htmlContent = values.get(position).getContentSnippet();
String plainTextContent = htmlContent.replaceAll("<img .*?/>", "");
I used the above regex to clean the img tags in my RSS content.

Convert HTML symbols and HTML names to HTML number using Java

I have an XML which contains many special symbols like ® (HTML number &#174) etc.
and HTML names like &atilde (HTML number &#227) etc.
I am trying to replace these HTML symbols and HTML names with corresponding HTML number using Java. For this, I first converted XML file to string and then used replaceAll method as:
File fn = new File("myxmlfile.xml");
String content = FileUtils.readFileToString(fn);
content = content.replaceAll("®", "&\#174");
FileUtils.writeStringToFile(fn, content);
But this is not working.
Can anyone please tell how to do it.
Thanks !!!

The signature for the replaceAll method is:
public String replaceAll(String regex, String replacement)
You have to be careful that your first parameter is a valid regular expression. The Java Pattern class describes the constructs used in a Java regular expression.
Based on what I see in the Pattern class description, I don't see what's wrong with:
content = content.replaceAll("®", "&\#174");
You could try:
content = content.replaceAll("\\p(®)", "&\#174");
and see if that works better.

I don't think that \# is a valid escape sequence.
BTW, what's wrong with "&#174" ?

If you want HTML numbers try first escaping for XML.
Use EscapeUtils from Apache Commons Lang.
Java may have trouble dealing with it, so first I prefere to escape Java, and after that XML or HTML.
String escapedStr= StringEscapeUtils.escapeJava(yourString);
escapedStr= StringEscapeUtils.escapeXML(yourString);
escapedStr= StringEscapeUtils.escapeHTML(yourString);

What is the most efficient way to format UTF-8 strings in java?

I am doing the following:
String url = String.format(WEBSERVICE_WITH_CITYSTATE, cityName, stateName);
String urlUtf8 = new String(url.getBytes(), "UTF8");
Log.d(TAG, "URL: [" + urlUtf8 + "]");
Reader reader = WebService.queryApi(url);
The output that I am looking for is essentially to get the city name with blanks (e.g., "Overland Park") to be formatted as Overland%20Park.
Is it this the best way?

Assuming you are actually wanting to encode your string for use in a URL (ie, "Overland Park" can also be formatted as "Overland+Park") you want URLEncoder.encode(url, "UTF-8"). Other unsafe characters will be converted to the %xx format you are asking for.

The simple answer is to use URLEncoder.encode(...) as stated by #Recurse. However, if part or all of the URL has already been encoded, then this can lead to double encoding. For example:
http://foo.com/pages/Hello%20There
or
http://foo.com/query?keyword=what%3f
Another concern with URLEncoder.encode(...) is that it doesn't understand that certain characters should be escaped in some contexts and not others. So for example, a '?' in a query parameter should be escaped, but the '?' that marks the start of the "query part" should not be escaped.
I think that safer way to add missing escapes would be the following:
String safeURI = new URI(url).toASCIIString();
However, I haven't tested this ...

Escaping & in a URL

I am using jsps and in my url I have a value for a variable like say "L & T". Now when I try to retrieve the value for it by using request.getParameter I get only "L". It recognizes "&" as a separator and thus it is not getting considered as a whole string.
How do I solve this problem?

java.net.URLEncoder.encode("L & T", "utf8")
this outputs the URL-encoded, which is fine as a GET parameter:
L+%26+T

A literal ampersand in a URL should be encoded as: %26
// Your URL
http://www.example.com?a=l&t
// Encoded
http://www.example.com?a=l%26t

You need to "URL encode" the parameters to avoid this problem. The format of the URL query string is:
...?<name>=<value>&<name>=<value>&<etc>
All <name>s and <value>s need to be URL encoded, which basically means transforming all the characters that could be interpreted wrongly (like the &) into %-escaped values. See this page for more information:
http://www.w3schools.com/TAGS/ref_urlencode.asp
If you're generating the problem URL with Java, you use this method:
String str = URLEncoder.encode(input, "UTF-8");
Generating the URL elsewhere (some templates or JS or raw markup), you need to fix the problem at the source.

You can use UriUtils#encode(String source, String encoding) from Spring Web. This utility class also provides means for encoding only some parts of the URL, like UriUtils#encodePath.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to automatically unescape the escape characters in a string - java

I think Apache Commons work pretty good. It has StringEscapeUtils class with bunch of different static methods for escaping and unescaping strings, so i think you should check it. Good luck!

Related

Safely pass string to javascript

Regex to remove html does not get rid of img tag

Convert HTML symbols and HTML names to HTML number using Java

What is the most efficient way to format UTF-8 strings in java?

Escaping & in a URL

Categories

Resources