Jsoup Parsing double quotes as &quot and single quotes as double quotes - java

I am trying to parse an HTML document. In the document, there is the
span-data-personalization = '{"one":["two"]}' which converts to
span-data-personalization = "{&quotone&quot:[&quottwo&quot]}" while parsing. The double quotes convert to &quot and single quotes to double quote. I have also used doc.outputSettings().prettyPrint(false); with no success. Also, made the changes suggested in jsoup - stop jsoup from making quotes into & It still did not work. And, I have also tried updating the Jsoup version.nothing seems to work. Does anybody have any suggestions?
Thank you.

The JSoup Parser class has a built in unescapeEntities​ method. From the JSoup documentation:
public static String unescapeEntities​(String string,
boolean inAttribute)
Utility method to unescape HTML entities from a string
Parameters:
string - HTML escaped string
inAttribute - if the string is to be escaped in strict mode (as attributes are)
Returns:
an unescaped string

Related

How do I have to format my feeder csv file, to inject a list of string elements into my JSON request in Gatling?

I am having trouble adding a list of string id's to my JSON request body! I tried many different formatting styles ... but could not figure out how to get this to work in Gatling, using the JavaDSL.
this is one of my csv formatting attempts to represent a list:
playerId, dateIds
113489013, {"20210820TT", "20220211TT"}
here the code to feed my csv data into the json request body:
public static ScenarioBuilder isPlayingScenario = scenario("is playing")
.feed(playerIdFeeder)
.exec(isPlaying);
public static final ChainBuilder isPlaying =
exec(http(IS_PLAYING)
.post(IS_PLAYING_URL + "#{playerId}")
.headers(headers)
.body(ElFileBody("data/requests/is-playing-request.json"))
.asJson()
);
and here the the very simple request body only containing a list of id's:
{
"dateIds": ["#{dateIds}"]
}
This particular attempt is resolved to:
body:StringChunksRequestBody{contentType='application/json', charset=UTF-8, content={
"dateIds": [" {"20210820TT""]
}}
so it does neither resolve to a valid JSON, nor does it include the second id...
Any help is well appreciated! Thanks.
Please properly read the CSV specification, your file is malformed. In particular:
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
"aaa","b""bb","ccc"
Your file should look like:
playerId, dateIds
113489013, "{""20210820TT"", ""20220211TT""}"

Store Backward slash and forward slash in java string

I am parsing below string value into OData query through java code.
objects.put("EndDate", "\/Date(1441756800)\/";
How can i parse the /Date(1441756800)/ into a string in java.
I have tried with below :
objects.put("EndDate", ""\\""//"Date(1441756800)""\\""//"";
throws error:(
I never used OData so I may not understand your question correctly, but if you are asking how to write \/Date(1441756800)\/ as String then you need to escape \ as it is String special character (used for instance when escaping or when creating other special characters like line separators \n).
So try with "\\/Date(1441756800)\\/"
Try this - objects.put("EndDate", "'Date(1441756800)'";

Escaping special characters in Java

The application I support is going through security review and there are some questions regarding escaping special characters. I have not been supporting this application for a long time and I'm not very knowledgeable about escaping special characters. The question I was asked is "Why are you JavaScript encoding the value and then HTML encoding it? Is that value written out in a context that requires the value to be encoded for both contexts?"
What is the difference between JavaScript encoding used and HTML encoding used? Why would I need both in my code?
Any information regarding this will be greatly appreciated!
public class HTMLEncodedResultSet extends ResultSetWrapper {
public HTMLEncodedResultSet(ResultSet resultSet) {
super(resultSet);
}
public String getString(int columnIndex) throws SQLException {
return StringEscapeUtils.escapeHtml(StringEscapeUtils.escapeJavaScript(super.getString(columnIndex)));
}
public String getString(String columnName) throws SQLException {
return StringEscapeUtils.escapeHtml(StringEscapeUtils.escapeJavaScript(super.getString(columnName)));
}
}
From the official documentation:
escapeHtml
Escapes the characters in a String using HTML entities.
For example:
"bread" & "butter"
becomes: "bread" & "butter".
escapeJavaScript
Escapes the characters in a String using JavaScript String rules.
Escapes any values it finds into their JavaScript String form. Deals
correctly with quotes and control-chars (tab, backslash, cr, ff, etc.)
So a tab becomes the characters '\' and 't'.
Example:
input string: He didn't say, "Stop!" output string: He didn\'t say,
\"Stop!\"
So, given that JS and HTML reserved characters are not the same, in your case if the input has HTML and JS code it may be necessary to invoke both methods.
It looks like that your application has JavaScript snippets stored in database. These snippets might create or contain HTML parts (i.e. for generating dynamic HTML based on interaction). When loading these snippets from DB as a string in Java a JavaScript AND HTML encoding is required.
Here an example of a value that could be stored in DB.
var obj = $('#fire');
var fps = 200;
var letters = obj.html().split('');
obj.empty();
$.each(letters,function(el){
obj.append($('<span>'+this+'</span>'));
});
var animateLetters = obj.find('span');
setInterval(function(){
animateLetters.each(function(){
$(this).css('fontSize', 80+(Math.floor(Math.random()*50)));
});
},fps);
Referring to the documentation:
escapeHTML: Escapes the characters in a String using HTML entities.
For example:
"bread" & "butter"
becomes: "bread" & "butter".
and
escapeJavaScript: Escapes any values it finds into their JavaScript
String form. Deals correctly with quotes and control-chars (tab,
backslash, cr, ff, etc.)
So a tab becomes the characters '\' and 't'.
The only difference between Java strings and JavaScript strings is
that in JavaScript, a single quote must be escaped.
Example:
input string: He didn't say, "Stop!" output string: He didn\'t say,
\"Stop!\"

Padding quotes in JSONObject

I'm building a JSON string to send to my web service. Since one of the pieces is user-inputted, there is the possibility for double quotes. I'm trying to resolve the issue by escaping it.
String strValue = "height of 6\"";
JSONObject json = new JSONObject();
json.put("key", strValue.replaceAll("\"","\\\""));
The problem here is when I do json.toString(), I get 3 slashes.
Ex:
{"key","height of 6\\\""}
If I don't try to do any replacing, json.toString() gives me broken json.
Ex:
{"key", "height of 6""}
How can I do this correctly?
Note: When my website saves this value and displays it, it displays height of 6\"
UPDATE:
It appears the culprit is json.toString()
When I call the replaceAll method it -- correctly -- only escapes the double quote. It appears json.toString() escapes slashes. To fix the issue, I must do json.toString().replace("\\\\", ""). This begs the question: Why on Earth does JSONObject escape slashes and not double quotes?????
It appears the culprit is json.toString()
When I call the replaceAll method it -- correctly -- only escapes the double quote. It appears json.toString() escapes slashes. To fix the issue, I must do json.toString().replace("\\\\", "").
This begs the question: Why on Earth does JSONObject escape slashes and not double quotes?????

Convert HTML symbols and HTML names to HTML number using Java

I have an XML which contains many special symbols like ® (HTML number &#174) etc.
and HTML names like &atilde (HTML number &#227) etc.
I am trying to replace these HTML symbols and HTML names with corresponding HTML number using Java. For this, I first converted XML file to string and then used replaceAll method as:
File fn = new File("myxmlfile.xml");
String content = FileUtils.readFileToString(fn);
content = content.replaceAll("®", "&\#174");
FileUtils.writeStringToFile(fn, content);
But this is not working.
Can anyone please tell how to do it.
Thanks !!!
The signature for the replaceAll method is:
public String replaceAll(String regex, String replacement)
You have to be careful that your first parameter is a valid regular expression. The Java Pattern class describes the constructs used in a Java regular expression.
Based on what I see in the Pattern class description, I don't see what's wrong with:
content = content.replaceAll("®", "&\#174");
You could try:
content = content.replaceAll("\\p(®)", "&\#174");
and see if that works better.
I don't think that \# is a valid escape sequence.
BTW, what's wrong with "&#174" ?
If you want HTML numbers try first escaping for XML.
Use EscapeUtils from Apache Commons Lang.
Java may have trouble dealing with it, so first I prefere to escape Java, and after that XML or HTML.
String escapedStr= StringEscapeUtils.escapeJava(yourString);
escapedStr= StringEscapeUtils.escapeXML(yourString);
escapedStr= StringEscapeUtils.escapeHTML(yourString);

Categories

Resources