Using PrintWriter, I am getting Chinese junk characters in browser - java

I am using PrintWriter as follows to get the output in the browser:
PrintWriter pw = response.getwriter();
StringBuffer sb = getTextFromDatabase();
pw.print(sb);
However, this prints the following Chinese junk characters:
格㸳潃浭湥獴⼼㍨‾琼扡敬㰾牴戠捧汯牯✽䔣䔷䔷❆㰾摴倾獯整⁤湏›〱㈭ⴷ〲〱ㄠ㨴㌰㔺਱‬祂›教桳慷瑮丠祡歡⠊湹祡歡捀獩潣挮浯਩硅散汬湥㱴琯㹤⼼牴㰾牴戠捧汯牯✽䔣䔷䔷❆㰾摴㰾琯㹤⼼牴㰾牴戠捧汯牯✽䔣䔷䔷❆㰾摴倾獯整⁤湏›〱㈭ⴷ〲〱ㄠ㨴㐰ㄺ਱‬祂›教桳慷瑮丠祡歡⠊湹祡歡捀獩潣挮浯਩敶祲朠潯㱤琯㹤⼼牴㰾牴戠捧汯牯✽䔣䔷䔷❆㰾摴㰾琯㹤⼼牴㰾牴戠捧汯牯✽䔣䔷䔷❆㰾摴倾獯整⁤湏›〱㈭ⴷ〲〱ㄠ㨴㜱㌺ਸ਼‬祂›教桳慷瑮丠祡歡⠊湹祡歡捀獩潣挮浯਩桔獩椠⁳潴琠獥㱴琯㹤⼼牴㰾琯扡敬㰾牢⼠‾格㸳潐瑳夠畯⁲潃浭湥㱴栯㸳㰠潦浲愠瑣潩㵮䌢浯敭瑮即牥汶瑥•敭桴摯∽敧≴渠浡㵥挢浯敭瑮潆浲•湯畳浢瑩∽爠瑥牵慖楬慤整潆浲⤨∻‾琼扡敬†眠摩桴∽〳∰栠楥桧㵴㌢〰㸢ठ琼㹲琼㹤氼扡汥映牯∽慮敭㸢潃浭湥㩴猼慰汣獡㵳洢湡呤汃獡≳⨾⼼灳湡㰾氯扡汥㰾牢㸯琼硥慴敲⁡慮敭∽潣瑮湥≴椠㵤挢浯敭瑮硔䅴敲≡挠慬獳∽整瑸牡慥氠牡敧•潣獬∽㠲•潲獷∽∶㸠⼼整瑸牡慥㰾琯㹤⼼牴㰾牴㰾摴㰾慬敢潦㵲渢浡≥举浡㩥猼慰汣獡㵳洢湡呤汃獡≳⨾⼼灳湡㰾氯扡汥㰾牢㸯椼灮瑵椠㵤渢浡≥琠灹㵥琢硥≴渠浡㵥渢浡≥挠慬獳∽慮敭•慶畬㵥∢洠硡敬杮桴∽㔲∵†楳敺∽㘳⼢㰾琯㹤⼼牴㰾牴㰾摴㰾慬敢潦㵲攢慭汩㸢ⵅ慍汩㰺灳湡挠慬獳∽慭摮䍔慬獳㸢㰪猯慰㹮⼼慬敢㹬戼⽲㰾湩異⁴摩∽浥楡≬琠灹㵥琢硥≴渠浡㵥攢慭汩•汣獡㵳攢慭汩•慶畬㵥∢洠硡敬杮桴∽㔲∵†楳敺∽㘳⼢㰾琯㹤⼼牴㰾牴㰾摴㰾湩異⁴琠灹㵥猢扵業≴†慮敭∽潰瑳•慶畬㵥倢獯≴㸯⼼摴㰾琯㹲⼼慴汢㹥⼼潦浲
I tried to use String instead of StringBuffer, but that didn't help. I also tried to set the content type header as follows
response.setContentType("text/html;charset=UTF-8");
before getting the response writer, but that did also not help.
In the DB there are no issues with the data as I have used the same data for 2 different purposes. In one I get correct output, but in other I get the above junk. I have used the above code in JSP using scriptlets. I have also given content type for the JSP.

Getting Chinese characters as Mojibake indicates that you're incorrectly showing UTF-16LE data as UTF-8. UTF16-LE stores each character in 4 bytes. In UTF-8, the 4-byte panels contains usually CJK (Chinese/Japanese/Korean) characters.
To fix this, you need to either show the data as UTF-16LE or to have stored the data in the DB as UTF-8 from the beginning on. Since you're attempting to display them as UTF-8, I think that your DB has to be reconfigured/converted to use UTF-8 instead of UTF-16LE.
Unrelated to the concrete problem, storing HTML (that was what those characters originally represent) in a database is really a bad idea ;) This was the original content:
<h3>Comments</h3> <table><tr bgcolor='#E7E7EF'><td>Posted On: 10-27-2010 14:03:51
, By: Yeshwant Nayak
(ynayak#cisco.com)
Excellent</td></tr><tr bgcolor='#E7E7EF'><td></td></tr><tr bgcolor='#E7E7EF'><td>Posted On: 10-27-2010 14:04:11
, By: Yeshwant Nayak
(ynayak#cisco.com)
very good</td></tr><tr bgcolor='#E7E7EF'><td></td></tr><tr bgcolor='#E7E7EF'><td>Posted On: 10-27-2010 14:17:36
, By: Yeshwant Nayak
(ynayak#cisco.com)
This is to test</td></tr></table><br /> <h3>Post Your Comment</h3> <form action="CommentsServlet" method="get" name="commentForm" onsubmit=" return ValidateForm();"> <table width="300" height="300"> <tr><td><label for="name">Comment:<span class="mandTClass">*</span></label><br/><textarea name="content" id="commentTxtArea" class="textarea large" cols="28" rows="6" ></textarea></td></tr><tr><td><label for="name">Name:<span class="mandTClass">*</span></label><br/><input id="name" type="text" name="name" class="name" value="" maxlength="255" size="36"/></td></tr><tr><td><label for="email">E-Mail:<span class="mandTClass">*</span></label><br/><input id="email" type="text" name="email" class="email" value="" maxlength="255" size="36"/></td></tr><tr><td><input type="submit" name="post" value="Post"/></td></tr></table></form
Here's how you can turn this incorrectly encoded Chinese back to normal characters:
String incorrect = "格㸳潃浭湥獴⼼㍨‾琼扡敬㰾牴戠捧汯";
String original = new String(incorrect.getBytes("UTF-16LE"), "UTF-8");
Note that this should not be used as solution! It was just posted as an evidence of the root cause of the problem.

Clearly, you have some kind of encoding problem here, but my guess is it is on the server or database side, not in the browser.
In the DB there are no issues with the data as i have used the same data for 2 different options,but in one i get correct output n in other junk.
I don't find that argument convincing. In fact, I think you may be overlooking the real cause of the problem.
What I think you need to do is add some server-side logging to capture what is actually in that StringBuffer that you are sending to the PrintWriter
Also, look at what is different about the way that the server side handles the "2 different options". (What do you mean by that phrase?).
Finally, please provide some REAL code, not just 3 line snippets that won't compile.

Related

JSON Data Limit

We have grails 2.2.4 application running on Tomcat that works with user camera and keystrokes, collects some data on the client side with Javascript and sends using POST.
In the view that collects data we have:
<g:form name="testResultsForm" id="testResultsForm" controller="customer" action="thankYou" method="post">
<h3>Dummy data!</h3>
<input type="text" style="visibility: hidden" name="testResults" id="testResults"/>
<button type="submit" class="btn btn-default">Submit dummy data</button>
</g:form>
In the JS, we assign all camera data to this html element and submit the form:
TestUtils.setValue('testResults', sendData);
$("#testResultsForm").submit();
In the grails controller we have the following line to parse the JSON:
def data = JSON.parse(params.testResults)
Everything works as expected except for when the user takes longer than normal and puts in lots of keystrokes. The errors looks something like:
2014-06-14 01:22:14,323 [http-8443-16] ERROR (org.codehaus.groovy.grails.web.errors.GrailsExceptionResolver) - JSONException occurred when processing request: [POST] /qbcheck/customer/thankYou
Expected a ',' or ']' at character 524288 of {"patkey":"","test_version":"1.4","data_version":"1.3","patientid":"","test_date":"","test_duration":0,"gender":"","dob":"","fov":62,"fps":26,"scale_factor":0,"country":46,"camera_data":{"x":[353,353,353,353,3......
It always fails at character 524288. This led us to investigate that there might be a limit on the amount of data, we looked at Tomcat and found that it allows 2MB data by default using maxpostsize property. Still we updated it to a bigger number just to be sure. Similarly, we tried looking on Grails and JS side but were not able to find any limitation.
Looking for any pointers in this regard. We are able to provide more details as required.
We found that the html "input" has a hard limit of 512 KB. Ideally, we should have received some kind of error/warning when trying to assign more data to an input value through JS. However, that does not happen
So we changed the input which was previously defined as:
<input type="text" style="visibility: hidden" name="testResults" id="testResults"/>
To a text-area:
<textarea style="visibility: hidden" name="testResults" id="testResults"/>
And this allowed us to transfer data greater than 512 KB.

xhtml: using '#' in a value attribute makes the characters after '#' being ignored

First of all, sorry for the bad title but I don't know how to describe it better.
My problem:
I want to display pictures in a <p:lightbox> element. Unfortunately the pictures contain '#' characters in their fileneame, so they look like this for example: desert_#1#.jpg.
Here is my code:
<p:panel id="showPics" closable="false" header="Fotos: ">
<p:lightBox styleClass="imagebox">
<p:dataList value="#{myBean.fotoList}" var="fl" >
<h:outputLink value="#{request.contextPath}/resources/pics/#{fl.PictureName}" title="#{fl.PictureName}" >
<h:graphicImage value="#{request.contextPath}/resources/pics/#{fl.PictureName}"/>
</h:outputLink>
</p:dataList>
</p:lightBox>
</p:panel>
the beanvalue #{fl.PictureName} returns the filename, so in our example desert_#1#.jpg
Now when I'm running my application, I get this error message:
Problem accessing /resources/pics/desert_. Reason: Not Found
So my guess is that the # characters in the picturename are recognized as references (or whatever you call them) to a beanmethod/value, which they of course aren't. Therefore the string after the first '#' in the filename isn't recognized anymore.
Unfortunately I cannot simply change the filename to get rid of the '#'s.
Could somebody tell me how to fix this? Thank you in advance!
UPDATE: I'm using JSF2.0 with Primefaces and Primefaces mobile components (since my application is a mobile web application) and Spring webflow framework. My IDE is Netbeans.
On the server side, encode the bean value using URI percent encoding. In this case, the property should use %23 where # is in the filename.

Data from database not same in java string

Here is my data when i view in SQL Developer tool
introduction
topic 1
topic end
and after i read it using a ResultSet,
ResultSet result = stmt.executeQuery();
result.getString("description")
and display in JSP page as
<bean:write name="data" property="description" />
but it will display like this
introduction topic 1 topic end
how can i keep the display same as in the SQL Developer?
Newlines aren't preserved in HTML. You need to either tell the browser it's preformatted:
<pre>
<bean:write name="data" property="description"/>
</pre>
Or replace the newlines with HTML line breaks. See this question for examples.
how can i keep the display same as in the SQL Developer?
The data presumably contains line breaks, e.g. "\r\n" or "\n". If you look at the source of your JSP, you'll probably see them there. However, HTML doesn't treat those as line breaks for display purposes - you'll need to either use the <br /> tag, or put each line in a separate paragraph, or something similar.
Basically, I don't think this is a database problem at all - I think it's an HTML problem. You can experiment with a static HTML file which you edit locally and display in your browser. Once you know the HTML you want to generate, then work on integrating it into your JSP.

Removing HTML entities while preserving line breaks with JSoup

I have been using JSoup to parse lyrics and it has been great until now, but have run into a problem.
I can use Node.html() to return the full HTML of the desired node, which retains line breaks as such:
Glóandi augu, silfurnátt
<br />Blóð alvöru, starir á
<br />Óður hundur er í vígamóð, í maga... mér
<br />
<br />Kolniður gref, kvik sem dreg hér
<br />Kolniður svart, hvergi bjart né
But has the unfortunate side-effect, as you can see, of retaining HTML entities and tags.
However, if I use Node.text(), I can get a better looking result, free of tags and entities:
Glóandi augu, silfurnátt Blóð alvöru, starir á Óður hundur er í vígamóð, í maga... mér Kolniður gref, kvik sem dreg hér Kolniður svart,
Which has another unfortunate side-effect of removing the line breaks and compressing into a single line.
Simply replacing <br /> from the node before calling Node.text() yields the same result, and it seems that that method is compressing the text onto a single line in the method itself, ignoring newlines.
Is it possible to have the best of both worlds, and have tags and entities replaced correctly which preserving the line breaks, or is there another method or way of decoding entities and removing tags without having to replace them manually?
(disclaimer) I haven't used this API ...
but a quick look at the docs suggests that you could visit each descendent node and dump out its text contents. Breaks could be inserted when special tags like <br> are encountered.
The TextNode.getWholeText() call also looks useful.
based on another answer from stackoverflow I added a few fixes and came with
String text = Jsoup.parse(html.replaceAll("(?i)<br[^>]*>", "br2nl").replaceAll("\n", "br2nl")).text();
text = text.replaceAll("br2nl ", "\n").replaceAll("br2nl", "\n").trim();
Hope this helps

Getting wrong characters in parameter

In files.jsp I am using following anchor and JSTL c:url combination -
<c:url value="downloadfile.jsp" var="dwnUrl" scope="request">
<c:param name="fileType" value="PDF"/>
<c:param name="fileId" value="${file.fileId}"/>
<c:param name="fileName" value="${file.fileName}"/>
</c:url>
Download
On downloadfile.jsp getting the file name value in JavaScript variable as -
selectedFile = <c:out value='${param.fileName}'>
Now, if file name contains some extra character e.g. XYZ 2/3" Technical then on the other page I am getting some different character as - XYZ 2/3#034; Technical
However, if I print request.getParameter("fileName"), its giving correct name. What is wrong?
The <c:out> by default escapes XML entities, such as the doublequote. This is done so to get well-formed XML and to avoid XSS.
To fix this, you should either get rid of <c:out>, since JSP 2.0, EL works perfectly fine in template text as well:
selectedFile = '${param.fileName}';
.. or, if you're still on legacy JSP 1.2 or older, set its escapeXml attribute to false:
selectedFile = '<c:out value="${param.fileName}" escapeXml="false">';
Note that I have added the singlequotes and semicolon to make JS code valid.
Needless to say, you'll need to keep XSS risks in mind if you do so.
The funky characters in your <c:param> values are being URL encoded by <c:url> as they should be. As far as downloadfile.jsp is concerned, the servlet container takes care of URL decoding incoming variables so you don't have to. This is normal behavior and shouldn't pose any problems for you.
If you simply turn escapeXml to false as #BalusC suggests, you will add an XSS vunerability to your page. Instead, you should encode the user input at the time of injection into the destination language, and escape characters that would be evaluated in the destination language. In this case, if the user input contained a single quote character (I'm assuming the string literal in your original example was supposed to be wrapped in single quotes, but the same would be true for double quotes if you were using them), any JavaScript code that followed it would be interpreted by the browser and executed. To safely do what you are trying to do, you should change the line in downloadfile.jsp to:
selectedFile = '${fn:replace(param.fileName, "'", "\'")}';
That will escape only single quotes, which would otherwise end the string literal declaration.
If you were using double quotes, then this would be appropriate:
selectedFile = "${fn:replace(param.fileName, '"', '\"')}";
It is worth noting that escapeXml could be appropriate for escaping JavaScript string literals (and it often is) when the string literal will eventually be dumped into HTML markup. However, in this case, the value should not be XML escaped as it is evaluated in the context of a file path, rather than in the context of HTML.

Categories

Resources