Jackson JSON Handling of Unicode symbols - java

I'm calling a webservice that returns text including the ascii symbols representing the ® symbol. For example:
ACME Corp® Services
I use spring to return this textual data directly as a JSON object, and by the time it gets into the browser the json data remains correct:
"service": "ACME Corp® Services"
But upon being rendered via a Handlebars template and written into the page I get:
ACME Corp® Services
Do I need to sanitize the JSON data before using it? If so, what are the best practices for doing that? Otherwise, perhaps there is a change I should make on the back-end but I am not sure what that would be.

You do not need to sanitize content, but you must make sure it uses valid encoding allowed by JSON specification: typically UTF-8 (alternatives being UTF-16 and UTF-32).
If content is not encoded as UTF-8 but something else (like ISO-8859-1 aka "Latin-1"), you will need to construct Reader to decode it properly:
Reader r = new InputStreamReader(in, StandardCharset.ISO_8859_1);
MyPOJO pojo = mapper.readValue(r, MyPOJO.class);
Problem you seem to be having is that encoding used is incorrect.

Related

Extending JBoss Data Virt ws translator to handle JSONP

What is the best way to extend org.teiid.translator.ws to connect to a webservice that returns JSONP (whose mediatype is usually application/javascript)? The existing ws translator can read only JSON or XML. In general, was the translator designed to facilitate the injection of transformation logic to handle any webpage structure/format (e.g., JSONP, plaintext, html, etc.)?
For JSONP, I am leaning towards creating my own implementation of org.teiid.core.types.InputStreamFactory, say com.acme.JsonpToJsonInputStreamFactory, in which I define my own JsonpToJsonReaderInputStream (extending ReaderInputStream) that skips the leading
randomFunctionName(
and trailing
)
of a JSONP payload, and modify ClobInputStreamFactory.getInputStream to return that, instead of ReaderInputStream. Then I replace both instances of
ds = new InputStreamFactory.ClobInputStreamFactory(...);
in translator-ws-jsonp.BinaryWSProcedureExecution (where translator-ws-jsonp is based on translator-ws) with
ds = new JsonpToJsonInputStreamFactory.ClobInputStreamFactory(...);
WS translator returns the results in Blob form, how you unpack the results is up to you. IMO, you do not really need to build another translator.
Currently, the typical use case in JDV is to read blob and use JSONTOXML function to convert into XML such that the results can be then parsed into a tabular structure using constructs like XMLTABLE. So, you can write a UDF like JSONPTOJSON which does above like you mention then use JSONTOXML(JSONPTOJSON(blob)) as input to XMLTABLE.

Android parse special characters in json response

In my Android application I get JSON response string from a PHP url. from the response I get some hotel names with apostrophe, I get &#039 character instead of apostrophe. How can I parse the hotel with special characters in android? I can see the apostrophe in the browser but could not see in android logcat.
I have tried jresponse = URLEncoder.encode(jresponse,"UTF-8"); but I could not get apostrophe for hotel name.
This is the one of the hotel name in the response.
I see the following in browser.
{"id":12747,
"name":"Oscar's",
....
}
But in the logcat:
id 12747
name Oscar's
Use the decoder instead of encoder. URLDecoder.decode(jresponse,"UTF-8")
Use ISO-8859-2 when you create the URLEncodedEntity that you send off. You can set this as a parameter in the constructor.
Without a specified charset, you are probably sending the data in UTF-8/UTF-16 (most common) which the server is interpreting in a different way.
EDIT: It looks like ISO-8859-2 doesn't support ñ. You may have to change something server-side. http://en.wikipedia.org/wiki/ISO/IEC_8859-2
You can try Html class. eg :-
jresponse = Html.fromHtml(jresponse);

How to check encoding in java?

I am facing a problem about encoding.
For example, I have a message in XML, whose format encoding is "UTF-8".
<message>
<product_name>apple</product_name>
<price>1.3</price>
<product_name>orange</product_name>
<price>1.2</price>
.......
</message>
Now, this message is supporting multiple languages:
Traditional Chinese (big5),
Simple Chinese (gb),
English (utf-8)
And it will only change the encoding in specific fields.
For example (Traditional Chinese),
蘋果
1.3
橙
1.2
.......
Only "蘋果" and "橙" are using big5, "<product_name>" and "</product_name>" are still using utf-8.
<price>1.3</price> and <price>1.2</price> are using utf-8.
How do I know which word is using different encoding?
It looks like whoever is providing the XML is providing incorrect XML. They should be using a consistent encoding.
http://sourceforge.net/projects/jchardet/files/ is a pretty good heuristic charset detector.
It's a port of the one used in Firefox to detect the encoding of pages that are missing a charset in content-type or a BOM.
You could use that to try and figure out the encoding for substrings in a malformed XML file if you can't get the provider to fix their output.
you should use only one encoding in one xml file. there are counterparts of the characters of big5 in the UTF_8 encoding.
Because I cannot get the provider to fix the output, so I should be handle it by myself and I cannot use the extend library in this project.
I only can solve that like this,
String str = new String(big5String.getByte("UTF-8"));
before display the message.

send arabic SMS on mobile in java

in my application there is both arabic and english language suport but i am facing a problem when the mobile receive arabic SMS it is displaied as ??? ???? (question marks) knowing that the monbile i am using for testing supports arabic and all the arabic in the application is working fine the problem is only when an arabic SMS is received by my mobile.
String ff = new String(smsContent.getBytes("UTF-8"), "UTF-8");
StringWriter stringBuffer = new StringWriter();
PrintWriter pOut = new PrintWriter(stringBuffer);
pOut.print("<?xml version=\"1.0\" encoding=\"utf-8\"?>");
pOut.print("<!DOCTYPE MESSAGE SYSTEM \"http://127.0.0.1/psms/dtd/messagev12.dtd\" >");
pOut.print("<MESSAGE VER=\"1.2\"><USER USERNAME=\""+userName+"\" PASSWORD=\""+password+"\"/>");
pOut.print("<SMS UDH=\"0\" CODING=\"1\" TEXT=\""+ff+"\" PROPERTY=\"0\" ID=\"2\">");
pOut.print("<ADDRESS FROM=\""+fromNo+"\" TO=\""+toNO+"\" SEQ=\"1\" TAG=\"\" />");
pOut.print("</SMS>");
pOut.print("</MESSAGE>");
pOut.flush();
pOut.close();
URL url = new URL("url");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setDoOutput(true);
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(connection.getOutputStream()));
out.write("data="+message+"&action=send");
out.flush();
SMS in english working file in my application.
First, new String(smsContent.getBytes("UTF-8"), "UTF-8") is a redundant roundtrip, equivalent to smsContent. First you encode the string as bytes via UTF-8, and then immediately decode it back from the bytes again.
Second, your method of puzzling together XML is completely broken. You can't just concatenate strings and hope to end up with well-formed XML. Just for example think about what happens if someone tries to send a "? Use an XML library.
Third, you're implicitly using the platform default encoding for your OutputStreamWriter instead of explicitly specifying one, which means your code only works on those machines which randomly happen to have the correct encoding as default. I'm guessing yours does not.
Fourth, your method of puzzling together POST parameters is broken. You haven't specified what the variable message is. I'm guessing it's the complete XML document, but then you're trying to send it as a POST parameter to some kind of HTTP service, in which case it needs to be escaped/url-encoded. Just for example, what happens if someone tries to send the message &data=<whatever>&? Please clarify.
See also Using java.net.URLConnection to fire and handle HTTP requests
Fifth, since you're sending to some HTTP service, there's probably some documentation for that service what encoding to send or how to specify it, possibly with a HTTP header (Probably "Content-type: application/x-www-form-urlencoded; charset=UTF-8"?). Point us to the documentation if you can't figure it out yourself.
Edit: Found the documentation: http://www.google.se/search?q=valuefirst+pace
It pretty clearly states that you need to url encode the XML document, so that's probably what you're missing, in which case the encoding for the OutputStreamWriter won't matter as long as it's ASCII-compatible.
However, the documentation does not specify which character encoding to use for url-encoding, which is pretty weak. UTF-8 is the most likely though.
From what I've read on some internet pages, SMS in arabic languages (and others too) are encoded with UCS-2 and not UTF-8. Changing the encoding is worth a try.
You are using your platform's default encoding for the request data, which may very well differ from UTF-8. Try specifying UTF-8 in the OutputStreamWriter:
... new OutputStreamWriter(connection.getOutputStream(), "UTF-8") ...
Another issue is of course that your hand-made XML document will fail as soon as any of your parameters contain characters, which have to be escaped in XML, but that's a different story. Why don't you use an XML library instead?
Just an additional information: The documentation Christoffer points to also explains that the request example you are using is only suitable for text messages with characters in the standard SMS character set. For Unicode character support, you have to use a different request.

how to use XML sent by html form?

i have html form with textarea in which i paste some XML, for example:
<network ip_addr="10.0.0.0/8" save_ip="true">
<subnet interf_used="200" name="lan1" />
<subnet interf_used="254" name="lan2" />
</network>
When user submit form, that data is send to Java server, so in headers i get something like that:
GET /?we=%3Cnetwork+ip_addr%3D%2210.0.0.0%2F8%22+save_ip%3D%22true%22%3E%0D%0A%3Csubnet+interf_used%3D%22200%22+name%3D%22lan1%22+%2F%3E%0D%0A%3Csubnet+interf_used%3D%22254%22+name%3D%22lan2%22+%2F%3E%0D%0A%3C%2Fnetwork%3E HTTP/1.1
how can i use that in my Java applications? I need to make some calculations on that data and re-send new generated XML.
This answer shows how to use the URLDecoder/URLEncoder classes to decode and encode url strings. It should work if you passed the 'GET' string to the URLDecoders decode method.
To answer your following question (comment)
First you need to extract this xml based response from the url string. Maybe it's enough to create a substring starting with the first < char.
The String should be fed into a XML parser to create a DOM document. The last easy task would be walking through that document and copying the values to your internal network model.
Do not think about using RegExp to extract the data. Use a parser.

Categories

Resources