How to parse json string with UTF-8 characters using java? - java

I have a json string with SUBSTITUTE () utf-8 character. I'm getting parsing exception when I try to convert json string to java object using jackson. Can you please let me know how to encode and decode utf-8 characters ?
ObjectMapper mapper = new ObjectMapper();
mapper.readValue(jsonString, MY_DOMAIN_OBJECT.class);
jsonString:
{"studentId":"753253-2274", "information":[{"key":"1","value":"Get alerts on your phone(SUBSTITUTE character is present here. Unable to paste it)To subscribe"}]}
Error:
Illegal unquoted character ((CTRL-CHAR, code 26)): has to be escaped using backslash to be included in string value

Can you try this?
ObjectMapper mapper = new ObjectMapper();
mapper.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, true);
mapper.readValue(jsonString, MY_DOMAIN_OBJECT.class);
I hope it helps you:
Javadoc
Feature that determines whether parser will allow JSON Strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed characters) or not. If feature is set false, an exception is thrown if such a character is encountered.
Since JSON specification requires quoting for all control characters, this is a non-standard feature, and as such disabled by default.

Related

How to convert utf8 string to escape string in JSON Java?

I want to convert a UTF-8 string to escape \uXXX format in value of JSON Object.
I used both JSON Object and Gson, but did not work for me in this case:
JSONObject js = new JSONObject();
js.put("lastReason","nguyễn");
System.out.println(js.toString());
and
Gson gson = new Gson();
String new_js = gson.toJson(js.toString());
System.out.println(new_js);
Output: {"test":"nguyễn"}
But i am expect that my result is:
Expected Output: {"test":"nguy\u1EC5n"}
Any solutions for this case, please help me to resolve it.
You can use apache commons-text library to change a string to use Unicode escape sequences. Use org.apache.commons.text.StringEscapeUtils to translate the text before adding it to JSONObject.
StringEscapeUtils.escapeJava("nguyễn")
will produce
nguy\u1EC5n
One possible problem with using StringEscapeUtils might be that it will escape control characters as well. If there is a tab character at the end of the string it will be translated to \t. I.e.:
StringEscapeUtils.escapeJava("nguyễn\t")
will produce an incorrect string:
nguy\u1EC5n\t
You can use org.apache.commons.text.translate.UnicodeEscaper to get around this but it will translate every character in the string to a Unicode escape sequence. I.e.:
UnicodeEscaper ue = new UnicodeEscaper();
ue.translate(rawString);
will produce
\u006E\u0067\u0075\u0079\u1EC5\u006E
or
\u006E\u0067\u0075\u0079\u1EC5\u006E\u0009
Whether it is a problem or not is up to you to decide.

How to remplace CCH (CANCEL CHARACTER) in Marshalling with JAXB

Currently I am developing a Java service which converts an class to json string, however when there is any one attribute with the character ” (It is word quotes) then the marshalling process returns CCH character (CANCEL CHARACTER) which appears of this way in notepad++ app.
The following code shows that objectClassX is the object for transforming:
import com.fasterxml.jackson.databind.ObjectMapper;
ObjectMapper mapper=mapper = new ObjectMapper();
byte[] bytesJson= mapper.writeValueAsBytes(objectClassX);
String stringJson = bytesJson!=null?new String(bytesJson , "UTF-8"):"VACIO";
I am using the byteJson for doing some things, but finnaly it is has been converted to String.
I want remplacing the CCH character by an blank space, but I would like doing from ObjectMapper configuration, and of this way I would avoid doing from replace code from String.
Thanks for your help.
Regards.

HTML in Json format fails to get parsed

I am playing around with the Twitter API, and want to parse the json response from the user_timeline GET call.
You can find an example of such a call here.
For this, I am using Jackson, as such :
URL url = new File("D:\\01_perso\\like-minded-friends\\test\\resources\\\\tweet_sample.json").toURI().toURL();
ObjectMapper om = new ObjectMapper();
//om.configure(JsonParser.Feature.ALLOW_COMMENTS, true);
JsonNode node = om.readTree(url);
System.out.print(node.path("id").asLong());
The issue is that the parsing fails with the following error :
com.fasterxml.jackson.core.JsonParseException: Unexpected character ('/' (code 47)): maybe a (non-standard) comment? (not recognized as one since Feature 'ALLOW_COMMENTS' not enabled for parser)
at [Source: file:/D:/01_perso/like-minded-friends/test/resources/tweet_sample.json; line: 98, column: 26]
The error comes from the fact that the JSON contains some HTML hrefs, and that they contain slashes.
"source": "YoruFukurou",
I am searching for a way to have jackson ignore those characters.
As you can see above, I tried to use the ALLOW_COMMENTS feature, but it doesn't work since the html is then taken as a comment and that ends up eating the final comma away from the json, leading to another error.
Is there any way to tell jackson to simply accept those HTML lines, or at least ignore them? (I do not actually need them, so the source elements could simply be skipped.)
Thanks

org.codehaus.jackson.JsonParseException: Unexpected character ('�' (code 65533 / 0xfffd))

I have a Json string in the database but while converting in Java object, it gives following error:
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('�' (code 65533 / 0xfffd)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
Json is : {"crt":"wrd","name":"7|6A TTTM"}
In java code I have configured it and have made it private (not static final)
objectMapper= new ObjectMapper();
objectMapper.configure(DeserializationConfig.Feature.ACCEPT_SINGLE_VALUE_AS_ARRAY, true);
Note: It some time converts that Json string in Object but some time gives above error. Why this unexpected result comes?
Short answer solution: Remove the first occurrence of the extra added BOM text with a method, such as the following, should fix this issue:
public String cleanUpJsonBOM(String json) {
return json.trim().replaceFirst("\ufeff", "");
}
I had a similar issue which I documented in a blog post.
Hope this help!
this worked for me.
String formattedString = yourString.trim().replaceAll("\uFFFD", "");
Something is producing invalid UTF-8 sequence (or, mismatch of UTF-8 vs a single-byte encoding like ISO-8859-1), and Jackson detects this encoding problem. It has nothing to do with ACCEPT_SINGLE_VALUE_AS_ARRAY setting, as the exception comes from low-level JsonParser.
So you need to figure out why the JSON content to parse is corrupt.

How to replace invalid characters in XML string?

I have a string which was encoded by UTF-16. When parsing using javax.xml.parsers.DocumentBuilder, I got an error like this:
Character reference "&#x0" is an invalid XML character
Here is the code I used to parse the XML:
InputSource inputSource = new InputSource();
inputSource.setCharacterStream(new StringReader(xmlString));
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = factory.newDocumentBuilder();
org.w3c.dom.Document document = parser.parse(inputSource);
My question is, how to replace the invalid characters by (space)?
You just need to use String.replaceAll and pass the pattern of invalid characters.
You are trying to parse an invalid xml entity and this is what raising exception. It seems you need not to worry about UTF-16 for your situation.
Find some explanation and example here.
As an example, it is not possible to use & character for a valid xml, we need to use & instead. Here & is the xml entity.
Assuming above example should be self explanatory to understand what xml entity is.
As I understand there are some xml entity which is not valid. But no worry again. it is possible to declare & add new xml entity. Take a look at the above article for more detail.
EDIT: Assuming there is & character making the xml invalid.
StringEscapeUtils()
escapeXml
public static void escapeXml(java.io.Writer writer,
java.lang.String str)
throws java.io.IOException
Escapes the characters in a String using XML entities.
For example: "bread" & "butter" => "bread" & "butter".
Supports only the five basic XML entities (gt, lt, quot, amp, apos).
Does not support DTDs or external entities.
Note that unicode characters greater than 0x7f are currently escaped to their
numerical \\u equivalent. This may change in future releases.
Parameters:
writer - the writer receiving the unescaped string, not null
str - the String to escape, may be null
Throws:
java.lang.IllegalArgumentException - if the writer is null
java.io.IOException - if there is a problem writing
See Also:
unescapeXml(java.lang.String)

Categories

Resources