org.codehaus.jackson.JsonParseException: Unexpected character ('�' (code 65533 / 0xfffd)) - java

I have a Json string in the database but while converting in Java object, it gives following error:
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('�' (code 65533 / 0xfffd)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
Json is : {"crt":"wrd","name":"7|6A TTTM"}
In java code I have configured it and have made it private (not static final)
objectMapper= new ObjectMapper();
objectMapper.configure(DeserializationConfig.Feature.ACCEPT_SINGLE_VALUE_AS_ARRAY, true);
Note: It some time converts that Json string in Object but some time gives above error. Why this unexpected result comes?

Short answer solution: Remove the first occurrence of the extra added BOM text with a method, such as the following, should fix this issue:
public String cleanUpJsonBOM(String json) {
return json.trim().replaceFirst("\ufeff", "");
}
I had a similar issue which I documented in a blog post.
Hope this help!

this worked for me.
String formattedString = yourString.trim().replaceAll("\uFFFD", "");

Something is producing invalid UTF-8 sequence (or, mismatch of UTF-8 vs a single-byte encoding like ISO-8859-1), and Jackson detects this encoding problem. It has nothing to do with ACCEPT_SINGLE_VALUE_AS_ARRAY setting, as the exception comes from low-level JsonParser.
So you need to figure out why the JSON content to parse is corrupt.

Related

Illegal base64 character "a" using java.util.Base64 from within Scala

Suppose I have the following Base64 encoded String from a github API call to a file:
LyoKICogQ29weXJpZ2h0IDIwMTkgY29tLmdpdGh1Yi50aGVvcnlkdWRlcwog
KgogKiBMaWNlbnNlZCB1bmRlciB0aGUgQXBhY2hlIExpY2Vuc2UsIFZlcnNp
b24gMi4wICh0aGUgIkxpY2Vuc2UiKTsKICogeW91IG1heSBub3QgdXNlIHRo
aXMgZmlsZSBleGNlcHQgaW4gY29tcGxpYW5jZSB3aXRoIHRoZSBMaWNlbnNl
LgogKiBZb3UgbWF5IG9idGFpbiBhIGNvcHkgb2YgdGhlIExpY2Vuc2UgYXQK
ICoKICogICAgIGh0dHA6Ly93d3cuYXBhY2hlLm9yZy9saWNlbnNlcy9MSUNF
TlNFLTIuMAogKgogKiBVbmxlc3MgcmVxdWlyZWQgYnkgYXBwbGljYWJsZSBs
YXcgb3IgYWdyZWVkIHRvIGluIHdyaXRpbmcsIHNvZnR3YXJlCiAqIGRpc3Ry
aWJ1dGVkIHVuZGVyIHRoZSBMaWNlbnNlIGlzIGRpc3RyaWJ1dGVkIG9uIGFu
ICJBUyBJUyIgQkFTSVMsCiAqIFdJVEhPVVQgV0FSUkFOVElFUyBPUiBDT05E
SVRJT05TIE9GIEFOWSBLSU5ELCBlaXRoZXIgZXhwcmVzcyBvciBpbXBsaWVk
LgogKiBTZWUgdGhlIExpY2Vuc2UgZm9yIHRoZSBzcGVjaWZpYyBsYW5ndWFn
ZSBnb3Zlcm5pbmcgcGVybWlzc2lvbnMgYW5kCiAqIGxpbWl0YXRpb25zIHVu
ZGVyIHRoZSBMaWNlbnNlLgogKi8KCnBhY2thZ2UgY29tLmdpdGh1Yi50aGVv
cnlkdWRlcy5tb2RlbAoKaW1wb3J0IGNvbS5naXRodWIudGhlb3J5ZHVkZXMu
dXRpbC5LaXZ5UHJldHR5UHJpbnRlcgppbXBvcnQgb3JnLmJpdGJ1Y2tldC5p
bmt5dG9uaWsua2lhbWEuPT0+CmltcG9ydCBvcmcuYml0YnVja2V0Lmlua3l0
b25pay5raWFtYS5yZXdyaXRpbmcuUmV3cml0ZXIuXwppbXBvcnQgb3JnLmJp
dGJ1Y2tldC5pbmt5dG9uaWsua2lhbWEucmV3cml0aW5nLlN0cmF0ZWd5Cgov
KioKICogQmFzZSBUeXBlIGZvciBhbGwgbm9kZXMgb2YgYSBLaXZ5LUFTVAog
Ki8KdHJhaXQgQVNUTm9kZSBleHRlbmRzIEZvbGRhYmxlQVNUIHsgc2VsZiA9
PgogIC8qKgogICAqIFRyYXZlcnNlcyB0aGUgQVNUTm9kZSBhbmQgYXBwbGll
cyBTdHJhdGVneSBgc2Agb250byBgc2VsZmAgYW5kIGFsbCBjaGlsZHJlbiBv
ZiBzZWxmLgogICAqCiAgICogYHNgIGlzIGhlcmVieSBhcHBsaWVkIGJvdHRv
bSB1cCBpbiBsZWZ0IHRvIHJpZ2h0IG9yZGVyLgogICAqCiAgICogQHNlZSBb
W2h0dHBzOi8vYml0YnVja2V0Lm9yZy9pbmt5dG9uaWsva2lhbWEvc3JjLzAz
MjYzMGZhMjFkZGFkNWNmMzNjYmQ2ZWY5YzJmMDI3ODY2MWE2NzUvd2lraS9S
ZXdyaXRpbmcubWRdXQogICAqIEBwYXJhbSBzIHN0cmF0ZWd5IHRoYXQgaXMg
YXBwbGllZCB0byBgc2VsZmAgYW5kIGFsbCBjaGlsZHJlbi4KICAgKiBAcmV0
dXJuIGEgcmV3cml0dGVuIEFTVE5vZGUgYWNjb3JkaW5nIHRvIHRoZSBzdHJh
dGVneSBgc2AKICAgKi8KICBwcml2YXRlW3RoZW9yeWR1ZGVzXSBkZWYgdHJh
dmVyc2VBbmRBcHBseShzOlN0cmF0ZWd5KTpBU1ROb2RlCgogIC8qKgogICAq
IFJld3JpdGUgdGhlIEFTVE5vZGUgYHNlbGZgIGJ5IHRoZSBzcGVjaWZpY2F0
aW9uIG9mIGEgcGFydGlhbCBmdW5jdGlvbiBgZnBgLgogICAqCiAgICogSWYg
d2Ugd2FudCB0byBjaGFuZ2UgYSBzcGVjaWZpYyBbW21vZGVsLlB5dGhvbl1d
LW5vZGUgaW4gdGhlIEFTVCBmb3IgZXhhbXBsZSB3ZSBjb3VsZAogICAqIGFw
cGx5IHRoZSBmb2xsb3dpbmcgcmV3cml0ZSBzdHJhdGVneToKICAgKnt7ewog
ICAqICAgYXN0LnJld3JpdGUoewogICAqICAgIGNhc2UgUHl0aG9uKCJbMSwy
LDNdIikgPT4gUHl0aG9uKCJbMSwyLDMsNF0iKQogICAqICAgfSkKICAgKn19
fQogICAqCiAgICogUGxlYXNlIG5vdGUsIHRoYXQgQVNUTm9kZXMgY2FuIG5v
dCBiZSByZXdyaXR0ZW4gYXJiaXRyYXJpbHkuIFNpbmNlIGVhY2ggQVNUTm9k
ZSBpbXBsaWVzCiAgICogYSBzcGVjaWZpYyBwYXJhbWV0ZXIgbGlzdC4gQW4g
QVNUIGhhcyB0byBzdGF5IHN0cnVjdHVyZS1jb25zaXN0ZW50IGFmdGVyIGFw
cGx5aW5nIHJld3JpdGluZyBydWxlcy4KICAgKiBBIHJld3JpdGluZyBydWxl
IGFzOgogICAqIHt7ewogICAqICAgewogICAqICAgIGNhc2UgUHl0aG9uKHMp
ID0+IFRvcExldmVsKE5pbCkKICAgKiAgIH0KICAgKiB9fX0KICAgKiBpcyBu
b3QgdmFsaWQgYXMgYSBbW21vZGVsLlRvcExldmVsXV0tbm9kZSBjYW4gbm90
IG9jY3VyIGF0IHBvc2l0aW9ucyB3aGVyZSBhIFtbbW9kZWwuUHl0aG9uXV0t
bm9kZSBjYW4uCiAgICoKICAgKiBAc2VlIFtbaHR0cHM6Ly9iaXRidWNrZXQu
b3JnL2lua3l0b25pay9raWFtYS9zcmMvMDMyNjMwZmEyMWRkYWQ1Y2YzM2Ni
ZDZlZjljMmYwMjc4NjYxYTY3NS93aWtpL1Jld3JpdGluZy5tZF1dCiAgICog
QHBhcmFtIGZwIFBhcnRpYWwgZnVuY3Rpb24gdGhhdCBkZWZpbmVzIGhvdyB0
aGUgYXN0IHNob3VsZCBiZSByZXdyaXR0ZW4uCiAgICogQHJldHVybiBBIHJl
d3JpdHRlbiBBU1QgYWNjb3JkaW5nIHRvIHRoZSBzcGVjaWZpY2F0aW9uIGlu
IGBmcGAgb3IgdGhlIHNhbWUgYXN0IGlmIGBmcGAgY291bGQgbm90IGJlIGFw
cGxpZWQuCiAgICovCiAgZGVmIHJld3JpdGUoZnA6QVNUTm9kZSA9PT4gQVNU
Tm9kZSk6IEFTVE5vZGUgPSBzZWxmLnRyYXZlcnNlQW5kQXBwbHkocnVsZShm
cCkpCgogIC8qKgogICAqIFRyYW5zZm9ybXMgYHNlbGZgIGludG8gYSB3ZWxs
IGZvcm1hdHRlZCBraXZ5IHByb2dyYW0gdGhhdCBjYW4gYmUgd3JpdHRlbgog
ICAqIGludG8gYSBmaWxlLgogICAqCiAgICogVGhlIGZvbGxvd2luZyBBU1RO
b2RlIGZvciBleGFtcGxlOgogICAqIHt7ewogICAqICAgVG9wTGV2ZWwoCiAg
ICogICAgTGlzdCgKICAgKiAgICAgIFJvb3QoCiAgICogICAgICAgIFdpZGdl
dCgKICAgKiAgICAgICAgICBQbG90LAogICAqICAgICAgICAgIExpc3QoCiAg
ICogICAgICAgICAgICBXaWRnZXQoCiAgICogICAgICAgICAgICAgIExpbmVH
cmFwaCwKICAgKiAgICAgICAgICAgICAgTGlzdCgKICAgKiAgICAgICAgICAg
ICAgICBQcm9wZXJ0eShiYWNrZ3JvdW5kX25vcm1hbCxMaXN0KCcnKSksCiAg
ICogICAgICAgICAgICAgICAgUHJvcGVydHkoYmFja2dyb3VuZF9jb2xvcixM
aXN0KFswLDAsMCwxXSkpCiAgICogICApKSkpKSkpCiAgICogfX19CiAgICoK
ICAgKiBpcyBwcmludGVkOgogICAqIHt7ewogICAqIFBsb3Q6CiAgICogIExp
bmVHcmFwaDoKICAgKiAgICBiYWNrZ3JvdW5kX25vcm1hbDogJycKICAgKiAg
ICBiYWNrZ3JvdW5kX2NvbG9yOiBbMCwwLDAsMV0KICAgKiB9fX0KICAgKgog
ICAqIEByZXR1cm4gQSBmb3JtYXR0ZWQgQVNUTm9kZSB0aGF0IGNhbiBiZSBp
bnRlcnByZXRlZCBhcyBhIEtpdnkgZmlsZS4KICAgKi8KICBkZWYgcHJldHR5
OlN0cmluZyA9IEtpdnlQcmV0dHlQcmludGVyLmZvcm1hdChzZWxmKS5sYXlv
dXQKfQ==
As far as I see, this encoding is correct and only contains the standard alphabet of characters for a Base64 encoding. If I decode this encoding here, I get a correct translation. However, I tried various approaches to decode it programmatically and did not find a solution yet.
Let contentEncoded be the string containing the encoded file. I tried the following:
java.util.Base64.getDecoder.decode(contentEncoded)
java.util.Base64.getDecoder.decode(contentEncoded.getBytes)
java.util.Base64.getDecoder.decode(contentEncoded.getBytes(StandardCharsets.UTF_8))
java.util.Base64.getUrlDecoder.decode(contentEncoded))
java.util.Base64.getUrlDecoder.decode(contentEncoded.getBytes(StandardCharsets.UTF_8))
java.util.Base64.getMimeDecoder.decode(contentEncoded.replaceAll("\\n", "").replaceAll("\\r", ""))
However, all of them resulted in an error message: java.lang.IllegalArgumentException: Illegal base64 character a.
My question is: Am I not seeing something obvious? Are there some hidden control characters? Has anybody had similar issues and was able to fix them?
Just remove line breaks and it should work.
contentEncoded.replace("\n", "")
The following snippet decodes the encoding correctly:
val decodedWithMime = java.util.Base64.getMimeDecoder.decode(contentEncoded)
val convertedByteArray = decodedWithMime.map(_.toChar).mkString
as pointed out by comments, the error message Illegal Base64 character a corresponds to the hex value for the newline character \n. Using the Mime Decoder it is possible to decode the string without removing the newline characters beforehand.

How to parse json string with UTF-8 characters using java?

I have a json string with SUBSTITUTE () utf-8 character. I'm getting parsing exception when I try to convert json string to java object using jackson. Can you please let me know how to encode and decode utf-8 characters ?
ObjectMapper mapper = new ObjectMapper();
mapper.readValue(jsonString, MY_DOMAIN_OBJECT.class);
jsonString:
{"studentId":"753253-2274", "information":[{"key":"1","value":"Get alerts on your phone(SUBSTITUTE character is present here. Unable to paste it)To subscribe"}]}
Error:
Illegal unquoted character ((CTRL-CHAR, code 26)): has to be escaped using backslash to be included in string value
Can you try this?
ObjectMapper mapper = new ObjectMapper();
mapper.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, true);
mapper.readValue(jsonString, MY_DOMAIN_OBJECT.class);
I hope it helps you:
Javadoc
Feature that determines whether parser will allow JSON Strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed characters) or not. If feature is set false, an exception is thrown if such a character is encountered.
Since JSON specification requires quoting for all control characters, this is a non-standard feature, and as such disabled by default.

Getting JsonException while using JSONML.toJSONObject()

I'm using JSONML for converting xml String to JSONObject.
This is my xml String
"<soapenv:Body xmlns:soapenv=\"http://schemas.xmlsoap.org/soap/envelope/\"><jsonArray><jsonElement><message>entity is deleted<\/message><errorCode>ENTITY_IS_DELETED<\/errorCode><\/jsonElement><jsonElement><message>entity is deleted<\/message><errorCode>ENTITY_IS_DELETED<\/errorCode><\/jsonElement><\/jsonArray><\/soapenv:Body>"
when I try JSONML.toJSONObject() It gives me
Caused by: org.json.JSONException: Bad character in a name at 32 [character 33 line 1]
at org.json.JSONTokener.syntaxError(JSONTokener.java:433)
at org.json.XMLTokener.nextToken(XMLTokener.java:288)
at org.json.JSONML.parse(JSONML.java:173)
at org.json.JSONML.toJSONObject(JSONML.java:286)
at org.json.JSONML.toJSONObject(JSONML.java:304)
at com.thbs.automaton.commonUtils.TestcaseUtils.compareXml(TestcaseUtils.java:144)
... 57 more
Its due to the escape character (\). I tried resolving this by removing all the \ characters , which solved my problem. However I don't think its a good practice.
Can anyone suggest a better approach?
The "\"s shows the original String is not a "XML String". It is an "escaped XML String". You should find out why and how the XML String is escaped.
Maybe it because of transferring as JSON. In that case, you should transform the original(JSON) String into data String, so to say a XML String. With code like this
String xmlString = jsonParser(originalString, String.class);
after that run as yours
JSONML.toJSONObject(xmlString);

Illegal Character in XML are not being replaced

SOLUTION So this was not an xml issue at all. My xml escapes were done properly, however there was an encoding issue. So i would like to share my solution with everyone, i hope you find this useful.
public static String entityEncode(String text) throws UnsupportedEncodingException {
String result = text;
if (result == null) {
return result;
}
byte ptext[] = result.getBytes("ISO-8859-1");
String value = new String(ptext, "UTF-8");
String temp = XMLStringUtil.escapeControlChrs(value);
return temp;
}
EXPLANATION The xml function above is for XML 1.0. We take our given text, convert it into a byte since String does not have an associated encoding. After which we create a new string off of the byte in "UTF-8". That is also why java was just telling me that character reference error with &#, it couldn't recognize the character at fault. Now that I did the encoding and assigned it to UTF-8, there are no issues and the xml escape proceeds properly!
EDIT: How do i print out all illegal xml characters in the provided string? According to StringEscapeUtils.escapeXml parameters? The problem i have is that i don't want to escape everything, because it doesn't properly decode after. So right now, i just need to find out what my invalid characters in the text are. The oens that are causing issues and need to be encoded.
I have the following error message:
ERROR: 'Character reference "&#'
ERROR: 'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Character reference "&#'
It does not specifically tell me what the character is which is a problem.
I do my original XML parse to convert to an xml document and then after that. I sanitize further to remove illegal characters
String xml10pattern = "[^"
+ "\u0009\r\n"
+ "\u0020-\uD7FF"
+ "\uE000-\uFFFD"
+ "\ud800\udc00-\udbff\udfff"
+ "]";
However, it's not removing them so i'm not sure how to go about this. Currently i have:
String temp = entityEncode(temp);
String legal = temp.replaceAll(xml10pattern , "");
item.setResponseBody(legal);
Entity encode just uses a standard xml parse class to escape characters XMLStringUtil.escapeControlChrs which is based off of StringEscapeUtils.escapeXml and just has additional escapes, nothing removed. But something is being missed.

HTML in Json format fails to get parsed

I am playing around with the Twitter API, and want to parse the json response from the user_timeline GET call.
You can find an example of such a call here.
For this, I am using Jackson, as such :
URL url = new File("D:\\01_perso\\like-minded-friends\\test\\resources\\\\tweet_sample.json").toURI().toURL();
ObjectMapper om = new ObjectMapper();
//om.configure(JsonParser.Feature.ALLOW_COMMENTS, true);
JsonNode node = om.readTree(url);
System.out.print(node.path("id").asLong());
The issue is that the parsing fails with the following error :
com.fasterxml.jackson.core.JsonParseException: Unexpected character ('/' (code 47)): maybe a (non-standard) comment? (not recognized as one since Feature 'ALLOW_COMMENTS' not enabled for parser)
at [Source: file:/D:/01_perso/like-minded-friends/test/resources/tweet_sample.json; line: 98, column: 26]
The error comes from the fact that the JSON contains some HTML hrefs, and that they contain slashes.
"source": "YoruFukurou",
I am searching for a way to have jackson ignore those characters.
As you can see above, I tried to use the ALLOW_COMMENTS feature, but it doesn't work since the html is then taken as a comment and that ends up eating the final comma away from the json, leading to another error.
Is there any way to tell jackson to simply accept those HTML lines, or at least ignore them? (I do not actually need them, so the source elements could simply be skipped.)
Thanks

Categories

Resources