Inconsistentency in deserealizing objects with Jackson streaming API - java

I am trying to use Jackson streaming API to deserialize huge objects from XML. The idea is to combine streaming API and ObjectMapper to parse XML(or JSON) by small chunks. However I see some inconsistent behavior with XML Parser.
With this code snippet:
try {
String xml1 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo></foo>";
String xml2 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo><bar></bar></foo>";
XmlFactory xmlFactory = new XmlFactory();
JsonParser jp = xmlFactory.createParser(new ByteArrayInputStream(xml1.getBytes()));
JsonToken token = jp.nextToken();
while (token != null) {
System.out.println("xml1 token=" + token);
token = jp.nextToken();
}
jp = xmlFactory.createParser(new ByteArrayInputStream(xml2.getBytes()));
token = jp.nextToken();
while (token != null) {
System.out.println("xml2 token=" + token);
token = jp.nextToken();
}
} catch (IOException e) {
e.printStackTrace();
}
I am getting:
xml1 token=START_OBJECT
xml1 token=END_OBJECT
xml2 token=START_OBJECT
xml2 token=FIELD_NAME
xml2 token=VALUE_NULL
xml2 token=END_OBJECT
Why is the FIELD_NAME token missing for xml1? Why is there just one START_OBJECT token for the second xml? Is there any setting that would allow me to see FIELD_NAME of outer tag?

Problem is quite simple: XML module is different from most other Jackson dataformat modules in that direct access via Streaming API is not supported.
This is mentioned on project README (along with mention that "tree model" is similarly not supported).
Not supported does not necessarily mean "can not be used at all", just that its behavior is different from handling for JSON so callers really need to know what they are doing above and beyond API used for JSON content (and Smile, CBOR, YAML -- even CSV content is represented in a way that is compatible with JSON access).
While you can try to use XmlFactory and streaming parser/generator, its behavior is controlled by XmlMapper based on metadata from Java classes, to make things works correctly via databinding API (that is, XmlMapper).
With that, the reason for observed tokens is that such translation is necessary to map to expected Java object structure:
public class Foo {
public Bar bar;
}
which would map to JSON like:
json
{
"bar" : null
}
as well as XML of
xml
<foo>
<bar></bar>
</foo>
Another way to put this is that XML and JSON data models are fundamentally different, and they can not be trivially translated. Since Jackson's token model is based on JSON, some work is needed to translated XML elements and attributes into structure that equivalent JSON would have.
Above is not to say that what you try to do is impossible. There are 2 ways you might be able to make things work:
Knowing translation that XmlParser does, call getToken() expecting translation
Instead of using XmlParser directly, construct XMLStreamReader (Stax low-level streaming parser), read "raw" tokens, and construct separate XmlParser (via XmlFactory) at expected location, use that for reading.
I hope this helps.

A kid with a hammer...
I don't know much about Jackson; in fact, I just started using it, thinking of using JSON or YAML instead of XML. But for XML, we have been using XStream with success.
//Consumer side
FileInputStream fis = new FileInputStream(filename);
XStream xs = new XStream();
Object obj = xs.fromXML(fis);
fis.close();
Also, if the case is that you are also originating the serialization and it is from Java, you could use Java serialization altogether for a lower footprint and faster operation.
//producer side
FileOutputStream fos = new FileOutputStream(filename);
ObjectOutputStream oos = new ObjectOutputStream(new BufferedOutputStream(fos));
oos.writeObject(yourVeryComplexObjectStructure); //I am writing a list of ten 1MB objects
oos.flush();
oos.close();
fos.close();
//Consumer side
final FileInputStream fin = new FileInputStream(filename);
final ObjectInputStream ois = new ObjectInputStream(new BufferedInputStream(fin));
#SuppressWarnings("unchecked")
final YourVeryComplexObjectStructureType object = (YourVeryComplexObjectStructureType) ois.readObject();
ois.close();
fin.close();

Related

Overwrite JsonNode using Jackson

I am using the Jackson streaming api to read in a json file like so:
// Go through json model and grab needed resources.
JsonFactory jsonfactory = new JsonFactory();
JsonParser jp = jsonfactory.createParser(fis);
JsonToken current;
current = jp.nextToken();
ObjectMapper mapper = new ObjectMapper();
if (current != JsonToken.START_OBJECT) {
System.out.println("Error: root should be object: quiting.");
return null;
}
while (jp.nextToken() != JsonToken.END_OBJECT) {
String fieldName = jp.getCurrentName();
// move from field name to field value
if ("Field1".equals(fieldName)) {
jp.nextToken();
JsonNode json = mapper.readTree(jp);
//Manipulate JsonNode
/*Want to write back into json file in place of
old object with manipulated node*/
}
else {
jp.skipChildren();
}
}
From the code above I am basically parsing the json file until I find the desired field I am looking for and then I read that into a JsonNode object, I then go through that JsonNode object and manipulate some of the data associated with it. My question is is there a way to delete that node out of the json file and write a newly created POJO into the file with the same field name in place of the old one? Everything I can find online about it involve reading the whole json file into a JsonNode which I would like to avoid as this file can be quite large.
In-place editing of a file like that is usually pretty complicated; a simpler approach is to create a new temporary file, and for the most part just copy what you're writing until you hit the conditions to modify what's going to the new one.
Then at the end you could delete the original file and rename the temporary one to "replace" it; Unless disk space is an issue though, I personally like keeping the original source around (especially in automated systems) for troubleshooting

Java - how to store a (multi-child) tree in file?

I'm working on an open-source, cross-platform pomodoro timer with statistics support.
For tasks, I have a tree data structure like this:
class Task {
String name;
int minutesWorkedOn;
int uniqueID;
Task parent;
...
ArrayList<Task> childTasks; //Note, not binary, but can have n-children.
}
(which is actually a bit bigger in practice)
I want to store this data structure in a file between sessions.
I was considering JSON or xml, and recurse for childTasks, or write all tasks out, one task per line and piece things back together by taskID's. But JSON/XML is not a hard-requirement, I'm just thinking out loud.
Some S.O answers mention serialization, but preferably I'd like to be able to see the stored data structure as is the case with JSON or XML. Also those two formats would make it easier to build reporting tools.
Considering I'm new to java and haven't worked with File/I/O before, can someone give me a tip/advise on which route to take here?
[edit]
The solution below works well. There is an issue with loops thou. I edited the code above, a task has a backwards link to it's parent. This causes gson to crash. I might ignore this field and fix it again after the data was loaded or maybe read some more about the tutorial.
The best and easy way is to use Gson to write/read the object to a file.
Write:
//Get the json serialization of the task object
GsonBuilder builder = new GsonBuilder();
//builder.setPrettyPrinting().serializeNulls(); //optional
Gson gson = builder.create();
String json = gson.toJson(task);
try {
//write json string to a file named "/tmp/task.json"
FileWriter writer = new FileWriter("/tmp/task.json");
writer.write(json);
writer.close();
} catch (IOException e) {e.printStackTrace();}
Read:
Gson gson = new Gson();
try {
BufferedReader br = new BufferedReader(new FileReader("/tmp/task.json"));
//convert the json string from file back to object
Task task = gson.fromJson(br, Task.class);
} catch (IOException e) {
e.printStackTrace();
}

Streaming a json element

Let's say I have a json that looks like this:
{"body":"abcdef","field":"fgh"}
Now suppose the value of the 'body' element is huge(~100 MB or more). I would like to stream out the value of the body element instead of storing it in a String.
How can I do this? Is there any Java library I could use for this?
This is the line of code that fails with an OutOfMemoryException when a large json value comes in:
String inputStreamString = (String) JsonPath.read(textValue.toString(), "$.body");
'textValue' here is a hadoop.io.Text object.
I'm assuming that the OutOfMemory error occurs because we try to do method calls like toString() (which creates a new object), and JsonPath.read(), all of which are done in-memory. I need to know if there is an approach I could take while handling large-sized textValue objects.
Please let me know if you need additional info.
JsonSurfer is good for processing very large JSON data with selective extraction.
Example how to surf in JSON data collecting matched values in the listeners:
BufferedReader reader = new BufferedReader(new FileReader(jsonFile));
JsonSurfer surfer = new JsonSurfer(GsonParser.INSTANCE, GsonProvider.INSTANCE);
SurfingConfiguration config = surfer.configBuilder().bind("$.store.book[*]", new JsonPathListener() {
#Override
public void onValue(Object value, ParsingContext context) throws Exception {
JsonObject book = (JsonObject) value;
}
}).build();
surfer.surf(reader, config);
Jackson offers a streaming API for generating and processing JSON data.

Java/Android: java.lang.OutOfMemoryError while building a JSON object

I am importing JSON data from a public database URI http://data.seattle.gov/api/views/3k2p-39jp/rows.json and the rows go as far as 445454. Using the following code I am constructing the JSON object of the entire data.
HttpGet get = new HttpGet(uri);
HttpClient client = new DefaultHttpClient();
HttpResponse response = client.execute(get);
BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), "UTF-8"));
StringBuilder builder=new StringBuilder();
for(String line=null;(line = reader.readLine()) != null;){
builder.append(line).append("\n");
}
JSONTokener jsonTokener=new JSONTokener(builder.toString());
JSONObject finalJson=new JSONObject(jsonTokener);
JSONArray data=finalJson.getJSONArray("data");
Because the data is too large, i am getting 03-21 03:41:49.714: E/AndroidRuntime(666): Caused by: java.lang.OutOfMemoryError pointing the source of error at buildr.append(line).append("\n"). Is there anyway I can handle large datasets without getting memory allocation issues?
That JSON is huge!
You definitely need to use a streaming JSON parser. There are two out there for Android: GSON and Jackson.
GSON Streaming is explained at: https://sites.google.com/site/gson/streaming
I like how GSON explains the problem you're having:
Most applications should use only the object model API. JSON streaming is useful in just a few situations:
When it is impossible or undesirable to load the entire object model into memory. This is most relevant on mobile platforms where memory is limited.
Jackson Streaming is documented at: http://wiki.fasterxml.com/JacksonInFiveMinutes#Streaming_API_Example
If possible only request parts of the data - this also reduces time for network io and thus saves battery.
Otherwise you could try to not keep the incoming data in memory, but to 'stream' it onto the sd-card. When it is stored there you can then iterate over it. Most likely this will mean to use your own JSON tokenizer that does not build a full tree, but which is able to (like a SAX parser) only look at a part of the object tree at a time.
You may have a look at Jackson, which has a streaming mode, which may be applicable.
Streaming pull parser is the way. I recommend GSON, as this has small memory footpring (just pull parsing is about 16K , jackson is way bigger)
Your code is problematic because you allocate:
buffer to hold all the string data coming from service
all the JSON DOM objects
and this is slow, and gives you memory meltdown.
In case you need java objects out of your JSON data , you may try my small databinding library building on GSON (shameles self advertising off):
https://github.com/ko5tik/jsonserializer
I did it a bit differently, My JSON code was waiting for status, which comes towards the end. So I modified the code to return earlier.
// try to get formattedAddress without reading the entire JSON
String formattedAddress;
while ((read = in.read(buff)) != -1) {
jsonResults.append(buff, 0, read);
formattedAddress = ((String) ((JSONObject) new JSONObject(
jsonResults.toString()).getJSONArray("results").get(0))
.get("formatted_address"));
if (formattedAddress != null) {
Log.i("Taxeeta", "Saved memory, returned early from json") ;
return formattedAddress;
}
}
JSONObject statusObj = new JSONObject(jsonResults.toString());
String status = (String) (statusObj.optString("status"));
if (status.toLowerCase().equals("ok")) {
formattedAddress = ((String) ((JSONObject) new JSONObject(
jsonResults.toString()).getJSONArray("results").get(0))
.get("formatted_address"));
if (formattedAddress != null) {
Log.w("Taxeeta", "Did not saved memory, returned late from json") ;
return formattedAddress;
}
}

Problems converting from an object to XML in java

What I'm trying to do is to convert an object to xml, then use a String to transfer it via Web Service so another platform (.Net in this case) can read the xml and then deparse it into the same object. I've been reading this article:
http://simple.sourceforge.net/download/stream/doc/tutorial/tutorial.php#start
And I've been able to do everything with no problems until here:
Serializer serializer = new Persister();
PacienteObj pac = new PacienteObj();
pac.idPaciente = "1";
pac.nomPaciente = "Sonia";
File result = new File("example.xml");
serializer.write(pac, result);
I know this will sound silly, but I can't find where Java creates the new File("example.xml"); so I can check the information.
And I wanna know if is there any way to convert that xml into a String instead of a File, because that's what I need exactly. I can't find that information at the article.
Thanks in advance.
And I wanna know if is there any way to convert that xml into a String instead of a File, because that's what I need exactly. I can't find that information at the article.
Check out the JavaDoc. There is a method that writes to a Writer, so you can hook it up to a StringWriter (which writes into a String):
StringWriter result = new StringWriter(expectedLength);
serializer.write(pac, result)
String s = result.toString();
You can use an instance of StringWriter:
Serializer serializer = new Persister();
PacienteObj pac = new PacienteObj();
pac.idPaciente = "1";
pac.nomPaciente = "Sonia";
StringWriter result = new StringWriter();
serializer.write(pac, result);
String xml = result.toString(); // xml now contains the serialized data
Log or print the below statement will tell you where the file is on the file system.
result.getAbsolutePath()

Categories

Resources