parsing giant json response - java

First the background info:
I'm using commons httpclient to make a get request to the server.
Server response a json string.
I parse the string using org.json.
The problem:
Actually everything works, that is for small responses (smaller then 2^31 bytes = max value of an integer which limits the getResponseBody and the stringbuilder). I on the other hand have a giant response (over several GB) and I'm getting stuck. I tried using the "getResponseBodyAsStream" of httpclient but the response is so big that my system is getting stuck. I tried using a String, a Stringbuilder, even saving it to a file.
The question:
First, is this the right approach, if so, what is the best way to handle such a response? If not, how should I proceed?

If you ever shall have a response which can be factor of GB you shall parse the json as stream character by character (almost) and avoid creating any String objects... (its very important because java stopTheWorld garbage collection will cause your system freese for seconds if you constantly create lot of garbage)
You can use SAXophone to create parsing logic.
You'll have to implement all the methods like onObjectStart, onObjectClose, onObjectKey etc... its hard at first but once you take a look implementation of PrettyPrinter in test packages you'll have the idea...
Once properly implemented you can handle an infinite stream of data ;)
P.S. this is designed for HFT so its all about performance and no garbage...

Related

Read REST api response hosting very large data

I am calling an REST API endpoint which is hosting a very large amount of data. The data quantity is too much that my chrome tab crashes as well (It displays the data for a short time and it's loading even more data causing the tab to crash). Even postman fails to get the data and instead would only return 200 OK code without displaying any response body.
I'm trying to write a java program to consume the response from the API. Is there a way to consume the response without using a lot of memory?
Please let me know if the question is not clear. Thank you !!
A possibility is to use a JSON streaming parser like Jackson Streaming API (https://github.com/FasterXML/jackson-docs/wiki/JacksonStreamingApi) for example code see https://javarevisited.blogspot.com/2015/03/parsing-large-json-files-using-jackson.html
For JS there is https://github.com/DonutEspresso/big-json
If data is really so large then better to split task:
Download full data via ordinary http client to disk
Make bulk processing , using some streaming approach, similar to SAX parsing for XML:
JAVA - Best approach to parse huge (extra large) JSON file
With such split, you will not deal with possible network errors during processing and will keep data consistency.

How could a Jersey client handle a huge payload without causing a memory issue?

I have to write a Jersey client which should handle a huge payload (>1GB) but the problem is if I use the java object model then I am getting a memory error. I am considering using Jackson streaming API but I am confused that it will still get buffered in memory and occupy more than 1 GB space. Can someone explain how streaming works on the client side?
The Jackson Streaming API is identical for both the server and client side. It can be very efficient but it is substantially more work than the Databind API since you have to code a bunch of that work yourself. (see Jackson Performance)
Functionally, you want to leave the input in the stream and parse (and process) it piece by piece. In cases where you know the structure or it happens to be an array you can theoretically process each object in the array one-by-one to avoid having to read the entire array before processing.
JsonFactory factory = ObjectMapper.getJsonFactory();
try(JsonParser parser = factory.createJsonParser(inputStream)) {
while(parser.nextToken() != JsonToken.END_OBJECT) {
// process tokens, etc. here
}
}

Jackson stream parser position

I am building a tool to parse huge JSON around 1GB. In that logic, I am creating JsonParser object keep reading till it reaches expected JsonToken. Now I create another JsonParser(called child), which should be starting from previous JsonParser token position without much overhead. Is there a way to do that in JasonParser API for that? I am using skipChildren(), which is also taking time in my scenario.
You can try to call releaseBuffered(...) to get the data that are read but not consumed by the parser, and then prepend these data to the input stream (getInputSource()) to somehow parse the resulting stream (one way to do this might be to use an input stream that supports marks when constructing the parser).
However, since you're already using a stream based API, you probably won't get better performance than with skipChildren().

Java- Getting Json by parts

Generally is there a way to get a big JSON string by a single request by parts?
For example, if I have a JSON string consisting of three big objects and having each size of 1mb, can I somehow in a single request get the first 1mb then parse it while other 3 objects are still being downloaded, instead of waiting for the full 3mb string to download?
If you know how big the parts are, it would be possible to split your request in three using HTTP1.1's range requests. Assuming your ranges are defined correctly, you should be able to get the JSON objects directly from the server (if the server supports range requests).
Note that this hinges on a) the server's capability to handle range requests, b) the idempotency of your REST operation (it could very well run the call three times, a cache or reverse proxy may help with this) and c) your ability to know the ranges before you call.

Rest calls-Large amount of data between calls

We are using Rest using Jersey. There are few scenarios where server(WAS 8.5) sends large amount of data to client, which is RCP application. In some cases data is so huge(150MB) in xml format that client gets an OutOfMemoryError exception.
I have below questions
How much size is increased when java object is converted in xml?
How we can send large java object to client and still use rest calls?
1) Tough question to answer without seeing the XML schema, I've seen well designed schemas that result in tight, lean XML, and others that are a mess and very bloated. To test it write some test code that serializes your Java objects to a byte[] and compare it's size to the XML payload you currently produce.
2) Might be worth looking into a chunking process, 150MB is pretty large for a single payload. Also are you using GZIP compression for this already? Also may be worth looking at Fast Infoset. Basically it's a binary encoding for XML that generally helps reduce the size of an XML Document.

Categories

Resources