Read REST api response hosting very large data - java

I am calling an REST API endpoint which is hosting a very large amount of data. The data quantity is too much that my chrome tab crashes as well (It displays the data for a short time and it's loading even more data causing the tab to crash). Even postman fails to get the data and instead would only return 200 OK code without displaying any response body.
I'm trying to write a java program to consume the response from the API. Is there a way to consume the response without using a lot of memory?
Please let me know if the question is not clear. Thank you !!

A possibility is to use a JSON streaming parser like Jackson Streaming API (https://github.com/FasterXML/jackson-docs/wiki/JacksonStreamingApi) for example code see https://javarevisited.blogspot.com/2015/03/parsing-large-json-files-using-jackson.html
For JS there is https://github.com/DonutEspresso/big-json

If data is really so large then better to split task:
Download full data via ordinary http client to disk
Make bulk processing , using some streaming approach, similar to SAX parsing for XML:
JAVA - Best approach to parse huge (extra large) JSON file
With such split, you will not deal with possible network errors during processing and will keep data consistency.

Related

parsing giant json response

First the background info:
I'm using commons httpclient to make a get request to the server.
Server response a json string.
I parse the string using org.json.
The problem:
Actually everything works, that is for small responses (smaller then 2^31 bytes = max value of an integer which limits the getResponseBody and the stringbuilder). I on the other hand have a giant response (over several GB) and I'm getting stuck. I tried using the "getResponseBodyAsStream" of httpclient but the response is so big that my system is getting stuck. I tried using a String, a Stringbuilder, even saving it to a file.
The question:
First, is this the right approach, if so, what is the best way to handle such a response? If not, how should I proceed?
If you ever shall have a response which can be factor of GB you shall parse the json as stream character by character (almost) and avoid creating any String objects... (its very important because java stopTheWorld garbage collection will cause your system freese for seconds if you constantly create lot of garbage)
You can use SAXophone to create parsing logic.
You'll have to implement all the methods like onObjectStart, onObjectClose, onObjectKey etc... its hard at first but once you take a look implementation of PrettyPrinter in test packages you'll have the idea...
Once properly implemented you can handle an infinite stream of data ;)
P.S. this is designed for HFT so its all about performance and no garbage...

Save large JSON to text straight from REST API response stream in Java

What is the best way to save a large JSON from a REST API call to a text file? I currently use Postman in Chrome to test the API and sometimes it makes Chrome un-responsive due large JSON output. I would like to write some Java codes to save the large JSON to a text file. Is there a way to stream the REST output straight to text input stream so that it wouldn't take too much memory at run time?
Suggest that you use cURL.
An example usage is found here : Capture Curl Output to a File
It should be more efficient than buffering the entire output.

Rest calls-Large amount of data between calls

We are using Rest using Jersey. There are few scenarios where server(WAS 8.5) sends large amount of data to client, which is RCP application. In some cases data is so huge(150MB) in xml format that client gets an OutOfMemoryError exception.
I have below questions
How much size is increased when java object is converted in xml?
How we can send large java object to client and still use rest calls?
1) Tough question to answer without seeing the XML schema, I've seen well designed schemas that result in tight, lean XML, and others that are a mess and very bloated. To test it write some test code that serializes your Java objects to a byte[] and compare it's size to the XML payload you currently produce.
2) Might be worth looking into a chunking process, 150MB is pretty large for a single payload. Also are you using GZIP compression for this already? Also may be worth looking at Fast Infoset. Basically it's a binary encoding for XML that generally helps reduce the size of an XML Document.

How to write more than 30 MB of data in xml?

First of all sorry if I'm repeating this question but I don't find any relevant solutions for my problem.
I'm facing difficulty in finding the way to solve the below issues.
1) I'm facing a scenario where I have to write more than 30 MB - 400 MB of data in a xml. When I'm using 'String' object to append the data to xml I'm getting 'OutOfMemory' exception.
After spending more time in doing R&D, I came to know that using 'Stream' will resolve this issue. But I'm not sure about this.
2) Once I constructed the xml, I have to send this data to the DMZ server using Android devices. As I know sending large amount of data using Http is difficult in this situation. In this case,
a) Using FTP will be helpful in this scenario?
b) Splitting the data into chunks of data and sending will be helpful?
Kindly let me know your suggestions. Thanks in advance.
i would consider zipping up the data before ftping it across.You could use a ZipOutputStream .
For the Out of Memory Exception, you could consider increasing the Heap Size.
Check this : Increase heap size in Java
Can you post some values of heap size you tried, your code and some exception traces?
Use StAX or SAX. These can create XML of any size because they write XML parts they generate to OutputStream on the fly.
What you should do is
First, use a XML parser to read and write data in XML format. it could be SAX or DOM. If data size is huge try CSV format it will take less space as you do not have to store XML tag.
Second, When creating output file make sure those are small small files.
third when sending over network, make sure you zipped everything.
And for god sake, don't eat up user mobile data cap for this design. Warn user about this file size and suggest him to use WiFi network.

Where to store the data for a GWT app - GAE data store or XML file

I have a GWT app which is deployed on GAE. One part of my application relies on static data, which is currently stored in an XML file. The application reads this data into a collection of POJOs. The data is sent over to the client using GWT-RPC. Based on the selections made by the user, it applies filters to the collection to get specific objects (the filtering is done on the client side).
The data may contain up to 3000 records, and the total XML file size would be around 1MB. There'll be no updates on this data from the application side (it's read-only), but I may be frequently adding new records or updating/fixing existing records during the initial few months (as the application evolves). The data has no relationship with any other data in the application.
One of my main consideration in fetch performance. I tried using Apache Digester to parse the XML, but noticed that even parsing 600 records and sending them to the client was a bit slow.
Given these requirements which of the following would be better and why - 1. Keep the data in the XML file, or 2. Store the data in the app engine data store?
Thanks.
One alternative would be to load the XML file directly by GWT (without GWT-RPC and server parsing):
Use RequestBUilder to get the data from the server. Example usage: http://www.gwtapps.com/doc/html/com.google.gwt.http.client.html
Then use XMLParser to parse the reply. Example: http://code.google.com/webtoolkit/doc/latest/DevGuideCodingBasicsXML.html#parsing
Here's an example combining both: http://www.roseindia.net/tutorials/gwt/retrieving-xml-data.shtml
The only downside is that you have to manually parse XML via DOM, where GWT-RPC directly produces objects.
Update:
based on the comments I'd recommend this:
Parse XML file with JAXB to create your objects: http://jaxb.java.net/guide/_XmlRootElement_and_unmarshalling.html
Save those to memcache: http://code.google.com/appengine/docs/java/memcache/overview.html
On RPC request, check memcache, if data not there goto 1.
The way I look at things, there are two bottle necks, though interrelated. One is loading the data (reading from XML, parsing). Another is sending it to the client.
Instead of sending the whole bunch, you might consider batch sending, like pagination.
Usually I prefer storing such files under WEB-INF.

Categories

Resources