I am importing JSON data from a public database URI http://data.seattle.gov/api/views/3k2p-39jp/rows.json and the rows go as far as 445454. Using the following code I am constructing the JSON object of the entire data.
HttpGet get = new HttpGet(uri);
HttpClient client = new DefaultHttpClient();
HttpResponse response = client.execute(get);
BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), "UTF-8"));
StringBuilder builder=new StringBuilder();
for(String line=null;(line = reader.readLine()) != null;){
builder.append(line).append("\n");
}
JSONTokener jsonTokener=new JSONTokener(builder.toString());
JSONObject finalJson=new JSONObject(jsonTokener);
JSONArray data=finalJson.getJSONArray("data");
Because the data is too large, i am getting 03-21 03:41:49.714: E/AndroidRuntime(666): Caused by: java.lang.OutOfMemoryError pointing the source of error at buildr.append(line).append("\n"). Is there anyway I can handle large datasets without getting memory allocation issues?
That JSON is huge!
You definitely need to use a streaming JSON parser. There are two out there for Android: GSON and Jackson.
GSON Streaming is explained at: https://sites.google.com/site/gson/streaming
I like how GSON explains the problem you're having:
Most applications should use only the object model API. JSON streaming is useful in just a few situations:
When it is impossible or undesirable to load the entire object model into memory. This is most relevant on mobile platforms where memory is limited.
Jackson Streaming is documented at: http://wiki.fasterxml.com/JacksonInFiveMinutes#Streaming_API_Example
If possible only request parts of the data - this also reduces time for network io and thus saves battery.
Otherwise you could try to not keep the incoming data in memory, but to 'stream' it onto the sd-card. When it is stored there you can then iterate over it. Most likely this will mean to use your own JSON tokenizer that does not build a full tree, but which is able to (like a SAX parser) only look at a part of the object tree at a time.
You may have a look at Jackson, which has a streaming mode, which may be applicable.
Streaming pull parser is the way. I recommend GSON, as this has small memory footpring (just pull parsing is about 16K , jackson is way bigger)
Your code is problematic because you allocate:
buffer to hold all the string data coming from service
all the JSON DOM objects
and this is slow, and gives you memory meltdown.
In case you need java objects out of your JSON data , you may try my small databinding library building on GSON (shameles self advertising off):
https://github.com/ko5tik/jsonserializer
I did it a bit differently, My JSON code was waiting for status, which comes towards the end. So I modified the code to return earlier.
// try to get formattedAddress without reading the entire JSON
String formattedAddress;
while ((read = in.read(buff)) != -1) {
jsonResults.append(buff, 0, read);
formattedAddress = ((String) ((JSONObject) new JSONObject(
jsonResults.toString()).getJSONArray("results").get(0))
.get("formatted_address"));
if (formattedAddress != null) {
Log.i("Taxeeta", "Saved memory, returned early from json") ;
return formattedAddress;
}
}
JSONObject statusObj = new JSONObject(jsonResults.toString());
String status = (String) (statusObj.optString("status"));
if (status.toLowerCase().equals("ok")) {
formattedAddress = ((String) ((JSONObject) new JSONObject(
jsonResults.toString()).getJSONArray("results").get(0))
.get("formatted_address"));
if (formattedAddress != null) {
Log.w("Taxeeta", "Did not saved memory, returned late from json") ;
return formattedAddress;
}
}
Related
I am trying to use Jackson streaming API to deserialize huge objects from XML. The idea is to combine streaming API and ObjectMapper to parse XML(or JSON) by small chunks. However I see some inconsistent behavior with XML Parser.
With this code snippet:
try {
String xml1 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo></foo>";
String xml2 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo><bar></bar></foo>";
XmlFactory xmlFactory = new XmlFactory();
JsonParser jp = xmlFactory.createParser(new ByteArrayInputStream(xml1.getBytes()));
JsonToken token = jp.nextToken();
while (token != null) {
System.out.println("xml1 token=" + token);
token = jp.nextToken();
}
jp = xmlFactory.createParser(new ByteArrayInputStream(xml2.getBytes()));
token = jp.nextToken();
while (token != null) {
System.out.println("xml2 token=" + token);
token = jp.nextToken();
}
} catch (IOException e) {
e.printStackTrace();
}
I am getting:
xml1 token=START_OBJECT
xml1 token=END_OBJECT
xml2 token=START_OBJECT
xml2 token=FIELD_NAME
xml2 token=VALUE_NULL
xml2 token=END_OBJECT
Why is the FIELD_NAME token missing for xml1? Why is there just one START_OBJECT token for the second xml? Is there any setting that would allow me to see FIELD_NAME of outer tag?
Problem is quite simple: XML module is different from most other Jackson dataformat modules in that direct access via Streaming API is not supported.
This is mentioned on project README (along with mention that "tree model" is similarly not supported).
Not supported does not necessarily mean "can not be used at all", just that its behavior is different from handling for JSON so callers really need to know what they are doing above and beyond API used for JSON content (and Smile, CBOR, YAML -- even CSV content is represented in a way that is compatible with JSON access).
While you can try to use XmlFactory and streaming parser/generator, its behavior is controlled by XmlMapper based on metadata from Java classes, to make things works correctly via databinding API (that is, XmlMapper).
With that, the reason for observed tokens is that such translation is necessary to map to expected Java object structure:
public class Foo {
public Bar bar;
}
which would map to JSON like:
json
{
"bar" : null
}
as well as XML of
xml
<foo>
<bar></bar>
</foo>
Another way to put this is that XML and JSON data models are fundamentally different, and they can not be trivially translated. Since Jackson's token model is based on JSON, some work is needed to translated XML elements and attributes into structure that equivalent JSON would have.
Above is not to say that what you try to do is impossible. There are 2 ways you might be able to make things work:
Knowing translation that XmlParser does, call getToken() expecting translation
Instead of using XmlParser directly, construct XMLStreamReader (Stax low-level streaming parser), read "raw" tokens, and construct separate XmlParser (via XmlFactory) at expected location, use that for reading.
I hope this helps.
A kid with a hammer...
I don't know much about Jackson; in fact, I just started using it, thinking of using JSON or YAML instead of XML. But for XML, we have been using XStream with success.
//Consumer side
FileInputStream fis = new FileInputStream(filename);
XStream xs = new XStream();
Object obj = xs.fromXML(fis);
fis.close();
Also, if the case is that you are also originating the serialization and it is from Java, you could use Java serialization altogether for a lower footprint and faster operation.
//producer side
FileOutputStream fos = new FileOutputStream(filename);
ObjectOutputStream oos = new ObjectOutputStream(new BufferedOutputStream(fos));
oos.writeObject(yourVeryComplexObjectStructure); //I am writing a list of ten 1MB objects
oos.flush();
oos.close();
fos.close();
//Consumer side
final FileInputStream fin = new FileInputStream(filename);
final ObjectInputStream ois = new ObjectInputStream(new BufferedInputStream(fin));
#SuppressWarnings("unchecked")
final YourVeryComplexObjectStructureType object = (YourVeryComplexObjectStructureType) ois.readObject();
ois.close();
fin.close();
I have been trying to create a Json String with a large amount document but using the below code but i get out of range or have to wait till up to 5min b4 the String is greated any idiea how i could optimise the code?
public String getJson() throws NotesException {
...
View view1 = ...;
ViewNavigator nav =view1.createViewNav();
ViewEntry ve = nav.getFirst();
JSONObject jsonMain = new JSONObject();
JSONArray items = new JSONArray();
Document docRoot = null
while (ve != null) {
docRoot= ve.getDocument();
items.add(getJsonDocAndChildren(docRoot));
ViewEntry veTemp = nav.getNextSibling(ve);
ve.recycle();
ve = docTemp;
}
jsonMain.put("identifier", "name");
jsonMain.put("label", "name");
jsonMain.put("items", items);
return jsonMain.toJSONString();
}
private JSONObject getJsonDocAndChildren(Document doc) throws NotesException {
String name = doc.getItemValueString("Name");
JSONObject jsonDoc = new JSONObject();
jsonDoc.put("name", name);
jsonDoc.put("field", doc.getItemValueString("field"));
DocumentCollection responses = doc.getResponses();
JSONArray children = new JSONArray();
getDocEntry(name,children);//this add all doc that has the fieldwith the same value name to children
if (responses.getCount() > 0) {
Document docResponse = responses.getFirstDocument();
while (docResponse != null) {
children.add(getJsonDocAndChildren(docResponse));
Document docTemp = responses.getNextDocument(docResponse);
docResponse.recycle();
docResponse = docTemp;
}
}
jsonDoc.put("children", children);
return jsonDoc;
}
There are a few things here, ranging from general efficiency to optimizations based on how you want to use the code.
The big one that would likely speed up your processing would be to do view operations only, without cracking open the documents. Since it looks like you want to get responses indiscriminately, you could add the response documents to the original view, with the "Show responses in hierarchy" option turned on. Then, if you have columns for Name and field in the view (and no "Show responses only") columns, then a nav.getNext() walk down the view will get them in turn. By storing the entry.getIndentLevel() value for each previous entry and comparing it at the start of the loop, you could "step" up and down the JSON tree: when the indent level increases by one, create a new array and add it to the existing object; when it decreases, step up one. It may be a little conceptually awkward at first, having to track previous states in a flat loop, but it'd be much more efficient.
Another option, also having the benefit of not having to crack open each individual document, would be to have a view of the response documents categorized by #Text($REF) and then making your recursive method look more like:
public static void walkTree(final View treeView, final String documentId) {
ViewNavigator nav = treeView.createViewNavFromCategory(documentId);
nav.setBufferMaxEntries(400);
for (ViewEntry entry : nav) {
// Do code here
walkTree(treeView, entry.getUniversalID(), callback);
}
}
(That example is using the OpenNTF Domino API, but, if you're not using that, you could down-convert the for loop to the legacy style)
As a minor improvement any time you traverse through ViewNavigators, you can set view.setAutoUpdate(false) and then nav.setBufferMaxEntries(400) to improve the internal caching.
And finally, depending on your needs - say, if you're outputting the JSON directly to an HTTP response's output stream - you could use JsonWriter instead of JsonObject to stream the content out instead of building a huge object in memory. I wrote about it with some simple code here: https://frostillic.us/blog/posts/EF0B875453B3CFC285257D570072F78F
You should first determine where the time is spent in your code. Maybe it is in doc.getResponses() or responses.getNextDocument() which you did not show here.
The obvious optimization which could be done within your code snippet is the following:
Basically you have some data structure called Document and build up a corresponding in memory JSON structure consisting of JSONObjects and JSONArrays. This JSON structure is then serialized to a String and returned.
Instead of building the JSON structure you could directly use a JsonWriter (don't know what JSON library you are using but there must be something like a JsonWriter). This avoids the memory allocations for the temporary JSON structure.
In getJson() you start:
StringWriter stringOut = new StringWriter();
JsonWriter out = new JsonWriter(stringOut);
and end
return stringOut.toString();
Now everywhere where you creating JSONObjects or JSONArrays you invoke corresponding writer methods. e.g.
private void getJsonDocAndChildren(Document doc, JsonWriter out) throws NotesException {
out.name("name");
out.value(doc.getItemValueString("Name"));
out.name("field");
out.value(doc.getItemValueString("field"));
DocumentCollection responses = doc.getResponses();
if (responses.getCount() > 0) {
Document docResponse = responses.getFirstDocument();
out.startArray();
...
Hope you get the idea.
This is probably a quick answer to a very novice question. I am having trouble wrapping my head around how to get JSON text dbpedia extraction server running from a localhost. The server is running fine, I followed the official instructions.
I have read the other StackOverflow questions about parsing JSON in java and what I am having trouble understanding is how to parse the JSON when the schema or structure is unknown.
For example in my code I try to grab the JSON from localhost and put it into a java object. But all the examples of parsing JSON online use a predesigned java object and all the JSON keys are mapped to an object's fields. (ie Employee class: name,job,email,id,phone)
String sURL = "http://localhost:9999/server/extraction/en/extract?title=" + wikipage + "&revid=&format=rdf-json&extractors=custom"; //just a string
URL url = new URL(sURL);
Reader pageReader = new InputStreamReader(url.openConnection().getInputStream());
Gson g = new Gson();
JsonReader jr = new JsonReader(new InputStreamReader((InputStream) request.getContent()));
jr.setLenient(true);
JsonParser jp = new JsonParser();
JsonElement root = jp.parse(new InputStreamReader((InputStream) request.getContent())); //convert the input stream to a json element
JsonObject rootobj = root.getAsJsonObject(); //may be an array, may be an object.
I now have this "json object" for the film "Blue Velvet" I can parse/iterate with jr.hasNext() or rootobj.getAsJsonArray().
Am I going about this correctly?
I feel like I am reinventing the wheel. Is there a standard way of parsing DBpedia JSON objects in Java?
At least the Jackson JSON library allows you to parse incoming JSON into a Map. If the keys and values of the JSON can be of any type, then you need to use Map<Object, Object>, which is a bit cumbersome, but anyways this should work:
ObjectMapper mapper = new ObjectMapper();
Map<Object, Object> parsedJSON = mapper.readValue(incomingJSON,
mapper.getTypeFactory().constructMapType(
LinkedHashMap.class,
Object.class,
Object.class));
I'm working on an open-source, cross-platform pomodoro timer with statistics support.
For tasks, I have a tree data structure like this:
class Task {
String name;
int minutesWorkedOn;
int uniqueID;
Task parent;
...
ArrayList<Task> childTasks; //Note, not binary, but can have n-children.
}
(which is actually a bit bigger in practice)
I want to store this data structure in a file between sessions.
I was considering JSON or xml, and recurse for childTasks, or write all tasks out, one task per line and piece things back together by taskID's. But JSON/XML is not a hard-requirement, I'm just thinking out loud.
Some S.O answers mention serialization, but preferably I'd like to be able to see the stored data structure as is the case with JSON or XML. Also those two formats would make it easier to build reporting tools.
Considering I'm new to java and haven't worked with File/I/O before, can someone give me a tip/advise on which route to take here?
[edit]
The solution below works well. There is an issue with loops thou. I edited the code above, a task has a backwards link to it's parent. This causes gson to crash. I might ignore this field and fix it again after the data was loaded or maybe read some more about the tutorial.
The best and easy way is to use Gson to write/read the object to a file.
Write:
//Get the json serialization of the task object
GsonBuilder builder = new GsonBuilder();
//builder.setPrettyPrinting().serializeNulls(); //optional
Gson gson = builder.create();
String json = gson.toJson(task);
try {
//write json string to a file named "/tmp/task.json"
FileWriter writer = new FileWriter("/tmp/task.json");
writer.write(json);
writer.close();
} catch (IOException e) {e.printStackTrace();}
Read:
Gson gson = new Gson();
try {
BufferedReader br = new BufferedReader(new FileReader("/tmp/task.json"));
//convert the json string from file back to object
Task task = gson.fromJson(br, Task.class);
} catch (IOException e) {
e.printStackTrace();
}
Let's say I have a json that looks like this:
{"body":"abcdef","field":"fgh"}
Now suppose the value of the 'body' element is huge(~100 MB or more). I would like to stream out the value of the body element instead of storing it in a String.
How can I do this? Is there any Java library I could use for this?
This is the line of code that fails with an OutOfMemoryException when a large json value comes in:
String inputStreamString = (String) JsonPath.read(textValue.toString(), "$.body");
'textValue' here is a hadoop.io.Text object.
I'm assuming that the OutOfMemory error occurs because we try to do method calls like toString() (which creates a new object), and JsonPath.read(), all of which are done in-memory. I need to know if there is an approach I could take while handling large-sized textValue objects.
Please let me know if you need additional info.
JsonSurfer is good for processing very large JSON data with selective extraction.
Example how to surf in JSON data collecting matched values in the listeners:
BufferedReader reader = new BufferedReader(new FileReader(jsonFile));
JsonSurfer surfer = new JsonSurfer(GsonParser.INSTANCE, GsonProvider.INSTANCE);
SurfingConfiguration config = surfer.configBuilder().bind("$.store.book[*]", new JsonPathListener() {
#Override
public void onValue(Object value, ParsingContext context) throws Exception {
JsonObject book = (JsonObject) value;
}
}).build();
surfer.surf(reader, config);
Jackson offers a streaming API for generating and processing JSON data.