JsonPath seems to be pretty slow for large JSON files.
In my project, I'd like a user to be able to pass an entire query as a string. I used JsonPath because it lets you do an entire query like $.store.book[3].price all at once by doing JsonPath.read(fileOrString, "$.store.book[3].price", new Filter[0]). Is there a faster method to interact with JSON files in Javascript? It would be ideal to be able to pass the entire query as a string, but I'll write a parser if I have to. Any ideas?
Even small optimizations would be helpful. For instance, I'm currently reading from a JSON file every time I query. Would it be better just to copy the entire file into a string at the beginning and query to the string instead?
EDIT: To those of you saying "this is Javascript, not Java", well, it actually is Java. JsonPath is a Javascript-like query language, but the file I am writing is most assuredly Java. Only the query is written in Javascript. Here's some info about JsonPath, and a snippet of code: https://code.google.com/p/json-path/
List toRet;
String query = "$.store.book[3].price";
try {
// if output is a list, good
toRet = (List) JsonPath.read(filestring_, query, new Filter[0]);
} catch (ClassCastException cce) {
// if output isn't a list, put it in a list
Object outObj = null;
try {
outObj = JsonPath.read(filestring_, query, new Filter[0]);
} catch (Exception e) {
throw new DataSourceException("Invalid file!\n", e, DataSourceException.UNKNOWN);
}
Related
I have a BuilderString that contain the same result as in this link:
https://hadoop.apache.org/docs/current/hadoop-project-dist/
I'm looking to extract the values of the ``. And return a list of String that contain all the files name.
My code is:
try {
HttpURLConnection conHttp = (HttpURLConnection) url.openConnection();
conHttp.setRequestMethod("GET");
conHttp.setDoInput(true);
InputStream in = conHttp.getInputStream();
int ch;
StringBuilder sb = new StringBuilder();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
How can I parse JSON to take all the values of pathSuffix and return a list of string that contains the file names ?
Could you please give me a suggestion ? Thanks
That is JSON formatted data; JSON is not regular, tehrefore, trying to parse this with a regular expression is impossible, and trying to parse it out with substring and friends will take you a week and will be very error prone.
Read up on what JSON is (no worries; it's very simple to understand!), then get a good JSON library (the standard json.org library absolutely sucks, don't get that one), such as Jackson or GSON, and the code to extract what you need will be robust and easy to write and test.
The good option
Do the following steps:
Convert to JSON
Get the value using: JSONObject.get("FileStatuses").getAsJson().get("FileStatus").getAsJsonArray()
Iterate over all objects in the array to get the value you want
The bad option
Although as mentioned it is not recommended- If you want to stay with Strings you can use:
String str_to_find= "pathSuffix" : \"";
while (str.indexOf(str_to_find) != -1){
str = str.substring(str.indexOf(str_to_find)+str_to_find.length);
value = str.substring(0,str.indexOf("\""));
System.out.println("Value is " + value);
}
I would not recommend to build from scratch an API binding for hadoop.
This binding exist already for the Java language:
https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html#listLocatedStatus-org.apache.hadoop.fs.Path-org.apache.hadoop.fs.PathFilter-
I have this custom parser, made in Java, where I want to export a 3,6 GB Json into an Sql Oracle database. The import works fine with a sample Json of 8MB. But when I try parsing the whole 3,6 GB JSON some memory problems appear, namely java.lang.OutOfMemoryError
I have used -Xmx5000m to allocate 5 GB of memory for this. My laptop has plenty of RAM.
As you can see I have memory left. Does this error happen because of the CPU?
UPDATE:
The Json represents the data from Free Code Camp: https://medium.freecodecamp.com/free-code-camp-christmas-special-giving-the-gift-of-data-6ecbf0313d62#.7mjj6abbg
The Data looks like this:
[
{
“name”: “Waypoint: Say Hello to HTML Elements”,
“completedDate”: 1445854025698,
“solution”: “Hello World\n”
}
]
As I've said, I have tried this parsing with an 8MB sample Json with the same data and it worked. So is the code really the problem here?
Here is some code
enter code here
public class MainParser {
public static void main(String[] args) {
//Date time;
try {
BufferedReader br = new BufferedReader(
new FileReader("output.json")); //destination to json here
Gson gson = new Gson();
Type collectionType = new TypeToken<List<List<Tasks>>>() {
}.getType();
List<List<Tasks>> details = gson.fromJson(br, collectionType);
DBConnect connection = new DBConnect("STUDENT","student");
connection.connect();
for (int person=0;person<details.size();person++)
{
for (int task = 0; task < details.get(person).size(); task++)
{
connection.insert_query(person + 1,
task + 1,
details.get(person).get(task).getName(),
(details.get(person).get(task).getCompletedDate()/1000),
details.get(person).get(task).getSolution());
}
}
} catch (IOException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
Here is the insert_query method:
enter code here
public void insert_query(int person_id, int task_id, String taskName, double date, String solution) throws SQLException {
Statement stmt = conn.createStatement();
try {
String query = "INSERT INTO FreeCodeCamp VALUES(?,?,?,?,?)";
PreparedStatement ps = conn.prepareStatement(query);
ps.setInt(1,person_id);
ps.setInt(2,task_id);
ps.setString(3,taskName);
ps.setDate(4,null);
ps.setString(5,solution);
/*stmt.executeUpdate("INSERT INTO FreeCodeCamp VALUES("
+ person_id + ","
+ task_id + ","
+ "'" + taskName + "',"
+ "TO_TIMESTAMP(unix_ts_to_date(" + date + "),'YYYY-MM-DD HH24:MI:SS'),"
+ "'" + solution + "')");
stmt.close();*/
ps.execute();
ps.close();
} catch (SQLException e) {
e.printStackTrace();
}
Parsing JSON (or anything, for that matter) will not take the same memory of the original file size.
Each block of JSON string that represent an object, will become an object, ADDING memory to the already loaded JSON. If you parse it using a some kind of stream, you will still add memory but to much less (you won't hold the entire 3.6GB file in memory).
Still, an object takes more memory to represent than the string. If you have an array, which might be parsed to a list, than there is overhead to that list. Multiply that overhead by the instances you have in the JSON (quite a lot, in a 3.6 GB file) and you end up with something taking much more than just 3.6GB in memory.
But if you want to parse it as a stream, and handle each record as it goes, then discard it, you can do that. In both cases for using a stream you'll need a component that parses the JSON and let you handle each parsed object. If you know the structure it just might be easier to write one yourself.
Hope it helps.
You need to use an event-based / streaming JSON parser. The idea is that instead of parsing the entire JSON file in one go and holding it in memory, the parser emits "events" at the start and end of each significant syntactic unit. Then you write your code to handle these events, extra and assemble the information and (in your case) insert the corresponding records into your database.
Here are some places to start reading about Oracle's streaming JSON APIs:
http://docs.oracle.com/javaee/7/api/javax/json/stream/JsonParser.html
http://www.oracle.com/technetwork/articles/java/json-1973242.html
and here is a link to the documentation for the GSON equivalent:
https://sites.google.com/site/gson/streaming
See Gson's Streaming doc
This is used when the whole model cannot be loaded into memory
I have been trying to create a Json String with a large amount document but using the below code but i get out of range or have to wait till up to 5min b4 the String is greated any idiea how i could optimise the code?
public String getJson() throws NotesException {
...
View view1 = ...;
ViewNavigator nav =view1.createViewNav();
ViewEntry ve = nav.getFirst();
JSONObject jsonMain = new JSONObject();
JSONArray items = new JSONArray();
Document docRoot = null
while (ve != null) {
docRoot= ve.getDocument();
items.add(getJsonDocAndChildren(docRoot));
ViewEntry veTemp = nav.getNextSibling(ve);
ve.recycle();
ve = docTemp;
}
jsonMain.put("identifier", "name");
jsonMain.put("label", "name");
jsonMain.put("items", items);
return jsonMain.toJSONString();
}
private JSONObject getJsonDocAndChildren(Document doc) throws NotesException {
String name = doc.getItemValueString("Name");
JSONObject jsonDoc = new JSONObject();
jsonDoc.put("name", name);
jsonDoc.put("field", doc.getItemValueString("field"));
DocumentCollection responses = doc.getResponses();
JSONArray children = new JSONArray();
getDocEntry(name,children);//this add all doc that has the fieldwith the same value name to children
if (responses.getCount() > 0) {
Document docResponse = responses.getFirstDocument();
while (docResponse != null) {
children.add(getJsonDocAndChildren(docResponse));
Document docTemp = responses.getNextDocument(docResponse);
docResponse.recycle();
docResponse = docTemp;
}
}
jsonDoc.put("children", children);
return jsonDoc;
}
There are a few things here, ranging from general efficiency to optimizations based on how you want to use the code.
The big one that would likely speed up your processing would be to do view operations only, without cracking open the documents. Since it looks like you want to get responses indiscriminately, you could add the response documents to the original view, with the "Show responses in hierarchy" option turned on. Then, if you have columns for Name and field in the view (and no "Show responses only") columns, then a nav.getNext() walk down the view will get them in turn. By storing the entry.getIndentLevel() value for each previous entry and comparing it at the start of the loop, you could "step" up and down the JSON tree: when the indent level increases by one, create a new array and add it to the existing object; when it decreases, step up one. It may be a little conceptually awkward at first, having to track previous states in a flat loop, but it'd be much more efficient.
Another option, also having the benefit of not having to crack open each individual document, would be to have a view of the response documents categorized by #Text($REF) and then making your recursive method look more like:
public static void walkTree(final View treeView, final String documentId) {
ViewNavigator nav = treeView.createViewNavFromCategory(documentId);
nav.setBufferMaxEntries(400);
for (ViewEntry entry : nav) {
// Do code here
walkTree(treeView, entry.getUniversalID(), callback);
}
}
(That example is using the OpenNTF Domino API, but, if you're not using that, you could down-convert the for loop to the legacy style)
As a minor improvement any time you traverse through ViewNavigators, you can set view.setAutoUpdate(false) and then nav.setBufferMaxEntries(400) to improve the internal caching.
And finally, depending on your needs - say, if you're outputting the JSON directly to an HTTP response's output stream - you could use JsonWriter instead of JsonObject to stream the content out instead of building a huge object in memory. I wrote about it with some simple code here: https://frostillic.us/blog/posts/EF0B875453B3CFC285257D570072F78F
You should first determine where the time is spent in your code. Maybe it is in doc.getResponses() or responses.getNextDocument() which you did not show here.
The obvious optimization which could be done within your code snippet is the following:
Basically you have some data structure called Document and build up a corresponding in memory JSON structure consisting of JSONObjects and JSONArrays. This JSON structure is then serialized to a String and returned.
Instead of building the JSON structure you could directly use a JsonWriter (don't know what JSON library you are using but there must be something like a JsonWriter). This avoids the memory allocations for the temporary JSON structure.
In getJson() you start:
StringWriter stringOut = new StringWriter();
JsonWriter out = new JsonWriter(stringOut);
and end
return stringOut.toString();
Now everywhere where you creating JSONObjects or JSONArrays you invoke corresponding writer methods. e.g.
private void getJsonDocAndChildren(Document doc, JsonWriter out) throws NotesException {
out.name("name");
out.value(doc.getItemValueString("Name"));
out.name("field");
out.value(doc.getItemValueString("field"));
DocumentCollection responses = doc.getResponses();
if (responses.getCount() > 0) {
Document docResponse = responses.getFirstDocument();
out.startArray();
...
Hope you get the idea.
I'm working on an open-source, cross-platform pomodoro timer with statistics support.
For tasks, I have a tree data structure like this:
class Task {
String name;
int minutesWorkedOn;
int uniqueID;
Task parent;
...
ArrayList<Task> childTasks; //Note, not binary, but can have n-children.
}
(which is actually a bit bigger in practice)
I want to store this data structure in a file between sessions.
I was considering JSON or xml, and recurse for childTasks, or write all tasks out, one task per line and piece things back together by taskID's. But JSON/XML is not a hard-requirement, I'm just thinking out loud.
Some S.O answers mention serialization, but preferably I'd like to be able to see the stored data structure as is the case with JSON or XML. Also those two formats would make it easier to build reporting tools.
Considering I'm new to java and haven't worked with File/I/O before, can someone give me a tip/advise on which route to take here?
[edit]
The solution below works well. There is an issue with loops thou. I edited the code above, a task has a backwards link to it's parent. This causes gson to crash. I might ignore this field and fix it again after the data was loaded or maybe read some more about the tutorial.
The best and easy way is to use Gson to write/read the object to a file.
Write:
//Get the json serialization of the task object
GsonBuilder builder = new GsonBuilder();
//builder.setPrettyPrinting().serializeNulls(); //optional
Gson gson = builder.create();
String json = gson.toJson(task);
try {
//write json string to a file named "/tmp/task.json"
FileWriter writer = new FileWriter("/tmp/task.json");
writer.write(json);
writer.close();
} catch (IOException e) {e.printStackTrace();}
Read:
Gson gson = new Gson();
try {
BufferedReader br = new BufferedReader(new FileReader("/tmp/task.json"));
//convert the json string from file back to object
Task task = gson.fromJson(br, Task.class);
} catch (IOException e) {
e.printStackTrace();
}
Let's say I have a json that looks like this:
{"body":"abcdef","field":"fgh"}
Now suppose the value of the 'body' element is huge(~100 MB or more). I would like to stream out the value of the body element instead of storing it in a String.
How can I do this? Is there any Java library I could use for this?
This is the line of code that fails with an OutOfMemoryException when a large json value comes in:
String inputStreamString = (String) JsonPath.read(textValue.toString(), "$.body");
'textValue' here is a hadoop.io.Text object.
I'm assuming that the OutOfMemory error occurs because we try to do method calls like toString() (which creates a new object), and JsonPath.read(), all of which are done in-memory. I need to know if there is an approach I could take while handling large-sized textValue objects.
Please let me know if you need additional info.
JsonSurfer is good for processing very large JSON data with selective extraction.
Example how to surf in JSON data collecting matched values in the listeners:
BufferedReader reader = new BufferedReader(new FileReader(jsonFile));
JsonSurfer surfer = new JsonSurfer(GsonParser.INSTANCE, GsonProvider.INSTANCE);
SurfingConfiguration config = surfer.configBuilder().bind("$.store.book[*]", new JsonPathListener() {
#Override
public void onValue(Object value, ParsingContext context) throws Exception {
JsonObject book = (JsonObject) value;
}
}).build();
surfer.surf(reader, config);
Jackson offers a streaming API for generating and processing JSON data.