The files I have contain larges JSON arrays where under some sub-objects I would like to update some values. Since the files are large, I was looking into a Streaming API to keep the memory footprint low.
What I'd like to achieve is streaming data in, parsing some specific sub-object, update its values, stringify that sub-object again and efficiently save the updated values back on disk.
I'm not sure how to do that using libraries like Jackson or GSON, because their Streaming API offer a way to do efficient reading (JsonParser or JsonReader), a way to do efficient writing (JsonGenerator or JsonWriter) but not something that let me do both at the same time:
try (JsonParser jsonParser = mapper.getFactory().createParser(new File("bigFile.json"))) {
// parsing logic...
while (jsonParser.nextToken() != JsonToken.END_ARRAY) {
MyClass obj = mapper.readValue(jsonParser, MyClass.class);
obj.field1 = "Some new value";
obj.field2 += 1;
// how to efficiently write `obj` back on disk?
}
}
Related
Does anybody know the way to deserialize Avro without using any Pojo and Schemas?
The problem:
I have a data stream of different Avro files.
The goal is to group that data depending on the presence of some attributes (e.g. user.role, another.really.deep.attribute.with.specific.value and so on).
Each avro entry might contain any number of matching attributes - from zero to all listed).
So, there is no need to do anything with data. Just to peek at some elements.
The question is, is there any way to convert that data to Map or Node? Like I can do it with JSON using Jackson or GSON.
I've tried to use GenericDatumReader, but it requires a Schema. So maybe all I need is to read the schema from avro (how?).
Also, I've tried to use something like this, but this approach doesn't work.
public Map deserialize(byte[] data) {
DatumReader<LinkedHashMap> reader
= new SpecificDatumReader<>(LinkedHashMap.class);
Decoder decoder = null;
try {
decoder = DecoderFactory.get().binaryDecoder(data, null);
return reader.read(null, decoder);
} catch (IOException e) {
logger.error("Deserialization error:" + e.getMessage());
}
}
Since I have time to 'play' with the problem, I have created a utility class that generates schemas depending on keys. It works, but looks like a big overhead.
A reader schema is required to deserialize any message.
If you have the writer schema available, you can simply use that. Note that if you have Avro files, these include the schema they were written with and you can use avro-tools.jar -getschema to extract it
Without these options, then you'll need to figure out the schema on your own (maybe using a hexdump and knowing how Avro data gets encoded)
To start off, I'm pretty new in programming. I have to create an Android Weather App for a school project and I'm stuck with this big ass JSON:
JSON Data
Out of this, how would I read the temperature out of every 3 hour interval(example: 9.00-12.00 temperature: 5°C, 12.00-15.00 temperature: 7°C etc.).
So I have an Activity that displays the temperature of the entire day by three hour intervals. Since I have no experience with JSON I have no idea what the certain indexes mean, when does it increment(there are like 8 main: thingies).
DISCLAIMER: I have to use JSON, no GSON or other shortcuts, I have to parse and read certain data from this JSON. I get this JSON from open weather map API so it changes every day.
API
Use volly library for it. You can easily fetch data from json. No async task is needed, if you are using volly library.
First validate the json by going to http://jsonlint.com/. This will help you see the formatted Json string.
Next read up on Json array and Object .
Use AsynTask to get the Json into forecastJsonStr string.
Then you need to convert this forecastJsonStr into JSon object forecastJsonObj
To get weather data in "list" do something similar to
JSONArray weatherArray = forecastJson.getJSONArray("list");
Hope this helps
JSONObject receivedData = new JSONObject("The string that you get as response from the API");
JSONArray weatherList = receivedData.getJSONArray("list");
for(int i=0;i<weatherList.length();i++){
JSONObject data = weatherList.getJSONObjectAt(i);
String date_text - data.getString("dt_txt");
JSONArray weatherData = weatherList.getJSONArray("main");
for(int j=0;j<weatherData.length();j++){
// Here is where you will get all the weather stuff that you need
int temp = weatherData.getInt("temp");
// Similarly other values like temp_min, temp_max
}
}
So basically you need to parse the entire thing. In order to understand the whole structure more clearly use something like http://jsonviewer.stack.hu/ in order to view the JSON in a more clear way so that you know what you need from the JSON data better. Simple copy paste your data into there and click "Format".
JSON is just a name-value pair kind of storage if you see stored like "name":"value". Integer values don't have the "".
Remember all the JSON is stored in { } and a JSON can be nested in a JSON. So in your example if you see, the entire thing is a JSON. Within that you have a "city" key which has a value within { }. So "city" is a JSONObject.
Similarly "coord" is a JSONObject while "cod" is a String and "cnt" is an integer.
There can also be some instances where a name points to an array of JSON objects like "list" over here. JSON Arrays are signified using a [ ]. Enclosed within are JSON objects separated by comma.
Above is a very simple sample to get you started so that you get a jist of what is going on. So play around and try to extract more data from in there.
All the best and Happy Coding :)
From my Android application I need to use a RESTful web service that returns me a list of objects in json format.
This list can be very long (about 1000/2000 object.).
What I need to do is to search and retrive just some of the objects inside the json file.
Due to the limited memory of mobile device, I was thinking that using object-binding (using for example GSON library) can be dangerous.
Which are the alternatives for solving this problem?
If you are using gson, use gson streaming.
I've added the sample from the link and added my comment inside of it:
public List<Message> readJsonStream(InputStream in) throws IOException {
JsonReader reader = new JsonReader(new InputStreamReader(in, "UTF-8"));
List<Message> messages = new ArrayList<Message>();
reader.beginArray();
while (reader.hasNext()) {
Message message = gson.fromJson(reader, Message.class);
// TODO : write an if statement
if(someCase) {
messages.add(message);
// if you want to use less memory, don't add the objects into an array.
// write them to the disk (i.e. use sql lite, shared preferences or a file...) and
// and retrieve them when you need.
}
}
reader.endArray();
reader.close();
return messages;
}
For example
1) Read the list as a stream and handle the single JSON entities on the fly and save only those that are of interest to you
2) Read the data into String object/objects and then find the JSON entities and handle them one by one instead of everything at the same time. Ways to analyse the String for JSON structures include regular expressions or manual indexOf combined with substring -type analysis.
1) is more efficient but requires a bit more work as you have to handle the stream at the same time where as 2) is probably more simple but it requires you to use quite a big Strings as temporary means.
I am new to Json and libGDX but I have created a simple game and I want to store player names and their scores in a Json file. Is there a way to do this? I want to create a Json file in Gdx.files.localStorage if it doesnt exist and if it does, append new data to it.
I have checked code given at :
1>Using Json.Serializable to parse Json files
2>Parsing Json in libGDX
But I failed to locate how to actually create a Json file and write multiple unique object values (name and score of each player) to it. Did I miss something from their codes?
This link mentions how to load an existing json but nothing else.
First of all i have to say that i never used the Libgdx Json API myself. But i try to help you out a bit.
I think this Tutorial on github should help you out a bit.
Basicly the Json API allows you to write a whole object to a Json object and then parse that to a String. To do that use:
PlayerScore score = new PlayerScore("Player1", 1537443); // The Highscore of the Player1
Json json = new Json();
String score = json.toJson(score);
This should then be something like:
{name: Player1, score: 1537443}
Instead of toJson() you can use prettyPrint(), which includes linebreaks and tabs.
To write this to a File use:
FileHandle file = Gdx.files.local("scores.json");
file.writeString(score, true); // True means append, false means overwrite.
You can also customize your Json by implementing Json.Serializable or by adding the values by hand, using writeValue.
Reading is similar:
FileHandle file = Gdx.files.local("scores.json");
String scores = file.readString();
Json json = new Json();
PlayerScore score = json.fromJson(PlayerScore.class, scores);
If you have been using a customized version by implementing Json.Serializable you have implemented the read (Json json, JsonValue jsonMap) method. If you implemented it correctly you the deserialization should work. If you have been adding the values by hand you need to create a JsonValuejsonFile = new JsonValue(scores). scores is the String of the File. Now you can cycle throught the childs of this JsonValue or get its childs by name.
One last thing: For highscores or things like that maybe the Libgdx Preferences are the better choice. Here you can read how to use them.
Hope i could help.
I have more than 10 million JSON documents of the form :
["key": "val2", "key1" : "val", "{\"key\":\"val", \"key2\":\"val2"}"]
in one file.
Importing using JAVA Driver API took around 3 hours, while using the following function (importing one BSON at a time):
public static void importJSONFileToDBUsingJavaDriver(String pathToFile, DB db, String collectionName) {
// open file
FileInputStream fstream = null;
try {
fstream = new FileInputStream(pathToFile);
} catch (FileNotFoundException e) {
e.printStackTrace();
System.out.println("file not exist, exiting");
return;
}
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
// read it line by line
String strLine;
DBCollection newColl = db.getCollection(collectionName);
try {
while ((strLine = br.readLine()) != null) {
// convert line by line to BSON
DBObject bson = (DBObject) JSON.parse(JSONstr);
// insert BSONs to database
try {
newColl.insert(bson);
}
catch (MongoException e) {
// duplicate key
e.printStackTrace();
}
}
br.close();
} catch (IOException e) {
e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates.
}
}
Is there a faster way? Maybe, MongoDB settings may influence the insertion speed? (for, example adding key : "_id" which will function as index, so that MongoDB would not have to create artificial key and thus index for each document) or disable index creation at all at insertion.
Thanks.
I'm sorry but you're all picking minor performance issues instead of the core one. Separating the logic from reading the file and inserting is a small gain. Loading the file in binary mode (via MMAP) is a small gain. Using mongo's bulk inserts is a big gain, but still no dice.
The whole performance bottleneck is the BSON bson = JSON.parse(line). Or in other words, the problem with the Java drivers is that they need a conversion from json to bson, and this code seems to be awfully slow or badly implemented. A full JSON (encode+decode) via JSON-simple or specially via JSON-smart is 100 times faster than the JSON.parse() command.
I know Stack Overflow is telling me right above this box that I should be answering the answer, which I'm not, but rest assured that I'm still looking for an answer for this problem. I can't believe all the talk about Mongo's performance and then this simple example code fails so miserably.
I've done importing a multi-line json file with ~250M records. I just use mongoimport < data.txt and it took 10 hours. Compared to your 10M vs. 3 hours I think this is considerably faster.
Also from my experience writing your own multi-threaded parser would speed things up drastically. The procedure is simple:
Open the file as BINARY (not TEXT!)
Set markers(offsets) evenly across the file. The count of markers depends on the number of threads you want.
Search for '\n' near the markers, calibrate the markers so they are aligned to lines.
Parse each chunk with a thread.
A reminder:
when you want performance, don't use stream reader or any built-in line-based read methods. They are slow. Just use binary buffer and search for '\n' to identify a line, and (most preferably) do in-place parsing in the buffer without creating a string. Otherwise the garbage collector won't be so happy with this.
You can parse the entire file together at once and the insert the whole json in mongo document, Avoid multiple loops, You need to separate the logic as follows:
1)Parse the file and retrieve the json Object.
2)Once the parsing is over, save the json Object in the Mongo Document.
I've got a slightly faster way (I'm also inserting millions at the moment), insert collections instead of single documents with
insert(List<DBObject> list)
http://api.mongodb.org/java/current/com/mongodb/DBCollection.html#insert(java.util.List)
That said, it's not that much faster. I'm about to experiment with setting other WriteConcerns than ACKNOWLEDGED (mainly UNACKNOWLEDGED) to see if I can speed it up faster. See http://docs.mongodb.org/manual/core/write-concern/ for info
Another way to improve performance, is to create indexes after bulk inserting. However, this is rarely an option except for one off jobs.
Apologies if this is slightly wooly sounding, I'm still testing things myself. Good question.
You can also remove all the indexes (except for the PK index, of course) and rebuild them after the import.
Use bulk operations insert/upserts. After Mongo 2.6 you can do Bulk Updates/Upserts. Example below does bulk update using c# driver.
MongoCollection<foo> collection = database.GetCollection<foo>(collectionName);
var bulk = collection.InitializeUnorderedBulkOperation();
foreach (FooDoc fooDoc in fooDocsList)
{
var update = new UpdateDocument { {fooDoc.ToBsonDocument() } };
bulk.Find(Query.EQ("_id", fooDoc.Id)).Upsert().UpdateOne(update);
}
BulkWriteResult bwr = bulk.Execute();
You can use a bulk insertion
You can read the documentation at mongo website and you can also check this java example on StackOverflow