MongoCollection updateMany, bulkWrite or something else? - java

I am trying to insert / update many records in a MongoCollection. I have a list of Documents to be updated.
List<Document> Documents;
The list contains some new records that are to be inserted and others are already existing ones which need to be updated. I was looking at the method
updateMany() in MongoCollection class
but the description says it updates one record. I am confused as to which method should be used.
Reference
Version : 3.0.0

I believe it is a bug in javadoc and updateMany() should update multiple records.
I've investigated source code of Mongo, just in case, and it sets "multi" parameter to true, so everything should work ok:
public UpdateResult updateMany(final Bson filter, final Bson update, final UpdateOptions updateOptions) {
return update(filter, update, updateOptions, true); // that true means "multi" is used
}

Related

MongoDB result set getting modified after execution of a query

In my application there are 2 threads:
crawl the web-sites and insert the data into MongoDB
retrieve the crawled sites and perform business logic
In order to retrieve the the crawled sites I use the following query:
Document query = new Document("fetchStatus", new Document("$lte", fetchStatusParam));
FindIterable<Document> unfetchedEpisodes = dbC_Episodes.find(query);
As the result I get all episodes, which its fetchStatusParam is less or equal to the specific value.
The next step, I store the items of the result set in HashMap<String, TrackedEpisode>, which is an object property in order to track them:
for (Document document : unfetchedEpisodes) {
this.trackedEpisodes.put(document.get("_id").toString(), new TrackedEpisode(document));
}
Then I do some business logic, which:
doesn't modify the unfetchedEpisodes result set.
doesn't remove any object from trackedEpisodes.
Up till now everything is OK.
The last step, I pass over all retrieved documents and mark them as fetched in order to prevent the duplicate fetching in the future.
for (Document document : unfetchedEpisodes) {
if (this.trackedEpisodes.containsKey(document.get("_id").toString())) {
// prevent repeated fetching
document.put("fetchStatus", FetchStatus.IN_PROCESS.getID());
if (this.trackedEpisodes.get(document.get("_id").toString()).isExpired()) {
document.put("isExpired", true);
document.put("fetchStatus", FetchStatus.FETCHED.getID());
}
} else {
System.out.println("BOO! Strange new object detected");
}
dbC_Episodes.updateOne(new Document("_id", document.get("_id")), new Document("$set", document));
}
I run this code for a couple of days and paid attention that sometimes it arrives to the else part of the if (this.trackedEpisodes.containsKey()) statement. It's weird for me, how it can be possible that unfetchedEpisodes and trackedEpisodes are not synchronized and don't contain the same items?
I began to investigate the case and paid attention that the times I arrive to "BOO! Strange new object detected" the document iterator contains the item which is in database but should not yet be in unfetchedEpisodes since I didn't execute a new query to database.
I checked couple of times the matter of storing retrieved items into trackedEpisodes and always all elements from the unfetchedEpisodes have been added to trackedEpisodes but after that sometimes I still arrive to "BOO! Strange new object detected".
My question:
Why unfetchedEpisodes gets new items after execution of a query?
Is it possible that unfetchedEpisodes will be modified by MongoDB driver after execution of Collection#query()?
Maybe should I use kind of .close() after executing a query from the MongoDB?
The used versions:
MongoDB: 3.2.3, x64
MongoDB Java Driver: mongodb-driver-3.2.2, mongodb-driver-core-3.2.2, bson-3.2.2
When you call find here:
FindIterable<Document> unfetchedEpisodes = dbC_Episodes.find(query);
you are not actually getting all the episodes back. You are getting a database cursor pointing to the matched documents.
Then when you call:
for (Document document : unfetchedEpisodes){}
an iterator is created over all of the documents that match the query.
When you call it a second time, a new cursor is returned, for the same query, and all of the documents that match now are iterated over.
If the collection has changed in between, the results will be different.
If you want to ensure that the contents of unfetchedEpisodes are unchanged then one option is you could pull the entire result set into memory and iterate over it in memory rather than on the DB, e.g.
ArrayList<Document> unfetchedEpisodes = dbC_Episodes.find(query).into(new ArrayList<Document>());

Java method for MongoDB collection.save()

I'm having a problem with MongoDB using Java when I try adding documents with customized _id field. And when I insert new document to that collection, I want to ignore the document if it's _id has already existed.
In Mongo shell, collection.save() can be used in this case but I cannot find the equivalent method to work with MongoDB java driver.
Just to add an example:
I have a collection of documents containing websites' information
with the URLs as _id field (which is unique)
I want to add some more documents. In those new documents, some might be existing in the current collection. So I want to keep adding all the new documents except for the duplicate ones.
This can be achieve by collection.save() in Mongo Shell but using MongoDB Java Driver, I can't find the equivalent method.
Hopefully someone can share the solution. Thanks in advance!
In the MongoDB Java driver, you could try using the BulkWriteOperation object with the initializeOrderedBulkOperation() method of the DBCollection object (the one that contains your collection). This is used as follows:
MongoClient mongo = new MongoClient("localhost", port_number);
DB db = mongo.getDB("db_name");
ArrayList<DBObject> objectList; // Fill this list with your objects to insert
BulkWriteOperation operation = col.initializeOrderedBulkOperation();
for (int i = 0; i < objectList.size(); i++) {
operation.insert(objectList.get(i));
}
BulkWriteResult result = operation.execute();
With this method, your documents will be inserted one at a time with error handling on each insert, so documents that have a duplicated id will throw an error as usual, but the operation will still continue with the rest of the documents. In the end, you can use the getInsertedCount() method of the BulkWriteResult object to know how many documents were really inserted.
This can prove to be a bit ineffective if lots of data is inserted this way, though. This is just sample code (that was found on journaldev.com and edited to fit your situation.). You may need to edit it so it fits your current configuration. It is also untested.
I guess save is doing something like this.
fun save(doc: Document, col: MongoCollection<Document>) {
if (doc.getObjectId("_id") != null) {
doc.put("_id", ObjectId()) // generate a new id
}
col.replaceOne(Document("_id", doc.getObjectId("_id")), doc)
}
Maybe they removed save so you decide how to generate the new id.

Failed to make bulk upsert using mongo

I'm trying to do upsert using mongodb driver, here is a code:
BulkWriteOperation builder = coll.initializeUnorderedBulkOperation();
DBObject toDBObject;
for (T entity : entities) {
toDBObject = morphia.toDBObject(entity);
builder.find(toDBObject).upsert().replaceOne(toDBObject);
}
BulkWriteResult result = builder.execute();
where "entity" is morphia object. When I'm running the code first time (there are no entities in the DB, so all of the queries should be insert) it works fine and I see the entities in the database with generated _id field. Second run I'm changing some fields and trying to save changed entities and then I receive the folowing error from mongo:
E11000 duplicate key error collection: statistics.counters index: _id_ dup key: { : ObjectId('56adfbf43d801b870e63be29') }
what I forgot to configure in my example?
I don't know the structure of dbObject, but that bulk Upsert needs a valid query in order to work.
Let's say, for example, that you have a unique (_id) property called "id". A valid query would look like:
builder.find({id: toDBObject.id}).upsert().replaceOne(toDBObject);
This way, the engine can (a) find an object to update and then (b) update it (or, insert if the object wasn't found). Of course, you need the Java syntax for find, but same rule applies: make sure your .find will find something, then do an update.
I believe (just a guess) that the way it's written now will find "all" docs and try to update the first one ... but the behavior you are describing suggests it's finding "no doc" and attempting an insert.

MongoDB : How to make findAndModify returns the Last Updated Record

I am new to Mongo DB and having trouble as it is behaving differently in different environments ( Dev , QA and Production)
I am using findAndModify to Update the Records in my MongoDB .
There is a Job that runs daily which Updates /Inserts Data to Mongo DB , and i am using findAndModify to Update the Record .
But what i observed is that the first record that is returned by findAndModify is different in Dev , QA and Production environemnts although the three environments are having the same Data ??
As per the Mongo DB document , it states that findAndModify will modify the first document
Currently this is my code :
BasicDBObject update = new BasicDBObject();
update.append("$set", new BasicDBObject(dataformed));
coll.findAndModify(query, update);
Please let me know how can i make sure that , the findAndModify returns the Last Updated Record , rather than depending upon un predictable behaviour ??
Edited Part
I am trying to use sort for my code but it is giving me compilation errors
coll.findAndModify(query, sort: { rating: 1 }, update);
I have a field called as lastUpdated which is created using System.currentTimeMilis
So can i use this lastUpdated as shown this way to get the Last Updated Record
coll.findAndModify( query,
new BasicDBObject("sort", new BasicDBObject("lastUpdated ", -1)),
update);
It appears you are using Java, so you have to construct the sort parameter as a DBObject, just like the other parameters:
coll.findAndModify(
query,
new BasicDBObject("sort", new BasicDBObject("rating", 1)),
update);
As we already explained to you in your other question, you have to add a field to the document which contains the date it was changed and then sort by that field or you have to use a capped collection, because they guarantee that the insertion order is preserved.

Update multiple rows in MongoDB java Driver

I want to update multiple rows in My Collection called "Users". Right now I am updating both the rows seperately but I want to do the same in one query.
My current code:
coll.update(new BasicDBObject().append("key", k1), new BasicDBObject().append("$inc", new BasicDBObject().append("balance", 10)));
coll.update(new BasicDBObject().append("key", k2), new BasicDBObject().append("$inc", new BasicDBObject().append("balance", -10)));
How to make these two seperate updates in one statement?
First let me translate your java code to shell script so people can read it :
db.coll.update({key: k1}, {$inc:{balance:10}})
db.coll.update({key: k2}, {$inc:{balance:-10}})
Now, the reason you will never be able to do this in one update is because there is no way to provide a unique update clause per matching document. You could bulk your updates so that you can do this (pseudoish):
set1 = getAllKeysForBalanceIncrease();
set2 = getAllKeysForBalanceDecrease();
db.coll.update({key:{$in:set1}}, {$inc:{balance:10}}, false, true)
db.coll.update({key:{$in:set2}}, {$inc:{balance:-10}}, false, true)
In other words, you can update multiple documents within one atomic write but the update operation will be static for all documents. So aggregating all documents that require the same update is your only optimization path.
The $in clause can be composed in Java through :
ObjectId[] oidArray = getAllKeysEtc();
query = new BasicDBObject("key", new BasicDBObject("$in", oidArray));
In MongoDB you do not have transactions that span multiple documents. Only writes on a document are atomic.
But you can do updates with:
public WriteResult update(DBObject q,
DBObject o,
boolean upsert,
boolean multi)
But note, this will not be in a transaction.

Categories

Resources