In my application there are 2 threads:
crawl the web-sites and insert the data into MongoDB
retrieve the crawled sites and perform business logic
In order to retrieve the the crawled sites I use the following query:
Document query = new Document("fetchStatus", new Document("$lte", fetchStatusParam));
FindIterable<Document> unfetchedEpisodes = dbC_Episodes.find(query);
As the result I get all episodes, which its fetchStatusParam is less or equal to the specific value.
The next step, I store the items of the result set in HashMap<String, TrackedEpisode>, which is an object property in order to track them:
for (Document document : unfetchedEpisodes) {
this.trackedEpisodes.put(document.get("_id").toString(), new TrackedEpisode(document));
}
Then I do some business logic, which:
doesn't modify the unfetchedEpisodes result set.
doesn't remove any object from trackedEpisodes.
Up till now everything is OK.
The last step, I pass over all retrieved documents and mark them as fetched in order to prevent the duplicate fetching in the future.
for (Document document : unfetchedEpisodes) {
if (this.trackedEpisodes.containsKey(document.get("_id").toString())) {
// prevent repeated fetching
document.put("fetchStatus", FetchStatus.IN_PROCESS.getID());
if (this.trackedEpisodes.get(document.get("_id").toString()).isExpired()) {
document.put("isExpired", true);
document.put("fetchStatus", FetchStatus.FETCHED.getID());
}
} else {
System.out.println("BOO! Strange new object detected");
}
dbC_Episodes.updateOne(new Document("_id", document.get("_id")), new Document("$set", document));
}
I run this code for a couple of days and paid attention that sometimes it arrives to the else part of the if (this.trackedEpisodes.containsKey()) statement. It's weird for me, how it can be possible that unfetchedEpisodes and trackedEpisodes are not synchronized and don't contain the same items?
I began to investigate the case and paid attention that the times I arrive to "BOO! Strange new object detected" the document iterator contains the item which is in database but should not yet be in unfetchedEpisodes since I didn't execute a new query to database.
I checked couple of times the matter of storing retrieved items into trackedEpisodes and always all elements from the unfetchedEpisodes have been added to trackedEpisodes but after that sometimes I still arrive to "BOO! Strange new object detected".
My question:
Why unfetchedEpisodes gets new items after execution of a query?
Is it possible that unfetchedEpisodes will be modified by MongoDB driver after execution of Collection#query()?
Maybe should I use kind of .close() after executing a query from the MongoDB?
The used versions:
MongoDB: 3.2.3, x64
MongoDB Java Driver: mongodb-driver-3.2.2, mongodb-driver-core-3.2.2, bson-3.2.2
When you call find here:
FindIterable<Document> unfetchedEpisodes = dbC_Episodes.find(query);
you are not actually getting all the episodes back. You are getting a database cursor pointing to the matched documents.
Then when you call:
for (Document document : unfetchedEpisodes){}
an iterator is created over all of the documents that match the query.
When you call it a second time, a new cursor is returned, for the same query, and all of the documents that match now are iterated over.
If the collection has changed in between, the results will be different.
If you want to ensure that the contents of unfetchedEpisodes are unchanged then one option is you could pull the entire result set into memory and iterate over it in memory rather than on the DB, e.g.
ArrayList<Document> unfetchedEpisodes = dbC_Episodes.find(query).into(new ArrayList<Document>());
Related
I have many existing indices partition by date. Eg: index_190901, index_190902,...
And I have an API which takes index_name and doc_id as inputs. User want to update some documents in index by input fields, index_name, doc_id.
I'm trying to update document using the following code:
updateRequest.index("invalid_daily_index")
.type("type")
.id("id")
.doc(jsonMap)
It works fine if user input existing index but if user input non-existing index, new index with no document will be created.
I know that I can setup auto_create_index but I still want to create index automatically when I insert new documents.
Check if index is existed with client.indices.exists(request, RequestOptions.DEFAULT) is quite expensive. I don't want to check it every request
How to make Elasticsearch to not create new index when I use updateRequest.
You can block the option to automaticaly create non existing indices by putting false to the action.auto_create_index setting of the cluster
PUT _cluster/settings
{
"persistent" : { "action.auto_create_indexā€¯ : "false" }
}
For details take a look at the reference
I'm having a problem with MongoDB using Java when I try adding documents with customized _id field. And when I insert new document to that collection, I want to ignore the document if it's _id has already existed.
In Mongo shell, collection.save() can be used in this case but I cannot find the equivalent method to work with MongoDB java driver.
Just to add an example:
I have a collection of documents containing websites' information
with the URLs as _id field (which is unique)
I want to add some more documents. In those new documents, some might be existing in the current collection. So I want to keep adding all the new documents except for the duplicate ones.
This can be achieve by collection.save() in Mongo Shell but using MongoDB Java Driver, I can't find the equivalent method.
Hopefully someone can share the solution. Thanks in advance!
In the MongoDB Java driver, you could try using the BulkWriteOperation object with the initializeOrderedBulkOperation() method of the DBCollection object (the one that contains your collection). This is used as follows:
MongoClient mongo = new MongoClient("localhost", port_number);
DB db = mongo.getDB("db_name");
ArrayList<DBObject> objectList; // Fill this list with your objects to insert
BulkWriteOperation operation = col.initializeOrderedBulkOperation();
for (int i = 0; i < objectList.size(); i++) {
operation.insert(objectList.get(i));
}
BulkWriteResult result = operation.execute();
With this method, your documents will be inserted one at a time with error handling on each insert, so documents that have a duplicated id will throw an error as usual, but the operation will still continue with the rest of the documents. In the end, you can use the getInsertedCount() method of the BulkWriteResult object to know how many documents were really inserted.
This can prove to be a bit ineffective if lots of data is inserted this way, though. This is just sample code (that was found on journaldev.com and edited to fit your situation.). You may need to edit it so it fits your current configuration. It is also untested.
I guess save is doing something like this.
fun save(doc: Document, col: MongoCollection<Document>) {
if (doc.getObjectId("_id") != null) {
doc.put("_id", ObjectId()) // generate a new id
}
col.replaceOne(Document("_id", doc.getObjectId("_id")), doc)
}
Maybe they removed save so you decide how to generate the new id.
I am trying to insert / update many records in a MongoCollection. I have a list of Documents to be updated.
List<Document> Documents;
The list contains some new records that are to be inserted and others are already existing ones which need to be updated. I was looking at the method
updateMany() in MongoCollection class
but the description says it updates one record. I am confused as to which method should be used.
Reference
Version : 3.0.0
I believe it is a bug in javadoc and updateMany() should update multiple records.
I've investigated source code of Mongo, just in case, and it sets "multi" parameter to true, so everything should work ok:
public UpdateResult updateMany(final Bson filter, final Bson update, final UpdateOptions updateOptions) {
return update(filter, update, updateOptions, true); // that true means "multi" is used
}
I am new to Mongo DB and having trouble as it is behaving differently in different environments ( Dev , QA and Production)
I am using findAndModify to Update the Records in my MongoDB .
There is a Job that runs daily which Updates /Inserts Data to Mongo DB , and i am using findAndModify to Update the Record .
But what i observed is that the first record that is returned by findAndModify is different in Dev , QA and Production environemnts although the three environments are having the same Data ??
As per the Mongo DB document , it states that findAndModify will modify the first document
Currently this is my code :
BasicDBObject update = new BasicDBObject();
update.append("$set", new BasicDBObject(dataformed));
coll.findAndModify(query, update);
Please let me know how can i make sure that , the findAndModify returns the Last Updated Record , rather than depending upon un predictable behaviour ??
Edited Part
I am trying to use sort for my code but it is giving me compilation errors
coll.findAndModify(query, sort: { rating: 1 }, update);
I have a field called as lastUpdated which is created using System.currentTimeMilis
So can i use this lastUpdated as shown this way to get the Last Updated Record
coll.findAndModify( query,
new BasicDBObject("sort", new BasicDBObject("lastUpdated ", -1)),
update);
It appears you are using Java, so you have to construct the sort parameter as a DBObject, just like the other parameters:
coll.findAndModify(
query,
new BasicDBObject("sort", new BasicDBObject("rating", 1)),
update);
As we already explained to you in your other question, you have to add a field to the document which contains the date it was changed and then sort by that field or you have to use a capped collection, because they guarantee that the insertion order is preserved.
I want to update multiple rows in My Collection called "Users". Right now I am updating both the rows seperately but I want to do the same in one query.
My current code:
coll.update(new BasicDBObject().append("key", k1), new BasicDBObject().append("$inc", new BasicDBObject().append("balance", 10)));
coll.update(new BasicDBObject().append("key", k2), new BasicDBObject().append("$inc", new BasicDBObject().append("balance", -10)));
How to make these two seperate updates in one statement?
First let me translate your java code to shell script so people can read it :
db.coll.update({key: k1}, {$inc:{balance:10}})
db.coll.update({key: k2}, {$inc:{balance:-10}})
Now, the reason you will never be able to do this in one update is because there is no way to provide a unique update clause per matching document. You could bulk your updates so that you can do this (pseudoish):
set1 = getAllKeysForBalanceIncrease();
set2 = getAllKeysForBalanceDecrease();
db.coll.update({key:{$in:set1}}, {$inc:{balance:10}}, false, true)
db.coll.update({key:{$in:set2}}, {$inc:{balance:-10}}, false, true)
In other words, you can update multiple documents within one atomic write but the update operation will be static for all documents. So aggregating all documents that require the same update is your only optimization path.
The $in clause can be composed in Java through :
ObjectId[] oidArray = getAllKeysEtc();
query = new BasicDBObject("key", new BasicDBObject("$in", oidArray));
In MongoDB you do not have transactions that span multiple documents. Only writes on a document are atomic.
But you can do updates with:
public WriteResult update(DBObject q,
DBObject o,
boolean upsert,
boolean multi)
But note, this will not be in a transaction.