Java method for MongoDB collection.save()

Java method for MongoDB collection.save() - java

I'm having a problem with MongoDB using Java when I try adding documents with customized _id field. And when I insert new document to that collection, I want to ignore the document if it's _id has already existed.
In Mongo shell, collection.save() can be used in this case but I cannot find the equivalent method to work with MongoDB java driver.
Just to add an example:
I have a collection of documents containing websites' information
with the URLs as _id field (which is unique)
I want to add some more documents. In those new documents, some might be existing in the current collection. So I want to keep adding all the new documents except for the duplicate ones.
This can be achieve by collection.save() in Mongo Shell but using MongoDB Java Driver, I can't find the equivalent method.
Hopefully someone can share the solution. Thanks in advance!

In the MongoDB Java driver, you could try using the BulkWriteOperation object with the initializeOrderedBulkOperation() method of the DBCollection object (the one that contains your collection). This is used as follows:
MongoClient mongo = new MongoClient("localhost", port_number);
DB db = mongo.getDB("db_name");
ArrayList<DBObject> objectList; // Fill this list with your objects to insert
BulkWriteOperation operation = col.initializeOrderedBulkOperation();
for (int i = 0; i < objectList.size(); i++) {
operation.insert(objectList.get(i));
}
BulkWriteResult result = operation.execute();
With this method, your documents will be inserted one at a time with error handling on each insert, so documents that have a duplicated id will throw an error as usual, but the operation will still continue with the rest of the documents. In the end, you can use the getInsertedCount() method of the BulkWriteResult object to know how many documents were really inserted.
This can prove to be a bit ineffective if lots of data is inserted this way, though. This is just sample code (that was found on journaldev.com and edited to fit your situation.). You may need to edit it so it fits your current configuration. It is also untested.

I guess save is doing something like this.
fun save(doc: Document, col: MongoCollection<Document>) {
if (doc.getObjectId("_id") != null) {
doc.put("_id", ObjectId()) // generate a new id
}
col.replaceOne(Document("_id", doc.getObjectId("_id")), doc)
}
Maybe they removed save so you decide how to generate the new id.

Related

Java Couchbase Querying to find a document's ID?

I'm new to couchbase. I'm using Java for this. I'm trying to remove a document from a bucket by looking up its ID with query parameters(assuming the ID is unknown).
Lets say I have a bucket called test-data. In that bucked I have a document with ID of 555 and Content of {"name":"bob","num":"10"}
I want to be able to remove that document by querying using 'name' and 'num'.
So far I have this (hardcoded):
String statement = "SELECT META(`test-data`).id from `test-data` WHERE name = \"bob\" and num = \"10\"";
N1qlQuery query = N1qlQuery.simple(statement);
N1qlQueryResult result = bucket.query(query);
List<N1qlQueryRow> row = result.allRows();
N1qlQueryRow res1 = row.get(0);
System.out.println(res1);
//output: {"id":"555"}
So I'm getting a json that has the document's ID in it. What would be the best way to extract that ID so that I can then remove the queryed document from the bucket using its ID? Am I doing to many steps? Is there a better way to extract the document's ID?
bucket.remove(docID)
Ideally I'd like to use something like a N1q1QueryResult to get this going but I'm not sure how to set that up.
N1qlQueryResult result = bucket.query(select("META.id").fromCurrentBucket().where((x("num").eq("\""+num+"\"")).and(x("name").eq("\""+name+"\""))));
But that isn't working at the moment.
Any help or direction would be appreciated. Thanks.

There might be a better way which is running this kind of query:
delete from `test-data` use keys '00000874a09e749ab6f199c0622c5cb0' returning raw META(`test-data`).id
or if your fields has index:
delete from `test-data` where name='bob' and num='10' returning raw META(`test-data`).id
This query deletes the specified document with given document key (which is meta.id) and returns document id of deleted document if it deletes any document. Returns empty if no documents deleted.
You can implement this query with couchbase sdk as follows:
Statement statement = deleteFrom("test-data")
.where(x("name").eq(s("bob")).and(x("num").eq(s("10"))))
.returningRaw(meta(i("test-data")).get("id"));
You can make this statement parameterized or just execute like that.

MongoDB result set getting modified after execution of a query

In my application there are 2 threads:
crawl the web-sites and insert the data into MongoDB
retrieve the crawled sites and perform business logic
In order to retrieve the the crawled sites I use the following query:
Document query = new Document("fetchStatus", new Document("$lte", fetchStatusParam));
FindIterable<Document> unfetchedEpisodes = dbC_Episodes.find(query);
As the result I get all episodes, which its fetchStatusParam is less or equal to the specific value.
The next step, I store the items of the result set in HashMap<String, TrackedEpisode>, which is an object property in order to track them:
for (Document document : unfetchedEpisodes) {
this.trackedEpisodes.put(document.get("_id").toString(), new TrackedEpisode(document));
}
Then I do some business logic, which:
doesn't modify the unfetchedEpisodes result set.
doesn't remove any object from trackedEpisodes.
Up till now everything is OK.
The last step, I pass over all retrieved documents and mark them as fetched in order to prevent the duplicate fetching in the future.
for (Document document : unfetchedEpisodes) {
if (this.trackedEpisodes.containsKey(document.get("_id").toString())) {
// prevent repeated fetching
document.put("fetchStatus", FetchStatus.IN_PROCESS.getID());
if (this.trackedEpisodes.get(document.get("_id").toString()).isExpired()) {
document.put("isExpired", true);
document.put("fetchStatus", FetchStatus.FETCHED.getID());
}
} else {
System.out.println("BOO! Strange new object detected");
}
dbC_Episodes.updateOne(new Document("_id", document.get("_id")), new Document("$set", document));
}
I run this code for a couple of days and paid attention that sometimes it arrives to the else part of the if (this.trackedEpisodes.containsKey()) statement. It's weird for me, how it can be possible that unfetchedEpisodes and trackedEpisodes are not synchronized and don't contain the same items?
I began to investigate the case and paid attention that the times I arrive to "BOO! Strange new object detected" the document iterator contains the item which is in database but should not yet be in unfetchedEpisodes since I didn't execute a new query to database.
I checked couple of times the matter of storing retrieved items into trackedEpisodes and always all elements from the unfetchedEpisodes have been added to trackedEpisodes but after that sometimes I still arrive to "BOO! Strange new object detected".
My question:
Why unfetchedEpisodes gets new items after execution of a query?
Is it possible that unfetchedEpisodes will be modified by MongoDB driver after execution of Collection#query()?
Maybe should I use kind of .close() after executing a query from the MongoDB?
The used versions:
MongoDB: 3.2.3, x64
MongoDB Java Driver: mongodb-driver-3.2.2, mongodb-driver-core-3.2.2, bson-3.2.2

When you call find here:
FindIterable<Document> unfetchedEpisodes = dbC_Episodes.find(query);
you are not actually getting all the episodes back. You are getting a database cursor pointing to the matched documents.
Then when you call:
for (Document document : unfetchedEpisodes){}
an iterator is created over all of the documents that match the query.
When you call it a second time, a new cursor is returned, for the same query, and all of the documents that match now are iterated over.
If the collection has changed in between, the results will be different.
If you want to ensure that the contents of unfetchedEpisodes are unchanged then one option is you could pull the entire result set into memory and iterate over it in memory rather than on the DB, e.g.
ArrayList<Document> unfetchedEpisodes = dbC_Episodes.find(query).into(new ArrayList<Document>());

Failed to make bulk upsert using mongo

I'm trying to do upsert using mongodb driver, here is a code:
BulkWriteOperation builder = coll.initializeUnorderedBulkOperation();
DBObject toDBObject;
for (T entity : entities) {
toDBObject = morphia.toDBObject(entity);
builder.find(toDBObject).upsert().replaceOne(toDBObject);
}
BulkWriteResult result = builder.execute();
where "entity" is morphia object. When I'm running the code first time (there are no entities in the DB, so all of the queries should be insert) it works fine and I see the entities in the database with generated _id field. Second run I'm changing some fields and trying to save changed entities and then I receive the folowing error from mongo:
E11000 duplicate key error collection: statistics.counters index: _id_ dup key: { : ObjectId('56adfbf43d801b870e63be29') }
what I forgot to configure in my example?

I don't know the structure of dbObject, but that bulk Upsert needs a valid query in order to work.
Let's say, for example, that you have a unique (_id) property called "id". A valid query would look like:
builder.find({id: toDBObject.id}).upsert().replaceOne(toDBObject);
This way, the engine can (a) find an object to update and then (b) update it (or, insert if the object wasn't found). Of course, you need the Java syntax for find, but same rule applies: make sure your .find will find something, then do an update.
I believe (just a guess) that the way it's written now will find "all" docs and try to update the first one ... but the behavior you are describing suggests it's finding "no doc" and attempting an insert.

MongoCollection updateMany, bulkWrite or something else?

I am trying to insert / update many records in a MongoCollection. I have a list of Documents to be updated.
List<Document> Documents;
The list contains some new records that are to be inserted and others are already existing ones which need to be updated. I was looking at the method
updateMany() in MongoCollection class
but the description says it updates one record. I am confused as to which method should be used.
Reference
Version : 3.0.0

I believe it is a bug in javadoc and updateMany() should update multiple records.
I've investigated source code of Mongo, just in case, and it sets "multi" parameter to true, so everything should work ok:
public UpdateResult updateMany(final Bson filter, final Bson update, final UpdateOptions updateOptions) {
return update(filter, update, updateOptions, true); // that true means "multi" is used
}

save and find json string in mongodb

I'm building a logging application that does the following:
gets JSON strings from many loggers continuously and saves them to a db
serves the collected data as a per logger bulk
my intention is to use a document based NoSQL storage to have the bulk structure right away. After some research I decided to go for MongoDB because of the following features:
- comprehensive functions to insert data into existing structures ($push, (capped) collection)
- automatic sharding with a key I choose (so I can shard on a per logger basis and therefore serve bulk data in no time - all data already being on the same db server)
The JSON I get from the loggers looks like this:
[
{"bdy":{
"cat":{"id":"36494h89h","toc":55,"boc":99},
"dataT":"2013-08-12T13:44:03Z","had":0,
"rng":23,"Iss":[{"id":10,"par":"dim, 10, dak"}]
},"hdr":{
"v":"0.2.7","N":2,"Id":"KBZD348940"}
}
]
The logger can send more than one element in the same array. I this example it is just one.
I started coding in Java with the mongo driver and the first problem I discovered was: I have to parse my with no doubt valid JSON to be able to save it in mongoDB. I learned that this is due to BSON being the native format of MongoDB. I would have liked to forward the JSON string to the db directly to save that extra execution time.
so what I do in a first Java test to save just this JSON string is:
String loggerMessage = "...the above JSON string...";
DBCollection coll = db.getCollection("logData");
DBObject message = (DBObject) JSON.parse(loggerMessage);
coll.insert(message);
the last line of this code causes the following exception:
Exception in thread "main" java.lang.IllegalArgumentException: BasicBSONList can only work with numeric keys, not: [_id]
at org.bson.types.BasicBSONList._getInt(BasicBSONList.java:161)
at org.bson.types.BasicBSONList._getInt(BasicBSONList.java:152)
at org.bson.types.BasicBSONList.get(BasicBSONList.java:104)
at com.mongodb.DBCollection.apply(DBCollection.java:767)
at com.mongodb.DBCollection.apply(DBCollection.java:756)
at com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:220)
at com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:204)
at com.mongodb.DBCollection.insert(DBCollection.java:76)
at com.mongodb.DBCollection.insert(DBCollection.java:60)
at com.mongodb.DBCollection.insert(DBCollection.java:105)
at mongomockup.MongoMockup.main(MongoMockup.java:65)
I tried to save this JSON via the mongo shell and it works perfectly.
How can I get this done in Java?
How could I maybe save the extra parsing?
What structure would you choose to save the data? Array of messages in the same document, collection of messages in single documents, ....

It didn't work because of the array. You need a BasicDBList to be able to save multiple messages. Here is my new solution that works perfectly:
BasicDBList data = (BasicDBList) JSON.parse(loggerMessage);
for(int i=0; i < data.size(); i++){
coll.insert((DBObject) data.get(i));
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java method for MongoDB collection.save() - java

Related

Java Couchbase Querying to find a document's ID?

MongoDB result set getting modified after execution of a query

Failed to make bulk upsert using mongo

MongoCollection updateMany, bulkWrite or something else?

save and find json string in mongodb

Categories

Resources