Java/Grails - MongoDB aggregation 16MB buffer size limit

Java/Grails - MongoDB aggregation 16MB buffer size limit - java

I am trying to run mongo db aggregate query from java, but buffer size is exceeding 16MB. Is there any way to adjust the buffer size or any other workaround. I do not have the option to create collection in mongo server side and also I do not have any mongo utility like mongo.exe or mongoExport.exe in my client system.
Here is little part of code
if (!datasetObject?.isFlat && jsonFor != 'collection-grid'){
//mongoPipeline = new AggregateArgs (Pipeline = pipeline, AllowDiskUse = true, OutputMode = AggregateOutputMode.Cursor)
output= dataSetCollection.aggregate(pipeline)
}else{
output= dataSetCollection.aggregate(project)
}
I have 100K records with 30 field. When I query for 5 fields for all 100K records I get result(Success). But when I make a query for 100K records with all fields its throwing below error.
Issue is when I am trying to access all documents from collection including all fields of document its exceeding 16Mb limit size.
Actual Error:
com.mongodb.CommandFailureException: { "serverUsed" : "127.0.0.1:27017" , "errmsg" : "exception: aggregation result exceeds maximum document size (16MB)" , "code" : 16389 , "ok" : 0.0
How to resolve this issue?
Using MongoDB-3.0.6
Note: GridFS is not suitable for my criteria. Because I need to retrieve all documents in one request not one document.

When running the aggregation you can tell mongo to return a cursor. With the new APIs in the 3.0 Java driver that would look like this:
// Assuming MongoCollection
dataSetCollection.aggregate(pipeline).useCursor(true)
You might also need to tell it to use disk space on the server rather than doing it all in memory:
// Assuming MongoCollection
dataSetCollection.aggregate(pipeline).useCursor(true).allowDiskUse(true)
If you're using an older driver (or the old API in the new driver) those two options would look like this:
// Assuming DBCollection
dataSetCollection.aggregate(pipeline, AggregationOptions.builder()
.allowDiskUse(true)
.useCursor(true)
.build())
.useCursor(true)

There are two options to resolve this issue
1) use of $out which creates new collection and write result, Which is not good idea because this process is time consuming and complex to implement.
public class JavaAggregation {
public static void main(String args[]) throws UnknownHostException {
MongoClient mongo = new MongoClient();
DB db = mongo.getDB("databaseName");
DBCollection coll = db.getCollection("dataset");
/*
MONGO SHELL :
db.dataset.aggregate([
{ "$match": { isFlat : true } },
{ "$out": "datasetTemp" }
])
*/
DBObject match = new BasicDBObject("$match", new BasicDBObject("isFlat", true));
DBObject out = new BasicDBObject("$out", "datasetTemp");
AggregationOutput output = coll.aggregate(match, out);
DBCollection tempColl = db.getCollection("datasetTemp");
DBCursor cursor = tempColl.find();
try {
while(cursor.hasNext()) {
System.out.println(cursor.next());
}
} finally {
cursor.close();
}
}
}
2. Use of allowDiskUse(true) is very simple to implement and not even time consuming.
public class JavaAggregation {
public static void main(String args[]) throws UnknownHostException {
MongoClient mongo = new MongoClient();
DB db = mongo.getDB("databaseName");
DBCollection coll = db.getCollection("dataset");
/*
MONGO SHELL :
db.dataset.aggregate([
{ "$match": { isFlat : true } },
{ "$out": "datasetTemp" }
])
*/
DBObject match = new BasicDBObject("$match", new BasicDBObject("isFlat", true));
def dbObjArray = new BasicDBObject[1]
dbObjArray[0]= match
List<DBObject> flatPipeline = Arrays.asList(dbObjArray)
AggregationOptions aggregationOptions = AggregationOptions.builder()
.batchSize(100)
.outputMode(AggregationOptions.OutputMode.CURSOR)
.allowDiskUse(true)
.build();
def cursor = dataSetCollection.aggregate(flatPipeline,aggregationOptions)
try {
while(cursor.hasNext()) {
System.out.println(cursor.next());
}
}
finally {
cursor.close();
}
}
For more see here and here

Related

MongoDB: Getting entries with maximum version-id after grouping

I'm rather new to MongoDB and I'm trying to create a query which I though would be pretty trivial (well, alteast with SQL it would) but I can't get it done.
So have a collection patients in this collections a single patient is identified using the id property. (NOT mongodbs _id!!) There can be multiple version of a single patient, his version is determined by the meta.versionId field.
In order to query for all "current versions of patients" I need to get for every patient with a specific id the one with the maximum versionId.
So far I've got this:
AggregateIterable<Document> allPatients = db.getCollection("patients").aggregate(Arrays.asList(
new Document("$group", new Document("_id", "$id")
.append("max", new Document("$max", "$meta.versionId")))));
allPatients.forEach(new Block<Document>() {
#Override
public void apply(final Document document) {
System.out.println(document.toJson());
}
});
Which results in the following output (using my very limited test data):
{ "_id" : "2.25.260185450267055504591276882440338245053", "max" : "5" }
{ "_id" : "2.25.260185450267055504591276882441338245099", "max" : "0" }
Seems to work so far, but I need to get the whole patients collection.
Now I only know that for the id : 2.25.260185450267055504591276882440338245053 the max version is "5" and so on. Of course I could now create an own query for every single entry and sequentially get each patient document for a specific id/versionId-combo from mongodb but this seems like a terrible solution! Is there any other way to get it done?

If you know the columns that you want to retrieve , say patient name , address, etc I guess you can append those columns to the document with value 1.
AggregateIterable<Document> allPatients = db.getCollection("patients").aggregate(Arrays.asList(
new Document("$group", new Document("_id", "$id")
.append("max", new Document("$max", "$meta.versionId")).append("name",1).append("address",1))));

An approach that could work for you would be to first order the documents getting in the pipeline by the meta.versionId field using the $sort pipeline operator. However, be aware that the $sort stage has a limit of 100 megabytes of RAM. By default, if it exceeds this limit, $sort will produce an error.
To allow for the handling of large datasets, set the allowDiskUse option to true to enable $sort operations to write to temporary files. See the allowDiskUse option in aggregate() method for details.
After sorting, you can then group the ordered documents, carry out the aggregation using the $first or $last operators (depending on the previous sort direction) to get the other fields.
Consider running the following mongo shell pipeline operation as a way of
demonstrating this concept:
Mongo shell
pipeline = [
{ "$sort": {"meta.versionId": -1}}, // order the documents by the versionId field descending
{
"$group": {
"_id": "$id",
"max": { "$first": "$meta.versionId" }, // get the maximum versionId
"active": { "$first": "$active" }, // Whether this patient's record is in active use
"name": { "$first": "$name" }, // A name associated with the patient
"telecom": { "$first": "$telecom" }, // A contact detail for the individual
"gender": { "$first": "$gender" }, // male | female | other | unknown
"birthDate": { "$first": "$birthDate" } // The date of birth for the individual
/*
And many other fields
*/
}
}
]
db.patients.aggregate(pipeline);
Java test implementation
public class JavaAggregation {
public static void main(String args[]) throws UnknownHostException {
MongoClient mongo = new MongoClient();
DB db = mongo.getDB("test");
DBCollection coll = db.getCollection("patients");
// create the pipeline operations, first with the $sort
DBObject sort = new BasicDBObject("$sort",
new BasicDBObject("meta.versionId", -1)
);
// build the $group operations
DBObject groupFields = new BasicDBObject( "_id", "$id");
groupFields.put("max", new BasicDBObject( "$first", "$meta.versionId"));
groupFields.put("active", new BasicDBObject( "$first", "$active"));
groupFields.put("name", new BasicDBObject( "$first", "$name"));
groupFields.put("telecom", new BasicDBObject( "$first", "$telecom"));
groupFields.put("gender", new BasicDBObject( "$first", "$gender"));
groupFields.put("birthDate", new BasicDBObject( "$first", "$birthDate"));
// append any other necessary fields
DBObject group = new BasicDBObject("$group", groupFields);
List<DBObject> pipeline = Arrays.asList(sort, group);
AggregationOutput output = coll.aggregate(pipeline);
for (DBObject result : output.results()) {
System.out.println(result);
}
}
}

How to write Geospatial query in MongoDB Java

I'm working on MongoDB using Java. In that I have a table where I had stored the location coordinates. I have to get the nearest location in the list. I have followed this site and tried this.
db.location.find({ loc: { $near : { $geometry: { type: "Point", coordinates: [80.23,13.1112] }, $minDistance: 0, $maxDistance:1000 } } } )
This is working good but I don't have any idea about the right syntax in Mongo Java, I need to do the same in Java code.

The following code replicates the above mongo shell query in Java:
BasicDBObject criteria = new BasicDBObject("$near", new double[] { -80.23, 13.1112 });
criteria.put("$maxDistance", 1000);
BasicDBObject query = new BasicDBObject("loc", criteria);
List<DBObject> obj = getCollection("location").find(query).toArray()

Spring Data MongoDB and Bulk Update

I am using Spring Data MongoDB and would like to perform a Bulk Update just like the one described here: http://docs.mongodb.org/manual/reference/method/Bulk.find.update/#Bulk.find.update
When using regular driver it looks like this:
The following example initializes a Bulk() operations builder for the items collection, and adds various multi update operations to the list of operations.
var bulk = db.items.initializeUnorderedBulkOp();
bulk.find( { status: "D" } ).update( { $set: { status: "I", points: "0" } } );
bulk.find( { item: null } ).update( { $set: { item: "TBD" } } );
bulk.execute()
Is there any way to achieve similar result with Spring Data MongoDB ?

Bulk updates are supported from spring-data-mongodb 1.9.0.RELEASE. Here is a sample:
BulkOperations ops = template.bulkOps(BulkMode.UNORDERED, Match.class);
for (User user : users) {
Update update = new Update();
...
ops.updateOne(query(where("id").is(user.getId())), update);
}
ops.execute();

You can use this as long as the driver is current and the server you are talking to is at least MongoDB, which is required for bulk operations. Don't believe there is anything directly in spring data right now (and much the same for other higher level driver abstractions), but you can of course access the native driver collection object that implements the access to the Bulk API:
DBCollection collection = mongoOperation.getCollection("collection");
BulkWriteOperation bulk = collection.initializeOrderedBulkOperation();
bulk.find(new BasicDBObject("status","D"))
.update(new BasicDBObject(
new BasicDBObject(
"$set",new BasicDBObject(
"status", "I"
).append(
"points", 0
)
)
));
bulk.find(new BasicDBObject("item",null))
.update(new BasicDBObject(
new BasicDBObject(
"$set", new BasicDBObject("item","TBD")
)
));
BulkWriteResult writeResult = bulk.execute();
System.out.println(writeResult);
You can either fill in the DBObject types required by defining them, or use the builders supplied in the spring mongo library which should all support "extracting" the DBObject that they build.

public <T> void bulkUpdate(String collectionName, List<T> documents, Class<T> tClass) {
BulkOperations bulkOps = mongoTemplate.bulkOps(BulkOperations.BulkMode.UNORDERED, tClass, collectionName);
for (T document : documents) {
Document doc = new Document();
mongoTemplate.getConverter().write(document, doc);
org.springframework.data.mongodb.core.query.Query query = new org.springframework
.data.mongodb.core.query.Query(Criteria.where(UNDERSCORE_ID).is(doc.get(UNDERSCORE_ID)));
Document updateDoc = new Document();
updateDoc.append("$set", doc);
Update update = Update.fromDocument(updateDoc, UNDERSCORE_ID);
bulkOps.upsert(query, update);
}
bulkOps.execute();
}
Spring Mongo template is used to perform the update. The above code will work if you provide the _id field in the list of documents.

Use MongoDB db.collection.remove(query) in java

Is there a way in the MongoDB Java driver to call the db.collection.remove(query) method that I see in the MongoDB shell documentation?
That is, I know the exact criteria that I need to find all the documents I want to delete from MongoDB, but I can't find a way to make one call to remove those records in one trip. All that I can figure out is to find the documents and then delete them one by one.
I see this
http://docs.mongodb.org/manual/reference/method/db.collection.remove/
which implies there should be a way to do it, but I can't figure out the Java calls to get me that to that call.
Thank you for your help

To remove documents with an age property of 25.
MongoClient mongo = new MongoClient(new ServerAddress("localhost", 27017));
DB db = mongo.getDB("thedb");
DBCollection collection = db.getCollection("test");
BasicDBObject query = new BasicDBObject();
query.append("age", 25);
collection.remove(query);
DBCollection and BasicDBObject are two of the most important classes in the Java API.

Also to remove specific values from you document you can use following code with Mongo Java 3.2
Document docToDelete = new Document("Designation", "SE-1");
objDbCollection.findOneAndUpdate(new Document("Company", "StackOverflow"), new Document("$unset", docToDelete));
Above code will first find document having company = StackOverflow and then unset (remove) Designation = SE-1 key/value from that document.

Add and Update Mongo
public class App {
public static void main(String[] args) {
try {
Mongo mongo = new Mongo("localhost", 27017);
DB db = mongo.getDB("yourdb");
// get a single collection
DBCollection collection = db.getCollection("dummyColl");
//insert number 1 to 10 for testing
for (int i=1; i <= 10; i++) {
collection.insert(new BasicDBObject().append("number", i));
}
//remove number = 1
DBObject doc = collection.findOne(); //get first document
collection.remove(doc);
//remove number = 2
BasicDBObject document = new BasicDBObject();
document.put("number", 2);
collection.remove(document);
//remove number = 3
collection.remove(new BasicDBObject().append("number", 3));
//remove number > 9 , means delete number = 10
BasicDBObject query = new BasicDBObject();
query.put("number", new BasicDBObject("$gt", 9));
collection.remove(query);
//remove number = 4 and 5
BasicDBObject query2 = new BasicDBObject();
List<Integer> list = new ArrayList<Integer>();
list.add(4);
list.add(5);
query2.put("number", new BasicDBObject("$in", list));
collection.remove(query2);
//print out the document
DBCursor cursor = collection.find();
while(cursor.hasNext()) {
System.out.println(cursor.next());
}
collection.drop();
System.out.println("Done");
} catch (UnknownHostException e) {
e.printStackTrace();
} catch (MongoException e) {
e.printStackTrace();
}

How to insert multiple documents at once in MongoDB through Java

I am using MongoDB in my application and was needed to insert multiple documents inside a MongoDB collection .
The version I am using is of 1.6
I saw an example here
http://docs.mongodb.org/manual/core/create/
in the
Bulk Insert Multiple Documents Section
Where the author was passing an array to do this .
When I tried the same , but why it isn't allowing , and please tell me how can I insert multiple documents at once ??
package com;
import java.util.Date;
import com.mongodb.BasicDBObject;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.MongoClient;
public class App {
public static void main(String[] args) {
try {
MongoClient mongo = new MongoClient("localhost", 27017);
DB db = mongo.getDB("at");
DBCollection collection = db.getCollection("people");
/*
* BasicDBObject document = new BasicDBObject();
* document.put("name", "mkyong"); document.put("age", 30);
* document.put("createdDate", new Date()); table.insert(document);
*/
String[] myStringArray = new String[] { "a", "b", "c" };
collection.insert(myStringArray); // Compilation error at this line saying that "The method insert(DBObject...) in the type DBCollection is not applicable for the arguments (String[])"
} catch (Exception e) {
e.printStackTrace();
}
}
}
Please let me know what is the way so that I can insert multiple documents at once through java .

DBCollection.insert accepts a parameter of type DBObject, List<DBObject> or an array of DBObjects for inserting multiple documents at once. You are passing in a string array.
You must manually populate documents(DBObjects), insert them to a List<DBObject> or an array of DBObjects and eventually insert them.
DBObject document1 = new BasicDBObject();
document1.put("name", "Kiran");
document1.put("age", 20);
DBObject document2 = new BasicDBObject();
document2.put("name", "John");
List<DBObject> documents = new ArrayList<>();
documents.add(document1);
documents.add(document2);
collection.insert(documents);
The above snippet is essentially the same as the command you would issue in the MongoDB shell:
db.people.insert( [ {name: "Kiran", age: 20}, {name: "John"} ]);

Before 3.0, you can use below code in Java
DB db = mongoClient.getDB("yourDB");
DBCollection coll = db.getCollection("yourCollection");
BulkWriteOperation builder = coll.initializeUnorderedBulkOperation();
for(DBObject doc :yourList)
{
builder.insert(doc);
}
BulkWriteResult result = builder.execute();
return result.isAcknowledged();
If you are using mongodb version 3.0 , you can use
MongoDatabase database = mongoClient.getDatabase("yourDB");
MongoCollection<Document> collection = database.getCollection("yourCollection");
collection.insertMany(yourDocumentList);

As of MongoDB 2.6 and 2.12 version of the driver you can also now do a bulk insert operation. In Java you could use the BulkWriteOperation. An example use of this could be:
DBCollection coll = db.getCollection("user");
BulkWriteOperation bulk = coll.initializeUnorderedBulkOperation();
bulk.find(new BasicDBObject("z", 1)).upsert().update(new BasicDBObject("$inc", new BasicDBObject("y", -1)));
bulk.find(new BasicDBObject("z", 1)).upsert().update(new BasicDBObject("$inc", new BasicDBObject("y", -1)));
bulk.execute();

Creating Documents
There're two principal commands for creating documents in MongoDB:
insertOne()
insertMany()
There're other ways as well such as Update commands. We call these operations, upserts. Upserts occurs when there're no documents that match the selector used to identify documents.
Although MongoDB inserts ID by it's own, We can manually insert custom IDs as well by specifying _id parameter in the insert...() functions.
To insert multiple documents we can use insertMany() - which takes an array of documents as parameter. When executed, it returns multiple ids for each document in the array. To drop the collection, use drop() command. Sometimes, when doing bulk inserts - we may insert duplicate values. Specifically, if we try to insert duplicate _ids, we'll get the duplicate key error:
db.startup.insertMany(
[
{_id:"id1", name:"Uber"},
{_id:"id2", name:"Airbnb"},
{_id:"id1", name:"Uber"},
]
);
MongoDB stops inserting operation, if it encounters an error, to supress that - we can supply ordered:false parameter. Ex:
db.startup.insertMany(
[
{_id:"id1", name:"Uber"},
{_id:"id2", name:"Airbnb"},
{_id:"id1", name:"Airbnb"},
],
{ordered: false}
);

Your insert record format like in MongoDB that query retire from any source
EG.
{
"_id" : 1,
"name" : a
}
{
"_id" : 2,
"name" : b,
}
it is mongodb 3.0
FindIterable<Document> resulutlist = collection.find(query);
List docList = new ArrayList();
for (Document document : resulutlist) {
docList.add(document);
}
if(!docList.isEmpty()){
collectionCube.insertMany(docList);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java/Grails - MongoDB aggregation 16MB buffer size limit - java

Related

MongoDB: Getting entries with maximum version-id after grouping

How to write Geospatial query in MongoDB Java

Spring Data MongoDB and Bulk Update

Use MongoDB db.collection.remove(query) in java

How to insert multiple documents at once in MongoDB through Java

Categories

Resources