Java Spring Mongo DB Bulk Upsert Performance

Java Spring Mongo DB Bulk Upsert Performance - java

I am trying to upsert a POJO list about 50000 items using bulk operation api in Spring.
I am using following code block to upsert the records, however it is taking about 15 minutes to update the whole collection.
public void saveHistory(List<History> history) {
List<Pair<Query, Update>> updates = new ArrayList<>(history.size());
history.forEach(item -> {
Query query = Query.query(Criteria.where("Id").is(item.getUserId()));
Update update = new Update();
update.set("_id", item.getUserId());
update.set(“auditHistoryMap", item.getAuditHistoryMap());
updates.add(Pair.of(query, update));
});
BulkOperations bulkOperations = mongoTemplate.bulkOps(BulkOperations.BulkMode.UNORDERED, Historyclass);
bulkOperations.upsert(updates);
bulkOperations.execute();
}
However if the POJO were converted to Document type prior to upsert whole collection can be upserted in about 3 minutes .
public <T> void bulkUpdate(String collectionName, List<T> documents, Class<T> tClass) {
BulkOperations bulkOps = mongoTemplate.bulkOps(BulkOperations.BulkMode.UNORDERED, tClass, collectionName);
for (T document : documents) {
Document doc = new Document();
mongoTemplate.getConverter().write(document, doc);
Query query = new Query(Criteria.where("_id").is(doc.get("_id")));
Document updateDoc = new Document();
updateDoc.append("$set", doc);
Update update = Update.fromDocument(updateDoc, "_id");
bulkOps.upsert(query, update);
}
bulkOps.execute();
}
What is the reason for such disparity ?

Related

Mongo DB Failed to target Upsert by query :: could not extract exact shard key', details={}}

I am trying to insert the document which is received by input system in mongodb database by using upsert option but it is failing with below error :
Anyone help me how to perform the upsert to avoid the duplicate transactions based on field
**Write error: WriteError{code=61, message='Failed to target upsert by query :: could not extract exact shard key', details={}}.**
#ResponseBody
public boolean createTickets(#Valid #RequestBody Document... service) {
MongoDatabase database = this.mongoClient.getDatabase("database");
MongoCollection<Document> collection = database.getCollection("Test");
List<Document> documentList = new ArrayList<>(bookings.length);
for (Document service : bookings) {
documentList.add(Document.parse(booking.toJson()));
}
Document doc= (Document) documentList.iterator().next().get("Price");
String priceResults = (String) doc.get("Ticket");
UpdateOptions options = new UpdateOptions().upsert(true);
UpdateResult updateResult = collection.updateOne(Filters.eq("Price.Ticket", priceResults),
Updates.combine(Updates.set("Price.Ticket", priceResults)), options);
System.out.println("updateResult:- " + updateResult);

How to distinct query in MongoDB with projection and filters?

I have list of attributes in mongo and I am querying some nested field. Here is my code,
public List<Brand> searchBrands(Request request) {
final MongoCollection<Document> collection = mongoDatabase.getCollection("shop");
final Document query = new Document();
final Document projection = new Document();
final List<Brand> brandList = new ArrayList<>();
query.append("_id", request.getId());
query.append("isActive", true);
if (request.Year() != null) {
query.append("attributes.name", "myYear");
query.append("attributes.value", request.getYear());
}
projection.append("brand.code", 1.0);
projection.append("brand.description", 1.0);
projection.append("_id", 0.0);
Block<Document> processBlock = document -> brandList.
add(Brand.builder().code(document.get("brand",Document.class).getString("code"))
.description(document.get("brand",Document.class).getString("description"))
.build());
collection.find(query).projection(projection).forEach(processBlock);
return brandList;}
Above code return results correctly, 72 item with same brand.code. But I want to fetch distinct according to brand.code How can I do that?

I'm not sure which mongodb client library you're using to create queries for mongodb;
I'm sharing the query that you can run in mongodb console to get the results you want. I hope you know how to create this query using your mongodb client library
db.shop.distinct('brand.code', myQuery)
//Replace myQuery with your query e.g. {isActive: true}

Mongodb Java - How to return all fields of a collection

With the driver Java Mongodb, I am looking for a way to return all fields of one collection.For example, I have a collection "people", how can I get all fields, I want output： 'id','name','city'...

Thanks a lot, I have finally got the answer.
DBCursor cur = db.getCollection("people").find();
DBObject dbo = cur.next();
Set<String> s = dbo.keySet();

From manual:
To return all documents in a collection, call the find method without a criteria document. For example, the following operation queries for all documents in the restaurants collection.
FindIterable<Document> iterable = db.getCollection("restaurants").find();
Iterate the results and apply a block to each resulting document.
iterable.forEach(new Block<Document>() {
#Override
public void apply(final Document document) {
System.out.println(document);
}
});

You'd need to decide on the number of samples that would make sense to you but this would pull the last 10 submissions.
Document nat = new Document().append("$natural",-1);
FindIterable<Document> theLastDocumentSubmitted = collection.find().limit(10).sort(nat);
ArrayList<String>fieldkeys = new ArrayList<String>();
for (Document doc : theLastDocumentSubmitted) {
Set<String> keys = doc.keySet();
Iterator iterator = keys.iterator();
while(iterator.hasNext()){
String key = (String) iterator.next();
String value = (String) doc.get(key).toString();
if(!fieldkeys.contains(key)) {
fieldkeys.add(key);
}
}
}
System.out.println("fieldkeys" + fieldkeys.toString());

Below code will return all fields from given collection.
MongoCollection<Document> mongoCollection = mongoDatabase.getCollection("collection_name");
AggregateIterable<Document> output = mongoCollection.aggregate(Arrays.asList(
new Document("$project", Document("arrayofkeyvalue", new Document("$objectToArray", "$$ROOT"))),
new Document("$unwind", "$arrayofkeyvalue"),
new Document("$group", new Document("_id", null).append("allkeys", new Document("$addToSet", "$arrayofkeyvalue.k")))
));

Below code will return all fields of people collection :
db.people.find()

Java/Grails - MongoDB aggregation 16MB buffer size limit

I am trying to run mongo db aggregate query from java, but buffer size is exceeding 16MB. Is there any way to adjust the buffer size or any other workaround. I do not have the option to create collection in mongo server side and also I do not have any mongo utility like mongo.exe or mongoExport.exe in my client system.
Here is little part of code
if (!datasetObject?.isFlat && jsonFor != 'collection-grid'){
//mongoPipeline = new AggregateArgs (Pipeline = pipeline, AllowDiskUse = true, OutputMode = AggregateOutputMode.Cursor)
output= dataSetCollection.aggregate(pipeline)
}else{
output= dataSetCollection.aggregate(project)
}
I have 100K records with 30 field. When I query for 5 fields for all 100K records I get result(Success). But when I make a query for 100K records with all fields its throwing below error.
Issue is when I am trying to access all documents from collection including all fields of document its exceeding 16Mb limit size.
Actual Error:
com.mongodb.CommandFailureException: { "serverUsed" : "127.0.0.1:27017" , "errmsg" : "exception: aggregation result exceeds maximum document size (16MB)" , "code" : 16389 , "ok" : 0.0
How to resolve this issue?
Using MongoDB-3.0.6
Note: GridFS is not suitable for my criteria. Because I need to retrieve all documents in one request not one document.

When running the aggregation you can tell mongo to return a cursor. With the new APIs in the 3.0 Java driver that would look like this:
// Assuming MongoCollection
dataSetCollection.aggregate(pipeline).useCursor(true)
You might also need to tell it to use disk space on the server rather than doing it all in memory:
// Assuming MongoCollection
dataSetCollection.aggregate(pipeline).useCursor(true).allowDiskUse(true)
If you're using an older driver (or the old API in the new driver) those two options would look like this:
// Assuming DBCollection
dataSetCollection.aggregate(pipeline, AggregationOptions.builder()
.allowDiskUse(true)
.useCursor(true)
.build())
.useCursor(true)

There are two options to resolve this issue
1) use of $out which creates new collection and write result, Which is not good idea because this process is time consuming and complex to implement.
public class JavaAggregation {
public static void main(String args[]) throws UnknownHostException {
MongoClient mongo = new MongoClient();
DB db = mongo.getDB("databaseName");
DBCollection coll = db.getCollection("dataset");
/*
MONGO SHELL :
db.dataset.aggregate([
{ "$match": { isFlat : true } },
{ "$out": "datasetTemp" }
])
*/
DBObject match = new BasicDBObject("$match", new BasicDBObject("isFlat", true));
DBObject out = new BasicDBObject("$out", "datasetTemp");
AggregationOutput output = coll.aggregate(match, out);
DBCollection tempColl = db.getCollection("datasetTemp");
DBCursor cursor = tempColl.find();
try {
while(cursor.hasNext()) {
System.out.println(cursor.next());
}
} finally {
cursor.close();
}
}
}
2. Use of allowDiskUse(true) is very simple to implement and not even time consuming.
public class JavaAggregation {
public static void main(String args[]) throws UnknownHostException {
MongoClient mongo = new MongoClient();
DB db = mongo.getDB("databaseName");
DBCollection coll = db.getCollection("dataset");
/*
MONGO SHELL :
db.dataset.aggregate([
{ "$match": { isFlat : true } },
{ "$out": "datasetTemp" }
])
*/
DBObject match = new BasicDBObject("$match", new BasicDBObject("isFlat", true));
def dbObjArray = new BasicDBObject[1]
dbObjArray[0]= match
List<DBObject> flatPipeline = Arrays.asList(dbObjArray)
AggregationOptions aggregationOptions = AggregationOptions.builder()
.batchSize(100)
.outputMode(AggregationOptions.OutputMode.CURSOR)
.allowDiskUse(true)
.build();
def cursor = dataSetCollection.aggregate(flatPipeline,aggregationOptions)
try {
while(cursor.hasNext()) {
System.out.println(cursor.next());
}
}
finally {
cursor.close();
}
}
For more see here and here

Spring Data MongoDB and Bulk Update

I am using Spring Data MongoDB and would like to perform a Bulk Update just like the one described here: http://docs.mongodb.org/manual/reference/method/Bulk.find.update/#Bulk.find.update
When using regular driver it looks like this:
The following example initializes a Bulk() operations builder for the items collection, and adds various multi update operations to the list of operations.
var bulk = db.items.initializeUnorderedBulkOp();
bulk.find( { status: "D" } ).update( { $set: { status: "I", points: "0" } } );
bulk.find( { item: null } ).update( { $set: { item: "TBD" } } );
bulk.execute()
Is there any way to achieve similar result with Spring Data MongoDB ?

Bulk updates are supported from spring-data-mongodb 1.9.0.RELEASE. Here is a sample:
BulkOperations ops = template.bulkOps(BulkMode.UNORDERED, Match.class);
for (User user : users) {
Update update = new Update();
...
ops.updateOne(query(where("id").is(user.getId())), update);
}
ops.execute();

You can use this as long as the driver is current and the server you are talking to is at least MongoDB, which is required for bulk operations. Don't believe there is anything directly in spring data right now (and much the same for other higher level driver abstractions), but you can of course access the native driver collection object that implements the access to the Bulk API:
DBCollection collection = mongoOperation.getCollection("collection");
BulkWriteOperation bulk = collection.initializeOrderedBulkOperation();
bulk.find(new BasicDBObject("status","D"))
.update(new BasicDBObject(
new BasicDBObject(
"$set",new BasicDBObject(
"status", "I"
).append(
"points", 0
)
)
));
bulk.find(new BasicDBObject("item",null))
.update(new BasicDBObject(
new BasicDBObject(
"$set", new BasicDBObject("item","TBD")
)
));
BulkWriteResult writeResult = bulk.execute();
System.out.println(writeResult);
You can either fill in the DBObject types required by defining them, or use the builders supplied in the spring mongo library which should all support "extracting" the DBObject that they build.

public <T> void bulkUpdate(String collectionName, List<T> documents, Class<T> tClass) {
BulkOperations bulkOps = mongoTemplate.bulkOps(BulkOperations.BulkMode.UNORDERED, tClass, collectionName);
for (T document : documents) {
Document doc = new Document();
mongoTemplate.getConverter().write(document, doc);
org.springframework.data.mongodb.core.query.Query query = new org.springframework
.data.mongodb.core.query.Query(Criteria.where(UNDERSCORE_ID).is(doc.get(UNDERSCORE_ID)));
Document updateDoc = new Document();
updateDoc.append("$set", doc);
Update update = Update.fromDocument(updateDoc, UNDERSCORE_ID);
bulkOps.upsert(query, update);
}
bulkOps.execute();
}
Spring Mongo template is used to perform the update. The above code will work if you provide the _id field in the list of documents.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Spring Mongo DB Bulk Upsert Performance - java

Related

Mongo DB Failed to target Upsert by query :: could not extract exact shard key', details={}}

How to distinct query in MongoDB with projection and filters?

Mongodb Java - How to return all fields of a collection

Java/Grails - MongoDB aggregation 16MB buffer size limit

Spring Data MongoDB and Bulk Update

Categories

Resources