Aggregation Performance Degradation on increasing load - java

I am running a 3 node Mongo cluster (version 3.0 wired tiger storage engine ) with 10GB RAM.
I have around 2 million doc each having 25 - 30 fields of which 2 are elementary arrays.
I am performing aggregation query which takes around 150 -170 milliseconds.
When I generate a load of 100 queries/sec, the performance starts degrading and reaches up to 2 sec.
Query
db.testCollection.aggregate( [
{ $match: { vid: { $in: ["001","002"]} , ss :"N" , spt : { $gte : new Date("2016-06-29")}, spf :{ $lte : new Date("2016-06-27")}}},
{ $match: {$or : [{sc:{$elemMatch :{$eq : "TEST"}}},{sc :{$exists : false}}]}},
{ $match: {$or : [{pt:{$ne : "RATE"}},{rpis :{$exists : true}}]}},
{ $project: { vid: 1, pid: 1, pn: 1, pt: 1, spf: 1, spt: 1, bw: 1, bwe: 1, st: 1, et: 1, ls: 1, dw: 1, at: 1, dt: 1, d1: 1, d2: 1, mldv: 1, aog: 1, nn: 1, mn: 1, rpis: 1, lmp: 1, cid: 1, van: 1, vad: 1, efo: 1, sc: 1, ss: 1, m: 1, pr: 1, obw: 1, osc: 1, m2: 1, crp: 1, sce: 1, dce: 1, cns: 1 }},
{ $group: { _id: null , data: { $push: "$$ROOT" } }
},
{ $project: { _id: 1 , data : 1 } }
]
)
There is a compound index on all the fields, in the same order as used for for query (except "rpis" since compound index can have only one array field).
Please suggest, where I am going wrong.

the two last stages are unnecessary.
last group is a very heavy as it creates new array in memory, but your result should be digested by application at this stage (not using group).
and there could be a green light to remove previous $project as maybe it could be cheaper to push full document down to client - this could be worth a try.
When $match is used on first entry - then index is used, there is a huge risk that 2nd and 3rd match works with result set from first pipeline instead of using created indexes. If you have a way try to compress $match stages to have only one and see how query performs.
Simplified version of query below:
db.testCollection.aggregate([{
$match : {
vid : {
$in : ["001", "002"]
},
ss : "N",
spt : {
$gte : new Date("2016-06-29")
},
spf : {
$lte : new Date("2016-06-27")
}
}
}, {
$match : {
$or : [{
sc : {
$elemMatch : {
$eq : "TEST"
}
}
}, {
sc : {
$exists : false
}
}
]
}
}, {
$match : {
$or : [{
pt : {
$ne : "RATE"
}
}, {
rpis : {
$exists : true
}
}
]
}
}])
Other issue could be business rules which had impact for scaling system to sharded environment - do you have estimate of load before you started working with such document structure?

Related

How to update different sub documents based on string in mongo

The following is the document i'm trying to update :
{
"_id" : "12",
"cm_AccAmt" : 30,
"cmPerDaySts" : [
{
"cm_accAmt" : 30,
"cm_accTxnCount" : 2,
"cm_cpnCount" : 2,
"cm_accDate" : "2018-02-12"
},
{
"cm_accAmt" : 15,
"cm_accTxnCount" : 1,
"cm_cpnCount" : 1,
"cm_accDate" : "2018-02-13"
}
],
"cpnPerDaySts" : {
"cpnFile" : "path",
"perDayAcc" : [
{
"cm_accAmt" : 0,
"cm_accTxnCount" : 0,
"cm_cpnCount" : 0,
"cm_accDate" : "2018-02-12"
},
{
"cm_accAmt" : 0,
"cm_accTxnCount" : 0,
"cm_cpnCount" : 0,
"cm_accDate" : "2018-02-13"
}
]
}
}
I want to update the two lists cmPerDaySts and cpnPerDaySts based on the string date field : cm_accDate, if a match is available.
The code i've tried until now to achieve this task is :
ArrayList<BasicDBObject> filter = new ArrayList<>();
filter.add(new BasicDBObject("_id", "12").append("cmPerDaySts.cm_accDate", "2018-02-12"));
filter.add(new BasicDBObject("_id", "12").append("cpnPerDaySts.perDayAcc.cm_accDate", "2018-02-12"));
Document document2 = mongoCollection.findOneAndUpdate(new BasicDBObject("$or", filter),
new BasicDBObject("$inc",
new BasicDBObject("cmPerDaySts.$.cm_accAmt", 15).append("cm_AccAmt", 15).append("cmPerDaySts.$.cm_accTxnCount", 1)
.append("cmPerDaySts.$.cm_cpnCount", 1).append("cpnPerDaySts.perDayAcc.cm_accTxnCount", 1)),
new FindOneAndUpdateOptions().upsert(false).returnDocument(ReturnDocument.AFTER));
System.out.println(document2.toJson());
But it ends up failing with the below exception :
Exception in thread "main" com.mongodb.MongoCommandException: Command failed with error 16837: 'The positional operator did not find the match needed from the query. Unexpanded update:
i want to achieve this in a single update query not multiple. can anyone point me in the right direction or approach to solve this.

How to get sub-document record in mongo shell

I have the following document in :
"Demo" : {
"SI" : {
"Value1" : 40,
"Value2" : [
10,
15,
20
]
} ,
"RS" : {
"Value1" : 4,
"Value2" : [
1,
2,
3,
4
]
}
}
I want to fetch the data for sub-document 'SI'. I have tried with following query:
db.getCollection('input').find({"Demo.SI":"SI"}), but its not giving any record for 'SI' document. The desired output is:
"SI" : {
"Value1" : 40,
"Value2" : [
10,
15,
20
]
}
Please specify where the query goes wrong.
First checkSI exists or not using $exists and then add it in projection as below :
db.input.find({"Demo.SI":{"$exists":true}},{"Demo.SI":1,"_id":0}).pretty()
db.collection.find({ "Demo.SI": { $exists: true, $ne: null } },{"Demo.SI":1,"_id":0})
This query will return all the documents which has SI key

java mongodb - get array length without downloading all data

I'm using mongodb to store data for my java program and I have a collection with an array field that has a lot of things in it but i want only to get the length, without all the other data.
Now i'm using this to get it:
((UUID[])document.get("customers")).length
How can I make this not to download all the array?
A possible answer is to create an int that counts the pushes and the pulls of the array but it's not the cleanest method.
You are looking for aggregation framework where you can use the $size operator in your pipeline, this counts and returns the total the number of items in an array:
db.collection.aggregate([
{
"$project": {
"_id": 0, "customer_count": { "$size": "$customers" }
}
}
]);
where the Java equivalent:
DBObject projectFields = new BasicDBObject("_id", 0);
projectFields.put("customer_count", new BasicDBObject( "$size", "$customers" ));
DBObject project = new BasicDBObject("$project", projectFields);
AggregationOutput output = db.getCollection("collectionName").aggregate(project);
System.out.println("\n" + output);
You can use MongoDB's Aggregation Framework to get the size of the array. For example, given the following document structure:
> db.macross.findOne()
{
"_id" : "SDF1",
"crew" : [
"Rick",
"Minmay",
"Roy",
"Max",
"Misa",
"Milia"
]
}
get the size of the array
> db.macross.aggregate(
{ $match: { _id: "SDF1" } },
{ $unwind: "$crew" },
{ $group: { _id: "", count: { $sum: 1 } } },
{ $project: { _id: 0, count: 1 } }
)
{ "count" : 6 }
More detailed and interesting examples are available in the docs.

How to return just the matched elements from a mongoDB array

I've been looking for this question one week and I can't understand why it still don't work...
I have this object into my MongoDB database:
{
produc: [
{
cod_prod: "0001",
description: "Ordenador",
price: 400,
current_stock: 3,
min_stock: 1,
cod_zone: "08850"
},
{
cod_prod: "0002",
description: "Secador",
price: 30,
current_stock: 10,
min_stock: 2,
cod_zone: "08870"
},
{
cod_prod: "0003",
description: "Portatil",
price: 500,
current_stock: 8,
min_stock: 4,
cod_zone: "08860"
},
{
cod_prod: "0004",
description: "Disco Duro",
price: 100,
current_stock: 20,
min_stock: 5,
cod_zone: "08850"
},
{
cod_prod: "0005",
description: "Monitor",
price: 150,
current_stock: 0,
min_stock: 2,
cod_zone: "08850"
}
]
}
I would like to query for array elements with specific cod_zone ("08850") for example.
I found the $elemMatch projection that supposedly should return just the array elements which match the query, but I don't know why I'm getting all object.
This is the query I'm using:
db['Collection_Name'].find(
{
produc: {
$elemMatch: {
cod_zone: "08850"
}
}
}
);
And this is the result I expect:
{ produc: [
{
cod_prod: "0001",
denominacion: "Ordenador",
precio: 400,
stock_actual: 3,
stock_minimo: 1,
cod_zona: "08850"
},{
cod_prod: "0004",
denominacion: "Disco Duro",
precio: 100,
stock_actual: 20,
stock_minimo: 5,
cod_zona: "08850"
},
{
cod_prod: "0005",
denominacion: "Monitor",
precio: 150,
stock_actual: 0,
stock_minimo: 2,
cod_zona: "08850"
}]
}
I'm making a Java program using MongoDB Java Connector, so I really need a query for java connector but I think I will be able to get it if I know mongo query.
Thank you so much!
This is possible through the aggregation framework. The pipeline passes all documents in the collection through the following operations:
$unwind operator - Outputs a document for each element in the produc array field by deconstructing it.
$match operator will filter only documents that match cod_zone criteria.
$group operator will group the input documents by a specified identifier expression and applies the accumulator expression $push to each group:
$project operator then reconstructs each document in the stream:
db.collection.aggregate([
{
"$unwind": "$produc"
},
{
"$match": {
"produc.cod_zone": "08850"
}
},
{
"$group":
{
"_id": null,
"produc": {
"$push": {
"cod_prod": "$produc.cod_prod",
"description": "$produc.description",
"price" : "$produc.price",
"current_stock" : "$produc.current_stock",
"min_stock" : "$produc.min_stock",
"cod_zone" : "$produc.cod_zone"
}
}
}
},
{
"$project": {
"_id": 0,
"produc": 1
}
}
])
will produce:
{
"result" : [
{
"produc" : [
{
"cod_prod" : "0001",
"description" : "Ordenador",
"price" : 400,
"current_stock" : 3,
"min_stock" : 1,
"cod_zone" : "08850"
},
{
"cod_prod" : "0004",
"description" : "Disco Duro",
"price" : 100,
"current_stock" : 20,
"min_stock" : 5,
"cod_zone" : "08850"
},
{
"cod_prod" : "0005",
"description" : "Monitor",
"price" : 150,
"current_stock" : 0,
"min_stock" : 2,
"cod_zone" : "08850"
}
]
}
],
"ok" : 1
}

Gets documents and total count of them in single query include pagination

I'm new in mongo and use mongodb aggregation framework for my queries. I need to retrieve some records which satisfy certain conditions(include pagination+sorting) and also get total count of records.
Now, I perform next steps:
Create $match operator
{ "$match" : { "year" : "2012" , "author.authorName" : { "$regex" :
"au" , "$options" : "i"}}}
Added sorting and pagination
{ "$sort" : { "some_field" : -1}} , { "$limit" : 10} , { "$skip" : 0}
After querying I receive the expected result: 10 documents with all fields.
For pagination I need to know the total count of records which satisfy these conditions, in my case 25.
I use next query to get count : { "$match" : { "year" : "2012" , "author.authorName" : { "$regex" : "au" , "$options" : "i"}}} , { "$group" : { "_id" : "$all" , "reviewsCount" : { "$sum" : 1}}} , { "$sort" : { "some_field" : -1}} , { "$limit" : 10} , { "$skip" : 0}
But I don't want to perform two separate queries: one for retrieving documents and second for total counts of records which satisfy certain conditions.
I want do it in one single query and get result in next format:
{
"result" : [
{
"my_documets": [
{
"_id" : ObjectId("512f1f47a411dc06281d98c0"),
"author" : {
"authorName" : "author name1",
"email" : "email1#email.com"
}
},
{
"_id" : ObjectId("512f1f47a411dc06281d98c0"),
"author" : {
"authorName" : "author name2",
"email" : "email2#email.com"
}
}, .......
],
"total" : 25
}
],
"ok" : 1
}
I tried modify the group operator : { "$group" : { "_id" : "$all" , "author" : "$author" "reviewsCount" : { "$sum" : 1}}}
But in this case I got : "exception: the group aggregate field 'author' must be defined as an expression inside an object". If add all fields in _id then reviewsCount always = 1 because all records are different.
Nobody know how it can be implement in single query ? Maybe mongodb has some features or operators for this case? Implementation with using two separate query reduces performance for querying thousand or millions records. In my application it's very critical performance issue.
I've been working on this all day and haven't been able to find a solution, so thought i'd turn to the stackoverflow community.
Thanks.
You can try using $facet in the aggregation pipeline as
db.name.aggregate([
{$match:{your match criteria}},
{$facet: {
data: [{$sort: sort},{$skip:skip},{$limit: limit}],
count:[{$group: {_id: null, count: {$sum: 1}}}]
}}
])
In data, you'll get your list with pagination and in the count, count variable will have a total count of matched documents.
Ok, I have one example, but I think it's really crazy query, I put it only for fun, but if this example faster than 2 query, tell us about it in the comments please.
For this question i create collection called "so", and put into this collection 25 documents like this:
{
"_id" : ObjectId("512fa86cd99d0adda2a744cd"),
"authorName" : "author name1",
"email" : "email1#email.com",
"c" : 1
}
My query use aggregation framework:
db.so.aggregate([
{ $group:
{
_id: 1,
collection: { $push : { "_id": "$_id", "authorName": "$authorName", "email": "$email", "c": "$c" } },
count: { $sum: 1 }
}
},
{ $unwind:
"$collection"
},
{ $project:
{ "_id": "$collection._id", "authorName": "$collection.authorName", "email": "$collection.email", "c": "$collection.c", "count": "$count" }
},
{ $match:
{ c: { $lte: 10 } }
},
{ $sort :
{ c: -1 }
},
{ $skip:
2
},
{ $limit:
3
},
{ $group:
{
_id: "$count",
my_documets: {
$push: {"_id": "$_id", "authorName":"$authorName", "email":"$email", "c":"$c" }
}
}
},
{ $project:
{ "_id": 0, "my_documets": "$my_documets", "total": "$_id" }
}
])
Result for this query:
{
"result" : [
{
"my_documets" : [
{
"_id" : ObjectId("512fa900d99d0adda2a744d4"),
"authorName" : "author name8",
"email" : "email8#email.com",
"c" : 8
},
{
"_id" : ObjectId("512fa900d99d0adda2a744d3"),
"authorName" : "author name7",
"email" : "email7#email.com",
"c" : 7
},
{
"_id" : ObjectId("512fa900d99d0adda2a744d2"),
"authorName" : "author name6",
"email" : "email6#email.com",
"c" : 6
}
],
"total" : 25
}
],
"ok" : 1
}
By the end, I think that for big collection 2 query (first for data, second for count) works faster. For example, you can count total for collection like this:
db.so.count()
or like this:
db.so.find({},{_id:1}).sort({_id:-1}).count()
I don't fully sure in first example, but in second example we use only cursor, which means higher speed:
db.so.find({},{_id:1}).sort({_id:-1}).explain()
{
"cursor" : "BtreeCursor _id_ reverse",
"isMultiKey" : false,
"n" : 25,
"nscannedObjects" : 25,
"nscanned" : 25,
"nscannedObjectsAllPlans" : 25,
"nscannedAllPlans" : 25,
"scanAndOrder" : false,
!!!!!>>> "indexOnly" : true, <<<!!!!!
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
...
}
For completeness (full discussion was on the MongoDB Google Groups) here is the aggregation you want:
db.collection.aggregate(db.docs.aggregate( [
{
"$match" : {
"year" : "2012"
}
},
{
"$group" : {
"_id" : null,
"my_documents" : {
"$push" : {
"_id" : "$_id",
"year" : "$year",
"author" : "$author"
}
},
"reviewsCount" : {
"$sum" : 1
}
}
},
{
"$project" : {
"_id" : 0,
"my_documents" : 1,
"total" : "$reviewsCount"
}
}
] )
By the way, you don't need aggregation framework here - you can just use a regular find. You can get count() from a cursor without having to re-query.

Categories

Resources