I'm new in mongo and use mongodb aggregation framework for my queries. I need to retrieve some records which satisfy certain conditions(include pagination+sorting) and also get total count of records.
Now, I perform next steps:
Create $match operator
{ "$match" : { "year" : "2012" , "author.authorName" : { "$regex" :
"au" , "$options" : "i"}}}
Added sorting and pagination
{ "$sort" : { "some_field" : -1}} , { "$limit" : 10} , { "$skip" : 0}
After querying I receive the expected result: 10 documents with all fields.
For pagination I need to know the total count of records which satisfy these conditions, in my case 25.
I use next query to get count : { "$match" : { "year" : "2012" , "author.authorName" : { "$regex" : "au" , "$options" : "i"}}} , { "$group" : { "_id" : "$all" , "reviewsCount" : { "$sum" : 1}}} , { "$sort" : { "some_field" : -1}} , { "$limit" : 10} , { "$skip" : 0}
But I don't want to perform two separate queries: one for retrieving documents and second for total counts of records which satisfy certain conditions.
I want do it in one single query and get result in next format:
{
"result" : [
{
"my_documets": [
{
"_id" : ObjectId("512f1f47a411dc06281d98c0"),
"author" : {
"authorName" : "author name1",
"email" : "email1#email.com"
}
},
{
"_id" : ObjectId("512f1f47a411dc06281d98c0"),
"author" : {
"authorName" : "author name2",
"email" : "email2#email.com"
}
}, .......
],
"total" : 25
}
],
"ok" : 1
}
I tried modify the group operator : { "$group" : { "_id" : "$all" , "author" : "$author" "reviewsCount" : { "$sum" : 1}}}
But in this case I got : "exception: the group aggregate field 'author' must be defined as an expression inside an object". If add all fields in _id then reviewsCount always = 1 because all records are different.
Nobody know how it can be implement in single query ? Maybe mongodb has some features or operators for this case? Implementation with using two separate query reduces performance for querying thousand or millions records. In my application it's very critical performance issue.
I've been working on this all day and haven't been able to find a solution, so thought i'd turn to the stackoverflow community.
Thanks.
You can try using $facet in the aggregation pipeline as
db.name.aggregate([
{$match:{your match criteria}},
{$facet: {
data: [{$sort: sort},{$skip:skip},{$limit: limit}],
count:[{$group: {_id: null, count: {$sum: 1}}}]
}}
])
In data, you'll get your list with pagination and in the count, count variable will have a total count of matched documents.
Ok, I have one example, but I think it's really crazy query, I put it only for fun, but if this example faster than 2 query, tell us about it in the comments please.
For this question i create collection called "so", and put into this collection 25 documents like this:
{
"_id" : ObjectId("512fa86cd99d0adda2a744cd"),
"authorName" : "author name1",
"email" : "email1#email.com",
"c" : 1
}
My query use aggregation framework:
db.so.aggregate([
{ $group:
{
_id: 1,
collection: { $push : { "_id": "$_id", "authorName": "$authorName", "email": "$email", "c": "$c" } },
count: { $sum: 1 }
}
},
{ $unwind:
"$collection"
},
{ $project:
{ "_id": "$collection._id", "authorName": "$collection.authorName", "email": "$collection.email", "c": "$collection.c", "count": "$count" }
},
{ $match:
{ c: { $lte: 10 } }
},
{ $sort :
{ c: -1 }
},
{ $skip:
2
},
{ $limit:
3
},
{ $group:
{
_id: "$count",
my_documets: {
$push: {"_id": "$_id", "authorName":"$authorName", "email":"$email", "c":"$c" }
}
}
},
{ $project:
{ "_id": 0, "my_documets": "$my_documets", "total": "$_id" }
}
])
Result for this query:
{
"result" : [
{
"my_documets" : [
{
"_id" : ObjectId("512fa900d99d0adda2a744d4"),
"authorName" : "author name8",
"email" : "email8#email.com",
"c" : 8
},
{
"_id" : ObjectId("512fa900d99d0adda2a744d3"),
"authorName" : "author name7",
"email" : "email7#email.com",
"c" : 7
},
{
"_id" : ObjectId("512fa900d99d0adda2a744d2"),
"authorName" : "author name6",
"email" : "email6#email.com",
"c" : 6
}
],
"total" : 25
}
],
"ok" : 1
}
By the end, I think that for big collection 2 query (first for data, second for count) works faster. For example, you can count total for collection like this:
db.so.count()
or like this:
db.so.find({},{_id:1}).sort({_id:-1}).count()
I don't fully sure in first example, but in second example we use only cursor, which means higher speed:
db.so.find({},{_id:1}).sort({_id:-1}).explain()
{
"cursor" : "BtreeCursor _id_ reverse",
"isMultiKey" : false,
"n" : 25,
"nscannedObjects" : 25,
"nscanned" : 25,
"nscannedObjectsAllPlans" : 25,
"nscannedAllPlans" : 25,
"scanAndOrder" : false,
!!!!!>>> "indexOnly" : true, <<<!!!!!
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
...
}
For completeness (full discussion was on the MongoDB Google Groups) here is the aggregation you want:
db.collection.aggregate(db.docs.aggregate( [
{
"$match" : {
"year" : "2012"
}
},
{
"$group" : {
"_id" : null,
"my_documents" : {
"$push" : {
"_id" : "$_id",
"year" : "$year",
"author" : "$author"
}
},
"reviewsCount" : {
"$sum" : 1
}
}
},
{
"$project" : {
"_id" : 0,
"my_documents" : 1,
"total" : "$reviewsCount"
}
}
] )
By the way, you don't need aggregation framework here - you can just use a regular find. You can get count() from a cursor without having to re-query.
Related
Mongodb 4.2.15
I'm tring to use mongo Updates with Aggregation Pipeline
My request is very big but here it's core structure
db.runCommand({
"update": "collectionName",
"updates": [
{
"q": { ... },
"u": [
{
"$project": { ... },
"$set": { ... },
"$set": { ... },
"$project": { ... }
}
],
"multi": false,
"upsert": true
}
]
});
After the first execute I receive a result with newly created object's _id
{
"n" : 1,
"nModified" : 0,
"upserted" : [
{
"index" : 0,
"_id" : ObjectId("619997f11501d6eb40c6f64a")
}
],
"opTime" : {
"ts" : Timestamp(1637455857, 61),
"t" : NumberLong(5)
},
"electionId" : ObjectId("7fffffff0000000000000005"),
"ok" : 1.0,
"$clusterTime" : {
"clusterTime" : Timestamp(1637455857, 61),
"signature" : {
"hash" : { "$binary" : "hA0tf5DXMqTNmnVXdMVnpnAKCU0=", "$type" : "00" },
"keyId" : NumberLong(7018546168816730116)
}
},
"operationTime" : Timestamp(1637455857, 61)
}
After the second execution of the same request there is no modified object's _id
{
"n" : 1,
"nModified" : 1,
"opTime" : {
"ts" : Timestamp(1637456057, 19),
"t" : NumberLong(5)
},
"electionId" : ObjectId("7fffffff0000000000000005"),
"ok" : 1.0,
"$clusterTime" : {
"clusterTime" : Timestamp(1637456057, 19),
"signature" : {
"hash" : { "$binary" : "U2yCP6nXUrjBN9ZiLanyl0rgxww=", "$type" : "00" },
"keyId" : NumberLong(7018546168816730116)
}
},
"operationTime" : Timestamp(1637456057, 19)
}
The thing is that my filter conditions do not contain _id of the object but I need to return it with response. I see no useful request configurations. Any suggestions is it possible to get _id at response on update case?
I have a list of objects that are given somewhat arbitrary Object keys as a result of using the async Java driver + BSON.
My issue is given the fact that jobStatuses are an arbitrary list of Dictionary items where I don't know the key, I have no idea how to access its sub-values. In the end, I'm trying to build a query that returns if ANY of jobStatus.*._id are true given a list of potential Object ID's.
So I'd be giving a list of ID's and want to return true if ANY of the items in jobStatuses have any of the given ID's. Any ideas?
Let's try this :
db.yourCollectionName.aggregate([
{
$project: {
_id: 0,
jobStatutses: { $arrayElemAt: [{ $objectToArray: "$jobStatutses" }, 0] }
}
}, {
$match: { 'jobStatutses.v._id': { $in: [ObjectId("5d6d8c3a5a0d22d3c84dd6dc"), ObjectId("5d6d8c3a5a0d22d3c84dd6ed")] } }
}
])
Collection Data :
/* 1 */
{
"_id" : ObjectId("5e06319c400289966eea6a07"),
"jobStatutses" : {
"5d6d8c3a5a0d22d3c84dd6dc" : {
"_id" : ObjectId("5d6d8c3a5a0d22d3c84dd6dc"),
"accepted" : "123",
"completed" : 0
}
},
"something" : 1
}
/* 2 */
{
"_id" : ObjectId("5e0631ad400289966eea6dd1"),
"jobStatutses" : {
"5d6d8c3a5a0d22d3c84dd6ed" : {
"_id" : ObjectId("5d6d8c3a5a0d22d3c84dd6ed"),
"accepted" : "456",
"completed" : 0
}
},
"something" : 2
}
/* 3 */
{
"_id" : ObjectId("5e0631cd400289966eea7542"),
"jobStatutses" : {
"5e06319c400289966eea6a07" : {
"_id" : ObjectId("5e06319c400289966eea6a07"),
"accepted" : "789",
"completed" : 0
}
},
"something" : 3
}
Output :
/* 1 */
{
"jobStatutses" : {
"k" : "5d6d8c3a5a0d22d3c84dd6dc",
"v" : {
"_id" : ObjectId("5d6d8c3a5a0d22d3c84dd6dc"),
"accepted" : "123",
"completed" : 0
}
}
}
/* 2 */
{
"jobStatutses" : {
"k" : "5d6d8c3a5a0d22d3c84dd6ed",
"v" : {
"_id" : ObjectId("5d6d8c3a5a0d22d3c84dd6ed"),
"accepted" : "456",
"completed" : 0
}
}
}
All you need is to check if at least one doc gets returned from DB for a given list or not, So we don't need to worry about document structure then just do result.length in your code to say at least one doc got matched for the input list.
I have a MongoDB collection of places. A typical place has most of the following fields:
{
"_id" : ObjectId("575014dc6b028f07bef53681"),
"_class" : "domain.model.PlaceV1",
"name" : "Γιασεμί",
"primary_photo_url" : "https://irs0.4sqi.net/img/general/original/34666238_STHSh6CHiC7hpAuB4rztRVg6cFc5ylfi15aRaR7zUuQ.jpg",
"seenDetails" : NumberLong(0),
"foursquare_checkins" : 646,
"foursquare_tips" : 28,
"keywords" : [
""
],
"verified" : 1,
"location" : {
"loc" : {
"type" : "Point",
"coordinates" : [
25.898318,
36.831486
]
},
"formattedAddress" : "Χώρα",
"locality" : "Amorgos",
"first_neighbourhood" : "Katapola",
"greek_locality" : "Αμοργός",
"greek_first_neighbourhood" : "Κατάπολα"
},
"contact" : {
"phone_numbers" : [
"+30 2285 074017"
]
},
"price" : {
"priceVotes" : NumberLong(0),
"price" : 0,
"priceVotesSum" : NumberLong(0)
},
"rating" : {
"rating" : 8,
"ratingVotes" : NumberLong(0),
"ratingVotesSum" : NumberLong(0)
},
"categories" : [
{
"cat_id" : NumberLong(10310061000),
"category" : "Café",
"greek_category" : "Καφετέρια",
"weight" : 4
},
{
"cat_id" : NumberLong(11610021000),
"category" : "Bar",
"greek_category" : "Μπαρ",
"weight" : 4
}
]
}
I want to make queries where the sorting will be based on a score that is a result of some expressions and conditions. From the mongo shell I have tried this:
db.place.aggregate([
{$match:{"location.locality":"Athens"}},
{$project:
{name:1, location:1, score:{
$let: {
vars:{ foursquare: {
$cond: { if: { $gte: [ "$foursquare_checkins", 500 ] }, then: 500, else: "$foursquare_checkins" }
},
rating: {$multiply:["$rating.rating", 100]},
},
in:{$add:["$$foursquare", "$$rating", "$seenDetails"]}
}
}
}
},
{$sort: {score: -1}}]).pretty();
This is a simple example of my queries. The score will contain more complex expressions like the distance from a location. The problem is that I cannot find a way to use the $let and the $cond operator in my Java code with Spring. Could anybody help?
You should be able to do this using nested DBObject and a Custom Aggregation Operation.
For Example:
Map operations = new HashMap();
operations.put("name", 1);
operations.put("location", 1);
operations.put("score", new BasicDBObject("$let", new BasicDBObject("vars", new BasicDBObject())));
Then you can create a CustomAggregationOperation to add this to your project
CustomAggregationOperation project = new CustomAggregationOperation(new BasicDBObject("$project", operation));
This will give you the following pipeline:
{ "$project" : { "score" : { "$let" : { "vars" : { }}} , "name" : 1 , "location" : 1}}
Then you can add your other stages:
Aggregation aggregate = Aggregation.newAggregation(match, project, sort);
public class CustomAggregationOperation implements AggregationOperation {
private DBObject operation;
public CustomAggregationOperation (DBObject operation) {
this.operation = operation;
}
#Override
public DBObject toDBObject(AggregationOperationContext context) {
return context.getMappedObject(operation);
}
}
I want to get a specific element of the array and through the responsaveis.$ (daniela.morais#sofist.com.br) but there is no result, there is problem in my syntax?
{
"_id" : ObjectId("54fa059ce4b01b3e086c83e9"),
"agencia" : "Abc",
"instancia" : "dentsuaegis",
"cliente" : "Samsung",
"nomeCampanha" : "Serie A",
"ativa" : true,
"responsaveis" : [
"daniela.morais#sofist.com.br",
"abc#sofist.com.br"
],
"email" : "daniela.morais#sofist.com.br"
}
Syntax 1
mongoCollection.findAndModify("{'responsaveis.$' : #}", oldUser.get("email"))
.with("{$set : {'responsaveis.$' : # }}", newUser.get("email"))
.returnNew().as(BasicDBObject.class);
Syntax 2
db.getCollection('validatag_campanhas').find({"responsaveis.$" : "daniela.morais#sofist.com.br"})
Result
Fetched 0 record(s) in 1ms
The $ positional operator is only used in update(...) or project calls, you can't use it to return the position within an array.
The correct syntax would be :-
Syntax 1
mongoCollection.findAndModify("{'responsaveis' : #}", oldUser.get("email"))
.with("{$set : {'responsaveis.$' : # }}", newUser.get("email"))
.returnNew().as(BasicDBObject.class);
Syntax 2
db.getCollection('validatag_campanhas').find({"responsaveis" : "daniela.morais#sofist.com.br"})
If you just want to project the specific element, you can use the positional operator $ in projection as
{"responsaveis.$":1}
db.getCollection('validatag_campanhas').find({"responsaveis" : "daniela.morais#sofist.com.br"},{"responsaveis.$":1})
Try with this
db.validatag_campanhas.aggregate(
{ $unwind : "$responsaveis" },
{
$match : {
"responsaveis": "daniela.morais#sofist.com.br"
}
},
{ $project : { responsaveis: 1, _id:0 }}
);
That would give you all documents which meets that conditions
{
"result" : [
{
"responsaveis" : "daniela.morais#sofist.com.br"
}
],
"ok" : 1
}
If you want one document that has in its responsaveis array the element "daniela.morais#sofist.com.br" you can eliminate the project operator like
db.validatag_campanhas.aggregate(
{ $unwind : "$responsaveis" },
{
$match : {
"responsaveis": "daniela.morais#sofist.com.br"
}
}
);
And that will give you
{
"result" : [
{
"_id" : ObjectId("54fa059ce4b01b3e086c83e9"),
"agencia" : "Abc",
"instancia" : "dentsuaegis",
"cliente" : "Samsung",
"nomeCampanha" : "Serie A",
"ativa" : true,
"responsaveis" : "daniela.morais#sofist.com.br",
"email" : "daniela.morais#sofist.com.br"
}
],
"ok" : 1
}
Hope it helps
I am facing a trouble in the use of ElasticSearch for my java application.
I explain myself, I have a mapping, which is something like :
{
"products": {
"properties": {
"id": {
"type": "long",
"ignore_malformed": false
},
"locations": {
"properties": {
"category": {
"type": "long",
"ignore_malformed": false
},
"subCategory": {
"type": "long",
"ignore_malformed": false
},
"order": {
"type": "long",
"ignore_malformed": false
}
}
},
...
So, as you can see, I receive a list of products, which are composed of locations. In my model, this locations are all the categories' product. It means that a product can be in 1 or more categories. In each of this category, the product has an order, which is the order the client wants to show them.
For instance, a diamond product can have a first place in Jewelry, but the third place in Woman (my examples are not so logic ^^).
So, when I click on Jewelry, I want to show this products, ordered by the field locations.order in this specific category.
For the moment, when I search all the products on a specific category the response for ElasticSearch that I receive is something like :
{"id":5331880,"locations":[{"category":5322606,"order":1},
{"category":5883712,"subCategory":null,"order":3},
{"category":5322605,"subCategory":6032961,"order":2},.......
Is it possible to sort this products, by the element locations.order for the specific category I am searching for ? For instance, if I am querying the category 5322606, I want the order 1 for this product to be taken.
Thank you very much beforehand !
Regards,
Olivier.
First a correction of terminology: in Elasticsearch, "parent/child" refers to completely separate docs, where the child doc points to the parent doc. Parent and children are stored on the same shard, but they can be updated independently.
With your example above, what you are trying to achieve can be done with nested docs.
Currently, your locations field is of type:"object". This means that the values in each location get flattened to look something like this:
{
"locations.category": [5322606, 5883712, 5322605],
"locations.subCategory": [6032961],
"locations.order": [1, 3, 2]
}
In other words, the "sub" fields get flattened into multi-value fields, which is of no use to you, because there is no correlation between category: 5322606 and order: 1.
However, if you change locations to be type:"nested" then internally it will index each location as a separate doc, meaning that each location can be queried independently, using the dedicated nested query and filter.
By default, the nested query will return a _score based upon how well each location matches, but in your case you want to return the highest value of the order field from any matching children. To do this, you'll need to use a custom_score query.
So let's start by creating the index with the appropriate mapping:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"products" : {
"properties" : {
"locations" : {
"type" : "nested",
"properties" : {
"order" : {
"type" : "long"
},
"subCategory" : {
"type" : "long"
},
"category" : {
"type" : "long"
}
}
},
"id" : {
"type" : "long"
}
}
}
}
}
'
The we index your example doc:
curl -XPOST 'http://127.0.0.1:9200/test/products?pretty=1' -d '
{
"locations" : [
{
"order" : 1,
"category" : 5322606
},
{
"order" : 3,
"subCategory" : null,
"category" : 5883712
},
{
"order" : 2,
"subCategory" : 6032961,
"category" : 5322605
}
],
"id" : 5331880
}
'
And now we can search for it using the queries we discussed above:
curl -XGET 'http://127.0.0.1:9200/test/products/_search?pretty=1' -d '
{
"query" : {
"nested" : {
"query" : {
"custom_score" : {
"script" : "doc[\u0027locations.order\u0027].value",
"query" : {
"constant_score" : {
"filter" : {
"and" : [
{
"term" : {
"category" : 5322605
}
},
{
"term" : {
"subCategory" : 6032961
}
}
]
}
}
}
}
},
"score_mode" : "max",
"path" : "locations"
}
}
}
'
Note: the single quotes within the script have been escaped as \u0027 to get around shell quoting. The script actually looks like this: "doc['locations.order'].value"
If you look at the _score from the results, you can see that it has used the order value from the matching location:
{
"hits" : {
"hits" : [
{
"_source" : {
"locations" : [
{
"order" : 1,
"category" : 5322606
},
{
"order" : 3,
"subCategory" : null,
"category" : 5883712
},
{
"order" : 2,
"subCategory" : 6032961,
"category" : 5322605
}
],
"id" : 5331880
},
"_score" : 2,
"_index" : "test",
"_id" : "cXTFUHlGTKi0hKAgUJFcBw",
"_type" : "products"
}
],
"max_score" : 2,
"total" : 1
},
"timed_out" : false,
"_shards" : {
"failed" : 0,
"successful" : 5,
"total" : 5
},
"took" : 9
}
Just add a more updated version related to sorting parent by child field.
We can query parent doc type sorted by child field ('count' e.g.) similar as follows.
https://gist.github.com/robinloxley1/7ea7c4f37a3413b1ca16