How to execute a complex MongoDB native query from Java Springboot - java

I have a bit of a complex query of view creation using 3 collections. The query is written in the native level. I need that query to be executed from Java and is there any way that I can execute these types of queries from Java level. Maybe a function that takes a MongoDB native query as a string and executes that on the database level
db.createView('TARGET_COLLECTION', 'SOURCE_COLLECTION_1', [
{
$facet: {
SOURCE_COLLECTION_1: [
{$match: {}},
{ $project: { "sourceId": {$toString: "$_id"}, "name": 1, "image": "$logo" }}
],
SOURCE_COLLECTION_2: [
{$limit: 1},
{
$lookup: {
from: 'SOURCE_COLLECTION_2',
localField: '__unexistingfield',
foreignField: '__unexistingfield',
as: '__col2'
}
},
{$unwind: '$__col2'},
{$replaceRoot: {newRoot: '$__col2'}},
{ $project: { "sourceId": {$toString: "$_id"}, "name": 1, "image": 1 }}
],
SOURCE_COLLECTION_3: [
{$limit: 1},
{
$lookup: {
from: 'SOURCE_COLLECTION_3',
localField: '__unexistingfield',
foreignField: '__unexistingfield',
as: '__col2'
}
},
{$unwind: '$__col2'},
{$replaceRoot: {newRoot: '$__col2'}},
{ $project: { "sourceId": {$toString: "$_id"}, "name": 1, "image": "$logo" }}
]
},
},
{$project: {data: {$concatArrays: ['$SOURCE_COLLECTION_1', '$SOURCE_COLLECTION_2', '$SOURCE_COLLECTION_3']}}},
{$unwind: '$data'},
{$replaceRoot: {newRoot: '$data'}}
])

An example:
Consider a document in a collection:
{ _id: 1234, name: "J. Doe", colors: [ "red", "black" ] }
And the following aggregation from the mongo shell:
db.collection.agregate( [
{ $project: { _id: 0, colors: 1 } }
] )
This returns: { "colors" : [ "red", "black" ] }
This can also be run with the following command:
db.runCommand( {
aggregate: "collection",
pipeline: [ { $project: { _id: 0, colors: 1 } } ],
cursor: { }
} )
And, its translation using Spring Data's MongoTemplate:
String jsonCommand = "{ aggregate: 'collection', pipeline: [ { $project: { _id: 0, colors: 1 } } ], cursor: { } }";
Document resultDoc = mongoTemplate.executeCommand(jsonCommand);
The output document resultDoc has a format like the following:
{
"cursor" : {
"firstBatch" : [
{
"colors" : [
"red",
"black"
]
}
],
"id" : NumberLong(0),
"ns" : "test.colors"
},
"ok" : 1
}
To know more about the db.runCommand(...) method see MongoDB documentation at: Database Commands and Database Command Aggregate.

Related

How to get property value direct from mongodb in JAVA

Hi everyone I have a collection of documents like bellow. I want to directly get "rights" from roles array for params: _id, groups._id, roles._id using java mongo driver.
{
"_id": 1000002,
"groups": [
{
"_id": 1,
"roles": [
{
"rights": 3,
"_id": 1
},
{
"rights": 7,
"_id": 2
},
{
"rights": 3,
"_id": 3
}
]
}
],
"timestamp": {
"$date": {
"$numberLong": "1675267318028"
}
},
"users": [
{
"accessProviderId": 1,
"rights": 1,
"_id": 4
},
{
"accessProviderId": 1,
"rights": 3,
"_id": 5
}
]
}
I have AccessListItem class which represents this document and I have used Bson filters to get it from mongo, but after fetching i had to get information through java function.. I want to get int value directly from mongo base.
Bson fileFilter = Filters.eq("_id", itemId);
Bson groupFilter = Filters.elemMatch("groups", Document.parse("{_id:"+groupId+"}"));
Bson roleFilter = Filters.elemMatch("groups.roles", Document.parse("{_id:"+role+"}"));
Bson finalFilter = Filters.and(fileFilter, Filters.and(groupFilter,roleFilter));
MongoCollection<AccessListItem> accessListItemMongoCollection = MongoUtils.getAccessCollection(type);
AccessListItem accessListItem = accessListItemMongoCollection.find(finalFilter).first();
The short answer is you can't.
MongoDB is designed for returning documents, that is, objects containing key-value pairs. There is no mechanism for a MongoDB query to return just a value, i.e. it will never return just 3 or [3].
You could use aggregation with a $project stage at the end to give you a simplified object like:
{ rights: 3}
In javascript that might look like:
db.collection.aggregate([
{$match: {
_id: itemId,
"groups._id": groupId,
"groups.roles._id": role
}},
{$project: {
_id: 0,
group: {
$first: {
$filter: {
input: "$groups",
cond: {$eq: ["$$this._id",groupId]}
}
}
}
}},
{$project: {
"roles": {
$first: {
$filter: {
input: "$group.roles",
cond: { $eq: [ "$$this._id",role]}
}
}
}
}},
{$project: {
rights: "$roles.rights"
}}
])
Example: Playground
I'm not familiar with spring boot, so I'm not sure what that would look like in Java.

Is it possible to create a map in runtime mode, which will be filled in passing through all documents and returned at the end ES

For ex: I have 2 documents with this body:
{
"id": "doc_one",
"name": "test_name",
"date_creation": "some_date_cr_1",
"date_updation": "some_date_up_1"
}
And the second doc:
{
"id": "doc_two",
"name": "test_name",
"date_creation": "some_date_cr_2",
"date_updation": "some_date_up_2"
}
What I want to do: to create two runtime field or Map('data_creation',count_of_doc_where_field_not_null_AND_the_condition_is_met).
For ex: I've got the 1st doc, there is date_creation IS NOT NULL and the condition startDate<=date_creation<=endDate is met, so, I create some field count = 0 and when I've got this case I do count++. When I will get all the docs I will set finally count value from map as result: Map('data_creation',final_count) and the same for another field but in the same map.
I tried to use script, but there is return Map for each doc, for ex:
{
"_index": "my_index_001",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"my_doubled_field": [
{
"NEW": 2
}
]
}
},
{
"_index": "my_index_001",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"fields": {
"my_doubled_field": [
{
"NEW": 2
}
]
}
}
I have index below 3 documents to index where one document dont have date_creation field:
POST sample/_doc
{
"id": "doc_two",
"name": "test_name",
"date_updation": "some_date_up_2"
}
POST sample/_doc
{
"id": "doc_one",
"name": "test_name",
"date_creation": "some_date_cr_1",
"date_updation": "some_date_up_1"
}
POST sample/_doc
{
"id": "doc_two",
"name": "test_name",
"date_creation": "some_date_cr_2",
"date_updation": "some_date_up_2"
}
Now you can use filter aggregation from elasticsearch as shown below:
{
"size": 0,
"aggs": {
"date_creation": {
"filter": {
"range": {
"date_creation": {
"gte": "2020-01-09T10:20:10"
}
}
}
},
"date_updation": {
"filter": {
"range": {
"date_updation": {
"gte": "2020-01-09T10:20:10"
}
}
}
}
}
}
Response:
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"date_updation" : {
"meta" : { },
"doc_count" : 3
},
"date_creation" : {
"meta" : { },
"doc_count" : 2
}
}
You can see date_updation field is available in 3 doc so it is showing count as 3 and date_creation field is available in 2 doc so it is showing count as 2.

Looking for alternative result projection that works on mongodb 4.2.x

As per my last question (Query document count by multiple ranges returning range start/end with matching element count), I built a query to check for count of documents in multiple, potentially overlapping date ranges.
The query works on MongoDB 4.4 but I need to run it on 4.2 as well.
On MongoDB 4.2, I get the following error:
Mongo Server error (MongoCommandException): Command failed with error 168 (InvalidPipelineOperator): 'Unrecognized expression '$first'' on server localhost:27017.
The full response is:
{
"ok" : 0.0,
"errmsg" : "Unrecognized expression '$first'",
"code" : 168.0,
"codeName" : "InvalidPipelineOperator"
}
How would you write the aggregation projection to achieve the same result structure.
Here is the complete code with data setup
db.createCollection("object_location_tracking");
db.getCollection("object_location_tracking").insertMany([
{
_id: "1",
locationId: "locationA",
objectId: "objectA",
timestamp: ISODate("2020-01-01T00:00:00Z")
},
{
_id: "2",
locationId: "locationB",
objectId: "objectA",
timestamp: ISODate("2020-01-01T00:00:00Z")
},
{
_id: "3",
locationId: "locationA",
objectId: "objectB",
timestamp: ISODate("2019-01-01T00:00:00Z")
},
{
_id: "4",
locationId: "locationB",
objectId: "objectB",
timestamp: ISODate("2020-01-01T00:00:00Z")
}
]);
db.getCollection("object_location_tracking").aggregate([
{$facet: {
"first_bucket_id": [
{$match: {"objectId":"objectA",
"locationId":"locationA",
"timestamp": {$gte: new ISODate('2020-01-01'),
$lt: new ISODate('2020-12-31')}
}},
{$count: "N"}
],
"second_bucket_id": [
{$match: {"objectId":"objectA",
"locationId":"locationA",
"timestamp": {$gte: new ISODate('2020-01-01'),
$lt: new ISODate('2022-12-31')}
}},
{$count: "N"}
],
"third_bucket_id": [
{$match: {"objectId":"objectA",
"locationId":"locationB",
"timestamp": {$gte: new ISODate('2022-01-01'),
$lt: new ISODate('2022-12-31')}
}},
{$count: "N"}
]
}},
{
$set: {
first_bucket_id: { $first: "$first_bucket_id.N"},
second_bucket_id: { $first: "$second_bucket_id.N"},
third_bucket_id: { $first: "$third_bucket_id.N"}
}
}
, {
$project: {
first_bucket_id: 1,
second_bucket_id: 1,
third_bucket_id: 1
}
}
]);
You are using the first array element, which as you can tell is a new operator added for version 4.4
Luckily for you this is quite easy to overcome, by just using $arrayElemAt, like so:
{
$set: {
first_bucket_id: {$arrayElemAt: ["$first_bucket_id.N", 0]},
second_bucket_id: {$arrayElemAt: ["$second_bucket_id.N", 0]},
third_bucket_id: {$arrayElemAt: ["$third_bucket_id.N", 0]}
}
}

Elasticsearch composite group by queries across the documents

We have an elastic search document which has a dimension called city. Each document will have only one value for city field. I have a scenario where I need to query the person based on the city or cities.
Documents in Elasticsearch
{
person_id: "1",
property_value : 25000,
city: "Bangalore"
}
{
person_id: "2",
property_value : 100000,
city: "Bangalore"
}
{
person_id: "1",
property_value : 15000,
city: "Delhi"
}
Note: The aggregation should be performed on property_value and group by on person_id.
For eg.,
If I query for Bangalore it should return document with person_id 1 and 2.
If I query for both Delhi and Bangalore it should return this
{
person_id: "1",
property_value : 40000,
city: ["Bangalore", "Delhi"]
}
Looking at your data, I've come up with a sample mapping, request query and the response.
Mapping:
PUT my_index_city
{
"mappings": {
"properties": {
"person_id":{
"type": "keyword"
},
"city":{
"type":"text",
"fields":{
"keyword":{
"type": "keyword"
}
}
},
"property_value":{
"type": "long"
}
}
}
}
Sample Request:
Note that I've made use of simple query string to filter the documents having Bangalore and Delhi.
For aggregation I've made use of Terms Aggregation on person_id and Sum Aggregation on the property_value field.
POST my_index_city/_search
{
"size": 0,
"query": {
"query_string": {
"default_field": "city",
"query": "Bangalore Delhi"
}
},
"aggs": {
"my_person": {
"terms": {
"field": "person_id",
"size": 10,
"min_doc_count": 2
},
"aggs": {
"sum_property_value": {
"sum": {
"field": "property_value"
}
}
}
}
}
}
Sample Response:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_person" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 2,
"sum_property_value" : {
"value" : 40000.0
}
}
]
}
}
}
Note: This query would only work if the person_id has multiple documents but each document having unique/different city value.
What I mean to say is, if the person_id has multiple documents with same city, the aggregation would not give right answer.
Updated Answer:
There is no direct way to achieve what you are looking for unless you modify the mapping. What I've done is, made use of nested datatype and ingested all the documents for person_id as a single document.
Mapping:
PUT my_sample_city_index
{
"mappings": {
"properties": {
"person_id":{
"type": "keyword"
},
"property_details":{
"type":"nested", <------ Note this
"properties": {
"city":{
"type": "text",
"fields":{
"keyword":{
"type":"keyword"
}
}
},
"property_value":{
"type": "long"
}
}
}
}
}
}
Sample Documents:
POST my_sample_city_index/_doc/1
{
"person_id": "1",
"property_details":[
{
"property_value" : 25000,
"city": "Bangalore"
},
{
"property_value" : 15000,
"city": "Delhi"
}
]
}
POST my_sample_city_index/_doc/2
{
"person_id": "2",
"property_details":[
{
"property_value" : 100000,
"city": "Bangalore"
}
]
}
Aggregation Query:
POST my_sample_city_index/_search
{
"size": 0,
"query": {
"nested": {
"path": "property_details",
"query": {
"query_string": {
"default_field": "property_details.city",
"query": "bangalore delhi"
}
}
}
},
"aggs": {
"persons": {
"terms": {
"field": "person_id",
"size": 10
},
"aggs": {
"property_sum": {
"nested": { <------ Note this
"path": "property_details"
},
"aggs": {
"total_sum": {
"sum": {
"field": "property_details.property_value"
}
}
}
}
}
}
}
}
Note that I've applied initially a term query on person_id post which I've applied Nested Aggregation, further on which I've applied metric sum aggregation query.
This should also work correctly if a person has multiple properties in the same city.
Response:
{
"took" : 31,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"persons" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 1,
"property_sum" : {
"doc_count" : 2,
"total_sum" : {
"value" : 40000.0
}
}
},
{
"key" : "2",
"doc_count" : 1,
"property_sum" : {
"doc_count" : 1,
"total_sum" : {
"value" : 100000.0
}
}
}
]
}
}
}
Let me know if this helps!

Get multiple fields count from one query in MongoDB?

I have a collection of events its structure is as follows :
{
"_id" : ObjectId("537b3ff288f4ca2f471afcae"),
"Name" : "PREMISES MAP DELETED",
"ScreenName" : "AccessPointActivity",
"Timestamp" : NumberLong("1392113758000"),
"EventParams" : "null",
"TracInfo" : {
"ApplicationId" : "fa41f204bfc711e3b9f9c8cbb8c502c4",
"DeviceId" : "2_1VafJVPu4yfdbMWO1XGROjK6iQZhq4hAVCQL837W",
"UserId" : "pawan",
"SessionId" : "a8UHE16mowNwNGyuLXbW",
"WiFiAP" : "null",
"WiFiStrength" : 0,
"BluetoothID" : "null",
"BluetoothStrength" : 0,
"NetworkType" : "null",
"NetworkSubType" : "null",
"NetworkCarrier" : "Idea",
"Age" : 43,
"Gender" : "Female",
"OSVersion" : "16",
"Manufacturer" : "samsung",
"Resolution" : "600*976",
"Platform" : "Android",
"Latitude" : 40.42,
"Longitude" : -74,
"City" : "Monmouth County",
"CityLowerCase" : "monmouth county",
"Country" : "United States",
"CountryLowerCase" : "united states",
"Region" : "New Jersey",
"RegionLowerCase" : "new jersey",
"Time_zone" : "null",
"PinCode" : "07732",
"Locale" : ", Paradise Trailer Park",
"Accuracy" : 0,
"Timestamp" : NumberLong("1392113758000")
}
}
their are many event on different screens.
My expected output is as follows :
{
ApplicationId:"fa41f204bfc711e3b9f9c8cbb8c502c4",
EventName:"PREMISES MAP DELETED",
Eventcount:300,
ScreenviewCount:20,
DeviceCount:10,
UserCount:3
}
EventCount : It is count of EventName
ScreenviewCount : It is the count of distinct screenName distinct per session
DeviceCount : It is the count of distinct deviceId
UserCount : It is the count of distinct userCount
Their will be multiple event on multiple screens(ScreenName).
Currently i' am using following approach :
Using aggregation to get each event name and it count
eg :
{
_id:
{
ApplicationId:"fa41f204bfc711e3b9f9c8cbb8c502c4",
EventName:"PREMISES MAP DELETED"
}
EventCount:300
}
For each event name from above aggregation result I call following queries in while loop until aggregation output has documents:
a) Distinct query using eventName from aggregation output for screenview count(on event collection).
b) Distinct query eventName from aggregation output for device count(on event collection).
c) Distinct query eventName from aggregation output for user count(on event collection).
And the problem is its slow as it has 3 distinct queries on each result of aggregation output.
Is their any way to do it in single aggregation call or something else.
Thank you in advance!!!
The general case here that you seemed to have missed is that to get the "distinct" values of various fields in your document under the "event" totals you can use the $addToSet operator.
A "set" by definition has all of it's values "unique/distinct", so you just want to hold all those possible values in a "set" for your grouping level and then get the "size" of the array produced, which is exactly what the $size operator introduced in MongoDB 2.6 does.
db.collection.aggregate([
{ "$group": {
"_id": {
"ApplicationId": "$TracInfo.ApplicationId",
"EventName": "$Name",
},
"oScreenViewCount": {
"$addToSet": {
"ScreenName": "$ScreenName",
"SessionId": "$TracInfo.SessionId",
}
},
"oDeviceCount": { "$addToSet": "$TracInfo.DeviceId" },
"oUserCount": { "$addToSet": "$TracInfo.UserId" },
"oEventcount": { "$sum": 1 }
}},
{ "$project": {
"_id": 0,
"ApplicationId": "$_id.ApplicationId",
"EventName": "$_id.EventName",
"EventCount": "$oEventCount",
"ScreenViewCount": { "$size": "$oScreenViewCount" },
"DeviceCount": { "$size": "$oDeviceCount" },
"UserCount": { "$size": "$oUserCount" }
}}
])
Versions pre MongoDB 2.6 require a little more work, using $unwind and $group to count the arrays:
db.collection.aggregate([
{ "$group": {
"_id": {
"ApplicationId": "$TracInfo.ApplicationId",
"EventName": "$Name",
},
"oScreenviewCount": {
"$addToSet": {
"ScreenName": "$ScreenName",
"SessionId": "$TracInfo.SessionId",
}
},
"oDeviceCount": { "$addToSet": "$TracInfo.DeviceId" },
"oUserCount": { "$addToSet": "$TracInfo.UserId" },
"oEventcount": { "$sum": 1 }
}},
{ "$unwind": "$oScreeenviewCount" },
{ "$group": {
"_id": "$_id",
"oScreenviewCount": { "$sum": 1 },
"oDeviceCount": { "$first": "$oDeviceCount" },
"oUserCount": { "$first": "$oUserCount" },
"oEventcount": { "$first": "$oEventCount" }
}},
{ "$unwind": "$oDeviceCount" },
{ "$group": {
"_id": "$_id",
"oScreenviewCount": { "$first": "$oScreenViewCount" },
"oDeviceCount": { "$sum": "$oDeviceCount" },
"oUserCount": { "$first": "$oUserCount" },
"oEventcount": { "$first": "$oEventCount" }
}},
{ "$unwind": "$oUserCount" },
{ "$group": {
"_id": "$_id",
"oScreenviewCount": { "$first": "$oScreenViewCount" },
"oDeviceCount": { "$first": "$oDeviceCount" },
"oUserCount": { "$sum": "$oUserCount" },
"oEventcount": { "$first": "$oEventCount" }
}},
{ "$project": {
"_id": 0,
"ApplicationId": "$_id.ApplicationId",
"EventName": "$_id.EventName",
"EventCount": "$oEventCount",
"ScreenViewCount": "$oScreenViewCount",
"DeviceCount": "$oDeviceCount",
"UserCount": "$oUserCount"
}}
])
The end usage of $project in the second listing and all the general usage of the "o" prefixed names is really just for prettying up the result at the end and making sure the output field order is the same as in your sample result.
As a general disclaimer, your question lacks the information to determine the exact fields or combinations that are used for these totals, but the principles and approach are sound and should be near enough to the same implementation.
So essentially, you are getting the "distinct" values within the "group" by using $addToSet for whatever the field or combination is and then you are determining the "count" of those "sets" by whatever means is available to you.
Much better than issuing many queries and merging results in client code.

Categories

Resources