Spring data elasticsearch subaggregation not giving parent child results in aggregation results - java

I am using spring data elastisearch and constructed a query with subaggregation. When I debug and copy the value of the created query and execute from sense plugin/postman I am getting correct result with subaggregation on 2 fields i.e. one field inside other, however the response returned by the program does not contain one field inside other but contains only the parent level field in the aggregation results.
Can you tell me what I might be doing wrong. Please see below for the more information about this
Document skeleton is like
{
...
"gender":"Men",
"category":"Shirts",
.....
} ,
{
...
"gender":"Men",
"category":"Pants",
.....
},
{
...
"gender":"Women",
"category":"Pants",
.....
}
etc.
Expected output in aggregation is like
...
"aggregations": {
"gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Men",
"doc_count": 4,
"category": {
"buckets": [
{
"key": "shirts",
"doc_count": 2
},
{
"key": "pants",
"doc_count": 2
}
]
}
},
{
"key": "Women",
"doc_count": 2,
"category": {
"buckets": [
{
"key": "pants",
"doc_count": 2
}
]
}
}
]
}
}
....
Java Code Code to add subaggregation of type term aggregation.
for (Object aggregationField : request.getAggregationFields()) {
TermsBuilder termBuilder = AggregationBuilders.terms(aggregationField.toString())
.field(aggregationField.toString()).size(0);
if(aggregationField.toString().equals("gender"))
{
TermsBuilder platformBuilder = AggregationBuilders.terms("category").field("category").size(0);
termBuilder.subAggregation(platformBuilder);
}
nativeSearchQueryBuilder.addAggregation(termBuilder);
}

Related

How to get property value direct from mongodb in JAVA

Hi everyone I have a collection of documents like bellow. I want to directly get "rights" from roles array for params: _id, groups._id, roles._id using java mongo driver.
{
"_id": 1000002,
"groups": [
{
"_id": 1,
"roles": [
{
"rights": 3,
"_id": 1
},
{
"rights": 7,
"_id": 2
},
{
"rights": 3,
"_id": 3
}
]
}
],
"timestamp": {
"$date": {
"$numberLong": "1675267318028"
}
},
"users": [
{
"accessProviderId": 1,
"rights": 1,
"_id": 4
},
{
"accessProviderId": 1,
"rights": 3,
"_id": 5
}
]
}
I have AccessListItem class which represents this document and I have used Bson filters to get it from mongo, but after fetching i had to get information through java function.. I want to get int value directly from mongo base.
Bson fileFilter = Filters.eq("_id", itemId);
Bson groupFilter = Filters.elemMatch("groups", Document.parse("{_id:"+groupId+"}"));
Bson roleFilter = Filters.elemMatch("groups.roles", Document.parse("{_id:"+role+"}"));
Bson finalFilter = Filters.and(fileFilter, Filters.and(groupFilter,roleFilter));
MongoCollection<AccessListItem> accessListItemMongoCollection = MongoUtils.getAccessCollection(type);
AccessListItem accessListItem = accessListItemMongoCollection.find(finalFilter).first();
The short answer is you can't.
MongoDB is designed for returning documents, that is, objects containing key-value pairs. There is no mechanism for a MongoDB query to return just a value, i.e. it will never return just 3 or [3].
You could use aggregation with a $project stage at the end to give you a simplified object like:
{ rights: 3}
In javascript that might look like:
db.collection.aggregate([
{$match: {
_id: itemId,
"groups._id": groupId,
"groups.roles._id": role
}},
{$project: {
_id: 0,
group: {
$first: {
$filter: {
input: "$groups",
cond: {$eq: ["$$this._id",groupId]}
}
}
}
}},
{$project: {
"roles": {
$first: {
$filter: {
input: "$group.roles",
cond: { $eq: [ "$$this._id",role]}
}
}
}
}},
{$project: {
rights: "$roles.rights"
}}
])
Example: Playground
I'm not familiar with spring boot, so I'm not sure what that would look like in Java.

Is it possible to create a map in runtime mode, which will be filled in passing through all documents and returned at the end ES

For ex: I have 2 documents with this body:
{
"id": "doc_one",
"name": "test_name",
"date_creation": "some_date_cr_1",
"date_updation": "some_date_up_1"
}
And the second doc:
{
"id": "doc_two",
"name": "test_name",
"date_creation": "some_date_cr_2",
"date_updation": "some_date_up_2"
}
What I want to do: to create two runtime field or Map('data_creation',count_of_doc_where_field_not_null_AND_the_condition_is_met).
For ex: I've got the 1st doc, there is date_creation IS NOT NULL and the condition startDate<=date_creation<=endDate is met, so, I create some field count = 0 and when I've got this case I do count++. When I will get all the docs I will set finally count value from map as result: Map('data_creation',final_count) and the same for another field but in the same map.
I tried to use script, but there is return Map for each doc, for ex:
{
"_index": "my_index_001",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"my_doubled_field": [
{
"NEW": 2
}
]
}
},
{
"_index": "my_index_001",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"fields": {
"my_doubled_field": [
{
"NEW": 2
}
]
}
}
I have index below 3 documents to index where one document dont have date_creation field:
POST sample/_doc
{
"id": "doc_two",
"name": "test_name",
"date_updation": "some_date_up_2"
}
POST sample/_doc
{
"id": "doc_one",
"name": "test_name",
"date_creation": "some_date_cr_1",
"date_updation": "some_date_up_1"
}
POST sample/_doc
{
"id": "doc_two",
"name": "test_name",
"date_creation": "some_date_cr_2",
"date_updation": "some_date_up_2"
}
Now you can use filter aggregation from elasticsearch as shown below:
{
"size": 0,
"aggs": {
"date_creation": {
"filter": {
"range": {
"date_creation": {
"gte": "2020-01-09T10:20:10"
}
}
}
},
"date_updation": {
"filter": {
"range": {
"date_updation": {
"gte": "2020-01-09T10:20:10"
}
}
}
}
}
}
Response:
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"date_updation" : {
"meta" : { },
"doc_count" : 3
},
"date_creation" : {
"meta" : { },
"doc_count" : 2
}
}
You can see date_updation field is available in 3 doc so it is showing count as 3 and date_creation field is available in 2 doc so it is showing count as 2.

How to execute a complex MongoDB native query from Java Springboot

I have a bit of a complex query of view creation using 3 collections. The query is written in the native level. I need that query to be executed from Java and is there any way that I can execute these types of queries from Java level. Maybe a function that takes a MongoDB native query as a string and executes that on the database level
db.createView('TARGET_COLLECTION', 'SOURCE_COLLECTION_1', [
{
$facet: {
SOURCE_COLLECTION_1: [
{$match: {}},
{ $project: { "sourceId": {$toString: "$_id"}, "name": 1, "image": "$logo" }}
],
SOURCE_COLLECTION_2: [
{$limit: 1},
{
$lookup: {
from: 'SOURCE_COLLECTION_2',
localField: '__unexistingfield',
foreignField: '__unexistingfield',
as: '__col2'
}
},
{$unwind: '$__col2'},
{$replaceRoot: {newRoot: '$__col2'}},
{ $project: { "sourceId": {$toString: "$_id"}, "name": 1, "image": 1 }}
],
SOURCE_COLLECTION_3: [
{$limit: 1},
{
$lookup: {
from: 'SOURCE_COLLECTION_3',
localField: '__unexistingfield',
foreignField: '__unexistingfield',
as: '__col2'
}
},
{$unwind: '$__col2'},
{$replaceRoot: {newRoot: '$__col2'}},
{ $project: { "sourceId": {$toString: "$_id"}, "name": 1, "image": "$logo" }}
]
},
},
{$project: {data: {$concatArrays: ['$SOURCE_COLLECTION_1', '$SOURCE_COLLECTION_2', '$SOURCE_COLLECTION_3']}}},
{$unwind: '$data'},
{$replaceRoot: {newRoot: '$data'}}
])
An example:
Consider a document in a collection:
{ _id: 1234, name: "J. Doe", colors: [ "red", "black" ] }
And the following aggregation from the mongo shell:
db.collection.agregate( [
{ $project: { _id: 0, colors: 1 } }
] )
This returns: { "colors" : [ "red", "black" ] }
This can also be run with the following command:
db.runCommand( {
aggregate: "collection",
pipeline: [ { $project: { _id: 0, colors: 1 } } ],
cursor: { }
} )
And, its translation using Spring Data's MongoTemplate:
String jsonCommand = "{ aggregate: 'collection', pipeline: [ { $project: { _id: 0, colors: 1 } } ], cursor: { } }";
Document resultDoc = mongoTemplate.executeCommand(jsonCommand);
The output document resultDoc has a format like the following:
{
"cursor" : {
"firstBatch" : [
{
"colors" : [
"red",
"black"
]
}
],
"id" : NumberLong(0),
"ns" : "test.colors"
},
"ok" : 1
}
To know more about the db.runCommand(...) method see MongoDB documentation at: Database Commands and Database Command Aggregate.

Elasticsearch how sum values after aggregation result

I have a lot of documents as below under Elasticsearch index:
{
"_index": "f2016-07-17",
"_type": "trkvjadsreqpxl.gif",
"_id": "AVX2N3dl5siG6SyfyIjb",
"_score": 1,
"_source": {
"time": "1468714676424",
"meta": {
"cb_id": 25681,
"mt_id": 649,
"c_id": 1592,
"revenue": 2.5,
"mt_name": "GMS-INAPP-EN-2.5",
"c_description": "COULL-INAPP-EN-2.5",
"domain": "wv.inner-active.mobi",
"master_domain": "649###wv.inner-active.mobi",
"child_domain": "1592###wv.inner-active.mobi",
"combo_domain": "25681###wv.inner-active.mobi",
"ip": "52.42.87.73"
}
}
}
I want to make date histogram/range aggregation on multiple fields and store the result in other collection/index.
So I could make doc_count sum using query/aggregation between hours range.
The Aggregation is:
{
"aggs": {
"hour":{
"date_histogram": {
"field": "time",
"interval": "hour"
},
"aggs":{
"hourly_M_TAG":{
"terms":{
"field":"meta.mt_id"
}
}
}....
}
}
}
The Result as expected:
"aggregations": {
"hour": {
"buckets": [
{
"key_as_string": "2016-07-17T00:00:00.000Z",
"key": 1468713600000,
"doc_count": 94411750,
"hourly_M_TAG": {
"doc_count_error_upper_bound": 1485,
"sum_other_doc_count": 30731646,
"buckets": [
{
"key": 10,
"doc_count": 10175501
},
{
"key": 649,
"doc_count": 200000
}....
]
}
},
{
"key_as_string": "2016-07-17T01:00:00.000Z",
"key": 1468717200000,
"doc_count": 68738743,
"hourly_M_TAG": {
"doc_count_error_upper_bound": 2115,
"sum_other_doc_count": 22478590,
"buckets": [
{
"key": 559,
"doc_count": 8307018
},
{
"key": 649,
"doc_count" :100000
}...
Lets assume that I parse the response and try to store the Result in other Index/Collection.
My Question
What is the best way to store the aggregated results ,
so I can make other query/aggregation that sums the "doc_count" between different hour ranges?
for example: between "2016-07-17T00:00:00.000Z" - "2016-07-17T01:00:00.000Z" want to see the total doc_count on each key
EXPECTED RESULT:
{
"range_sum": {
"buckets": [
{
"key": 649,
"doc_count": 300000 // (200000+100000)
},
{
"key": 588,
"doc_count": 2928548 // ... + ...
}....
]
}
}
Thanks!
I might have your end goal wrong, but it seems to me like you want
the total doc_count for each value of meta.mt_id, over configurable ranges of time?
If that is the case I don't think you really need to store the result of the first aggregation, you really just need to change the interval value to reflect the bucket sizes you want. If you want totals for each value of meta.mt_id, it might help to flip the aggregation around so you are first aggregating on terms and then on the dates:
{
"size": 0,
"aggs": {
"hourly_M_TAG": {
"terms": {
"field": "meta.mt_id"
},
"aggs": {
"hour": {
"date_histogram": {
"field": "time",
"interval": "2h"
}
}
}
}
}
This will give you results for each meta.mt_id if you wish to get totals added for a particular time range just change the interval to reflect that.
EDIT:
There might be some smart elasticsearch way of doing this but I think I would just do it like this:
Do your original aggregation
foreach bucket in buckets:
index:
{
"id" : {meta.id},
"timestamp" : {key_as_string}
"count" : {doc_count}
}
You should then have an index of all meta.id documents with their doc_count at various timestamps, the granularity of the interval depends on what you need.
Then you can just do a term->sum aggregation on the new index with a range filter (Assuming use of elasticsearch 2.x) for the dates:
{
"size": 0,
"filter": {
"range": {
"timestamp": {
"gte": "now-1h",
"lte": "now"
}
}
},
"aggs": {
"termName": {
"terms": {
"field": "id"
},
"aggs": {
"sumCounts": {
"sum": {
"field": "count"
}
}
}
}
}
}
Sorry if that is still not what you are looking for, I think there are many different ways of doing this.

Selected read JSON by url

I want to make my code as simple as I can and don't want to download items which are not needed.
I want to select only Pages when I'm admin on facebook.
For now I use:
https://graph.facebook.com/v2.0/me/?fields=id,accounts{id,perms}&summary=true&limit=100&access_token=MY_TOKEN
Results:
{
"id": "101506",
"accounts": {
"data": [
{
"id": "6986842335",
"perms": [
"ADMINISTER",
"EDIT_PROFILE",
"CREATE_CONTENT",
"MODERATE_CONTENT",
"CREATE_ADS",
"BASIC_ADMIN"
]
},
{
"id": "1374577066121",
"perms": [
"BASIC_ADMIN"
]
},
{
"id": "997587036984",
"perms": [
"ADMINISTER",
"EDIT_PROFILE",
"CREATE_CONTENT",
"MODERATE_CONTENT",
"CREATE_ADS",
"BASIC_ADMIN"
]
},
],
}
}
Now, how to change the query to read only items with perms=ADMINISTER ?
This is what I need:
{
"id": "101506",
"accounts": {
"data": [
{
"id": "6986842335",
"perms": [
"ADMINISTER",
"EDIT_PROFILE",
"CREATE_CONTENT",
"MODERATE_CONTENT",
"CREATE_ADS",
"BASIC_ADMIN"
]
},
{
"id": "997587036984",
"perms": [
"ADMINISTER",
"EDIT_PROFILE",
"CREATE_CONTENT",
"MODERATE_CONTENT",
"CREATE_ADS",
"BASIC_ADMIN"
]
},
],
}
}
If I can't make the query to read what I need, maybe you know how to do this in Java to check the length of pages?
For now I have:
User = facebookClient.fetchObject("v2.0/me", JsonObject.class, Parameter.with("summary", true), Parameter.with("fields", "id,name,accounts{id,perms}"), Parameter.with("limit", 100));
Integer userFPCount = User.getJsonObject("accounts").getJsonArray("data").length();
But this code gives me results: 3. How to check the length of accounts.data.perms(ADMINISTER) ? I should have result: 2 in this example.
THANK YOU FOR YOUR HELP !

Categories

Resources