How to generate elastic search nested aggregations in java? - java

I have the below aggregation query which i need to translate into java
Below aggregation query needs to be translated into java using elastic search client RestHighLevelCleint
not sure i tried multiple times but the java code is not able to translate as per the below query.
{
"aggs": {
"recommendations": {
"nested": {
"path": "events.recommendationData"
},
"aggs": {
"exception": {
"filter": {
"terms": {
"events.recommendationData.exceptionId": [
"2"
]
}
},
"aggs": {
"exceptionIds": {
"terms": {
"field": "events.recommendationData.exceptionId.keyword",
"size": 10
},
"aggs": {
"recommendations": {
"nested": {
"path": "events.recommendationData.recommendations"
},
"aggs": {
"recommendationType": {
"terms": {
"field": "events.recommendationData.recommendations.recommendationType",
"size": 10
}
}
}
}
}
}
}
}
}
}
}
}
using the below code with RestHighLevelClient
AggregationBuilder recommendations =
AggregationBuilders.nested("recommendations", "events.recommendationData");
AggregationBuilder exception = AggregationBuilders
.filter("exception", QueryBuilders.termsQuery("events.recommendationData.exceptionId", "2"));
AggregationBuilder exceptionIds = AggregationBuilders.terms("exceptionIds")
.field("events.recommendationData.exceptionId.keyword").size(10);
AggregationBuilder recommendations2 =
AggregationBuilders.nested("recommendations", "events.recommendationData.recommendations");
AggregationBuilder recommendationType = AggregationBuilders.terms("recommendationType")
.field("events.recommendationData.recommendations.recommendationType").size(10);
AggregationBuilder build =
recommendations
.subAggregation(exception)
.subAggregation(exceptionIds)
.subAggregation(recommendations2)
.subAggregation(recommendationType);
and it is producing the wrong query as i posted below which is not working.
{
"aggregations": {
"recommendations": {
"nested": {
"path": "events.recommendationData"
},
"aggregations": {
"exception": {
"filter": {
"terms": {
"events.recommendationData.exceptionId": [
"1",
"2"
],
"boost": 1
}
}
},
"exceptionIds": {
"terms": {
"field": "events.recommendationData.exceptionId.keyword",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
}
},
"recommendations": {
"nested": {
"path": "events.recommendationData.recommendations"
}
},
"recommendationType": {
"terms": {
"field": "events.recommendationData.recommendations.recommendationType",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
}
}
}
}
}
}

Expected: Every aggregation is a sub aggregation to the previous.
Therefore, if you see the expected query, recommendationType is sub-aggregation of recommendations2. These together are sub-aggregation to exceptionIds and so on. Therefore only one line needs to change here, which is instead of
AggregationBuilder build =
recommendations
.subAggregation(exception)
.subAggregation(exceptionIds)
.subAggregation(recommendations2)
.subAggregation(recommendationType);
use this,
recommendations.subAggregation(
exception.subAggregation(
exceptionIds.subAggregation(
recommendations2.subAggregation(recommendationType)
)
)
);

Related

MongoDB Autocomplete index doesn't get result

I have a collection which name called 'airport' and i have Atlas Auto Complete index you can see JSON config below.
{
"mappings": {
"dynamic": false,
"fields": {
"name": [
{
"type": "string"
},
{
"foldDiacritics": false,
"maxGrams": 7,
"minGrams": 2,
"type": "autocomplete"
}
]
}
}
}
and this is my Document record
{
"_id": {
"$oid": "63de588c7154cc3ee5cbabb2"
},
"name": "Antalya Airport",
"code": "AYT",
"country": "TR",
"createdDate": {
"$date": {
"$numberLong": "1675516044323"
}
},
"updatedDate": {
"$date": {
"$numberLong": "1675516044323"
}
},
"updatedBy": "VISITOR",
"createdBy": "VISITOR",
}
And This is my MongoDB Query
public List<Document> autoCompleteAirports(AutoCompleteRequest autoCompleteRequest) {
return database.getCollection(AIRPORT).aggregate(
Arrays.asList(new Document("$search",
new Document("index", "airportAutoCompleteIndex")
.append("text",
new Document("query", autoCompleteRequest.getKeyword())
.append("path", "name")
)))
).into(new ArrayList<>());
}
So, when i type "antalya" or "Antalya", this works. But when i type "Antaly" or "antal" there is no result.
Any solution ?
i tried change min and max grams settings on index

ElasticSearch sorting isn't sorting by field

I'm trying to perform a field sort on the specified field but to no avail. The query keeps returning the same position when I run the script.
Here is the ElasticSearch script:
{
"from": 0,
"size": 10,
"timeout": "60s",
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"query_string": {
"query": "random",
"fields": [],
"type": "best_fields",
"default_operator": "or",
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "AUTO",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 0,
"escape": false,
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"match": {
"reviews.source": {
"query": "TEST",
"operator": "AND",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"path": "reviews",
"ignore_unmapped": false,
"score_mode": "avg",
"boost": 1,
"inner_hits": {
"name": "reviews",
"ignore_unmapped": false,
"from": 0,
"size": 3,
"version": false,
"seq_no_primary_term": false,
"explain": false,
"track_scores": false
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
],
"should": [
{
"match": {
"dataset": {
"query": "QUERY_TEST",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"minimum_should_match": "1",
"boost": 1
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"reviews.openedAt": {
"order": "desc",
"nested": {
"path": "reviews"
}
}
}
]
}
The mapping I'm currently using:
"reviews": {
"type": "nested",
"properties": {
"id": {
"type": "keyword",
"copy_to": "fulltext"
},
"updatedAt": {
"type": "date",
"format": "strict_date_time",
"index": false
},
"openedAt": {
"type": "date",
"format": "strict_date_time"
}
I'm trying to sort the records based on a specific date in the reviews section. If a user inputs ASC, the returning values (reviews) should be in ascending order based on the openedAt date. I believe the sorting function isn't necessarily hitting the appropriate path. What should the sorting function look like?
I have a Java API that I created that calls the request and creates its own set of records:
public SearchResponse(SearchResponse response, SearchRequest searchRequest) {
this.facets = new ArrayList<>();
if (searchRequest == null || searchRequest.getRestricted().isEmpty()) {
this.records =
Stream.of(response.getHits().getHits()).map(SearchHit::getSourceAsMap).collect(Collectors.toList());
} else {
this.records = processRestrictedResults(response, searchRequest);
}
if (response.getAggregations() != null) {
for (Map.Entry<String, Aggregation> entry : response.getAggregations().getAsMap().entrySet()) {
this.facets.add(Facet.create(entry));
}
}
this.totalRecords = getTotalMatched(response);
}
To answer the original question, the top-level hits are indeed being sorted by the latest reviews.openedAt in the descending order — one of the reviews from doc#2 has the value 2021-04-06T08:13:53.552Z which is greater than the only reviews.openedAt from doc#1 (2021-03-30T08:13:53.552Z), thus #2 comes before #1.
What you're missing, though, is sorted inner_hits, as I explained here and here.
In your particular use case this would mean:
{
"from": 0,
"size": 10,
"timeout": "60s",
"query": {
"bool": {
"must": [
... // your original queries
{
"nested": {
"path": "reviews", <-- we need to enforce the nested context
"query": {
"match_all": {} <-- this could've been `"exists": { "field": "reviews.openedAt" }` too
},
"inner_hits": {
"sort": {
"reviews.openedAt": { <-- sorting the inner hits under the nested context
"order": "desc"
}
}
}
}
}
]
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"reviews.openedAt": { <-- sorting the top-level hits, as you previously were
"order": "desc",
"nested": {
"path": "reviews"
}
}
}
]
}
When you run the above query, each top-level hit will include an inner_hits attribute containing the sorted reviews which you can then post-process in your java backend.

Average of difference between the dates

A snippet of my elasticsearch data is like below. Status field is nested.
status: [
{
"updated_at": "2020-08-04 17:18:41",
"created_at": "2020-08-04 17:18:39",
"sub_stage": "Stage1"
},
{
"updated_at": "2020-08-04 17:21:15",
"created_at": "2020-08-04 17:18:41",
"sub_stage": "Stage2"
},
{
"updated_at": "2020-08-04 17:21:15",
"created_at": "2020-08-04 17:21:07",
"sub_stage": "Stage3"
}
]
After aggregating based on some field, I have for each bucket some documents and every document will have status field. Now, what I want is to find the average of time difference between stage1 and stage3.
For ex: Suppose for id = 1 bucket consists of 100 documents. Then for each document I have to find the time difference between stage 1 and stage 3. Then, finally take the average of it.
I am able to perform till aggregation but stuck at finding average.
With some effort, I am using below script but have no idea whether it is correct :
Map findEvent(List events, String type) {
return events.find(it -> it.sub_stage == type);
}
return ChronoUnit.DAYS.between(Instant.parse(findEvent(params._source.events, 'Stage1').timestamp), Instant.parse(findEvent(params._source.events, 'Stage3').timestamp););
Is there any way I can perform this in Java with this script or any other script ?
Roughly, Query looks like:
{
"from": 0,
"size": 0,
"query": {
"bool": {
"must": [
{
"nested": {
"query": {
"bool": {
"should": [
{
"match": {
"status.sub_stage": {
"query": "Stage1",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"minimum_should_match": "1",
"boost": 1.0
}
},
"path": "status",
"ignore_unmapped": false,
"score_mode": "none",
"boost": 1.0
}
}
],
"adjust_pure_negative": true,
"minimum_should_match": "1",
"boost": 1.0
}
},
"aggregations": {
"id": {
"terms": {
"field": "id.keyword",
"size": 1000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"avg time": {
"avg": {
"script": {
"source": "Map findStage(List events, String type) { return events.find(it -> it.sub_stage == type); } return ChronoUnit.DAYS.between(Instant.parse(findStage(ctx._source.status, 'Stage1').timestamp), Instant.parse(findStage(ctx._source.status, 'Stage3').timestamp));",
"lang": "painless"
}
}
}
}
}
}
}

Why does ElasticSearch is not showing the score?

I am using ElasticSearch 2.3.1 on Ubuntu 16.04.
The mapping is:
{
"settings": {
"analysis": {
"filter": {
"2gramsto3_filter": {
"type": "ngram",
"min_gram": 2,
"max_gram": 3
}
},
"analyzer": {
"2gramsto3": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"2gramsto3_filter"
]
}
}
}
},
"mappings": {
"agents": {
"properties": {
"presentation": {
"type": "string",
"analyzer": "2gramsto3"
},
"cv": {
"type": "string",
"analyzer": "2gramsto3"
}
}
}
}
The query is:
{
"size": 20,
"from": 0,
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
[
{
"match": {
"cv": "folletto"
}
},
{
"match": {
"cv": " psicologia"
}
},
{
"match": {
"cv": " tenacia"
}
}
]
]
}
}
]
}
}
}
It found 14567 documents but the score is always "_score": 0
I read the filters have the score, so, why not in this case?
Thank you!
The score is not calculated for filters. You need to use a normal query if you need scores.
Just take into account implications pointed out at the documentation below.
Ref doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html

Elasticsearch query not returning anything

I have the following document indexed but when I run the search it's not returning anything, I was wondering if its an issue with the query. I am trying to search for any of the nested messages that have the word dogs in it. Here is the document:
{
"_index": "thread_and_messages",
"_type": "thread",
"_id": "3",
"_score": 1.0,
"_source": {
"thread_id": 3,
"thread_name": "I play the guitar",
"created": "Wed Apr 13 2016",
"thread_view": 2,
"first_nick": "Test User",
"messages": [{
"message_text": " I like dogs",
"message_id": 13,
"message_nick": "Test"
}],
"site_name": "Test Site"
}
}
Here is the query I am running when I run the curl command:
{
"function_score": {
"functions": [{
"field_value_factor": {
"field": "thread_view",
"modifier": "log1p",
"factor": 2
}
}],
{"query": {
"bool": {
"should": [{
"match": {
"thread_name": "dogs"
}
}, {
"nested": {
"path": "messages",
"query": {
"bool": {
"should": [{
"match": {
"messages.message_text": "dogs"
}
}]
}
},
"inner_hits": {}
}
}]
}
}
}
}
The mapping you have plus the sample document with a slightly modified query works for me:
curl -XGET "http://localhost:9200/thread_and_messages/thread/_search" -d'
{
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "thread_view",
"modifier": "log1p",
"factor": 2
}
}
],
"query": {
"bool": {
"should": [
{
"match": {
"thread_name": "dogs"
}
},
{
"nested": {
"path": "messages",
"query": {
"bool": {
"should": [
{
"match": {
"messages.message_text": "dogs"
}
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
}
}'

Categories

Resources