How to generate elastic search nested aggregations in java?

How to generate elastic search nested aggregations in java? - java

I have the below aggregation query which i need to translate into java
Below aggregation query needs to be translated into java using elastic search client RestHighLevelCleint
not sure i tried multiple times but the java code is not able to translate as per the below query.
{
"aggs": {
"recommendations": {
"nested": {
"path": "events.recommendationData"
},
"aggs": {
"exception": {
"filter": {
"terms": {
"events.recommendationData.exceptionId": [
"2"
]
}
},
"aggs": {
"exceptionIds": {
"terms": {
"field": "events.recommendationData.exceptionId.keyword",
"size": 10
},
"aggs": {
"recommendations": {
"nested": {
"path": "events.recommendationData.recommendations"
},
"aggs": {
"recommendationType": {
"terms": {
"field": "events.recommendationData.recommendations.recommendationType",
"size": 10
}
}
}
}
}
}
}
}
}
}
}
}
using the below code with RestHighLevelClient
AggregationBuilder recommendations =
AggregationBuilders.nested("recommendations", "events.recommendationData");
AggregationBuilder exception = AggregationBuilders
.filter("exception", QueryBuilders.termsQuery("events.recommendationData.exceptionId", "2"));
AggregationBuilder exceptionIds = AggregationBuilders.terms("exceptionIds")
.field("events.recommendationData.exceptionId.keyword").size(10);
AggregationBuilder recommendations2 =
AggregationBuilders.nested("recommendations", "events.recommendationData.recommendations");
AggregationBuilder recommendationType = AggregationBuilders.terms("recommendationType")
.field("events.recommendationData.recommendations.recommendationType").size(10);
AggregationBuilder build =
recommendations
.subAggregation(exception)
.subAggregation(exceptionIds)
.subAggregation(recommendations2)
.subAggregation(recommendationType);
and it is producing the wrong query as i posted below which is not working.
{
"aggregations": {
"recommendations": {
"nested": {
"path": "events.recommendationData"
},
"aggregations": {
"exception": {
"filter": {
"terms": {
"events.recommendationData.exceptionId": [
"1",
"2"
],
"boost": 1
}
}
},
"exceptionIds": {
"terms": {
"field": "events.recommendationData.exceptionId.keyword",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
}
},
"recommendations": {
"nested": {
"path": "events.recommendationData.recommendations"
}
},
"recommendationType": {
"terms": {
"field": "events.recommendationData.recommendations.recommendationType",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
}
}
}
}
}
}

Expected: Every aggregation is a sub aggregation to the previous.
Therefore, if you see the expected query, recommendationType is sub-aggregation of recommendations2. These together are sub-aggregation to exceptionIds and so on. Therefore only one line needs to change here, which is instead of
AggregationBuilder build =
recommendations
.subAggregation(exception)
.subAggregation(exceptionIds)
.subAggregation(recommendations2)
.subAggregation(recommendationType);
use this,
recommendations.subAggregation(
exception.subAggregation(
exceptionIds.subAggregation(
recommendations2.subAggregation(recommendationType)
)
)
);

Related

MongoDB Autocomplete index doesn't get result

I have a collection which name called 'airport' and i have Atlas Auto Complete index you can see JSON config below.
{
"mappings": {
"dynamic": false,
"fields": {
"name": [
{
"type": "string"
},
{
"foldDiacritics": false,
"maxGrams": 7,
"minGrams": 2,
"type": "autocomplete"
}
]
}
}
}
and this is my Document record
{
"_id": {
"$oid": "63de588c7154cc3ee5cbabb2"
},
"name": "Antalya Airport",
"code": "AYT",
"country": "TR",
"createdDate": {
"$date": {
"$numberLong": "1675516044323"
}
},
"updatedDate": {
"$date": {
"$numberLong": "1675516044323"
}
},
"updatedBy": "VISITOR",
"createdBy": "VISITOR",
}
And This is my MongoDB Query
public List<Document> autoCompleteAirports(AutoCompleteRequest autoCompleteRequest) {
return database.getCollection(AIRPORT).aggregate(
Arrays.asList(new Document("$search",
new Document("index", "airportAutoCompleteIndex")
.append("text",
new Document("query", autoCompleteRequest.getKeyword())
.append("path", "name")
)))
).into(new ArrayList<>());
}
So, when i type "antalya" or "Antalya", this works. But when i type "Antaly" or "antal" there is no result.
Any solution ?
i tried change min and max grams settings on index

ElasticSearch sorting isn't sorting by field

I'm trying to perform a field sort on the specified field but to no avail. The query keeps returning the same position when I run the script.
Here is the ElasticSearch script:
{
"from": 0,
"size": 10,
"timeout": "60s",
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"query_string": {
"query": "random",
"fields": [],
"type": "best_fields",
"default_operator": "or",
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "AUTO",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 0,
"escape": false,
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"match": {
"reviews.source": {
"query": "TEST",
"operator": "AND",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"path": "reviews",
"ignore_unmapped": false,
"score_mode": "avg",
"boost": 1,
"inner_hits": {
"name": "reviews",
"ignore_unmapped": false,
"from": 0,
"size": 3,
"version": false,
"seq_no_primary_term": false,
"explain": false,
"track_scores": false
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
],
"should": [
{
"match": {
"dataset": {
"query": "QUERY_TEST",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"minimum_should_match": "1",
"boost": 1
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"reviews.openedAt": {
"order": "desc",
"nested": {
"path": "reviews"
}
}
}
]
}
The mapping I'm currently using:
"reviews": {
"type": "nested",
"properties": {
"id": {
"type": "keyword",
"copy_to": "fulltext"
},
"updatedAt": {
"type": "date",
"format": "strict_date_time",
"index": false
},
"openedAt": {
"type": "date",
"format": "strict_date_time"
}
I'm trying to sort the records based on a specific date in the reviews section. If a user inputs ASC, the returning values (reviews) should be in ascending order based on the openedAt date. I believe the sorting function isn't necessarily hitting the appropriate path. What should the sorting function look like?
I have a Java API that I created that calls the request and creates its own set of records:
public SearchResponse(SearchResponse response, SearchRequest searchRequest) {
this.facets = new ArrayList<>();
if (searchRequest == null || searchRequest.getRestricted().isEmpty()) {
this.records =
Stream.of(response.getHits().getHits()).map(SearchHit::getSourceAsMap).collect(Collectors.toList());
} else {
this.records = processRestrictedResults(response, searchRequest);
}
if (response.getAggregations() != null) {
for (Map.Entry<String, Aggregation> entry : response.getAggregations().getAsMap().entrySet()) {
this.facets.add(Facet.create(entry));
}
}
this.totalRecords = getTotalMatched(response);
}

To answer the original question, the top-level hits are indeed being sorted by the latest reviews.openedAt in the descending order — one of the reviews from doc#2 has the value 2021-04-06T08:13:53.552Z which is greater than the only reviews.openedAt from doc#1 (2021-03-30T08:13:53.552Z), thus #2 comes before #1.
What you're missing, though, is sorted inner_hits, as I explained here and here.
In your particular use case this would mean:
{
"from": 0,
"size": 10,
"timeout": "60s",
"query": {
"bool": {
"must": [
... // your original queries
{
"nested": {
"path": "reviews", <-- we need to enforce the nested context
"query": {
"match_all": {} <-- this could've been `"exists": { "field": "reviews.openedAt" }` too
},
"inner_hits": {
"sort": {
"reviews.openedAt": { <-- sorting the inner hits under the nested context
"order": "desc"
}
}
}
}
}
]
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"reviews.openedAt": { <-- sorting the top-level hits, as you previously were
"order": "desc",
"nested": {
"path": "reviews"
}
}
}
]
}
When you run the above query, each top-level hit will include an inner_hits attribute containing the sorted reviews which you can then post-process in your java backend.

Average of difference between the dates

A snippet of my elasticsearch data is like below. Status field is nested.
status: [
{
"updated_at": "2020-08-04 17:18:41",
"created_at": "2020-08-04 17:18:39",
"sub_stage": "Stage1"
},
{
"updated_at": "2020-08-04 17:21:15",
"created_at": "2020-08-04 17:18:41",
"sub_stage": "Stage2"
},
{
"updated_at": "2020-08-04 17:21:15",
"created_at": "2020-08-04 17:21:07",
"sub_stage": "Stage3"
}
]
After aggregating based on some field, I have for each bucket some documents and every document will have status field. Now, what I want is to find the average of time difference between stage1 and stage3.
For ex: Suppose for id = 1 bucket consists of 100 documents. Then for each document I have to find the time difference between stage 1 and stage 3. Then, finally take the average of it.
I am able to perform till aggregation but stuck at finding average.
With some effort, I am using below script but have no idea whether it is correct :
Map findEvent(List events, String type) {
return events.find(it -> it.sub_stage == type);
}
return ChronoUnit.DAYS.between(Instant.parse(findEvent(params._source.events, 'Stage1').timestamp), Instant.parse(findEvent(params._source.events, 'Stage3').timestamp););
Is there any way I can perform this in Java with this script or any other script ?
Roughly, Query looks like:
{
"from": 0,
"size": 0,
"query": {
"bool": {
"must": [
{
"nested": {
"query": {
"bool": {
"should": [
{
"match": {
"status.sub_stage": {
"query": "Stage1",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"minimum_should_match": "1",
"boost": 1.0
}
},
"path": "status",
"ignore_unmapped": false,
"score_mode": "none",
"boost": 1.0
}
}
],
"adjust_pure_negative": true,
"minimum_should_match": "1",
"boost": 1.0
}
},
"aggregations": {
"id": {
"terms": {
"field": "id.keyword",
"size": 1000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"avg time": {
"avg": {
"script": {
"source": "Map findStage(List events, String type) { return events.find(it -> it.sub_stage == type); } return ChronoUnit.DAYS.between(Instant.parse(findStage(ctx._source.status, 'Stage1').timestamp), Instant.parse(findStage(ctx._source.status, 'Stage3').timestamp));",
"lang": "painless"
}
}
}
}
}
}
}

Why does ElasticSearch is not showing the score?

I am using ElasticSearch 2.3.1 on Ubuntu 16.04.
The mapping is:
{
"settings": {
"analysis": {
"filter": {
"2gramsto3_filter": {
"type": "ngram",
"min_gram": 2,
"max_gram": 3
}
},
"analyzer": {
"2gramsto3": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"2gramsto3_filter"
]
}
}
}
},
"mappings": {
"agents": {
"properties": {
"presentation": {
"type": "string",
"analyzer": "2gramsto3"
},
"cv": {
"type": "string",
"analyzer": "2gramsto3"
}
}
}
}
The query is:
{
"size": 20,
"from": 0,
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
[
{
"match": {
"cv": "folletto"
}
},
{
"match": {
"cv": " psicologia"
}
},
{
"match": {
"cv": " tenacia"
}
}
]
]
}
}
]
}
}
}
It found 14567 documents but the score is always "_score": 0
I read the filters have the score, so, why not in this case?
Thank you!

The score is not calculated for filters. You need to use a normal query if you need scores.
Just take into account implications pointed out at the documentation below.
Ref doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html

Elasticsearch query not returning anything

I have the following document indexed but when I run the search it's not returning anything, I was wondering if its an issue with the query. I am trying to search for any of the nested messages that have the word dogs in it. Here is the document:
{
"_index": "thread_and_messages",
"_type": "thread",
"_id": "3",
"_score": 1.0,
"_source": {
"thread_id": 3,
"thread_name": "I play the guitar",
"created": "Wed Apr 13 2016",
"thread_view": 2,
"first_nick": "Test User",
"messages": [{
"message_text": " I like dogs",
"message_id": 13,
"message_nick": "Test"
}],
"site_name": "Test Site"
}
}
Here is the query I am running when I run the curl command:
{
"function_score": {
"functions": [{
"field_value_factor": {
"field": "thread_view",
"modifier": "log1p",
"factor": 2
}
}],
{"query": {
"bool": {
"should": [{
"match": {
"thread_name": "dogs"
}
}, {
"nested": {
"path": "messages",
"query": {
"bool": {
"should": [{
"match": {
"messages.message_text": "dogs"
}
}]
}
},
"inner_hits": {}
}
}]
}
}
}
}

The mapping you have plus the sample document with a slightly modified query works for me:
curl -XGET "http://localhost:9200/thread_and_messages/thread/_search" -d'
{
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "thread_view",
"modifier": "log1p",
"factor": 2
}
}
],
"query": {
"bool": {
"should": [
{
"match": {
"thread_name": "dogs"
}
},
{
"nested": {
"path": "messages",
"query": {
"bool": {
"should": [
{
"match": {
"messages.message_text": "dogs"
}
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
}
}'

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to generate elastic search nested aggregations in java? - java

Related

MongoDB Autocomplete index doesn't get result

ElasticSearch sorting isn't sorting by field

Average of difference between the dates

Why does ElasticSearch is not showing the score?

Elasticsearch query not returning anything

Categories

Resources