ElasticSearch sorting isn't sorting by field - java

I'm trying to perform a field sort on the specified field but to no avail. The query keeps returning the same position when I run the script.
Here is the ElasticSearch script:
{
"from": 0,
"size": 10,
"timeout": "60s",
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"query_string": {
"query": "random",
"fields": [],
"type": "best_fields",
"default_operator": "or",
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "AUTO",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 0,
"escape": false,
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"match": {
"reviews.source": {
"query": "TEST",
"operator": "AND",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"path": "reviews",
"ignore_unmapped": false,
"score_mode": "avg",
"boost": 1,
"inner_hits": {
"name": "reviews",
"ignore_unmapped": false,
"from": 0,
"size": 3,
"version": false,
"seq_no_primary_term": false,
"explain": false,
"track_scores": false
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
],
"should": [
{
"match": {
"dataset": {
"query": "QUERY_TEST",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"minimum_should_match": "1",
"boost": 1
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"reviews.openedAt": {
"order": "desc",
"nested": {
"path": "reviews"
}
}
}
]
}
The mapping I'm currently using:
"reviews": {
"type": "nested",
"properties": {
"id": {
"type": "keyword",
"copy_to": "fulltext"
},
"updatedAt": {
"type": "date",
"format": "strict_date_time",
"index": false
},
"openedAt": {
"type": "date",
"format": "strict_date_time"
}
I'm trying to sort the records based on a specific date in the reviews section. If a user inputs ASC, the returning values (reviews) should be in ascending order based on the openedAt date. I believe the sorting function isn't necessarily hitting the appropriate path. What should the sorting function look like?
I have a Java API that I created that calls the request and creates its own set of records:
public SearchResponse(SearchResponse response, SearchRequest searchRequest) {
this.facets = new ArrayList<>();
if (searchRequest == null || searchRequest.getRestricted().isEmpty()) {
this.records =
Stream.of(response.getHits().getHits()).map(SearchHit::getSourceAsMap).collect(Collectors.toList());
} else {
this.records = processRestrictedResults(response, searchRequest);
}
if (response.getAggregations() != null) {
for (Map.Entry<String, Aggregation> entry : response.getAggregations().getAsMap().entrySet()) {
this.facets.add(Facet.create(entry));
}
}
this.totalRecords = getTotalMatched(response);
}

To answer the original question, the top-level hits are indeed being sorted by the latest reviews.openedAt in the descending order — one of the reviews from doc#2 has the value 2021-04-06T08:13:53.552Z which is greater than the only reviews.openedAt from doc#1 (2021-03-30T08:13:53.552Z), thus #2 comes before #1.
What you're missing, though, is sorted inner_hits, as I explained here and here.
In your particular use case this would mean:
{
"from": 0,
"size": 10,
"timeout": "60s",
"query": {
"bool": {
"must": [
... // your original queries
{
"nested": {
"path": "reviews", <-- we need to enforce the nested context
"query": {
"match_all": {} <-- this could've been `"exists": { "field": "reviews.openedAt" }` too
},
"inner_hits": {
"sort": {
"reviews.openedAt": { <-- sorting the inner hits under the nested context
"order": "desc"
}
}
}
}
}
]
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"reviews.openedAt": { <-- sorting the top-level hits, as you previously were
"order": "desc",
"nested": {
"path": "reviews"
}
}
}
]
}
When you run the above query, each top-level hit will include an inner_hits attribute containing the sorted reviews which you can then post-process in your java backend.

Related

MongoDB Autocomplete index doesn't get result

I have a collection which name called 'airport' and i have Atlas Auto Complete index you can see JSON config below.
{
"mappings": {
"dynamic": false,
"fields": {
"name": [
{
"type": "string"
},
{
"foldDiacritics": false,
"maxGrams": 7,
"minGrams": 2,
"type": "autocomplete"
}
]
}
}
}
and this is my Document record
{
"_id": {
"$oid": "63de588c7154cc3ee5cbabb2"
},
"name": "Antalya Airport",
"code": "AYT",
"country": "TR",
"createdDate": {
"$date": {
"$numberLong": "1675516044323"
}
},
"updatedDate": {
"$date": {
"$numberLong": "1675516044323"
}
},
"updatedBy": "VISITOR",
"createdBy": "VISITOR",
}
And This is my MongoDB Query
public List<Document> autoCompleteAirports(AutoCompleteRequest autoCompleteRequest) {
return database.getCollection(AIRPORT).aggregate(
Arrays.asList(new Document("$search",
new Document("index", "airportAutoCompleteIndex")
.append("text",
new Document("query", autoCompleteRequest.getKeyword())
.append("path", "name")
)))
).into(new ArrayList<>());
}
So, when i type "antalya" or "Antalya", this works. But when i type "Antaly" or "antal" there is no result.
Any solution ?
i tried change min and max grams settings on index

ElastiSearch combine result set of two queries

I have below data structure in ElastiSearch.
[{
"name": "Kapil",
"age": 32,
"hobbies": ["Cricket", "Football", "Swimming"]
},
{
"name": "John",
"age": 33,
"hobbies": ["Baseball", "Football", "Swimming"]
},
{
"name": "Vick",
"age": 30,
"hobbies": ["Baseball", "Karate", "Swimming"]
}]
I want to get all records from the data in following order:
Get all users which has Football as hobby and sort them by age desc.
Get all other users sort by age desc.
So expecting result in following order John, Kapil and Vick.
I used below query to get result for #1.
{
"size": 500,
"query": {
"bool": {
"must": [
{
"match_phrase": {
"hobbies.keyword": "Football"
}
}
]
}
},
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
and used below for point #2
{
"size": 500,
"query": {
"bool": {
"must_not": [
{
"match_phrase": {
"hobbies.keyword": "Football"
}
}
]
}
},
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
With above, I am not able to maintain the paging logic. It also requires me to execute both queries separately. Can Someone please help how to achieve this?
Try this out:
{
"query": {
"bool": {
"must": {
"match_all": {} <-- retrieve all docs
},
"should": { <-- give higher score to docs that match this clause
"match_phrase": {
"hobbies.keyword": "Football"
}
}
}
},
"sort": [
"_score", <-- sort by doc score first
{
"age": { <-- when score is equal then sort by age
"order": "desc"
}
}
]
}

Average of difference between the dates

A snippet of my elasticsearch data is like below. Status field is nested.
status: [
{
"updated_at": "2020-08-04 17:18:41",
"created_at": "2020-08-04 17:18:39",
"sub_stage": "Stage1"
},
{
"updated_at": "2020-08-04 17:21:15",
"created_at": "2020-08-04 17:18:41",
"sub_stage": "Stage2"
},
{
"updated_at": "2020-08-04 17:21:15",
"created_at": "2020-08-04 17:21:07",
"sub_stage": "Stage3"
}
]
After aggregating based on some field, I have for each bucket some documents and every document will have status field. Now, what I want is to find the average of time difference between stage1 and stage3.
For ex: Suppose for id = 1 bucket consists of 100 documents. Then for each document I have to find the time difference between stage 1 and stage 3. Then, finally take the average of it.
I am able to perform till aggregation but stuck at finding average.
With some effort, I am using below script but have no idea whether it is correct :
Map findEvent(List events, String type) {
return events.find(it -> it.sub_stage == type);
}
return ChronoUnit.DAYS.between(Instant.parse(findEvent(params._source.events, 'Stage1').timestamp), Instant.parse(findEvent(params._source.events, 'Stage3').timestamp););
Is there any way I can perform this in Java with this script or any other script ?
Roughly, Query looks like:
{
"from": 0,
"size": 0,
"query": {
"bool": {
"must": [
{
"nested": {
"query": {
"bool": {
"should": [
{
"match": {
"status.sub_stage": {
"query": "Stage1",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"minimum_should_match": "1",
"boost": 1.0
}
},
"path": "status",
"ignore_unmapped": false,
"score_mode": "none",
"boost": 1.0
}
}
],
"adjust_pure_negative": true,
"minimum_should_match": "1",
"boost": 1.0
}
},
"aggregations": {
"id": {
"terms": {
"field": "id.keyword",
"size": 1000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"avg time": {
"avg": {
"script": {
"source": "Map findStage(List events, String type) { return events.find(it -> it.sub_stage == type); } return ChronoUnit.DAYS.between(Instant.parse(findStage(ctx._source.status, 'Stage1').timestamp), Instant.parse(findStage(ctx._source.status, 'Stage3').timestamp));",
"lang": "painless"
}
}
}
}
}
}
}

How to generate elastic search nested aggregations in java?

I have the below aggregation query which i need to translate into java
Below aggregation query needs to be translated into java using elastic search client RestHighLevelCleint
not sure i tried multiple times but the java code is not able to translate as per the below query.
{
"aggs": {
"recommendations": {
"nested": {
"path": "events.recommendationData"
},
"aggs": {
"exception": {
"filter": {
"terms": {
"events.recommendationData.exceptionId": [
"2"
]
}
},
"aggs": {
"exceptionIds": {
"terms": {
"field": "events.recommendationData.exceptionId.keyword",
"size": 10
},
"aggs": {
"recommendations": {
"nested": {
"path": "events.recommendationData.recommendations"
},
"aggs": {
"recommendationType": {
"terms": {
"field": "events.recommendationData.recommendations.recommendationType",
"size": 10
}
}
}
}
}
}
}
}
}
}
}
}
using the below code with RestHighLevelClient
AggregationBuilder recommendations =
AggregationBuilders.nested("recommendations", "events.recommendationData");
AggregationBuilder exception = AggregationBuilders
.filter("exception", QueryBuilders.termsQuery("events.recommendationData.exceptionId", "2"));
AggregationBuilder exceptionIds = AggregationBuilders.terms("exceptionIds")
.field("events.recommendationData.exceptionId.keyword").size(10);
AggregationBuilder recommendations2 =
AggregationBuilders.nested("recommendations", "events.recommendationData.recommendations");
AggregationBuilder recommendationType = AggregationBuilders.terms("recommendationType")
.field("events.recommendationData.recommendations.recommendationType").size(10);
AggregationBuilder build =
recommendations
.subAggregation(exception)
.subAggregation(exceptionIds)
.subAggregation(recommendations2)
.subAggregation(recommendationType);
and it is producing the wrong query as i posted below which is not working.
{
"aggregations": {
"recommendations": {
"nested": {
"path": "events.recommendationData"
},
"aggregations": {
"exception": {
"filter": {
"terms": {
"events.recommendationData.exceptionId": [
"1",
"2"
],
"boost": 1
}
}
},
"exceptionIds": {
"terms": {
"field": "events.recommendationData.exceptionId.keyword",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
}
},
"recommendations": {
"nested": {
"path": "events.recommendationData.recommendations"
}
},
"recommendationType": {
"terms": {
"field": "events.recommendationData.recommendations.recommendationType",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
}
}
}
}
}
}
Expected: Every aggregation is a sub aggregation to the previous.
Therefore, if you see the expected query, recommendationType is sub-aggregation of recommendations2. These together are sub-aggregation to exceptionIds and so on. Therefore only one line needs to change here, which is instead of
AggregationBuilder build =
recommendations
.subAggregation(exception)
.subAggregation(exceptionIds)
.subAggregation(recommendations2)
.subAggregation(recommendationType);
use this,
recommendations.subAggregation(
exception.subAggregation(
exceptionIds.subAggregation(
recommendations2.subAggregation(recommendationType)
)
)
);

Why does ElasticSearch is not showing the score?

I am using ElasticSearch 2.3.1 on Ubuntu 16.04.
The mapping is:
{
"settings": {
"analysis": {
"filter": {
"2gramsto3_filter": {
"type": "ngram",
"min_gram": 2,
"max_gram": 3
}
},
"analyzer": {
"2gramsto3": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"2gramsto3_filter"
]
}
}
}
},
"mappings": {
"agents": {
"properties": {
"presentation": {
"type": "string",
"analyzer": "2gramsto3"
},
"cv": {
"type": "string",
"analyzer": "2gramsto3"
}
}
}
}
The query is:
{
"size": 20,
"from": 0,
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
[
{
"match": {
"cv": "folletto"
}
},
{
"match": {
"cv": " psicologia"
}
},
{
"match": {
"cv": " tenacia"
}
}
]
]
}
}
]
}
}
}
It found 14567 documents but the score is always "_score": 0
I read the filters have the score, so, why not in this case?
Thank you!
The score is not calculated for filters. You need to use a normal query if you need scores.
Just take into account implications pointed out at the documentation below.
Ref doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html

Categories

Resources