Elasticsearch Pagination on composite aggregation gives wrong results

Elasticsearch Pagination on composite aggregation gives wrong results - java

I'm having such kind of problem on defining composite aggregation for paginating the results.
CompositeAggregationBuilder aggregationBuilder = AggregationBuilders
.composite(aggrField, List.of(new TermsValuesSourceBuilder(aggrField).field(aggrField)))
.aggregateAfter(Map.of("keyword", aggrField))
.size(bucketListInfo.getTopResultsCount());
searchSourceBuilder
.from(2)
.size(4)
.aggregation(aggregationBuilder);
final SearchRequest searchRequest = new SearchRequest(bucketListInfo.getIndexName())
.source(searchSourceBuilder);
try {
final SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
return processBucketsFromResponse(bucketListInfo, response);
} catch (Exception e) {
log.error(e.getMessage(), e);
return null;
}
the problem comes this way:
when I getting the response, it contains all hits, not only paginated ones, say if I specify the from param as 2, and size param as 4 and expect that it will give me the results starting from second page with the size of 4, but it gives me all records, no matter what pagination params I have specified.
when I add this
.aggregateAfter(Map.of("keyword", aggrField))
I get this kind of error :
[type=illegal_argument_exception, reason=Missing value for [after.[my-aggr-field].keyword]]];
So any workaround of this, any suggestions please.

Related

Elasticsearch wildcard query in Java - find all matching fields and replace

I want to update all path fields starting with "example/one".
Map<String, Object> parameters = new HashMap<>();
parameters.put("old", "example/one");
parameters.put("new", "new/example");
UpdateByQueryRequest request = new UpdateByQueryRequest(index)
.setDocTypes(type)
.setScript(new Script(ScriptType.INLINE,
"painless",
"ctx._source.path = ctx._source.path.replace(params.old, params.new);",
parameters))
.setQuery(QueryBuilders.wildcardQuery("path.tree", "example/one*"));
client.updateByQuery(request, RequestOptions.DEFAULT);
It's not working (no update, no errors - tried a prefixQuery, same). The following query however is updating the matching documents (Postman).
POST my_index/_update_by_query
{
"script": {
"lang": "painless",
"inline": "ctx._source.path = ctx._source.path.replace(\"example/one\", \"new/example\")"
},
"query": {
"wildcard": {
"path.tree": {
"value: "example/one*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
What am I missing? The Path hierarchy tokenizer is used on the field path.
Your help is much needed.
PS: I can't upgrade to a newer version of elasticsearch.

When testing the solution, I first thought it was related to the custom analyser used on the path field. But it was quickly discarded as I was getting the expected result via Postman.
I finally decided to go with a 'two steps' solution (couldn't use the update by query API). First search for all matching documents, then perform a bulk update.
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.wildcardQuery("path.tree", "example/one*"))
.withSourceFilter(new FetchSourceFilter(new String[]{"_id", "path"}, null))
.build();
List<MyClass> result = elasticsearchRestTemplate.queryForList(query, MyClass.class);
if(!CollectionUtils.isEmpty(result)) {
Map<String, Object> parameters = new HashMap<>();
parameters.put("old", "example/one");
parameters.put("new", "new/example");
Script script = new Script(ScriptType.INLINE,
"painless",
"ctx._source.path = ctx._source.path.replace(params.old, params.new)",
parameters);
BulkRequest request = new BulkRequest();
for (MyClass myClass : result) {
request.add(new UpdateRequest(index, type, myClass.getId()).script(script));
}
client.bulk(request, RequestOptions.DEFAULT);
}
UPDATE
Turns out setting the type in the request was the problem.
UpdateByQueryRequest request = new UpdateByQueryRequest(index)
.setDocTypes(type) <--------------- Remove
.....

Elasticsearch filter result ignoring search keyword

I am getting good result with normal search query.
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder queryBuilder = new BoolQueryBuilder();
String keyword = requestBody.getKeyword();
queryBuilder.should(QueryBuilders.matchQuery("fullText", keyword));
searchSourceBuilder.query(queryBuilder);
searchSourceBuilder.from(requestBody.getPage() - 1);
searchSourceBuilder.size(BROWSE_PAGE_DATA_LIMIT);
searchRequest.source(searchSourceBuilder);
try {
return client.search(searchRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
throw new HttpServerErrorException(HttpStatus.INTERNAL_SERVER_ERROR, "Error in ES search");
}
But when I add filtering with it, the result is ignoring my search keyword.
queryBuilder.filter(QueryBuilders.termsQuery("authorId", filter.getAuthorIds()));
here I am trying to replace fq of solr. What's wrong at my approach.

Excerpt from ES Docs
If the bool query includes at least one should clause and no must or
filter clauses, the default value is 1. Otherwise, the default value
is 0.
Basically, if there is a filter or/and must clause with-in a bool query then the should clause is ignored until min_should_match is set to a suitable value.
Set minShouldMatch to 1. e.g:
queryBuilder.should(QueryBuilders.matchQuery("fullText", keyword)).minimumShouldMatch(1);

Fetch all record including particular fields

I am working with Elasticcsearch 7.3. I want to fetch only two records of all the documents using JAVA Api from my index. I am using the following code but it returning the null object.
RestHighLevelClient client;
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.fetchSource("recipe_ID,recipe_url", null);
sourceBuilder.from(0);
SearchRequest searchRequest = new SearchRequest("recipes");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHit searchHit = searchResponse.getHits().getAt(0);
String resultString = searchHit.getSourceAsString();
return resultString;
I need to include only two fields recipe_ID and recipe_url in my result.

You're on the right path, although source filtering requires you to specify your fields in an array like this:
String[] includeFields = new String[] {"recipe_ID", "recipe_url"};
sourceBuilder.fetchSource(includeFields, null);

Elasticsearch set limit and offset using RestHighLevelClient

I search in Elasticsearch using MultiSearchRequest by part of the field:
#Override
public Collection<Map<String, Object>> findContractsByIndexAndWord(String index, String type, String word) throws CommonUserException {
MultiSearchRequest request = new MultiSearchRequest();
word = word.toLowerCase();
request.add(formSearchRequestForMultiSearch(index, type, ID_FIELD, word));
request.add(formSearchRequestForMultiSearch(index, type, PROVIDER_ID_FIELD, word));
MultiSearchResponse searchResponse;
try (RestHighLevelClient client = getClient()) {
searchResponse = client.multiSearch(request);
return formContracts(searchResponse);
} catch (IOException e) {
throw new CommonUserException(ELASTIC_EXCEPTION, ELASTIC_EXCEPTION);
}
}
private SearchRequest formSearchRequestForMultiSearch(String index, String type, String field, String word) {
SearchRequest searchRequest = new SearchRequest(index);
searchRequest.types(type);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.wildcardQuery(field, word));
searchRequest.source(searchSourceBuilder);
return searchRequest;
}
private Collection<Map<String, Object>> formContracts(MultiSearchResponse response) {
Collection<Map<String, Object>> contracts = new LinkedList<>();
for (int i = 0; i < response.getResponses().length; i++) {
SearchHit[] hits = response.getResponses()[i].getResponse().getHits().getHits();
for (SearchHit hit : hits) {
if (!contracts.contains(hit.getSourceAsMap())) {
contracts.add(hit.getSourceAsMap());
}
}
}
return contracts;
}
How can I add to this request limit and offset of the result?

From the elasticsearch documentation (last version)
Here are a few examples of some common options:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); //1
sourceBuilder.query(QueryBuilders.termQuery("user", "kimchy")); //2
sourceBuilder.from(0); //3
sourceBuilder.size(5); //4
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); //5
Create a SearchSourceBuilder with default options.
Set the query. Can be any type of QueryBuilder
Set the from option that determines the result index to start searching from. Defaults to 0.
Set the size option that determines the number of search hits to return. Defaults to 10.
Set an optional timeout that controls how long the search is allowed to take.
Size and From are what is usually known as offset/limit.
You may want to have a look at the scroll API if you use it intensively, as offset/limit based result traversal may be kind of slow under certain circumstances, and the scroll API is the way to avoid most of this (if you come from a SQL background as I guess from the limit/offset terminology, think of the Scroll API as the equivalent of keeping a SQL Cursor open to carry on from where you left off iterating).

How to get the elasticsearch json response using aggregations in spring-data-elasticsearch?

I have the following:
I notice that at the end of running the code, if I print out aggregations.asMap().get('subjects');
I am getting:
org.elasticsearch.search.aggregations.bucket.terms.StringTerms#6cff59fa
Printing out "aggregations" gives me: org.elasticsearch.search.aggregations.InternalAggregations#65cf321d
What I really want is the entire string/json response that is normally returned if you were to curl on elasticsearch to get aggregations. How do I get to the raw response from the aggregation query? Also, is there a way to iterate and print out what's in those "wrapped up" objects?
https://github.com/spring-projects/spring-data-elasticsearch/blob/ab7e870d5f82f6c0de236048bd7001e8e7d2a680/src/test/java/org/springframework/data/elasticsearch/core/aggregation/ElasticsearchTemplateAggregationTests.java
#Test
public void shouldReturnAggregatedResponseForGivenSearchQuery() {
// given
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(matchAllQuery())
.withSearchType(COUNT)
.withIndices("articles").withTypes("article")
.addAggregation(terms("subjects").field("subject"))
.build();
// when
Aggregations aggregations = elasticsearchTemplate.query(searchQuery, new ResultsExtractor<Aggregations>() {
#Override
public Aggregations extract(SearchResponse response) {
return response.getAggregations();
}
});
// then
System.out.println(aggregations); // gives me some cryptic InternalAggregations object, how do I get to the raw JSON normally returned by elasticsearch?
System.out.println(aggregations.asMap().get("subjects")); // gives me some StringTerms object I have no idea how to iterate over to get results
}

You cannot get the raw JSON response this way, since Spring Data Elasticsearch will take care of parsing it for you, that's the whole point.
If you need to parse those buckets, you can do it like this easily:
...
StringTerms subjects = aggregations.asMap().get("subjects");
for (Terms.Bucket bucket : subjects.getBuckets()) {
String key = bucket.getKey();
long docCount = bucket.getDocCount();
// do something with the key and the doc count
}
If you really want to see the JSON being returned, what you can do is to re-write the parsed Aggregations object into JSON using serialization, but that won't really be helpful:
InternalAggregations aggregations = ...;
XContentBuilder jsonBuilder = JsonXContent.contentBuilder();
aggregations.toXContent(jsonBuilder, ToXContent.EMPTY_PARAMS);
String rawJson = jsonBuilder.string();

Set Size of EsRequest to Zero
Get Esresponse.toString()
Convert String to Json
Get aggregation field from Json.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Elasticsearch Pagination on composite aggregation gives wrong results - java

Related

Elasticsearch wildcard query in Java - find all matching fields and replace

Elasticsearch filter result ignoring search keyword

Fetch all record including particular fields

Elasticsearch set limit and offset using RestHighLevelClient

How to get the elasticsearch json response using aggregations in spring-data-elasticsearch?

Categories

Resources