Spring data ElastiSearch aggregations with filters

Spring data ElastiSearch aggregations with filters - java

I am trying to perform aggregations on values filtered by some conditions. I am using ElasticSearchTemplate.query() method of spring data too execute query and get the results in result extractor.
I am getting the hits correctly (i.e. filters are applied and docs matching those values are only retrieved.). However, aggregations are performed on all the docs. I believe aggregations should be applied to filtered values only. Following is the code I am using:
SearchQuery query = //get the query
SearchResponse hits = template.query(query, new ResultsExtractor<SearchResponse>() {
#Override
public SearchResponse extract(SearchResponse response) {
return response;
}
});
To debug the problem further, I wrote the code to execute the query rather than using spring data. Following is the code:
SearchRequestBuilder builder = esSetup.client().prepareSearch("document");
builder.setQuery(QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), query.getFilter()));
builder.addFields(query.getFields().toArray(new String[query.getFields().size()]));
for(AbstractAggregationBuilder aggregation : query.getAggregations()){
builder.addAggregation(aggregation);
}
SearchResponse response = builder.get();
To my surprise, this query executed correctly and filters were applied on aggregates as well. To analyze further, I went through the code of elasticsearchtemplate and found that it uses setPostFilter method to set the filter. I then modified my code to set the filter that way:
SearchRequestBuilder builder = esSetup.client().prepareSearch("document");
// builder.setQuery(QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), query.getFilter()));
builder.setPostFilter(query.getFilter());
builder.addFields(query.getFields().toArray(new String[query.getFields().size()]));
for(AbstractAggregationBuilder aggregation : query.getAggregations()){
builder.addAggregation(aggregation);
}
SearchResponse response = builder.get();
When I executed above code, it showed same behavior as spring data! (i.e. filters were applied on query but not aggregates.
Is this a bug of spring data es? If not, then, is there any other method which I should be using to retrieve the data the way I want?
Thanks in advance.

This behaviour is by design in Elasticsearch.
In very simple words, input to aggregations AND post filter is the set of documents that match the query section of the request body. Hence aggregations are not applied over the filtered documents.
However, if you do want aggregations to be applied over the filtered documents, "move the filters inside the query section", that is, use filtered query. Now output of the query section will be the filtered set of documents and aggregations will apply on them as expected.
So for your requirements, use filtered query instead of post filter.

Related

Springboot + Hibernate with ElasticSearch : Getting no results

I want to enable search for a bunch of fields for one of my entities. Therefore, I added hibernate search to my spring boot project. When I load data into the database, I can see that Elasticsearch contains that data as expected in the index running
curl localhost:9200/myindex/_search?pretty
I can run queries like
curl localhost:9200/myindex/_search?pretty&q=name:test
and receive the expected results.
I would like to give consumers of my API the option to run arbitrary queries like "name:test" against the index, so that
curl "localhost:8086/myentity/search/querySearch?query=name:test"
would return the same results as before in the direct query.
Here's what I am trying but whatever I do, I get 0 results:
public List<MyEntity> querySearch(String queryString) {
QueryParser queryParser = new MultiFieldQueryParser(ALL_FIELDS, new SimpleAnalyzer());
queryParser.setDefaultOperator(QueryParser.AND_OPERATOR);
org.apache.lucene.search.Query query = queryParser.parse(QueryParser.escape(queryString));
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(this.entityManager);
javax.persistence.Query persistenceQuery =
fullTextEntityManager.createFullTextQuery(query, MyEntity.class);
return persistenceQuery.getResultList();
}

By calling QueryParser.escape(queryString), you remove the meaning of operators such as :. So if the user enters name:test, you will end up looking for documents that contain name:test (literally), instead of looking for documents whose name field contains test.
Remove that escape and everything should work as you want.
By the way, you're essentially using Lucene to parse a query that will then be sent to Elasticsearch. An easier solution would be to send the query to Elasticsearch directly, especially if you do not need to prevent users from accessing some fields.
public List<Training> querySearch(String queryString) {
FullTextEntityManager fullTextEm = Search.getFullTextEntityManager(this.entityManager);
QueryDescriptor query = ElasticsearchQueries.fromQueryString(queryString);
javax.persistence.Query persistenceQuery = fullTextEm.createFullTextQuery(query, Training.class);
return persistenceQuery.getResultList();
}
See https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#_queries

How to get document fields when a failure happens in an Elasticsearch bulk operation?

I am currently migrating entities from a DBMS to Elasticsearch using bulk requests, but I want to be able to identify which specific entities failed during the operation (without having to query Elastic after the operation).
I noticed the BulkItemResponse.Failure class has an ID_FIELD, but that seems to be the action, not the document.
Is there any field in the response that I can use to retrieve the fields of the failing documents?

After reading the Elasticsearch forums, it seems it's only possible to retrieve the index of the document in the bulk request, but not the content of the document
https://discuss.elastic.co/t/way-to-re-index-failed-documents-using-bulkprocessor/33736/3
For the record, this is how I extracted the failing items id from the bulk response:
List<Integer> processBulkResponse(BulkResponse bulkResponse) {
List<Integer> failures = new ArrayList<>();
for (BulkItemResponse bulkItemResponse : bulkResponse) {
if (bulkItemResponse.isFailed()) {
failures.add(bulkItemResponse.getItemId());
}
}
return failures;
}

Elasticsearch using Java api

Hi I am trying to do query on elastic search by following the sql query and I want to implement same logic using Java API
select * from log , web where l.loghost = w.webhost and #datetime between '2016-05-20' AND '2016-05-25'
log and web are different types, and indices are set to logstash-log-* and logstash-web*, #timestamp format looks like "2016-05-20T17:14:01.037Z"
Now I have the following Java code but i don't know how to set between two dates ,so it does not return expected output
SearchResponse response = client.prepareSearch("logstash-log-*","logstash-web-*")
.setTypes("log","web")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setFetchSource(new String[]{"*"}, null)
.setQuery(QueryBuilders.queryStringQuery("1.2.3.4").field("*_host"))// Query
.execute()
.actionGet();
Please guide I am new to Elastic search. Thanks in advance.

You need to combine a range query with your query_string query inside a bool/filter query:
QueryStringQueryBuilder qs = QueryBuilders.queryStringQuery("1.2.3.4").field("*_host");
RangeQueryBuilder range = QueryBuilders.rangeQuery("#timestamp")
.gte("2016-05-20T00:00:00.000Z")
.lte("2016-05-25T00:00:00.000Z");
and then
...
.setQuery(QueryBuilders.boolQuery().filter(qs).filter(range))
...

How to get facet value for all fields using Java API

I'm new to Elasticsearch and tried to query some sample documents. I issued the following query using the Java API. This query fetched me the correct result. It returned the names of all categories. Now I want the count of all names of a category. Could you explain me how to do that? I'm sorry for my bad English.
SearchResponse sr = client.prepareSearch()
.addField("Category")
.setQuery(QueryBuilders.matchAllQuery())
.addFacet(FacetBuilders.termsFacet("f")
.field("Category"))
.execute()
.actionGet();

Look at the Count API to count your results if you not want to get the result set (matched documents) but only the count.
If you want to get the result set, you get the result size in the response for every filter or query request too.

ElasticSearch - Using FilterBuilders

I am new to ElasticSearch and Couchbase. I am building a sample Java application to learn more about ElasticSearch and Couchbase.
Reading the ElasticSearch Java API, Filters are better used in cases where sort on score is not necessary and for caching.
I still haven't figured out how to use FilterBuilders and have following questions:
Can FilterBuilders be used alone to search?
Or Do they always have to be used with a Query? ( If true, can someone please list an example? )
Going through a documentation, if I want to perform a search based on field values and want to use FilterBuilders, how can I accomplish that? (using AndFilterBuilder or TermFilterBuilder or InFilterBuilder? I am not clear about the differences between them.)
For the 3rd question, I actually tested it with search using queries and using filters as shown below.
I got empty result (no rows) when I tried search using FilterBuilders. I am not sure what am I doing wrong.
Any examples will be helpful. I have had a tough time going through documentation which I found sparse and even searching led to various unreliable user forums.
private void processQuery() {
SearchRequestBuilder srb = getSearchRequestBuilder(BUCKET);
QueryBuilder qb = QueryBuilders.fieldQuery("doc.address.state", "TX");
srb.setQuery(qb);
SearchResponse resp = srb.execute().actionGet();
System.out.println("response :" + resp);
}
private void searchWithFilters(){
SearchRequestBuilder srb = getSearchRequestBuilder(BUCKET);
srb.setFilter(FilterBuilders.termFilter("doc.address.state", "tx"));
//AndFilterBuilder andFb = FilterBuilders.andFilter();
//andFb.add(FilterBuilders.termFilter("doc.address.state", "TX"));
//srb.setFilter(andFb);
SearchResponse resp = srb.execute().actionGet();
System.out.println("response :" + resp);
}
--UPDATE--
As suggested in the answer, changing to lowercase "tx" works. With this question resolved. I still have following questions:
In what scenario(s), are filters used with query? What purpose will this serve?
Difference between InFilter, TermFilter and MatchAllFilter. Any illustration will help.

Right, you should use filters to exclude documents from being even considered when executing the query. Filters are faster since they don't involve any scoring, and cacheable as well.
That said, it's pretty obvious that you have to use a filter with the search api, which does execute a query and accepts an optional filter. If you only have a filter you can just use the match_all query together with your filter. A filter can be a simple one, or a compund one in order to combine multiple filters together.
Regarding the Java API, the names used are the names of the filters available, no big difference. Have a look at this search example for instance. In your code I don't see where you do setFilter on your SearchRequestBuilder object. You don't seem to need the and filter either, since you are using a single filter. Furthermore, it might be that you are indexing using the default mappings, thus the term "TX" is lowercased. That's why when you search using the term filter you don't find any match. Try searching for "tx" lowercased.
You can either change your mapping if you want to keep the "TX" term as it is while indexing, probably setting the field as not_analyzed if it should only be a single token. Otherwise you can change filter, you might want to have a look at a query that is analyzed, so that your query wil be analyzed the same way the content was indexed.
Have a look at the query DSL documentation for more information regarding queries and filters:
MatchAllFilter: matches all your document, not that useful I'd say
TermFilter: Filters documents that have fields that contain a term (not analyzed)
AndFilter: compound filter used to put in and two or more filters
Don't know what you mean by InFilterBuilder, couldn't find any filter with this name.
The query usually contains what the user types in through the text search box. Filters are more way to refine the search, for example clicking on facet entries. That's why you would still have the query plus one or more filters.

To append to what #javanna said:
A lot of confusion can come from the fact that filters can be defined in several ways:
standalone (with a required query, for instance match_all if all you need is the filters) (http://www.elasticsearch.org/guide/reference/api/search/filter/)
or as part of a filtered query (http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/)
What's the difference you might ask. And indeed you can construct exactly the same logic in both ways.
The difference is that a query operates on BOTH the resultset as well as any facets you have defined. Whereas, a Filter (when defined standalone) only operates on the resultset and NOT on any facets you may have defined (explained here: http://www.elasticsearch.org/guide/reference/api/search/filter/)

To add to the other answers, InFilter is only used with FilterBuilders. The definition is, InFilter: A filter for a field based on several terms matching on any of them.
The query Java API uses FilterBuilders, which is a factory for filter builders that can dynamically create a query from Java code. We do this using a form and we build our query based on user selections from it with checkboxes, options, and dropdowns.
Here is some Example code for FilterBuilders and there is a snippet from that link that uses InFilter as shown below:
FilterBuilder filterBuilder;
User user = (User) auth.getPrincipal();
if (user.getGroups() != null && !user.getGroups().isEmpty()) {
filterBuilder = FilterBuilders.boolFilter()
.should(FilterBuilders.nestedFilter("userRoles", FilterBuilders.termFilter("userRoles.key", auth.getName())))
.should(FilterBuilders.nestedFilter("groupRoles", FilterBuilders.inFilter("groupRoles.key", user.getGroups().toArray())));
} else {
filterBuilder = FilterBuilders.nestedFilter("userRoles", FilterBuilders.termFilter("userRoles.key", auth.getName()));
}
...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.