How to get facet value for all fields using Java API - java

I'm new to Elasticsearch and tried to query some sample documents. I issued the following query using the Java API. This query fetched me the correct result. It returned the names of all categories. Now I want the count of all names of a category. Could you explain me how to do that? I'm sorry for my bad English.
SearchResponse sr = client.prepareSearch()
.addField("Category")
.setQuery(QueryBuilders.matchAllQuery())
.addFacet(FacetBuilders.termsFacet("f")
.field("Category"))
.execute()
.actionGet();

Look at the Count API to count your results if you not want to get the result set (matched documents) but only the count.
If you want to get the result set, you get the result size in the response for every filter or query request too.

Related

How to count Elastic Search documents based on multiple field values with the Java API?

I am attempting to count the number of documents of a specific type based on the value of 2 of the documents fields.
So far I can count the number of documents using:
SearchResponse response = elasticClient.getClient().prepareSearch(Properties.get().getSearch().getAliasName())
.setTypes(type)
.setSize(0) // Don't return any documents, we don't need them.
.get();
SearchHits hits = response.getHits();
return (int) hits.getTotalHits();
Which returns the correct number of total documents.
I can also search for a specific document using:
searchResponse = elasticClient.getClient()
.prepareSearch(index)
.setSearchType(SearchType.QUERY_THEN_FETCH)
.setQuery(QueryBuilders.prefixQuery(field_name, field_value))
.setFetchSource(true)
.setSize(10000)
.setTypes(type)
.addSort(builder.order(SortOrder.DESC))
.get();
This also returns documents.
I have attempted to count based on 2 field values using:
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
queryBuilder.must(QueryBuilders.prefixQuery(field_1, value_1));
queryBuilder.must(QueryBuilders.prefixQuery(field_2, value_2));
SearchRequestBuilder searchBuilder = elasticClient.getClient().prepareSearch(index)
.setQuery(queryBuilder)
.setTypes(type)
.setSize(0);
But this always returns 0.
How do I correctly count documents based on 2 field values using the Java API?
From inspecting the query, you are not passing both values together. You are first passing the query for field_1 and later overwriting it with the query for field_2. So the variable queryBuilder only has the query: QueryBuilders.prefixQuery(field_2, value_2) by the time you pass it to the client.
In elasticsearch query DSL, the must object is an array of queries that behave similar to an AND in a SQL where clause. So all the queries in the must list need to match. To include both queries in the must do the following:
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
queryBuilder.must().add(QueryBuilders.prefixQuery(field_1, value_1));
queryBuilder.must().add(QueryBuilders.prefixQuery(field_2, value_2));
As for the count, size = 0 should do the trick. What kind of elasticsearch client are you using?

Elasticsearch Java API - How to get the number of documents without retrieving the documents

I need to get the number of documents in an index. not the documents themselves, but just this "how many" .
What's the best way to do that?
There is https://www.elastic.co/guide/en/elasticsearch/reference/current/search-count.html. but I'm looking to do this in Java.
There also is https://www.elastic.co/guide/en/elasticsearch/client/java-api/2.4/count.html, but it seems way old.
I can get all the documents in the given index and come up with "how many". But there must be a better way.
Use the search API, but set it to return no documents and retrieve the count of hits from the SearchResponse object it returns.
For example:
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.index.query.QueryBuilders.*;
SearchResponse response = client.prepareSearch("your_index_goes_here")
.setTypes("YourTypeGoesHere")
.setQuery(QueryBuilders.termQuery("some_field", "some_value"))
.setSize(0) // Don't return any documents, we don't need them.
.get();
SearchHits hits = response.getHits();
long hitsCount = hits.getTotalHits();
Just an addition to #evanjd's answer
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.index.query.QueryBuilders.*;
SearchResponse response = client.prepareSearch("your_index_goes_here")
.setTypes("YourTypeGoesHere")
.setQuery(QueryBuilders.termQuery("some_field", "some_value"))
.setSize(0) // Don't return any documents, we don't need them.
.get();
SearchHits hits = response.getHits();
long hitsCount = hits.getTotalHits().value;
we need to add .value to get long value of total hits otherwise it will be a string value like "6 hits"
long hitsCount = hits.getTotalHits().value;
long hitsCount = hits.getTotalHits().value;
Elastic - Indices Stats
Indices level stats provide statistics on different operations
happening on an index. The API provides statistics on the index level
scope (though most stats can also be retrieved using node level
scope).
prepareStats(indexName)
client.admin().indices().prepareStats(indexName).get().getTotal().getDocs().getCount();
Breaking changes after 7.0; you need to set track_total_hits to true explicitly in the search request.
https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes-7.0.html#track-total-hits-10000-default
We can also get lowLevelClient from highLevelClient and invoke the "_count" rest API like "GET /twitter/_doc/_count?q=user:kimchy".
2021 Solution
I went through the solutions posted and none of them are convincing. You may get the job done by setting size of the search request to 0 but that's not the correct way. For counting purposes we should use the count API because count consumes less resources/bandwidth and it doesn't require to fetch documents, scoring and other internal optimisations.
You must use the Count API for Java (link attached below) to get the count of the documents. Following piece of code should get the job done.
Build query using QueryBuilder
Pass the query and list of indexes to the CountRequest() constructor
Get CountResponse() object by doing client.count(countReq)
Extract/Return the value by doing countResp.getCount()
CountRequest countReq = new CountRequest(indexes, query);
CountResponse countResp = client.count(countReq, RequestOptions.DEFAULT);
return countResp.getCount();
Read the second link for more information.
Important Links
Count API vs Search API : Counting number of documents using Elasticsearch
Count API for Java : https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-count.html

Explanation on the "Rank" for Retrieve & Rank service in Java

Does anyone ever used the Retrieve & Rank service with Java SDK (Rank service particularly) ?
I want to understand how it works because some points seem me not logical :
What is the difference between the Java approach, where we must execute a search query with Apache Solr, and then call the method rank; and the CURL approach, where we just have to run a single query?
Why the method rank takes a CSV file that contains results from the search query whereas we apparently cannot have the result of a search query in CSV?
I did not find my responses neitheir in this documentation nor in this example.
Thanks for your time.
I have never used Retrieve and Rank before but by reading the documentation here are my thoughts
I do not think that there any difference between Java approach and CURL. From what I understand Search and rank in curl uses this command
curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/sc1ca23733_faa8_49ce_b3b6_dc3e193264c6/solr/example_collection/fcselect?ranker_id=B2E325-rank-67&q=what%20is%20the%20basic%20mechanism%20of%20the%20transonic%20aileron%20buzz&wt=json&fl=id,title"
while in Java
RetrieveAndRank service = new RetrieveAndRank();
service.setUsernameAndPassword("{username}","{password}");
HttpSolrClient solrClient = new HttpSolrClient;
solrClient = getSolrClient(service.getSolrUrl("scfaaf8903_02c1_4297_84c6_76b79537d849"), "{username}","{password}");
SolrQuery query = new SolrQuery("what is the basic mechanism of the transonic aileron buzz");
QueryResponse response = solrClient.query("example_collection", query);
Ranking ranking = service.rank("B2E325-rank-67", response);
System.out.println(ranking);
I think what the curl command would do, at the back end it would fire a search in Solr using the query specified and after the results returned it would rank them.
In Java this is done explicitly, instead of having a method queryAndRank you have two methods, one that is going to run in Solr, get the results from there and then forward these results to ranking system.
The search in Solr can return csv.
The CSVResponseWriter can write the list of documents in a response in
CSV format.
http://wiki.apache.org/solr/CSVResponseWriter

Elasticsearch using Java api

Hi I am trying to do query on elastic search by following the sql query and I want to implement same logic using Java API
select * from log , web where l.loghost = w.webhost and #datetime between '2016-05-20' AND '2016-05-25'
log and web are different types, and indices are set to logstash-log-* and logstash-web*, #timestamp format looks like "2016-05-20T17:14:01.037Z"
Now I have the following Java code but i don't know how to set between two dates ,so it does not return expected output
SearchResponse response = client.prepareSearch("logstash-log-*","logstash-web-*")
.setTypes("log","web")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setFetchSource(new String[]{"*"}, null)
.setQuery(QueryBuilders.queryStringQuery("1.2.3.4").field("*_host"))// Query
.execute()
.actionGet();
Please guide I am new to Elastic search. Thanks in advance.
You need to combine a range query with your query_string query inside a bool/filter query:
QueryStringQueryBuilder qs = QueryBuilders.queryStringQuery("1.2.3.4").field("*_host");
RangeQueryBuilder range = QueryBuilders.rangeQuery("#timestamp")
.gte("2016-05-20T00:00:00.000Z")
.lte("2016-05-25T00:00:00.000Z");
and then
...
.setQuery(QueryBuilders.boolQuery().filter(qs).filter(range))
...

Spring data ElastiSearch aggregations with filters

I am trying to perform aggregations on values filtered by some conditions. I am using ElasticSearchTemplate.query() method of spring data too execute query and get the results in result extractor.
I am getting the hits correctly (i.e. filters are applied and docs matching those values are only retrieved.). However, aggregations are performed on all the docs. I believe aggregations should be applied to filtered values only. Following is the code I am using:
SearchQuery query = //get the query
SearchResponse hits = template.query(query, new ResultsExtractor<SearchResponse>() {
#Override
public SearchResponse extract(SearchResponse response) {
return response;
}
});
To debug the problem further, I wrote the code to execute the query rather than using spring data. Following is the code:
SearchRequestBuilder builder = esSetup.client().prepareSearch("document");
builder.setQuery(QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), query.getFilter()));
builder.addFields(query.getFields().toArray(new String[query.getFields().size()]));
for(AbstractAggregationBuilder aggregation : query.getAggregations()){
builder.addAggregation(aggregation);
}
SearchResponse response = builder.get();
To my surprise, this query executed correctly and filters were applied on aggregates as well. To analyze further, I went through the code of elasticsearchtemplate and found that it uses setPostFilter method to set the filter. I then modified my code to set the filter that way:
SearchRequestBuilder builder = esSetup.client().prepareSearch("document");
// builder.setQuery(QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), query.getFilter()));
builder.setPostFilter(query.getFilter());
builder.addFields(query.getFields().toArray(new String[query.getFields().size()]));
for(AbstractAggregationBuilder aggregation : query.getAggregations()){
builder.addAggregation(aggregation);
}
SearchResponse response = builder.get();
When I executed above code, it showed same behavior as spring data! (i.e. filters were applied on query but not aggregates.
Is this a bug of spring data es? If not, then, is there any other method which I should be using to retrieve the data the way I want?
Thanks in advance.
This behaviour is by design in Elasticsearch.
In very simple words, input to aggregations AND post filter is the set of documents that match the query section of the request body. Hence aggregations are not applied over the filtered documents.
However, if you do want aggregations to be applied over the filtered documents, "move the filters inside the query section", that is, use filtered query. Now output of the query section will be the filtered set of documents and aggregations will apply on them as expected.
So for your requirements, use filtered query instead of post filter.

Categories

Resources