Range queries in Elasticsearch Java API

Range queries in Elasticsearch Java API - java

I have two fields in my ES index: min_duration and max_duration. I want to create a query to find all the documents for input duration such that :
min_duration<=duration<=max_duration
For example if duration is 30 seconds then I should get all docs having min_duration less than eq to duration and duration less than eq to max_duration.
I am using ES Java API and seems like range filter is the way to go. I have constructed the range filter as follows:
val filter = FilterBuilders.andFilter( FilterBuilders.rangeFilter("min_duration").lte(duration),FilterBuilders.rangeFilter("max_duration").gte(duration))
Though it still not seems to work for me. Is it the correct way to build this type of query or am I missing something?
Thanks.

Try doing it with bool query. Wrap your two range clauses inside it like
QueryBuilder qb = boolQuery()
.must(rangeQuery("max_duration").gte(duration))
.must(rangeQuery("min_duration").lte(duration));
Does this help?

Related

How to get the first result from a group (Jooq)

My requirement is to take a list of identifiers, each of which could refer to multiple records, and return the newest record per identifier.
This would seem to be doable with a combination of orderBy(date, desc) and fetchGroups() on the identifier column. I then use values() to get the Result objects.
At this point, I want the first value in each result object. I can do get(0) to get the first value in the list, but that seems like cheating. Is there a better way to get that first result from a Result object?

You're going to write a top-1-per-category query, which is a special case of a top-n-per-category query. Most syntaxes that produce this behaviour in SQL are supported by jOOQ as well. You shouldn't use grouping in the client, because you'd transfer all the excess data from the server to the client, which corresponds to the remaining results per group.
Some examples:
Standard SQL (when window functions are supported)
Field<Integer> rn = rowNumber().over(T.DATE.desc()).as("rn");
var subquery = table(
select(T.fields())
.select(rn)
.from(T)
).as("subquery");
var results =
ctx.select(subquery.fields(T.fields())
.from(subquery)
.where(subquery.field(rn).eq(1))
.fetch();
Teradata and H2 (we might emulate this soon)
var results =
ctx.select(T.fields())
.from(T)
.qualify(rowNumber().over(T.DATE.desc()).eq(1))
.fetch();
PostgreSQL
var results =
ctx.select(T.fields())
.distinctOn(T.DATE)
.from(T)
.orderBy(T.DATE.desc())
.fetch();
Oracle
var results =
ctx.select(
T.DATE,
max(T.COL1).keepDenseRankFirstOrderBy(T.DATE.desc()).as(T.COL1),
max(T.COL2).keepDenseRankFirstOrderBy(T.DATE.desc()).as(T.COL2),
...
max(T.COLN).keepDenseRankFirstOrderBy(T.DATE.desc()).as(T.COLN))
.from(T)
.groupBy(T.DATE)
.fetch();

Elasticsearch Java API - How to get the number of documents without retrieving the documents

I need to get the number of documents in an index. not the documents themselves, but just this "how many" .
What's the best way to do that?
There is https://www.elastic.co/guide/en/elasticsearch/reference/current/search-count.html. but I'm looking to do this in Java.
There also is https://www.elastic.co/guide/en/elasticsearch/client/java-api/2.4/count.html, but it seems way old.
I can get all the documents in the given index and come up with "how many". But there must be a better way.

Use the search API, but set it to return no documents and retrieve the count of hits from the SearchResponse object it returns.
For example:
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.index.query.QueryBuilders.*;
SearchResponse response = client.prepareSearch("your_index_goes_here")
.setTypes("YourTypeGoesHere")
.setQuery(QueryBuilders.termQuery("some_field", "some_value"))
.setSize(0) // Don't return any documents, we don't need them.
.get();
SearchHits hits = response.getHits();
long hitsCount = hits.getTotalHits();

Just an addition to #evanjd's answer
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.index.query.QueryBuilders.*;
SearchResponse response = client.prepareSearch("your_index_goes_here")
.setTypes("YourTypeGoesHere")
.setQuery(QueryBuilders.termQuery("some_field", "some_value"))
.setSize(0) // Don't return any documents, we don't need them.
.get();
SearchHits hits = response.getHits();
long hitsCount = hits.getTotalHits().value;
we need to add .value to get long value of total hits otherwise it will be a string value like "6 hits"
long hitsCount = hits.getTotalHits().value;
long hitsCount = hits.getTotalHits().value;

Elastic - Indices Stats
Indices level stats provide statistics on different operations
happening on an index. The API provides statistics on the index level
scope (though most stats can also be retrieved using node level
scope).
prepareStats(indexName)
client.admin().indices().prepareStats(indexName).get().getTotal().getDocs().getCount();

Breaking changes after 7.0; you need to set track_total_hits to true explicitly in the search request.
https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes-7.0.html#track-total-hits-10000-default

We can also get lowLevelClient from highLevelClient and invoke the "_count" rest API like "GET /twitter/_doc/_count?q=user:kimchy".

2021 Solution
I went through the solutions posted and none of them are convincing. You may get the job done by setting size of the search request to 0 but that's not the correct way. For counting purposes we should use the count API because count consumes less resources/bandwidth and it doesn't require to fetch documents, scoring and other internal optimisations.
You must use the Count API for Java (link attached below) to get the count of the documents. Following piece of code should get the job done.
Build query using QueryBuilder
Pass the query and list of indexes to the CountRequest() constructor
Get CountResponse() object by doing client.count(countReq)
Extract/Return the value by doing countResp.getCount()
CountRequest countReq = new CountRequest(indexes, query);
CountResponse countResp = client.count(countReq, RequestOptions.DEFAULT);
return countResp.getCount();
Read the second link for more information.
Important Links
Count API vs Search API : Counting number of documents using Elasticsearch
Count API for Java : https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-count.html

Hadoop MongoConfigUtil query limit

I am using the Java MongoDB Connector to run an Hadoop Mapreduce job against MongoDB.
I am setting the input and output URI with the MongoConfigUtil
MongoConfigUtil.setInputURI( conf, "mongodb://host/db.collection" );
MongoConfigUtil.setOutputURI( conf, "mongodb://host/db.collectionOut" );
And the Job is correctly fetching all the document in the specified collection.
Is there a way to limit the number of fetched document?
I wish to achieve this query(Mongo Style):
db.collection.find().limit(1000)
I know MongoConfigUtil has a SetQuery method but how can I set the limit query? Any hints?
I tried to add
MongoConfigUtil.setLimit(conf, 1000)
But I still get all the documents in the collection.

setSplitSize 8 MB is default Size and this property has higher priority than setLimit(mongo.input.limit).
Example mongoConfig.setSplitSize(5); // MB - 8 MB Deafault
In the example above i set the value to 5 MB.
If the stated limit size(for example 1000) for each chunk fetched for each Mapper.setLimit means the limit of your each chunk(split) query limit.
I think you want to limit the query for the entire MapReduce process.
SetQuery is the query inside the find() and that must be represented in JSON format like MongoDB.As far I know you can't limit inside mongo query(find()).
You can find another way to filter query like { fieldName: { $lt: 20 } } based on you case.Besides, you may create a separate collection based on you limit using projection and then apply MapReduce there.
Finally, SetQuery is used to filter the collection.

I found the solution using the setLimit method of the class MongoInputSplit, passing the number of document that you want to fetch.
myMongoInputSplitObj = new MongoInputSplit(*param*)
myMongoInputSplitObj.setLimit(100)
MongoConfigUtil setLimit
Allow users to set the limit on MongoInputSplits (HADOOP-267).

How to get facet value for all fields using Java API

I'm new to Elasticsearch and tried to query some sample documents. I issued the following query using the Java API. This query fetched me the correct result. It returned the names of all categories. Now I want the count of all names of a category. Could you explain me how to do that? I'm sorry for my bad English.
SearchResponse sr = client.prepareSearch()
.addField("Category")
.setQuery(QueryBuilders.matchAllQuery())
.addFacet(FacetBuilders.termsFacet("f")
.field("Category"))
.execute()
.actionGet();

Look at the Count API to count your results if you not want to get the result set (matched documents) but only the count.
If you want to get the result set, you get the result size in the response for every filter or query request too.

ElasticSearch - Using FilterBuilders

I am new to ElasticSearch and Couchbase. I am building a sample Java application to learn more about ElasticSearch and Couchbase.
Reading the ElasticSearch Java API, Filters are better used in cases where sort on score is not necessary and for caching.
I still haven't figured out how to use FilterBuilders and have following questions:
Can FilterBuilders be used alone to search?
Or Do they always have to be used with a Query? ( If true, can someone please list an example? )
Going through a documentation, if I want to perform a search based on field values and want to use FilterBuilders, how can I accomplish that? (using AndFilterBuilder or TermFilterBuilder or InFilterBuilder? I am not clear about the differences between them.)
For the 3rd question, I actually tested it with search using queries and using filters as shown below.
I got empty result (no rows) when I tried search using FilterBuilders. I am not sure what am I doing wrong.
Any examples will be helpful. I have had a tough time going through documentation which I found sparse and even searching led to various unreliable user forums.
private void processQuery() {
SearchRequestBuilder srb = getSearchRequestBuilder(BUCKET);
QueryBuilder qb = QueryBuilders.fieldQuery("doc.address.state", "TX");
srb.setQuery(qb);
SearchResponse resp = srb.execute().actionGet();
System.out.println("response :" + resp);
}
private void searchWithFilters(){
SearchRequestBuilder srb = getSearchRequestBuilder(BUCKET);
srb.setFilter(FilterBuilders.termFilter("doc.address.state", "tx"));
//AndFilterBuilder andFb = FilterBuilders.andFilter();
//andFb.add(FilterBuilders.termFilter("doc.address.state", "TX"));
//srb.setFilter(andFb);
SearchResponse resp = srb.execute().actionGet();
System.out.println("response :" + resp);
}
--UPDATE--
As suggested in the answer, changing to lowercase "tx" works. With this question resolved. I still have following questions:
In what scenario(s), are filters used with query? What purpose will this serve?
Difference between InFilter, TermFilter and MatchAllFilter. Any illustration will help.

Right, you should use filters to exclude documents from being even considered when executing the query. Filters are faster since they don't involve any scoring, and cacheable as well.
That said, it's pretty obvious that you have to use a filter with the search api, which does execute a query and accepts an optional filter. If you only have a filter you can just use the match_all query together with your filter. A filter can be a simple one, or a compund one in order to combine multiple filters together.
Regarding the Java API, the names used are the names of the filters available, no big difference. Have a look at this search example for instance. In your code I don't see where you do setFilter on your SearchRequestBuilder object. You don't seem to need the and filter either, since you are using a single filter. Furthermore, it might be that you are indexing using the default mappings, thus the term "TX" is lowercased. That's why when you search using the term filter you don't find any match. Try searching for "tx" lowercased.
You can either change your mapping if you want to keep the "TX" term as it is while indexing, probably setting the field as not_analyzed if it should only be a single token. Otherwise you can change filter, you might want to have a look at a query that is analyzed, so that your query wil be analyzed the same way the content was indexed.
Have a look at the query DSL documentation for more information regarding queries and filters:
MatchAllFilter: matches all your document, not that useful I'd say
TermFilter: Filters documents that have fields that contain a term (not analyzed)
AndFilter: compound filter used to put in and two or more filters
Don't know what you mean by InFilterBuilder, couldn't find any filter with this name.
The query usually contains what the user types in through the text search box. Filters are more way to refine the search, for example clicking on facet entries. That's why you would still have the query plus one or more filters.

To append to what #javanna said:
A lot of confusion can come from the fact that filters can be defined in several ways:
standalone (with a required query, for instance match_all if all you need is the filters) (http://www.elasticsearch.org/guide/reference/api/search/filter/)
or as part of a filtered query (http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/)
What's the difference you might ask. And indeed you can construct exactly the same logic in both ways.
The difference is that a query operates on BOTH the resultset as well as any facets you have defined. Whereas, a Filter (when defined standalone) only operates on the resultset and NOT on any facets you may have defined (explained here: http://www.elasticsearch.org/guide/reference/api/search/filter/)

To add to the other answers, InFilter is only used with FilterBuilders. The definition is, InFilter: A filter for a field based on several terms matching on any of them.
The query Java API uses FilterBuilders, which is a factory for filter builders that can dynamically create a query from Java code. We do this using a form and we build our query based on user selections from it with checkboxes, options, and dropdowns.
Here is some Example code for FilterBuilders and there is a snippet from that link that uses InFilter as shown below:
FilterBuilder filterBuilder;
User user = (User) auth.getPrincipal();
if (user.getGroups() != null && !user.getGroups().isEmpty()) {
filterBuilder = FilterBuilders.boolFilter()
.should(FilterBuilders.nestedFilter("userRoles", FilterBuilders.termFilter("userRoles.key", auth.getName())))
.should(FilterBuilders.nestedFilter("groupRoles", FilterBuilders.inFilter("groupRoles.key", user.getGroups().toArray())));
} else {
filterBuilder = FilterBuilders.nestedFilter("userRoles", FilterBuilders.termFilter("userRoles.key", auth.getName()));
}
...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.