Elasticsearch set limit and offset using RestHighLevelClient

Elasticsearch set limit and offset using RestHighLevelClient - java

I search in Elasticsearch using MultiSearchRequest by part of the field:
#Override
public Collection<Map<String, Object>> findContractsByIndexAndWord(String index, String type, String word) throws CommonUserException {
MultiSearchRequest request = new MultiSearchRequest();
word = word.toLowerCase();
request.add(formSearchRequestForMultiSearch(index, type, ID_FIELD, word));
request.add(formSearchRequestForMultiSearch(index, type, PROVIDER_ID_FIELD, word));
MultiSearchResponse searchResponse;
try (RestHighLevelClient client = getClient()) {
searchResponse = client.multiSearch(request);
return formContracts(searchResponse);
} catch (IOException e) {
throw new CommonUserException(ELASTIC_EXCEPTION, ELASTIC_EXCEPTION);
}
}
private SearchRequest formSearchRequestForMultiSearch(String index, String type, String field, String word) {
SearchRequest searchRequest = new SearchRequest(index);
searchRequest.types(type);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.wildcardQuery(field, word));
searchRequest.source(searchSourceBuilder);
return searchRequest;
}
private Collection<Map<String, Object>> formContracts(MultiSearchResponse response) {
Collection<Map<String, Object>> contracts = new LinkedList<>();
for (int i = 0; i < response.getResponses().length; i++) {
SearchHit[] hits = response.getResponses()[i].getResponse().getHits().getHits();
for (SearchHit hit : hits) {
if (!contracts.contains(hit.getSourceAsMap())) {
contracts.add(hit.getSourceAsMap());
}
}
}
return contracts;
}
How can I add to this request limit and offset of the result?

From the elasticsearch documentation (last version)
Here are a few examples of some common options:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); //1
sourceBuilder.query(QueryBuilders.termQuery("user", "kimchy")); //2
sourceBuilder.from(0); //3
sourceBuilder.size(5); //4
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); //5
Create a SearchSourceBuilder with default options.
Set the query. Can be any type of QueryBuilder
Set the from option that determines the result index to start searching from. Defaults to 0.
Set the size option that determines the number of search hits to return. Defaults to 10.
Set an optional timeout that controls how long the search is allowed to take.
Size and From are what is usually known as offset/limit.
You may want to have a look at the scroll API if you use it intensively, as offset/limit based result traversal may be kind of slow under certain circumstances, and the scroll API is the way to avoid most of this (if you come from a SQL background as I guess from the limit/offset terminology, think of the Scroll API as the equivalent of keeping a SQL Cursor open to carry on from where you left off iterating).

Related

Elasticsearch Pagination on composite aggregation gives wrong results

I'm having such kind of problem on defining composite aggregation for paginating the results.
CompositeAggregationBuilder aggregationBuilder = AggregationBuilders
.composite(aggrField, List.of(new TermsValuesSourceBuilder(aggrField).field(aggrField)))
.aggregateAfter(Map.of("keyword", aggrField))
.size(bucketListInfo.getTopResultsCount());
searchSourceBuilder
.from(2)
.size(4)
.aggregation(aggregationBuilder);
final SearchRequest searchRequest = new SearchRequest(bucketListInfo.getIndexName())
.source(searchSourceBuilder);
try {
final SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
return processBucketsFromResponse(bucketListInfo, response);
} catch (Exception e) {
log.error(e.getMessage(), e);
return null;
}
the problem comes this way:
when I getting the response, it contains all hits, not only paginated ones, say if I specify the from param as 2, and size param as 4 and expect that it will give me the results starting from second page with the size of 4, but it gives me all records, no matter what pagination params I have specified.
when I add this
.aggregateAfter(Map.of("keyword", aggrField))
I get this kind of error :
[type=illegal_argument_exception, reason=Missing value for [after.[my-aggr-field].keyword]]];
So any workaround of this, any suggestions please.

Elasticsearch filter result ignoring search keyword

I am getting good result with normal search query.
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder queryBuilder = new BoolQueryBuilder();
String keyword = requestBody.getKeyword();
queryBuilder.should(QueryBuilders.matchQuery("fullText", keyword));
searchSourceBuilder.query(queryBuilder);
searchSourceBuilder.from(requestBody.getPage() - 1);
searchSourceBuilder.size(BROWSE_PAGE_DATA_LIMIT);
searchRequest.source(searchSourceBuilder);
try {
return client.search(searchRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
throw new HttpServerErrorException(HttpStatus.INTERNAL_SERVER_ERROR, "Error in ES search");
}
But when I add filtering with it, the result is ignoring my search keyword.
queryBuilder.filter(QueryBuilders.termsQuery("authorId", filter.getAuthorIds()));
here I am trying to replace fq of solr. What's wrong at my approach.

Excerpt from ES Docs
If the bool query includes at least one should clause and no must or
filter clauses, the default value is 1. Otherwise, the default value
is 0.
Basically, if there is a filter or/and must clause with-in a bool query then the should clause is ignored until min_should_match is set to a suitable value.
Set minShouldMatch to 1. e.g:
queryBuilder.should(QueryBuilders.matchQuery("fullText", keyword)).minimumShouldMatch(1);

Lucene LongPoint Range search doesn't work

I am using Lucene 8.2.0 in Java 11.
I am trying to index a Long value so that I can filter by it using a range query, for example like so: +my_range_field:[1 TO 200]. However, any variant of that, even my_range_field:[* TO *], returns 0 results in this minimal example. As soon as I remove the + from it to make it an OR, I get 2 results.
So I am thinking I must make a mistake in how I index it, but I can't make out what it might be.
From the LongPoint JavaDoc:
An indexed long field for fast range filters. If you also need to store the value, you should add a separate StoredField instance.
Finding all documents within an N-dimensional shape or range at search time is efficient. Multiple values for the same field in one document is allowed.
This is my minimal example:
public static void main(String[] args) {
Directory index = new RAMDirectory();
StandardAnalyzer analyzer = new StandardAnalyzer();
try {
IndexWriter indexWriter = new IndexWriter(index, new IndexWriterConfig(analyzer));
Document document1= new Document();
Document document2= new Document();
document1.add(new LongPoint("my_range_field", 10));
document1.add(new StoredField("my_range_field", 10));
document2.add(new LongPoint("my_range_field", 100));
document2.add(new StoredField("my_range_field", 100));
document1.add(new TextField("my_text_field", "test content 1", Field.Store.YES));
document2.add(new TextField("my_text_field", "test content 2", Field.Store.YES));
indexWriter.deleteAll();
indexWriter.commit();
indexWriter.addDocument(document1);
indexWriter.addDocument(document2);
indexWriter.commit();
indexWriter.close();
QueryParser parser = new QueryParser("text", analyzer);
IndexSearcher indexSearcher = new IndexSearcher(DirectoryReader.open(index));
String luceneQuery = "+my_text_field:test* +my_range_field:[1 TO 200]";
Query query = parser.parse(luceneQuery);
System.out.println(indexSearcher.search(query, 10).totalHits.value);
} catch (IOException e) {
} catch (ParseException e) {
}
}

You need to first use StandardQueryParser, then provide the parser with a PointsConfig map, essentially hinting which fields are to be treated as Points. You'll now get 2 results.
// Change this line to the following
StandardQueryParser parser = new StandardQueryParser(analyzer);
IndexSearcher indexSearcher = new IndexSearcher(DirectoryReader.open(dir));
/* Added code */
PointsConfig longConfig = new PointsConfig(new DecimalFormat(), Long.class);
Map<String, PointsConfig> pointsConfigMap = new HashMap<>();
pointsConfigMap.put("my_range_field", longConfig);
parser.setPointsConfigMap(pointsConfigMap);
/* End of added code */
String luceneQuery = "+my_text_field:test* +my_range_field:[1 TO 200]";
// Change the query to the following
Query query = parser.parse(luceneQuery, "text");

I found the solution to my problem.
I was under the impression that the query parser could just parse any query string correctly. That doesn't seem to be the case.
Using
Query rangeQuery = LongPoint.newRangeQuery("my_range_field", 1L, 11L);
Query searchQuery = new WildcardQuery(new Term("my_text_field", "test*"));
Query build = new BooleanQuery.Builder()
.add(searchQuery, BooleanClause.Occur.MUST)
.add(rangeQuery, BooleanClause.Occur.MUST)
.build();
returned the correct result.

Fetch all record including particular fields

I am working with Elasticcsearch 7.3. I want to fetch only two records of all the documents using JAVA Api from my index. I am using the following code but it returning the null object.
RestHighLevelClient client;
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.fetchSource("recipe_ID,recipe_url", null);
sourceBuilder.from(0);
SearchRequest searchRequest = new SearchRequest("recipes");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHit searchHit = searchResponse.getHits().getAt(0);
String resultString = searchHit.getSourceAsString();
return resultString;
I need to include only two fields recipe_ID and recipe_url in my result.

You're on the right path, although source filtering requires you to specify your fields in an array like this:
String[] includeFields = new String[] {"recipe_ID", "recipe_url"};
sourceBuilder.fetchSource(includeFields, null);

Elasticsearch - Java RestHighLevelClient - how to get all documents using scroll api

In my index in Elasticsearch I saved about 30000 entities. I'd like to get all ids of them using RestHighLevelClient. I've read that the best way to do it is to use scroll api. However when I do it I recieve only about 10 entities instead of 30k. How to solve this
final class ElasticRepo {
private final RestHighLevelClient restHighLevelClient;
List<ListingsData> getAllListingsDataIds() {
val request = new SearchRequest(ELASTICSEARCH_LISTINGS_INDEX);
request.types(ELASTICSEARCH_TYPE);
val searchSourceBuilder = new SearchSourceBuilder()
.query(matchAllQuery())
.fetchSource(new String[]{"listing_id"}, new String[]{"backoffice_data", "search_and_match_data"});
request.source(searchSourceBuilder);
request.scroll(TimeValue.timeValueMinutes(3));
return executeQuery(request);
}
private List<ListingsData> executeQuery(final SearchRequest searchQuery) {
try {
val hits = restHighLevelClient.search(searchQuery, RequestOptions.DEFAULT).getHits().getHits();
return Arrays.stream(hits).map(SearchHit::getSourceAsString).map(ElasticRepo::toListingsData).collect(Collectors.toList());
} catch (IOException e) {
e.printStackTrace();
throw new RuntimeException("");
}
}
}
And when I do it executeQuery returns only about 11 entites. How to solve that, how to obtain all documents in index ?

try to follow this example, I am using this code and it works:
String query = "your query here";
QueryBuilder matchQueryBuilder = QueryBuilders.boolQuery().must(new QueryStringQueryBuilder(query));
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(matchQueryBuilder);
searchSourceBuilder.size(5000); //max is 10000
searchRequest.indices("your index here");
searchRequest.source(searchSourceBuilder);
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(10L));
searchRequest.scroll(scroll);
SearchResponse searchResponse = client.search(searchRequest);
String scrollId = searchResponse.getScrollId();
SearchHit[] allHits = new SearchHit[0];
SearchHit[] searchHits = searchResponse.getHits().getHits();
while (searchHits != null && searchHits.length > 0)
{
allHits = Helper.concatenate(allHits, searchResponse.getHits().getHits()); //create a function which concatenate two arrays
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(scroll);
searchResponse = client.searchScroll(scrollRequest);
scrollId = searchResponse.getScrollId();
searchHits = searchResponse.getHits().getHits();
}
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest);

As part of Search API, by default the max documents retrieved is 10 unless the size field is specified.
The Search Scroll API documentation as part of Java REST High Level document has a nice sample code -> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-search-scroll.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Elasticsearch set limit and offset using RestHighLevelClient - java

Related

Elasticsearch Pagination on composite aggregation gives wrong results

Elasticsearch filter result ignoring search keyword

Lucene LongPoint Range search doesn't work

Fetch all record including particular fields

Elasticsearch - Java RestHighLevelClient - how to get all documents using scroll api

Categories

Resources