Elasticsearch - Java RestHighLevelClient - how to get all documents using scroll api - java

In my index in Elasticsearch I saved about 30000 entities. I'd like to get all ids of them using RestHighLevelClient. I've read that the best way to do it is to use scroll api. However when I do it I recieve only about 10 entities instead of 30k. How to solve this
final class ElasticRepo {
private final RestHighLevelClient restHighLevelClient;
List<ListingsData> getAllListingsDataIds() {
val request = new SearchRequest(ELASTICSEARCH_LISTINGS_INDEX);
request.types(ELASTICSEARCH_TYPE);
val searchSourceBuilder = new SearchSourceBuilder()
.query(matchAllQuery())
.fetchSource(new String[]{"listing_id"}, new String[]{"backoffice_data", "search_and_match_data"});
request.source(searchSourceBuilder);
request.scroll(TimeValue.timeValueMinutes(3));
return executeQuery(request);
}
private List<ListingsData> executeQuery(final SearchRequest searchQuery) {
try {
val hits = restHighLevelClient.search(searchQuery, RequestOptions.DEFAULT).getHits().getHits();
return Arrays.stream(hits).map(SearchHit::getSourceAsString).map(ElasticRepo::toListingsData).collect(Collectors.toList());
} catch (IOException e) {
e.printStackTrace();
throw new RuntimeException("");
}
}
}
And when I do it executeQuery returns only about 11 entites. How to solve that, how to obtain all documents in index ?

try to follow this example, I am using this code and it works:
String query = "your query here";
QueryBuilder matchQueryBuilder = QueryBuilders.boolQuery().must(new QueryStringQueryBuilder(query));
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(matchQueryBuilder);
searchSourceBuilder.size(5000); //max is 10000
searchRequest.indices("your index here");
searchRequest.source(searchSourceBuilder);
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(10L));
searchRequest.scroll(scroll);
SearchResponse searchResponse = client.search(searchRequest);
String scrollId = searchResponse.getScrollId();
SearchHit[] allHits = new SearchHit[0];
SearchHit[] searchHits = searchResponse.getHits().getHits();
while (searchHits != null && searchHits.length > 0)
{
allHits = Helper.concatenate(allHits, searchResponse.getHits().getHits()); //create a function which concatenate two arrays
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(scroll);
searchResponse = client.searchScroll(scrollRequest);
scrollId = searchResponse.getScrollId();
searchHits = searchResponse.getHits().getHits();
}
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest);

As part of Search API, by default the max documents retrieved is 10 unless the size field is specified.
The Search Scroll API documentation as part of Java REST High Level document has a nice sample code -> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-search-scroll.html

Related

How do I implement a terms query like under in Java? Anyway to use termsQuery() or other way?

How do I implement a terms query like under in Java?
Anyway to use termsQuery() or other way?
{
"terms": {
"model_id": [
"166168N",
"753547",
"1568357",
"90112",
"1020682",
"3257438"
],
"boost": 1.0E+6
}
}
You can create query like below using Java high level client:
SearchRequest searchRequest = new SearchRequest("userdoc");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
String[] values = new String[] { "166168N", "753547", "1568357", "90112", "1020682", "3257438" };
searchSourceBuilder.query(new TermsQueryBuilder("model_id", values).boost(1));
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

Fetch all record including particular fields

I am working with Elasticcsearch 7.3. I want to fetch only two records of all the documents using JAVA Api from my index. I am using the following code but it returning the null object.
RestHighLevelClient client;
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.fetchSource("recipe_ID,recipe_url", null);
sourceBuilder.from(0);
SearchRequest searchRequest = new SearchRequest("recipes");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHit searchHit = searchResponse.getHits().getAt(0);
String resultString = searchHit.getSourceAsString();
return resultString;
I need to include only two fields recipe_ID and recipe_url in my result.
You're on the right path, although source filtering requires you to specify your fields in an array like this:
String[] includeFields = new String[] {"recipe_ID", "recipe_url"};
sourceBuilder.fetchSource(includeFields, null);

Fetch the fields from an ElasticSearch 7.3. document with Springboot java

I am using the following java code to query the elasticsearch 7.3 and get 1 document. I have set it to only 1 doc with the highest score. It is working fine and returning me the doc perfectly.
#Autowired
RestHighLevelClient client;
#RequestMapping(value = "/search", method = RequestMethod.GET)
public #ResponseBody
String getItem(#RequestParam("string") String string) throws IOException {
QueryBuilder matchQueryBuilder = QueryBuilders.simpleQueryStringQuery(string);
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(matchQueryBuilder);
sourceBuilder.from(0);
sourceBuilder.size(1);
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
SearchRequest searchRequest = new SearchRequest("nutrients");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
return searchResponse.toString();
}
The response is as follow, I want to access the values using java code of Calories , Fat , Protein , Carbohydrate and ignore the values of other Nutrient.
I need the values in 4 variables like String caloriesVar=153 kcal and similarly for the other 3.
[

Elasticsearch set limit and offset using RestHighLevelClient

I search in Elasticsearch using MultiSearchRequest by part of the field:
#Override
public Collection<Map<String, Object>> findContractsByIndexAndWord(String index, String type, String word) throws CommonUserException {
MultiSearchRequest request = new MultiSearchRequest();
word = word.toLowerCase();
request.add(formSearchRequestForMultiSearch(index, type, ID_FIELD, word));
request.add(formSearchRequestForMultiSearch(index, type, PROVIDER_ID_FIELD, word));
MultiSearchResponse searchResponse;
try (RestHighLevelClient client = getClient()) {
searchResponse = client.multiSearch(request);
return formContracts(searchResponse);
} catch (IOException e) {
throw new CommonUserException(ELASTIC_EXCEPTION, ELASTIC_EXCEPTION);
}
}
private SearchRequest formSearchRequestForMultiSearch(String index, String type, String field, String word) {
SearchRequest searchRequest = new SearchRequest(index);
searchRequest.types(type);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.wildcardQuery(field, word));
searchRequest.source(searchSourceBuilder);
return searchRequest;
}
private Collection<Map<String, Object>> formContracts(MultiSearchResponse response) {
Collection<Map<String, Object>> contracts = new LinkedList<>();
for (int i = 0; i < response.getResponses().length; i++) {
SearchHit[] hits = response.getResponses()[i].getResponse().getHits().getHits();
for (SearchHit hit : hits) {
if (!contracts.contains(hit.getSourceAsMap())) {
contracts.add(hit.getSourceAsMap());
}
}
}
return contracts;
}
How can I add to this request limit and offset of the result?
From the elasticsearch documentation (last version)
Here are a few examples of some common options:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); //1
sourceBuilder.query(QueryBuilders.termQuery("user", "kimchy")); //2
sourceBuilder.from(0); //3
sourceBuilder.size(5); //4
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); //5
Create a SearchSourceBuilder with default options.
Set the query. Can be any type of QueryBuilder
Set the from option that determines the result index to start searching from. Defaults to 0.
Set the size option that determines the number of search hits to return. Defaults to 10.
Set an optional timeout that controls how long the search is allowed to take.
Size and From are what is usually known as offset/limit.
You may want to have a look at the scroll API if you use it intensively, as offset/limit based result traversal may be kind of slow under certain circumstances, and the scroll API is the way to avoid most of this (if you come from a SQL background as I guess from the limit/offset terminology, think of the Scroll API as the equivalent of keeping a SQL Cursor open to carry on from where you left off iterating).

Search by the field in ElasticSearch in Java?

I have objects indexed in elasticsearch:
{"id":"one","name":"John"}
{"id":"two","name":"Steve"}
I put them into elastic with index 'people', type 'human' and document '/id(one,two)/'
The task is to search records by 'name' in java using elasticsearch 6.2.4 with rest high level client.
Here is my code example:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.termQuery("name", name));
SearchRequest searchRequest = new SearchRequest("people");
searchRequest.types("human");
searchRequest.indices("name").source(sourceBuilder);
SearchResponse searchResponse;
RestHighLevelClient client = getClient();
searchResponse = client.search(searchRequest);
this is not working.
Need help in performing the search.
Please try below code.
SearchRequest searchRequest = new SearchRequest("people");
searchRequest.types("human");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.termQuery("name", name"));
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest);
Here name must be in lowercase.

Categories

Resources