I am searching my Elastic index from my Java backend using Elastic's high level REST client for JAVA. I notice that it takes 700 to 800 milliseconds to receive the response from Elastic.
I checked the actual query time in Elastic and it is only 7 milliseconds.
I have built filters and aggregations into my query and also am returning many fields.
However, if I remove all filters and aggregations and limit the result set to a single document and only return a single field, the time it takes my Java code to receive the response from Elastic is still > 700ms. Why might this be? My server code is running in California. My Elastic index is served in North Virginia. Perhaps this explains the latency? What else could be the cause?
This is a multisearch containing two search queries.
Related
I have a problem in which I need to query for a subset of records on a large index containing a high volume of records, whilst running a Painless script with the search query to augment the result. The (much smaller) result is to be saved in a secondary index for later use. In a different SO question: Reindex part of Elasticsearch index onto new index via Jest, I mentioned this is possible through the Kibana interface, but there does not seem to be a Java library that can accomplish what I need. Has anyone ever accomplished a query within a _reindex operation outside of Kibana? I am leaning toward using the URLConnection family in Java, but am looking for suggestions and advice at this point.
I am new to elastic search, I was reading that we can use elasticsearch to query using its rest API calls.
I was reading the following link :
http://blogs.justenougharchitecture.com/using-jest-as-a-rest-based-java-client-with-elasticsearch/
Is this the right way to do it??
Also, I donot want to put a limit to the number of results that my search will return(it can return millions of records).
So just how ResultSet in java works, where the table might have millions of row, but we can iterate one row at a time and just process it, and not storing it in my java heap anywhere), hence not worrying about the java heap space,.. Similarly I want to do something similar with Elastic Search Querying if possible, ( where I want all the records in the query), but not putting them all together in my memory while iterating them.
Is it possible to do so using any java client(via rest API), if not via rest API, then is there a method of solving this problem.
Thanks
First, if you use a Java or another JVM language, you could also use the native client. Jest is a good option if you want to keep your dependencies small (the java client is essentially the same as the complete server) or if you want or can access Elasticsearch only via the HTTP interface and not via its binary interface.
Second, what you want to use is the scroll API: https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html (didn't found a quick reference on the Jest documentation though).
It doesn't exactly work like ResultSet, but allows you to iterate in chunks over all your results. An example, copied from the documentation
QueryBuilder query = ...;
SearchResponse scrollResponse = client.prepareSearch(index)
.setSearchType(SearchType.SCAN)
.setScroll(new TimeValue(60000)) // timeout
.setQuery(query)
.setSize(100) // bulk size
.execute().actionGet();
//Scroll until no hits are returned
while (!scrollResp.getHits().getHits().isEmpty()) {
for (SearchHit hit : scrollResp.getHits().getHits()) {
//Handle the hit...
}
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
.setScroll(new TimeValue(60000))
.execute().actionGet();
}
I am thinking of setting up a page in an application that each of the queries can return a resultset that cannot fit in memory or the query is very expensive to fetch all of them. The user will be hitting "get more" to get more of those results. I wonder if I could use a yielder for Java something like that (http://benjiweber.co.uk/blog/2015/03/21/yield-return-in-java/) and if I will need Web Sockets e.g from Spring (http://docs.spring.io/spring/docs/current/spring-framework-reference/html/websocket.html) so that the client can tell to Server to push more results. Also could you please give an example of the handshake .. Will the endpoint uri be based on some session id as well? Also when databases like OrientDB/Neo4j return Iterables does it mean that we can keep the connection open and get the next rows after minutes without problems? Thanks!
You are talking about two different concepts.
Pagination
If you have a large result set and you need to return it piece by piece to avoid long query times or high memory requirements, you're paginating the over the result set.
To do this, you require another piece of the set hitting "Get More" button from the client. Each time you require more, the server will receive a request from the server and will hit the DB with some paginated query.
Example in SQL (page 10, 10 results/page , for instance):
SELECT * FROM Table OFFSET 100 LIMIT 109
Websockets / Yielder
You'll need a websocket / yielder when is the server the one who sends data, in other words, the client doesn't require an update, it only keeps the socket open and receives updates from the Server when they come.
That's the case of a Message service, for example, avoiding constant polling from the client side.
In your case is absolutely unnecessary a websocket. You can also see an example of what I'm saying here -> What's the behavioral difference between HTTP Stay-Alive and Websockets?
However you can setup a keep-alive connection between your back-end and database in order to avoid closing/opening constantly the connection each time the user requires more results.
Finally, your question about Iterable results in Neo4j. Neo4j's result type is an Iterable list of Map<String,Object> which represents a List of key-value pairs. That doesn't keep the connection alive (by default), it only iterates through the returned results of that particular query.
I'm trying to implement application using google guice framework with dynamodb database.
I have implemented API for finding documents by range query ie. time period when I query by Month it gives limited number of documents i.e 3695 documents and again I search by start time and end time it also gives same number of documents which does not contain newly created document.
Please find the way to implement API which will solve the limitation issues of application or dynamodb.
The response of dynamodb is limited to 1mb per page. Also when your resultset is bigger, you only get the first results till response size is 1MB.
In the docs:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html#Pagination
Is described how to use the meta data of the response to see real amount of results, starting index and so on. To query the hole result in batches / pages.
Important excerpt of the docs:
If you query or scan for specific attributes that match values that
amount to more than 1 MB of data, you'll need to perform another Query
or Scan request for the next 1 MB of data. To do this, take the
LastEvaluatedKey value from the previous request, and use that value
as the ExclusiveStartKey in the next request. This will let you
progressively query or scan for new data in 1 MB increments.
When the entire result set from a Query or Scan has been processed,
the LastEvaluatedKey is null. This indicates that the result set is
complete (i.e. the operation processed the “last page” of data).
I'm trying to execute a SOQL query using salesforce REST API which will return 2,749 results. However it seems there is a limit of 2,000 results that can be returned for a given request.
Is there a way to query the remaining 749 results without using the OFFSET keyword? (it's not currently supported in my production environment).
I looked into this and found a queryMore function but I can't find a way to call it through the REST API.
part of the result is a nextRecordsUrl property which when you do a GET on it, will return you the next chunk of the results. See the section on query in the rest api docs.