ES - Convert Legacy ElasticSearch from Rest to High level Client - java

I'm wondering what is the equivalent for the following query with elastic 5 - 7 (doesn't matter for me)
I know that this query has deprecated but actually i'm trying to use legacy 1.7.5 cluster work with High level ES cluster.
I did some tests and although the documentation points that it isn't support i tried and most of the simple actions work. What is left is convert some queries like in the example below
{
"size" : 3000,
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" : [ {
"terms" : {
"source" : [ "o365mail" ]
}
}, {
"range" : {
"bckdate" : {
"from" : "1549360021398l",
"to" : null,
"include_lower" : true,
"include_upper" : true
}
}
} ]
}
}
}
},
"fields" : "*"
}
What i've tried so far is with 7.9.3:
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.9/java-rest-high.html
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
boolQueryBuilder
.must(QueryBuilders.termsQuery(IndexFields.SOURCE.getIndexName(),Arrays.asList(Source.O365MAIL.toString().toLowerCase())))
.must(QueryBuilders.rangeQuery("bckdate").gte(1549360021398l).lte(null));
sourceBuilder.query(boolQueryBuilder);
SearchRequest sr = new SearchRequest();
sr.source(sourceBuilder);
SearchResponse searchResponse2 = client.search(sr, RequestOptions.DEFAULT);
The query from debugging is:
{
"bool" : {
"must" : [
{
"terms" : {
"source" : [
"o365mail"
],
"boost" : 1.0
}
},
{
"range" : {
"bckdate" : {
"from" : 1549360021398,
"to" : null,
"include_lower" : true,
"include_upper" : true,
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
Im wondering if it is the same regarding the filters of the legacy code, cause the responded data is pretty the same .
I need not to break the logic with all Filters like in legacy query...
thanks for help

Related

Elasticsearch query from Java REST High level client returns different/undesirable results compared to execution of Query DSL on Kibana

I'm implementing an elasticsearch search feature using Java High-level REST client, which queries the ES index residing on one of the clusters hosted on the cloud. My intended query JSON DSL looks like this,
{
"query" :{
"bool": {
"should" :[
{
"query_string":{
"query":"cla-180",
"default_field": "product_title",
"boost" : 3
}
},
{
"match" : {
"product_title" : {
"query" : "cla-180",
"fuzziness" : "AUTO"
}
}
}
]
}
}
}
Corresponding to this, I have written the code to be executed using the Java High-level REST client, which performs the same functionality as the DSL above.
BoolQueryBuilder boolQueryBuilder = buildBoolQuery();
boolQueryBuilder.should(QueryBuilders.queryStringQuery("cla-180").defaultField("product_title")).boost(3);
boolQueryBuilder.should(QueryBuilders.matchQuery("product_title", "cla-180").fuzziness(Fuzziness.AUTO));
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(boolQueryBuilder);
What I'm noticing is that the search results for the java methods are different to the results when I execute the DSL on kibana directly. I find records that have no relation to the search content that I have given above when executed from Java. I consider this weird because I guess I have implemented the Java code to match that of the JSON query DSL given above.
When I try to print the generated JSON from the Java side, its output is like this below,
{
"query": {
"bool" : {
"should" : [
{
"query_string" : {
"query" : "cla-180",
"default_field" : "product_title",
"fields" : [ ],
"type" : "best_fields",
"default_operator" : "or",
"max_determinized_states" : 10000,
"enable_position_increments" : true,
"fuzziness" : "AUTO",
"fuzzy_prefix_length" : 0,
"fuzzy_max_expansions" : 50,
"phrase_slop" : 0,
"escape" : false,
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
},
{
"match" : {
"product_title" : {
"query" : "cla-180",
"operator" : "OR",
"fuzziness" : "AUTO",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"minimum_should_match" : "1",
"boost" : 3.0
}
}
}
Am I missing something in my java code, that makes the search results to be returned in an undesirable fashion? Or what could be the reason for this mismatch in records that are being returned in these 2 methods?
Thanks in advance!

How do I write this elasticsearch query using java APIs

This following query works correctly and returns the results that I need. I am struggling to write this using JAVA APIs though.
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "somepath",
"query": {
"bool": {
"filter": [
{
"terms": {
"somepath.key": ["key1", "key2", "key3"]
}
}
]
}
}
}
}
]
}
}
}
I am using this in JAVA. What am I missing? commaSeparatedKeyString = "key1, key2, key3"
QueryBuilders.boolQuery().must(QueryBuilders.nestedQuery(
"somepath",
QueryBuilders.boolQuery().filter(QueryBuilders.termsQuery("somepath.key", commaSeparatedKeyString)),
ScoreMode.Total));
For debugging purposes, it can be helpful to check the JSON serialization of the query you are building. Fortunately, the toString() methods in the query builders do that for you, so you can simply use System.out.println to print the query builder to stdout (or log it with a logging framework). Assuming that the variable commaSeparatedKeyString is set to "key1,key2,key3" (it sounds like it is, but you don't tell us), you are actually creating the following query:
{
"bool" : {
"must" : [
{
"nested" : {
"query" : {
"bool" : {
"filter" : [
{
"terms" : {
"somepath.key" : [
"key1,key2,key3"
],
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"path" : "somepath",
"ignore_unmapped" : false,
"score_mode" : "sum",
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
As you can see, there are at least two relevant differences in the query you require and the query you are building:
On the top level, the query you want starts with "bool.filter...", but you are building a query with "bool.must...":
QueryBuilders.boolQuery().must(QueryBuilders.nestedQuery(
The innermost term query is supposed to have an array of terms (key1, key2, key3). You can't simply pass one string with comma separated values to achieve that, but have to pass the terms one by one:
termsQuery("somepath.key", "key1", "key2", "key3"))

Spring Data MongoDB addToSet() with complex object

I want to get data from MongoDB through Java application using Spring Data.
I did following MongoDB query and converted successfully to Java code
db.getCollection('financialMessage').aggregate([{
$match:{ createdDate: {
$gte: ISODate("2017-11-03 00:00:00.000Z"),
$lt: ISODate("2017-11-04 00:00:00") }}}, {
$group: { _id: {
consolidatedBatchId: "$consolidatedBatchId",
version: "$version"},
messages: { $addToSet: "$message" }}}, {
$sort: {
"_id.consolidatedBatchId": 1,
"_id.version": 1}
}])
The results looks like :
{
"_id" : {
"consolidatedBatchId" : "5f4e1d16-2070-48ef-8369-00004ec3e8ee",
"version" : 4
},
"messages" : [
"message1",
"message2",
"message3"
]
}
Java code for above query looks like :
Criteria filterCriteria = Criteria.where(CREATED_DATE)
.gte(startDate)
.lt(endDate);
Sort sort = new Sort(Sort.Direction.DESC, "consolidatedBatchId" ,"version");
Aggregation agg = Aggregation.newAggregation(
Aggregation.match(filterCriteria),
Aggregation.group("consolidatedBatchId", "version")
.addToSet("message").as("messages"),
Aggregation.sort(sort)
);
AggregationResults<FinancialMessageKey> aggregationResults =
mongoTemplate.aggregate(agg, FinancialMessage.class, FinancialMessageKey.class);
return aggregationResults.getMappedResults();
Now I do not find how to convert following MongoDB query code to Java code :
db.getCollection('financialMessage').aggregate([{
$match:{ createdDate: {
$gte: ISODate("2017-11-03 00:00:00.000Z"),
$lt: ISODate("2017-11-04 00:00:00")
}}}, {
$group: { _id: {
consolidatedBatchId: "$consolidatedBatchId",
version: "$version"},
messages: { $addToSet: {message: "$message",
createdDate: "$createdDate",
sender: "$sender",
receiver: "$receiver" }}}}, {
$sort: {
"_id.consolidatedBatchId": 1,
"_id.version": 1}
}])
With the following output :
{
"_id" : {
"consolidatedBatchId" : "5f4e1d16-2070-48ef-8369-00004ec3e8ee",
"version" : 4
},
"messages" : [
{
"message" : "message1",
"createdDate" : ISODate("2017-11-03T07:13:08.074Z"),
"sender" : "sender",
"receiver" : "receiver"
},
{
"message" : "message2",
"createdDate" : ISODate("2017-11-03T07:13:08.111Z"),
"sender" : "sender",
"receiver" : "receiver"
},
{
"message" : "message3",
"createdDate" : ISODate("2017-11-03T07:13:07.986Z"),
"sender" : "sender",
"receiver" : "receiver"
}
]
}
How this addToSet() could be write in Java in order to get List<'complex object'> instead of simple List ?
After few minutes of research on Google, I finally found how to do it.
public List<FinancialMessageKey> findFinancialMessageKeys(FinancialMessageQueryDTO financialMessageQueryDTO) {
Criteria filterCriteria = Criteria.where(CREATED_DATE)
.gte(startDate)
.lt(endDate);
Sort sort = new Sort(Sort.Direction.DESC, "consolidatedBatchId", "version");
Aggregation agg = Aggregation.newAggregation(
Aggregation.match(filterCriteria),
Aggregation.group(CONSOLIDATED_BATCH_ID, VERSION)
.addToSet(new BasicDBObject() {
{
put(CREATED_DATE, "$" + CREATED_DATE);
put(SENDER, "$" + SENDER);
put(RECEIVER, "$" + RECEIVER);
put(MESSAGE, "$" + MESSAGE);
}
}
).as("messages"),
Aggregation.sort(sort));
AggregationResults<FinancialMessageKey> aggregationResults =
mongoTemplate.aggregate(agg, FinancialMessage.class, FinancialMessageKey.class);
return aggregationResults.getMappedResults();
}

Java API aggregation with Elasticsearch 1.x

I am new to aggregations with Elasticsearch and am stuck with a very simple example taken from this link.
Basically I am trying to do the Java API version of this very simple working aggregation:
http://localhost:9200/cars/transactions/_search?search_type=count
{
"aggs" : {
"colors" : {
"terms" : {
"field" : "color"
}
}
}
}
and this is the Java version I am trying to build but it returns empty buckets:
SearchResponse sr = client
.prepareSearch("cars")
.setTypes("transactions")
.setSearchType(SearchType.COUNT)
.addAggregation(AggregationBuilders.terms("colors").field("color"))
.execute()
.actionGet();
while using the Elastic DSL I get a proper response with buckets grouped by colors, but with the Java version I get an empty bucket.
Update
It turns out the code is correct, the issue I had is related to using it in a test case; when used against a running cluster it works.
I suspect your issue is not with your Java version of the request, which is just fine. I tried it on some test data I have and got the expected result. The cluster is running Elasticsearch 1.7.5.
The Java code snippet I used :
final SearchResponse sr = client
.prepareSearch("index")
.setTypes("docType")
.setSearchType(SearchType.COUNT)
.addAggregation(AggregationBuilders.terms("aggName").field("fieldName"))
.execute()
.actionGet();
System.out.println(sr);
The result I got :
{
"took" : 130,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1927227,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"aggName" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "key1",
"doc_count" : 757843
}, {
"key" : "key2",
"doc_count" : 620033
}, {
"key" : "key3",
"doc_count" : 549351
} ]
}
}
}
You are doing SearchType as Count which is not correct for aggregations. Can you please remove searchtype and construct the query as below
SearchResponse sr = client
.prepareSearch("cars")
.setTypes("transactions")
.addAggregation(AggregationBuilders.terms("colors").field("color"))
.execute()
.actionGet();

Elastic Search - How to perform IN condition in elastic search

This is how my document looks like in elastic search
{
"entityId": "CAMP_ID",
"txnId": "TXN_ID",
"changeSummary": [{
"field_name": "status_id",
"old_val": "1",
"new_val": "2"
}, {
"field_name": "budget",
"old_val": "100",
"new_val": "250"
}]
}
I get a list of txnId and list of changeSummary.field_name and I have to get all matching documents. I initially tried this query:
{
"query" : {
"bool" : {
"should" : [ {
"terms" : {
"transactionId" : [ "2915315c03b3420280eae04116fb303f" ]
}
}, {
"terms" : {
"transactionId" : [ "18faf80b3eb44e4b85be993a6d5fd40b" ]
}
} ]
}
},
"post_filter" : {
"bool" : {
"must" : [ {
"match" : {
"changeSummary.fieldName" : "abc"
}
},{
"match" : {
"changeSummary.fieldName" : "xyz"
}
}]
}
}
}
This works fine but my problem is that list of transactionId can be really huge from 10 to 1000000 (or even more), and if size of list of transactionId is more than 1024 then I start getting
too_many_clauses: maxClauseCount is set to 1024
Could someone please help in this?

Categories

Resources