How to generate the same elasticsearch query using Java Resthighlevel client? - java

I am trying to generate a query using the Java RestHighLevelClient of Elasticsearch similar to this:
GET /field_search/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"ENTRY_ID": "ttttt"
}
},
{
"match": {
"MODULE_ID": "xxxxx"
}
},
{
"match": {
"COMPANY_ID": "22244"
}
},
{
"match": {
"DELETED": false
}
}
]
}
}
}
This is my code that I am using to generate it
BoolQueryBuilder boolQueryBuilder1 = new BoolQueryBuilder();
boolQueryBuilder1.must().add(QueryBuilders.matchQuery("MODULE_ID", moduleId));
boolQueryBuilder1.must().add(QueryBuilders.matchQuery("COMPANY_ID", companyId));
........
I have skipped some part of it to keep it small. But i use a BoolQueryBuilder and the query it generates is something like this:
{
"query": {
"bool" : {
"must" : [
{
"match" : {
"MODULE_ID" : {
"query" : "xxxxx",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : false,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : false,
"boost" : 1.0
}
}
},
{
"match" : {
"COMPANY_ID" : {
"query" : "22244",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : false,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : false,
"boost" : 1.0
}
}
},
{
"match" : {
"DELETED" : {
"query" : false,
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : false,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : false,
"boost" : 1.0
}
}
},
{
"match" : {
"ENTRY_ID" : {
"query" : ttttt,
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : false,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : false,
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
}
It adds additional things into the query. Using the above normal query my results come back correctly but with the java generated query my results are none so how can i build the same query using Java client?

"adjust_pure_negative" : true
This is your problem, set it to false or delete it.
Read here why this happens.

Related

Opensearch - get inner aggregations from aggregations using opensearch-java client

There is this opensearch query constructed using openserch-java
GET eventsearch/_search
{
"aggregations": {
"WEB": {
"aggregations": {
"eventDate": {
"date_histogram": {
"extended_bounds": {
"max": "2022-12-01T00:00:00Z",
"min": "2022-01-01T00:00:00Z"
},
"field": "eventDate",
"fixed_interval": "1d",
"min_doc_count": 0
}
}
},
"filter": {
"term": {
"channel": {
"value": "WEB",
"case_insensitive": true
}
}
}
}
},
"query": {
"bool": {
"filter": [
{
"range": {
"eventDate": {
"from": "2022-01-01T00:00:00Z",
"to": "2022-12-01T00:00:00Z"
}
}
}
],
"must": [
{
"match_all": {}
}
]
}
},
"size": 0
}
Running query, the response is this:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 26,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"WEB" : {
"doc_count" : 25,
"eventDate" : {
"buckets" : [
{
"key_as_string" : "2022-01-01T00:00:00.000Z",
"key" : 1640995200000,
"doc_count" : 0
},
{
"key_as_string" : "2022-01-02T00:00:00.000Z",
"key" : 1641081600000,
"doc_count" : 0
},
{
"key_as_string" : "2022-01-03T00:00:00.000Z",
"key" : 1641168000000,
"doc_count" : 0
},
{
"key_as_string" : "2022-01-04T00:00:00.000Z",
"key" : 1641254400000,
"doc_count" : 0
},
....................
]
}
}
}
}
In java I need to perform this query and get the results from there.
But after using the opensearchclient.search and then get the "aggregations" list method, I receive this (image attached) and get
If I try to get the "WEB" from the Map, there is no other "eventDate" aggregation to fetch.
Is there a way to fetch this inner aggregation using opensearch-java client? I had no luck with documentation.
opensearch-java 2.1.0
There is currently no feature like this, it exists an open bug, with merged code, but not released.
https://github.com/opensearch-project/opensearch-java/issues/197

must_not is not giving expected result in Elasticsearch for empty field

This is my sample es index document:
"hits" : [
{
"_index" : "project_note",
"_type" : "project_note",
"_id" : "19",
"_score" : 1.0,
"_source" : {
"createTime" : "2021-10-04T13:43:55.330",
"createTimeInMs" : 1633333435330,
"createdBy" : "test",
"editTime" : "2021-10-04T13:43:55.330",
"editTimeInMs" : 1633333435330,
"editedBy" : "test",
"versionId" : 1,
"id" : "19",
"organizationId" : "28",
"accessLevel" : "PUBLIC",
"status" : "ACTIVE",
"projectId" : "95",
"userId" : 129,
"noteType" : "SYSTEM_GENERATED",
"projectDemographicLogId" : "1"
},
{
"_index" : "project_note",
"_type" : "project_note",
"_id" : "19",
"_score" : 1.0,
"_source" : {
"createTime" : "2021-10-04T13:43:55.330",
"createTimeInMs" : 1633333435330,
"createdBy" : "test",
"editTime" : "2021-10-04T13:43:55.330",
"editTimeInMs" : 1633333435330,
"editedBy" : "test",
"versionId" : 1,
"id" : "19",
"organizationId" : "28",
"accessLevel" : "PUBLIC",
"status" : "ACTIVE",
"projectId" : "95",
"userId" : 129
}
]
In the first doc, it has noteType but in the second, I don't have that field stored in db.
I want to exclude the documents where noteType==null or noteType is absent.
But, I am getting only the docs which have noteType="SYSTEM_GENERATED"
My approach:
{
"query":
{
"bool" : {
"must" : [
{
"term" : {
"projectId" : {
"value" : "95",
"boost" : 1.0
}
}
},
{
"range" : {
"createTimeInMs" : {
"from" : null,
"to" : 1633594455000,
"include_lower" : true,
"include_upper" : true,
"boost" : 1.0
}
}
}
],
"must_not" : [
{
"term" : {
"noteType" : {
"value" : "SYSTEM_GENERATED",
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
}
Equivalent java code:
BoolQueryBuilder queryBuilder= QueryBuilders.boolQuery();
queryBuilder.must(QueryBuilders.termQuery("projectId", requestInfo.getProjectId()));
queryBuilder.must(rangeQuery("createTimeInMs").lte(requestInfo.getCreateTimeInMs()));
if(!requestInfo.isIncludeLog()) {
queryBuilder.mustNot(QueryBuilders.termQuery("noteType", Defs.SYSTEM_NOTE_TYPE));
}
If only the must_not part of the query is used (excluding the must part)
{
"query": {
"bool": {
"must_not": [
{
"term": {
"noteType.keyword": {
"value": "SYSTEM_GENERATED",
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
}
}
The search result is similar to what you expect to get
"hits": [
{
"_index": "69477995",
"_type": "_doc",
"_id": "2",
"_score": 0.0,
"_source": {
"createTime": "2021-09-26T15:54:08.373",
"createTimeInMs": 1632650048373,
"createdBy": "test",
"editTime": "2021-09-26T15:54:08.373",
"editTimeInMs": 1632650048373,
"editedBy": "test",
"versionId": 1,
"id": "18",
"note": "note-1, simple note ",
"organizationId": "28",
"accessLevel": "PUBLIC",
"status": "ACTIVE",
"taskId": "5",
"userId": 129
}
}
]

Elasticsearch search all the documents, in order of relavancy score

i have a complex match query in the list of news headlines document,
{
"bool" : {
"should" : [
{
"multi_match" : {
"query" : " Reliance gets shareholders, creditors nod for hiving off O2C business into separate unit - The HinduBillionaire Mukesh Ambani's Reliance Industries Ltd on Friday said it has secured approval of its shareholders and creditors for hiving off its oil-to-chemical (O2C) business into a separate unit.",
"fields" : [
"article.description^1.0",
"article.title^1.0"
],
"type" : "best_fields",
"operator" : "OR",
"slop" : 0,
"prefix_length" : 0,
"max_expansions" : 50,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 3.0
}
},
{
"match" : {
"article.author" : {
"query" : " PTI",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},
{
"match" : {
"article.source.name" : {
"query" : " The Hindu",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
The problem is that elasticsearch returns only the relavant documents, I want as many documents as possible, in decreasing order of relavance score. Its fine to return all the documents, but ordering should be in that order. I could not find a better way elasticsearch returns on 5-10 documents from the news repository, while I have 1000s of article.
By default elasticsearch return only 10 documents. If you want to return more than 10 documents, you need to set the size parameter.
The modified query will be
{
"size": 1000, // note this
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": " Reliance gets shareholders, creditors nod for hiving off O2C business into separate unit - The HinduBillionaire Mukesh Ambani's Reliance Industries Ltd on Friday said it has secured approval of its shareholders and creditors for hiving off its oil-to-chemical (O2C) business into a separate unit.",
"fields": [
"article.description^1.0",
"article.title^1.0"
],
"type": "best_fields",
"operator": "OR",
"slop": 0,
"prefix_length": 0,
"max_expansions": 50,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 3.0
}
},
{
"match": {
"article.author": {
"query": " PTI",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1.0
}
}
},
{
"match": {
"article.source.name": {
"query": " The Hindu",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
}
}

QueryString to search String with colon

i am trying to achieve below condition
orgId = "z2store" and type = "web" and dateTime = "12:17:08"
below query i have written
GET /sample/_search
{
"bool" : {
"must" : [
{
"term" : {
"orgId" : {
"value" : "z2store",
"boost" : 1.0
}
}
},
{
"term" : {
"type" : {
"value" : "web",
"boost" : 1.0
}
}
},
{
"query_string" : {
"query" : "12:17:08",
"default_field" : "dateTime",
"fields" : [ ],
"type" : "best_fields",
"default_operator" : "or",
"max_determinized_states" : 10000,
"enable_position_increments" : true,
"fuzziness" : "AUTO",
"fuzzy_prefix_length" : 0,
"fuzzy_max_expansions" : 50,
"phrase_slop" : 0,
"escape" : false,
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
below is my java code
BoolQueryBuilder boolQuery = new BoolQueryBuilder().must(QueryBuilders.termQuery("orgId", orgId))
.must(QueryBuilders.termQuery("type", "web"));
QueryStringQueryBuilder builder = new QueryStringQueryBuilder("12:17:08");
builder.defaultField("dateTime").queryString();
boolQuery.must(builder);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder().query(builder)
.from((batchNumber - 1) * batchSize).size(batchSize)
.sort("#timestamp", SortOrder.DESC);
Above query is not working. Any help will be appreciated. I am using elasticSearch 7.4.
You can create your dateTime field with type as date and giving format as hour_minute_second(which takes format as HH:mm:ss) . You can read more about different date formats here https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html.
Below is the mapping of dateTime field:
{
"mappings": {
"properties": {
"dateTime": {
"type" : "date",
"format" : "hour_minute_second"
}
}
}
}
Now when you search data with below search query :
{
"query" : {
"bool" : {
"must" : [
{
"term" : {
"orgId" : {
"value" : "z2store",
"boost" : 1.0
}
}
},
{
"term" : {
"type" : {
"value" : "web",
"boost" : 1.0
}
}
},
{
"term" :{
"dateTime":"12:17:08"
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
}
You will get your required result :
"hits": [
{
"_index": "datetimeindexf",
"_type": "_doc",
"_id": "1",
"_score": 1.5753641,
"_source": {
"dateTime": "12:17:08",
"orgId": "z2store",
"type": "web"
}
}
]

Elasticsearch Java High Level Rest Client constructing a boolean query with multiple match values and OR condition

I am trying to construct a query via the java high level rest client that implements taking a list of ids and returning all those documents that match a given id akin to a WHERE clause with an OR operator.
For this reason I have been going with bool query, and trying to iterate the list and must match for each value with operator set to OR
BoolQueryBuilder builder = QueryBuilders.boolQuery();
ids.forEach(i -> {
bool.must(QueryBuilders.matchQuery("_id", i).operator(Operator.OR));
});
return bool;
// this constructs the dsl this way
{
"bool" : {
"must" : [
{
"match" : {
"_id" : {
"query" : "0025370c-baea-4dcc-af48-56c4bdb86854",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},
{
"match" : {
"_id" : {
"query" : "013fedef-6b04-4520-8458-fca8b0366833",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},
{
"match" : {
"_id" : {
"query" : "01c44ce4-0e87-4dc9-8a29-1f24679d335f",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
which is constructed fine only it doesn't work because I think the OR is nested to low, and doesn't get applied across the multiple matches. So I assume that there needs to be a nested type and I tried this:
BoolQueryBuilder bool = QueryBuilders.boolQuery();
BoolQueryBuilder subBool = QueryBuilders.boolQuery();
ids.forEach(i -> {
subBool.must(QueryBuilders.matchQuery("_id", i).operator(Operator.OR));
});
bool.must(subBool);
return bool;
// it would make more sense to me to place the operator condition on bool instead of subBool but it is not available and I am sure I am going at that wrong
{
"bool" : {
"must" : [
{
"bool" : {
"must" : [
{
"match" : {
"_id" : {
"query" : "0025370c-baea-4dcc-af48-56c4bdb86854",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},
{
"match" : {
"_id" : {
"query" : "013fedef-6b04-4520-8458-fca8b0366833",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},
{
"match" : {
"_id" : {
"query" : "01c44ce4-0e87-4dc9-8a29-1f24679d335f",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
This seems to work if I reduce it to single value in the nested match (again 1 id instead of the lot)...so I still think I am implementing the OR condition wrong.
Filters within the bool query instead of must matches yield the same result. Appreciate the help.
The OR-Operator in the match-Queries means that only one term of each query-string of that particular sub-query has to match the document in order for the sub-query to match, so that's not what you're aiming for. To compound the sub-queries with OR, you have to use should instead of mustin your root bool-query. must is the ElasticSearch-equivalent of the AND-operator, while shouldmeans OR.

Categories

Resources