How do I write this elasticsearch query using java APIs - java

This following query works correctly and returns the results that I need. I am struggling to write this using JAVA APIs though.
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "somepath",
"query": {
"bool": {
"filter": [
{
"terms": {
"somepath.key": ["key1", "key2", "key3"]
}
}
]
}
}
}
}
]
}
}
}
I am using this in JAVA. What am I missing? commaSeparatedKeyString = "key1, key2, key3"
QueryBuilders.boolQuery().must(QueryBuilders.nestedQuery(
"somepath",
QueryBuilders.boolQuery().filter(QueryBuilders.termsQuery("somepath.key", commaSeparatedKeyString)),
ScoreMode.Total));

For debugging purposes, it can be helpful to check the JSON serialization of the query you are building. Fortunately, the toString() methods in the query builders do that for you, so you can simply use System.out.println to print the query builder to stdout (or log it with a logging framework). Assuming that the variable commaSeparatedKeyString is set to "key1,key2,key3" (it sounds like it is, but you don't tell us), you are actually creating the following query:
{
"bool" : {
"must" : [
{
"nested" : {
"query" : {
"bool" : {
"filter" : [
{
"terms" : {
"somepath.key" : [
"key1,key2,key3"
],
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"path" : "somepath",
"ignore_unmapped" : false,
"score_mode" : "sum",
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
As you can see, there are at least two relevant differences in the query you require and the query you are building:
On the top level, the query you want starts with "bool.filter...", but you are building a query with "bool.must...":
QueryBuilders.boolQuery().must(QueryBuilders.nestedQuery(
The innermost term query is supposed to have an array of terms (key1, key2, key3). You can't simply pass one string with comma separated values to achieve that, but have to pass the terms one by one:
termsQuery("somepath.key", "key1", "key2", "key3"))

Related

ES - Convert Legacy ElasticSearch from Rest to High level Client

I'm wondering what is the equivalent for the following query with elastic 5 - 7 (doesn't matter for me)
I know that this query has deprecated but actually i'm trying to use legacy 1.7.5 cluster work with High level ES cluster.
I did some tests and although the documentation points that it isn't support i tried and most of the simple actions work. What is left is convert some queries like in the example below
{
"size" : 3000,
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" : [ {
"terms" : {
"source" : [ "o365mail" ]
}
}, {
"range" : {
"bckdate" : {
"from" : "1549360021398l",
"to" : null,
"include_lower" : true,
"include_upper" : true
}
}
} ]
}
}
}
},
"fields" : "*"
}
What i've tried so far is with 7.9.3:
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.9/java-rest-high.html
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
boolQueryBuilder
.must(QueryBuilders.termsQuery(IndexFields.SOURCE.getIndexName(),Arrays.asList(Source.O365MAIL.toString().toLowerCase())))
.must(QueryBuilders.rangeQuery("bckdate").gte(1549360021398l).lte(null));
sourceBuilder.query(boolQueryBuilder);
SearchRequest sr = new SearchRequest();
sr.source(sourceBuilder);
SearchResponse searchResponse2 = client.search(sr, RequestOptions.DEFAULT);
The query from debugging is:
{
"bool" : {
"must" : [
{
"terms" : {
"source" : [
"o365mail"
],
"boost" : 1.0
}
},
{
"range" : {
"bckdate" : {
"from" : 1549360021398,
"to" : null,
"include_lower" : true,
"include_upper" : true,
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
Im wondering if it is the same regarding the filters of the legacy code, cause the responded data is pretty the same .
I need not to break the logic with all Filters like in legacy query...
thanks for help

Elasticsearch query from Java REST High level client returns different/undesirable results compared to execution of Query DSL on Kibana

I'm implementing an elasticsearch search feature using Java High-level REST client, which queries the ES index residing on one of the clusters hosted on the cloud. My intended query JSON DSL looks like this,
{
"query" :{
"bool": {
"should" :[
{
"query_string":{
"query":"cla-180",
"default_field": "product_title",
"boost" : 3
}
},
{
"match" : {
"product_title" : {
"query" : "cla-180",
"fuzziness" : "AUTO"
}
}
}
]
}
}
}
Corresponding to this, I have written the code to be executed using the Java High-level REST client, which performs the same functionality as the DSL above.
BoolQueryBuilder boolQueryBuilder = buildBoolQuery();
boolQueryBuilder.should(QueryBuilders.queryStringQuery("cla-180").defaultField("product_title")).boost(3);
boolQueryBuilder.should(QueryBuilders.matchQuery("product_title", "cla-180").fuzziness(Fuzziness.AUTO));
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(boolQueryBuilder);
What I'm noticing is that the search results for the java methods are different to the results when I execute the DSL on kibana directly. I find records that have no relation to the search content that I have given above when executed from Java. I consider this weird because I guess I have implemented the Java code to match that of the JSON query DSL given above.
When I try to print the generated JSON from the Java side, its output is like this below,
{
"query": {
"bool" : {
"should" : [
{
"query_string" : {
"query" : "cla-180",
"default_field" : "product_title",
"fields" : [ ],
"type" : "best_fields",
"default_operator" : "or",
"max_determinized_states" : 10000,
"enable_position_increments" : true,
"fuzziness" : "AUTO",
"fuzzy_prefix_length" : 0,
"fuzzy_max_expansions" : 50,
"phrase_slop" : 0,
"escape" : false,
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
},
{
"match" : {
"product_title" : {
"query" : "cla-180",
"operator" : "OR",
"fuzziness" : "AUTO",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"minimum_should_match" : "1",
"boost" : 3.0
}
}
}
Am I missing something in my java code, that makes the search results to be returned in an undesirable fashion? Or what could be the reason for this mismatch in records that are being returned in these 2 methods?
Thanks in advance!

Elasticsearch matching string with like operator

I would query elasticsearch for retrieve all the document that has field value like a given string.
For example field LIKE "abc" has to return
"abc"
"abcdef"
"abcd"
"abc1"
So all the field that has "abc" string inside.
I try this query but return only the document with field = "abc":
{"query":{"more_like_this":{"fields":["FIELD"],"like_text":"abc","min_term_freq" : 1,"max_query_terms" : 12}}}
What is the correct query?
Thanks
If you're trying to do a Prefix Query, then you can use this.
{ "query": {
"prefix" : { "field" : "abc" }
}
See ElasticSearch Prefix Query ElasticSearch Prefix Query
Although your question is incomplete. I will try to give you several ideas.
One way surely is a prefix query, but much more efficiently is to build an edge ngram analyzer. That way you'll have your data prepared on inserts and query will be much faster. edge ngram is the most flexible way to do your functionality also, because you can autocomplete words that appear in any order. If you don't need to do this, but you only need "search as you type" queries then the best way is to use completion suggester. If you need to find strings that appear in the middle of the words than you can check ngram analyzer.
Here is how I set an edge ngram analyzer from my code.
"settings": {
"analysis": {
"filter" : {
"edge_filter" : {
"type" : "edge_ngram",
"min_gram": 1,
"max_gram": 256
}
},
"analyzer": {
"edge_analyzer" : {
"type" : "custom",
"tokenizer": "whitespace",
"filter" : ["lowercase", "edge_filter"]
},
"lowercase_whitespace": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [ "lowercase" ]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "keyword",
"fields": {
"suggest": {
"type": "text",
"analyzer" : "edge_analyzer",
"search_analyzer": "lowercase_whitespace"
}
}
}
}
}
}
You should be able to perform a wildcard query as described here.
Elasticsearch like query
{
"query": {
"wildcard": {
"<<FIELD NAME>>": "*<<QUERY TEXT>>*"
}
}
}

Function score query with field_value_factor on not (yet) existing field

I've been messing around with this problem for quite some time now and can't get round to fixing this.
Take the following case:
I have 2 employees in my company which have their own blog page:
POST blog/page/1
{
"author": "Byron",
"author-title": "Junior Software Developer",
"content" : "My amazing bio"
}
and
POST blog/page/2
{
"author": "Jason",
"author-title": "Senior Software Developer",
"content" : "My amazing bio is better"
}
After they created their blog posts, we would like to keep track of the 'views' of their blogs and boost search results based on their 'views'.
This can be done by using the function score query:
GET blog/_search
{
"query": {
"function_score": {
"query": {
"match": {
"author-title": "developer"
}
},
"functions": [
{
"filter": {
"range": {
"views": {
"from": 1
}
}
},
"field_value_factor": {
"field": "views"
}
}
]
}
}
}
I use the range filter to make sure the field_value_factor doesn't affect the score when the amount of views is 0 (score would be also 0).
Now when I try to run this query, I will get the following exception:
nested: ElasticsearchException[Unable to find a field mapper for field [views]]; }]
Which makes sense, because the field doesn't exist anywhere in the index.
If I were to add views = 0 on index-time, I wouldn't have the above issue as the field is known within the index. But in my use-case I'm unable to add this either on index-time or to a mapping.
Based on the ability to use a range filter within the function score query, I thought I would be able to use a exists filter to make sure that the field_value_factor part would only be executed when the field is actually present in the index, but no such luck:
GET blog/_search
{
"query": {
"function_score": {
"query": {
"match": {
"author-title": "developer"
}
},
"functions": [
{
"filter": {
"bool": {
"must": [
{
"exists": {
"field": "views"
}
},
{
"range": {
"views": {
"from": 1
}
}
}
]
}
},
"field_value_factor": {
"field": "views"
}
}
]
}
}
}
Still gives:
nested: ElasticsearchException[Unable to find a field mapper for field [views]]; }]
Where I'd expect Elasticsearch to apply the filter first, before parsing the field_value_factor.
Any thoughts on how to fix this issue, without the use of mapping files or fixing during index-time or scripts??
The error you're seeing occurs at query parsing time, i.e. nothing has been executed yet. At that time, the FieldValueFactorFunctionParser builds the filter_value_factor function to be executed later, but it notices that the views field doesn't exist in the mapping type.
Note that the filter has not been executed yet, just like the filter_value_factor function, it has only been parsed by FunctionScoreQueryParser.
I'm wondering why you can't simply add a field in your mapping type, it's as easy as running this
curl -XPUT 'http://localhost:9200/blog/_mapping/page' -d '{
"page" : {
"properties" : {
"views" : {"type" : "integer"}
}
}
}'
If this is REALLY not an option, another possibility would be to use script_score instead, like this:
{
"query": {
"function_score": {
"query": {
"match": {
"author-title": "developer"
}
},
"functions": [
{
"filter": {
"range": {
"views": {
"from": 1
}
}
},
"script_score": {
"script": "_score * doc.views.value"
}
}
]
}
}
}

Why the below query in elasticsearch does not work?

I have test data shown below.
{
"SequenceLocation":{
"Assembly":"GPR7",
"Chr": "10",
"start": 1111
}
}
Whenever I fired query like below it returns me proper values.
{
"query" : {
"bool" : {
"must" : [
{
"term" : {
"SequenceLocation.Chr": "10"
}
}
]
}
}
}
But when I changes query to
{
"query" : {
"bool" : {
"must" : [
{
"term" : {
"SequenceLocation.Assembly": "GPR7"
}
}
]
}
}
}
It does not return me any hits from Elasticsearch. Could you please explain what am I doing wrong?
I think you have wrong mapping for SequenceLocation.Assembly. Default analyzer splits GPR7.p10 into two tokens gpr7 and p10.
According to documentation term query doesn't analyze your query, so you are asking elasticsearch for GPR7.p10 but it is indexed as tokens gpr7 and p10. So it can't match.
You should recreate index with mapping set to "index" : "not_analyzed" for SequenceLocation.Assembly field.

Categories

Resources