Elastic search exact match query issue - java

I am having a problem while querying elastic search. The below is my query
GET _search {
"query": {
"bool": {
"must": [{
"match": {
"name": "SomeName"
}
},
{
"match": {
"type": "SomeType"
}
},
{
"match": {
"productId": "ff134be8-10fc-4461-b620-79s51199c7qb"
}
},
{
"range": {
"request_date": {
"from": "2018-08-22T12:16:37,392",
"to": "2018-08-28T12:17:41,137",
"format": "YYYY-MM-dd'T'HH:mm:ss,SSS"
}
}
}
]
}
}
}
I am using three match queries and a range query in the bool query. My intention is getting docs with these exact matches and with in this date range. Here , if i change name and type value, i wont get the results. But for productId , if i put just ff134be8, i would get results. Anyone knows why is that ? . The exact match works on name and type but not for productId

You need to set the mapping of your productId to keyword to avoid the tokenization. With the standard tokenizer "ff134be8-10fc-4461-b620-79s51199c7qb" will create ["ff134be8", "10fc", "4461", "b620", "79s51199c7qb"] as tokens.
You have different options :
1/ use a term query to check without analyzing the content of the field
...
{
"term": {
"productId": "ff134be8-10fc-4461-b620-79s51199c7qb"
}
},
...
2/ if you are in Elasticsearch 6.X you could change your request to
...
{
"match": {
"productId.keyword": "ff134be8-10fc-4461-b620-79s51199c7qb"
}
},
...
As elasticsearch will create a subfield keyword with the type keyword for all string field
The best option is, of course, the first one. Always use term query if you are trying to match the exact content.

Related

querying on a new field in open search isnt retrieving results

am using opensearch 2.4 and I have an index with some fields while creating , later i started saving new field to the index , now when i query on the newly created field am not getting any results
ex : query 1
POST abc/_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"name": [
"john"
]
}
}
]
}
}
}
above works fine because name fields exists since creation of index
query 2 :
POST abc/_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"lastname": [
"William"
]
}
}
]
}
}
}
above query doesnt work though i have some documents with lastname william
When you index a new field without previously declaring it in the mapping, opensearch/elastic will generate text type and type keyword.
There are two ways for you to get results with the Term Query. First remember that Term query works with exact terms.
The first option is to use the keyword field.
{
"terms": {
"lastname.keyword": [
"William"
]
}
}
The second option is to search in the text field, but remember that when indexing the default parser is applied, then the lowecase filter leaves the token like this: william.
In this case, the query should be:
{
"terms": {
"lastname": [
"william"
]
}
}
When you use "terms" there must be an exact match (including casing).
So make sure your document contains William and not william or Williams
If you want more tolerance you can explore the match query:
https://opensearch.org/docs/latest/opensearch/query-dsl/full-text/#match

Elasticsearch - fetch matching words based on case insensitive

Am fetching documents from elasticsearch indexes and am using whitespace tokenizer with stemmer.
Please find my mapping file below.
PUT stemmer_lower_test
{
"settings": {
"analysis": {
"analyzer": {
"value_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": ["lowercase", "asciifolding", "my_stemmer"]
}
},
"filter" : {
"my_stemmer" : {
"type" : "stemmer",
"name" : "minimal_english"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"product_attr_value": {
"type": "text",
"analyzer": "value_analyzer"
},
"product_id": {
"type": "long"
},
"product_name":{
"type": "text"
}
}
}
}
}
Please find my fuzzy API which am using :
QueryBuilder qb1 = QueryBuilders.boolQuery()
.must(QueryBuilders.fuzzyQuery("product_attr_value", keyword).boost(0.0f).prefixLength(3).fuzziness(Fuzziness.AUTO).transpositions(true));
If am searching for value (in lowercase) and getting count arround 1555. If i searching for Value (only first character in uppercase) and getting 8979 count.
Am expecting both count should be same. like i want to search with case insensitive.
Fuzzy Query is a term level query, then, Elasticsearch won't apply any analyzer on your search term. You have to normalize it before submitting your search to ES. It's the same for multiple other query types.
While the full text queries will analyze the query string before executing, the term-level queries operate on the exact terms that are stored in the inverted index
See https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html

Elasticsearch matching string with like operator

I would query elasticsearch for retrieve all the document that has field value like a given string.
For example field LIKE "abc" has to return
"abc"
"abcdef"
"abcd"
"abc1"
So all the field that has "abc" string inside.
I try this query but return only the document with field = "abc":
{"query":{"more_like_this":{"fields":["FIELD"],"like_text":"abc","min_term_freq" : 1,"max_query_terms" : 12}}}
What is the correct query?
Thanks
If you're trying to do a Prefix Query, then you can use this.
{ "query": {
"prefix" : { "field" : "abc" }
}
See ElasticSearch Prefix Query ElasticSearch Prefix Query
Although your question is incomplete. I will try to give you several ideas.
One way surely is a prefix query, but much more efficiently is to build an edge ngram analyzer. That way you'll have your data prepared on inserts and query will be much faster. edge ngram is the most flexible way to do your functionality also, because you can autocomplete words that appear in any order. If you don't need to do this, but you only need "search as you type" queries then the best way is to use completion suggester. If you need to find strings that appear in the middle of the words than you can check ngram analyzer.
Here is how I set an edge ngram analyzer from my code.
"settings": {
"analysis": {
"filter" : {
"edge_filter" : {
"type" : "edge_ngram",
"min_gram": 1,
"max_gram": 256
}
},
"analyzer": {
"edge_analyzer" : {
"type" : "custom",
"tokenizer": "whitespace",
"filter" : ["lowercase", "edge_filter"]
},
"lowercase_whitespace": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [ "lowercase" ]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "keyword",
"fields": {
"suggest": {
"type": "text",
"analyzer" : "edge_analyzer",
"search_analyzer": "lowercase_whitespace"
}
}
}
}
}
}
You should be able to perform a wildcard query as described here.
Elasticsearch like query
{
"query": {
"wildcard": {
"<<FIELD NAME>>": "*<<QUERY TEXT>>*"
}
}
}

ElasticSearch mapping for dynamic keys for indexing a map

I have a sample json which I want to index into elasticsearch.
Sample Json Indexed:
put test/names/1
{
"1" : {
"name":"abc"
},
"2" : {
"name":"def"
},
"3" : {
"name":"xyz"
}
}
where ,
index name : test,
type name : names,
id :1
Now the default mapping generated by elasticsearch is :
{
"test": {
"mappings": {
"names": {
"properties": {
"1": {
"properties": {
"name": {
"type": "string"
}
}
},
"2": {
"properties": {
"name": {
"type": "string"
}
}
},
"3": {
"properties": {
"name": {
"type": "string"
}
}
},
"metadataFieldDefinition": {
"properties": {
"name": {
"type": "string"
}
}
}
}
}
}
}
}
If the map size increases from 3 ( currently) to suppose thousand or million, then ElasticSearch will create a mapping for each which may cause a performance issue as the mapping collection will be huge .
I tried creating a mapping by setting :
"dynamic":false,
"type":object
but it was overriden by ES. since it didnt match the indexed data.
Please let me know how can I define a mapping so that ES. doesnot creates one like the above .
I think there might be a little confusion here in terms of how we index documents.
put test/names/1
{...
document
...}
This says: the following document belongs to index test and is of type name with id 1. The entire document is treated as type name. Using the PUT API as you currently are, you cannot index multiple documents at once. ES immediately interprets 1, 2, and 3 as a properties of type object, each containing a property name of type string.
Effectively, ES thinks you are trying to index ONE document, instead of three
To get many documents into index test with a type of name, you could do this, using the CURL syntax:
curl -XPUT"http://your-es-server:9200/test/names/1" -d'
{
"name": "abc"
}'
curl -XPUT"http://your-es-server:9200/test/names/2" -d'
{
"name": "ghi"
}'
curl -XPUT"http://your-es-server:9200/test/names/3" -d'
{
"name": "xyz"
}'
This will specify the document ID in the endpoint you are index to. Your mapping will then look like this:
"test": {
"mappings": {
"names": {
"properties": {
"name": {
"type": "string"
}
}
}
}
}
Final Word: Split your indexing up into discrete operations, or check out the Bulk API to see the syntax on how to POST multiple operations in a single request.

Function score query with field_value_factor on not (yet) existing field

I've been messing around with this problem for quite some time now and can't get round to fixing this.
Take the following case:
I have 2 employees in my company which have their own blog page:
POST blog/page/1
{
"author": "Byron",
"author-title": "Junior Software Developer",
"content" : "My amazing bio"
}
and
POST blog/page/2
{
"author": "Jason",
"author-title": "Senior Software Developer",
"content" : "My amazing bio is better"
}
After they created their blog posts, we would like to keep track of the 'views' of their blogs and boost search results based on their 'views'.
This can be done by using the function score query:
GET blog/_search
{
"query": {
"function_score": {
"query": {
"match": {
"author-title": "developer"
}
},
"functions": [
{
"filter": {
"range": {
"views": {
"from": 1
}
}
},
"field_value_factor": {
"field": "views"
}
}
]
}
}
}
I use the range filter to make sure the field_value_factor doesn't affect the score when the amount of views is 0 (score would be also 0).
Now when I try to run this query, I will get the following exception:
nested: ElasticsearchException[Unable to find a field mapper for field [views]]; }]
Which makes sense, because the field doesn't exist anywhere in the index.
If I were to add views = 0 on index-time, I wouldn't have the above issue as the field is known within the index. But in my use-case I'm unable to add this either on index-time or to a mapping.
Based on the ability to use a range filter within the function score query, I thought I would be able to use a exists filter to make sure that the field_value_factor part would only be executed when the field is actually present in the index, but no such luck:
GET blog/_search
{
"query": {
"function_score": {
"query": {
"match": {
"author-title": "developer"
}
},
"functions": [
{
"filter": {
"bool": {
"must": [
{
"exists": {
"field": "views"
}
},
{
"range": {
"views": {
"from": 1
}
}
}
]
}
},
"field_value_factor": {
"field": "views"
}
}
]
}
}
}
Still gives:
nested: ElasticsearchException[Unable to find a field mapper for field [views]]; }]
Where I'd expect Elasticsearch to apply the filter first, before parsing the field_value_factor.
Any thoughts on how to fix this issue, without the use of mapping files or fixing during index-time or scripts??
The error you're seeing occurs at query parsing time, i.e. nothing has been executed yet. At that time, the FieldValueFactorFunctionParser builds the filter_value_factor function to be executed later, but it notices that the views field doesn't exist in the mapping type.
Note that the filter has not been executed yet, just like the filter_value_factor function, it has only been parsed by FunctionScoreQueryParser.
I'm wondering why you can't simply add a field in your mapping type, it's as easy as running this
curl -XPUT 'http://localhost:9200/blog/_mapping/page' -d '{
"page" : {
"properties" : {
"views" : {"type" : "integer"}
}
}
}'
If this is REALLY not an option, another possibility would be to use script_score instead, like this:
{
"query": {
"function_score": {
"query": {
"match": {
"author-title": "developer"
}
},
"functions": [
{
"filter": {
"range": {
"views": {
"from": 1
}
}
},
"script_score": {
"script": "_score * doc.views.value"
}
}
]
}
}
}

Categories

Resources