Elasticsearch matching string with like operator - java

I would query elasticsearch for retrieve all the document that has field value like a given string.
For example field LIKE "abc" has to return
"abc"
"abcdef"
"abcd"
"abc1"
So all the field that has "abc" string inside.
I try this query but return only the document with field = "abc":
{"query":{"more_like_this":{"fields":["FIELD"],"like_text":"abc","min_term_freq" : 1,"max_query_terms" : 12}}}
What is the correct query?
Thanks

If you're trying to do a Prefix Query, then you can use this.
{ "query": {
"prefix" : { "field" : "abc" }
}
See ElasticSearch Prefix Query ElasticSearch Prefix Query

Although your question is incomplete. I will try to give you several ideas.
One way surely is a prefix query, but much more efficiently is to build an edge ngram analyzer. That way you'll have your data prepared on inserts and query will be much faster. edge ngram is the most flexible way to do your functionality also, because you can autocomplete words that appear in any order. If you don't need to do this, but you only need "search as you type" queries then the best way is to use completion suggester. If you need to find strings that appear in the middle of the words than you can check ngram analyzer.
Here is how I set an edge ngram analyzer from my code.
"settings": {
"analysis": {
"filter" : {
"edge_filter" : {
"type" : "edge_ngram",
"min_gram": 1,
"max_gram": 256
}
},
"analyzer": {
"edge_analyzer" : {
"type" : "custom",
"tokenizer": "whitespace",
"filter" : ["lowercase", "edge_filter"]
},
"lowercase_whitespace": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [ "lowercase" ]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "keyword",
"fields": {
"suggest": {
"type": "text",
"analyzer" : "edge_analyzer",
"search_analyzer": "lowercase_whitespace"
}
}
}
}
}
}

You should be able to perform a wildcard query as described here.
Elasticsearch like query
{
"query": {
"wildcard": {
"<<FIELD NAME>>": "*<<QUERY TEXT>>*"
}
}
}

Related

Elastic search exact match query issue

I am having a problem while querying elastic search. The below is my query
GET _search {
"query": {
"bool": {
"must": [{
"match": {
"name": "SomeName"
}
},
{
"match": {
"type": "SomeType"
}
},
{
"match": {
"productId": "ff134be8-10fc-4461-b620-79s51199c7qb"
}
},
{
"range": {
"request_date": {
"from": "2018-08-22T12:16:37,392",
"to": "2018-08-28T12:17:41,137",
"format": "YYYY-MM-dd'T'HH:mm:ss,SSS"
}
}
}
]
}
}
}
I am using three match queries and a range query in the bool query. My intention is getting docs with these exact matches and with in this date range. Here , if i change name and type value, i wont get the results. But for productId , if i put just ff134be8, i would get results. Anyone knows why is that ? . The exact match works on name and type but not for productId
You need to set the mapping of your productId to keyword to avoid the tokenization. With the standard tokenizer "ff134be8-10fc-4461-b620-79s51199c7qb" will create ["ff134be8", "10fc", "4461", "b620", "79s51199c7qb"] as tokens.
You have different options :
1/ use a term query to check without analyzing the content of the field
...
{
"term": {
"productId": "ff134be8-10fc-4461-b620-79s51199c7qb"
}
},
...
2/ if you are in Elasticsearch 6.X you could change your request to
...
{
"match": {
"productId.keyword": "ff134be8-10fc-4461-b620-79s51199c7qb"
}
},
...
As elasticsearch will create a subfield keyword with the type keyword for all string field
The best option is, of course, the first one. Always use term query if you are trying to match the exact content.

Elasticsearch - fetch matching words based on case insensitive

Am fetching documents from elasticsearch indexes and am using whitespace tokenizer with stemmer.
Please find my mapping file below.
PUT stemmer_lower_test
{
"settings": {
"analysis": {
"analyzer": {
"value_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": ["lowercase", "asciifolding", "my_stemmer"]
}
},
"filter" : {
"my_stemmer" : {
"type" : "stemmer",
"name" : "minimal_english"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"product_attr_value": {
"type": "text",
"analyzer": "value_analyzer"
},
"product_id": {
"type": "long"
},
"product_name":{
"type": "text"
}
}
}
}
}
Please find my fuzzy API which am using :
QueryBuilder qb1 = QueryBuilders.boolQuery()
.must(QueryBuilders.fuzzyQuery("product_attr_value", keyword).boost(0.0f).prefixLength(3).fuzziness(Fuzziness.AUTO).transpositions(true));
If am searching for value (in lowercase) and getting count arround 1555. If i searching for Value (only first character in uppercase) and getting 8979 count.
Am expecting both count should be same. like i want to search with case insensitive.
Fuzzy Query is a term level query, then, Elasticsearch won't apply any analyzer on your search term. You have to normalize it before submitting your search to ES. It's the same for multiple other query types.
While the full text queries will analyze the query string before executing, the term-level queries operate on the exact terms that are stored in the inverted index
See https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html

What filter should i use to match the exact term/s inside a string?

I am using ElasticSearch 2.3.1.
I need to create a query that checks if specific term (a word or a list of words) are present in a text. Basically something as the like operator.
If i use the bool/must/match filter i can order the documents by score, but i must remove the documents that do not have all the terms i need to search.
At the moment i am using 2gram, this is the mapping...
{
"settings": {
"analysis": {
"filter": {
"2gramsto3_filter": {
"type": "ngram",
"min_gram": 2,
"max_gram": 3
}
},
"analyzer": {
"2gramsto3": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"2gramsto3_filter"
]
}
}
}
},
"mappings": {
"agents": {
"properties": {
"cv": {
"type": "string",
"analyzer": "2gramsto3"
}
}
}
}
but as i wrote above all the terms must be inside the text so not just a bigram.
If you need to match by a specific set of words, then bool/must/match_phrase may be more appropriate than bool/must/match
A Quick reference about this can be found in the Getting started section of the documentation, which you can find here: https://www.elastic.co/guide/en/elasticsearch/reference/current/_executing_searches.html:
From the above source:
This example is a variant of match (match_phrase) that returns all accounts containing the phrase "mill lane" in the address:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match_phrase": { "address": "mill lane" } }
}'
More details from ES documentation can be found at this location: https://www.elastic.co/guide/en/elasticsearch/reference/current/_executing_searches.html

ElasticSearch mapping for dynamic keys for indexing a map

I have a sample json which I want to index into elasticsearch.
Sample Json Indexed:
put test/names/1
{
"1" : {
"name":"abc"
},
"2" : {
"name":"def"
},
"3" : {
"name":"xyz"
}
}
where ,
index name : test,
type name : names,
id :1
Now the default mapping generated by elasticsearch is :
{
"test": {
"mappings": {
"names": {
"properties": {
"1": {
"properties": {
"name": {
"type": "string"
}
}
},
"2": {
"properties": {
"name": {
"type": "string"
}
}
},
"3": {
"properties": {
"name": {
"type": "string"
}
}
},
"metadataFieldDefinition": {
"properties": {
"name": {
"type": "string"
}
}
}
}
}
}
}
}
If the map size increases from 3 ( currently) to suppose thousand or million, then ElasticSearch will create a mapping for each which may cause a performance issue as the mapping collection will be huge .
I tried creating a mapping by setting :
"dynamic":false,
"type":object
but it was overriden by ES. since it didnt match the indexed data.
Please let me know how can I define a mapping so that ES. doesnot creates one like the above .
I think there might be a little confusion here in terms of how we index documents.
put test/names/1
{...
document
...}
This says: the following document belongs to index test and is of type name with id 1. The entire document is treated as type name. Using the PUT API as you currently are, you cannot index multiple documents at once. ES immediately interprets 1, 2, and 3 as a properties of type object, each containing a property name of type string.
Effectively, ES thinks you are trying to index ONE document, instead of three
To get many documents into index test with a type of name, you could do this, using the CURL syntax:
curl -XPUT"http://your-es-server:9200/test/names/1" -d'
{
"name": "abc"
}'
curl -XPUT"http://your-es-server:9200/test/names/2" -d'
{
"name": "ghi"
}'
curl -XPUT"http://your-es-server:9200/test/names/3" -d'
{
"name": "xyz"
}'
This will specify the document ID in the endpoint you are index to. Your mapping will then look like this:
"test": {
"mappings": {
"names": {
"properties": {
"name": {
"type": "string"
}
}
}
}
}
Final Word: Split your indexing up into discrete operations, or check out the Bulk API to see the syntax on how to POST multiple operations in a single request.

Function score query with field_value_factor on not (yet) existing field

I've been messing around with this problem for quite some time now and can't get round to fixing this.
Take the following case:
I have 2 employees in my company which have their own blog page:
POST blog/page/1
{
"author": "Byron",
"author-title": "Junior Software Developer",
"content" : "My amazing bio"
}
and
POST blog/page/2
{
"author": "Jason",
"author-title": "Senior Software Developer",
"content" : "My amazing bio is better"
}
After they created their blog posts, we would like to keep track of the 'views' of their blogs and boost search results based on their 'views'.
This can be done by using the function score query:
GET blog/_search
{
"query": {
"function_score": {
"query": {
"match": {
"author-title": "developer"
}
},
"functions": [
{
"filter": {
"range": {
"views": {
"from": 1
}
}
},
"field_value_factor": {
"field": "views"
}
}
]
}
}
}
I use the range filter to make sure the field_value_factor doesn't affect the score when the amount of views is 0 (score would be also 0).
Now when I try to run this query, I will get the following exception:
nested: ElasticsearchException[Unable to find a field mapper for field [views]]; }]
Which makes sense, because the field doesn't exist anywhere in the index.
If I were to add views = 0 on index-time, I wouldn't have the above issue as the field is known within the index. But in my use-case I'm unable to add this either on index-time or to a mapping.
Based on the ability to use a range filter within the function score query, I thought I would be able to use a exists filter to make sure that the field_value_factor part would only be executed when the field is actually present in the index, but no such luck:
GET blog/_search
{
"query": {
"function_score": {
"query": {
"match": {
"author-title": "developer"
}
},
"functions": [
{
"filter": {
"bool": {
"must": [
{
"exists": {
"field": "views"
}
},
{
"range": {
"views": {
"from": 1
}
}
}
]
}
},
"field_value_factor": {
"field": "views"
}
}
]
}
}
}
Still gives:
nested: ElasticsearchException[Unable to find a field mapper for field [views]]; }]
Where I'd expect Elasticsearch to apply the filter first, before parsing the field_value_factor.
Any thoughts on how to fix this issue, without the use of mapping files or fixing during index-time or scripts??
The error you're seeing occurs at query parsing time, i.e. nothing has been executed yet. At that time, the FieldValueFactorFunctionParser builds the filter_value_factor function to be executed later, but it notices that the views field doesn't exist in the mapping type.
Note that the filter has not been executed yet, just like the filter_value_factor function, it has only been parsed by FunctionScoreQueryParser.
I'm wondering why you can't simply add a field in your mapping type, it's as easy as running this
curl -XPUT 'http://localhost:9200/blog/_mapping/page' -d '{
"page" : {
"properties" : {
"views" : {"type" : "integer"}
}
}
}'
If this is REALLY not an option, another possibility would be to use script_score instead, like this:
{
"query": {
"function_score": {
"query": {
"match": {
"author-title": "developer"
}
},
"functions": [
{
"filter": {
"range": {
"views": {
"from": 1
}
}
},
"script_score": {
"script": "_score * doc.views.value"
}
}
]
}
}
}

Categories

Resources