Am fetching documents from elasticsearch indexes and am using whitespace tokenizer with stemmer.
Please find my mapping file below.
PUT stemmer_lower_test
{
"settings": {
"analysis": {
"analyzer": {
"value_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": ["lowercase", "asciifolding", "my_stemmer"]
}
},
"filter" : {
"my_stemmer" : {
"type" : "stemmer",
"name" : "minimal_english"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"product_attr_value": {
"type": "text",
"analyzer": "value_analyzer"
},
"product_id": {
"type": "long"
},
"product_name":{
"type": "text"
}
}
}
}
}
Please find my fuzzy API which am using :
QueryBuilder qb1 = QueryBuilders.boolQuery()
.must(QueryBuilders.fuzzyQuery("product_attr_value", keyword).boost(0.0f).prefixLength(3).fuzziness(Fuzziness.AUTO).transpositions(true));
If am searching for value (in lowercase) and getting count arround 1555. If i searching for Value (only first character in uppercase) and getting 8979 count.
Am expecting both count should be same. like i want to search with case insensitive.
Fuzzy Query is a term level query, then, Elasticsearch won't apply any analyzer on your search term. You have to normalize it before submitting your search to ES. It's the same for multiple other query types.
While the full text queries will analyze the query string before executing, the term-level queries operate on the exact terms that are stored in the inverted index
See https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html
Related
I am having a problem while querying elastic search. The below is my query
GET _search {
"query": {
"bool": {
"must": [{
"match": {
"name": "SomeName"
}
},
{
"match": {
"type": "SomeType"
}
},
{
"match": {
"productId": "ff134be8-10fc-4461-b620-79s51199c7qb"
}
},
{
"range": {
"request_date": {
"from": "2018-08-22T12:16:37,392",
"to": "2018-08-28T12:17:41,137",
"format": "YYYY-MM-dd'T'HH:mm:ss,SSS"
}
}
}
]
}
}
}
I am using three match queries and a range query in the bool query. My intention is getting docs with these exact matches and with in this date range. Here , if i change name and type value, i wont get the results. But for productId , if i put just ff134be8, i would get results. Anyone knows why is that ? . The exact match works on name and type but not for productId
You need to set the mapping of your productId to keyword to avoid the tokenization. With the standard tokenizer "ff134be8-10fc-4461-b620-79s51199c7qb" will create ["ff134be8", "10fc", "4461", "b620", "79s51199c7qb"] as tokens.
You have different options :
1/ use a term query to check without analyzing the content of the field
...
{
"term": {
"productId": "ff134be8-10fc-4461-b620-79s51199c7qb"
}
},
...
2/ if you are in Elasticsearch 6.X you could change your request to
...
{
"match": {
"productId.keyword": "ff134be8-10fc-4461-b620-79s51199c7qb"
}
},
...
As elasticsearch will create a subfield keyword with the type keyword for all string field
The best option is, of course, the first one. Always use term query if you are trying to match the exact content.
I would query elasticsearch for retrieve all the document that has field value like a given string.
For example field LIKE "abc" has to return
"abc"
"abcdef"
"abcd"
"abc1"
So all the field that has "abc" string inside.
I try this query but return only the document with field = "abc":
{"query":{"more_like_this":{"fields":["FIELD"],"like_text":"abc","min_term_freq" : 1,"max_query_terms" : 12}}}
What is the correct query?
Thanks
If you're trying to do a Prefix Query, then you can use this.
{ "query": {
"prefix" : { "field" : "abc" }
}
See ElasticSearch Prefix Query ElasticSearch Prefix Query
Although your question is incomplete. I will try to give you several ideas.
One way surely is a prefix query, but much more efficiently is to build an edge ngram analyzer. That way you'll have your data prepared on inserts and query will be much faster. edge ngram is the most flexible way to do your functionality also, because you can autocomplete words that appear in any order. If you don't need to do this, but you only need "search as you type" queries then the best way is to use completion suggester. If you need to find strings that appear in the middle of the words than you can check ngram analyzer.
Here is how I set an edge ngram analyzer from my code.
"settings": {
"analysis": {
"filter" : {
"edge_filter" : {
"type" : "edge_ngram",
"min_gram": 1,
"max_gram": 256
}
},
"analyzer": {
"edge_analyzer" : {
"type" : "custom",
"tokenizer": "whitespace",
"filter" : ["lowercase", "edge_filter"]
},
"lowercase_whitespace": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [ "lowercase" ]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "keyword",
"fields": {
"suggest": {
"type": "text",
"analyzer" : "edge_analyzer",
"search_analyzer": "lowercase_whitespace"
}
}
}
}
}
}
You should be able to perform a wildcard query as described here.
Elasticsearch like query
{
"query": {
"wildcard": {
"<<FIELD NAME>>": "*<<QUERY TEXT>>*"
}
}
}
I am using ElasticSearch 2.3.1.
I need to create a query that checks if specific term (a word or a list of words) are present in a text. Basically something as the like operator.
If i use the bool/must/match filter i can order the documents by score, but i must remove the documents that do not have all the terms i need to search.
At the moment i am using 2gram, this is the mapping...
{
"settings": {
"analysis": {
"filter": {
"2gramsto3_filter": {
"type": "ngram",
"min_gram": 2,
"max_gram": 3
}
},
"analyzer": {
"2gramsto3": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"2gramsto3_filter"
]
}
}
}
},
"mappings": {
"agents": {
"properties": {
"cv": {
"type": "string",
"analyzer": "2gramsto3"
}
}
}
}
but as i wrote above all the terms must be inside the text so not just a bigram.
If you need to match by a specific set of words, then bool/must/match_phrase may be more appropriate than bool/must/match
A Quick reference about this can be found in the Getting started section of the documentation, which you can find here: https://www.elastic.co/guide/en/elasticsearch/reference/current/_executing_searches.html:
From the above source:
This example is a variant of match (match_phrase) that returns all accounts containing the phrase "mill lane" in the address:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match_phrase": { "address": "mill lane" } }
}'
More details from ES documentation can be found at this location: https://www.elastic.co/guide/en/elasticsearch/reference/current/_executing_searches.html
I have a sample json which I want to index into elasticsearch.
Sample Json Indexed:
put test/names/1
{
"1" : {
"name":"abc"
},
"2" : {
"name":"def"
},
"3" : {
"name":"xyz"
}
}
where ,
index name : test,
type name : names,
id :1
Now the default mapping generated by elasticsearch is :
{
"test": {
"mappings": {
"names": {
"properties": {
"1": {
"properties": {
"name": {
"type": "string"
}
}
},
"2": {
"properties": {
"name": {
"type": "string"
}
}
},
"3": {
"properties": {
"name": {
"type": "string"
}
}
},
"metadataFieldDefinition": {
"properties": {
"name": {
"type": "string"
}
}
}
}
}
}
}
}
If the map size increases from 3 ( currently) to suppose thousand or million, then ElasticSearch will create a mapping for each which may cause a performance issue as the mapping collection will be huge .
I tried creating a mapping by setting :
"dynamic":false,
"type":object
but it was overriden by ES. since it didnt match the indexed data.
Please let me know how can I define a mapping so that ES. doesnot creates one like the above .
I think there might be a little confusion here in terms of how we index documents.
put test/names/1
{...
document
...}
This says: the following document belongs to index test and is of type name with id 1. The entire document is treated as type name. Using the PUT API as you currently are, you cannot index multiple documents at once. ES immediately interprets 1, 2, and 3 as a properties of type object, each containing a property name of type string.
Effectively, ES thinks you are trying to index ONE document, instead of three
To get many documents into index test with a type of name, you could do this, using the CURL syntax:
curl -XPUT"http://your-es-server:9200/test/names/1" -d'
{
"name": "abc"
}'
curl -XPUT"http://your-es-server:9200/test/names/2" -d'
{
"name": "ghi"
}'
curl -XPUT"http://your-es-server:9200/test/names/3" -d'
{
"name": "xyz"
}'
This will specify the document ID in the endpoint you are index to. Your mapping will then look like this:
"test": {
"mappings": {
"names": {
"properties": {
"name": {
"type": "string"
}
}
}
}
}
Final Word: Split your indexing up into discrete operations, or check out the Bulk API to see the syntax on how to POST multiple operations in a single request.
I am storing some documents in elastic search with 2 columns:
id | data
--- -----
(int) json string
What mapping do I use to ask elastic search to just store the json string without doing any processing on it? For efficiency, I do not want that column to be indexed or searchable - just want ES to store it as bits.
Use "index": "no" in your mapping for that field.
So maybe something like:
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"doc": {
"properties": {
"id": {
"type": "integer"
},
"json_string": {
"type": "string",
"index": "no"
}
}
}
}
}