Elasticsearch Multimatch substring not working - java

So I have a record with following field :
"fullName" : "Virat Kohli"
I have written the following multi_match query that should fetch this record :
GET _search
{
"query": {
"multi_match": {
"query": "*kohli*",
"fields": [
"fullName^1.0",
"team^1.0"
],
"type": "phrase_prefix",
"operator": "OR",
"slop": 0,
"prefix_length": 0,
"max_expansions": 50,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1
}
}
}
This works fine.
But when I remove the letter 'k' from query and change it to :
"query": "*ohli*"
It doesn't fetch any record.
Any reason why this is happening? How can I modify the query to get the record returned with the above modification?

first let me explain you why your existing query didn't work and then the solution of it.
Problem : you are using the multi_match query with type phrase_prefix and as explained in the documentation it makes a prefix query on the last search term and in your case you have only 1 search term so on that Elasticsearch will perform the phrase query.
And prefix query works on the exact tokens, and you are using standard analyzer mostly, default for text fields so for fullName field it will have virat and kohli and your search term also generates kohli(notice smallcase k) as standard analyzer also lowercase the tokens, above you can check with the explain API output in your first request as shown below.
"_explanation": {
"value": 0.2876821,
"description": "max of:",
"details": [
{
"value": 0.2876821,
"description": "weight(fullName:kohli in 0) [PerFieldSimilarity], result of:",
"details": [
{
(note he search term in the weight)
Solution
As you are trying to use the wildcard in your query, best solution is to use the wildcard query against your field as shown below to get results in both case.
{
"query": {
"wildcard": {
"fullName": {
"value": "*ohli",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
And SR
"hits": [
{
"_shard": "[match_query][0]",
"_node": "BKVyHFTiSCeq4zzD-ZqMbA",
"_index": "match_query",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"id": 2,
"fullName": "Virat Kohli",
"team": [
"Royal Challengers Bangalore",
"India"
]
},
"_explanation": {
"value": 1.0,
"description": "fullName:*ohli",
"details": []
}
}
]

Related

Elasticsearch - fetch matching words based on case insensitive

Am fetching documents from elasticsearch indexes and am using whitespace tokenizer with stemmer.
Please find my mapping file below.
PUT stemmer_lower_test
{
"settings": {
"analysis": {
"analyzer": {
"value_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": ["lowercase", "asciifolding", "my_stemmer"]
}
},
"filter" : {
"my_stemmer" : {
"type" : "stemmer",
"name" : "minimal_english"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"product_attr_value": {
"type": "text",
"analyzer": "value_analyzer"
},
"product_id": {
"type": "long"
},
"product_name":{
"type": "text"
}
}
}
}
}
Please find my fuzzy API which am using :
QueryBuilder qb1 = QueryBuilders.boolQuery()
.must(QueryBuilders.fuzzyQuery("product_attr_value", keyword).boost(0.0f).prefixLength(3).fuzziness(Fuzziness.AUTO).transpositions(true));
If am searching for value (in lowercase) and getting count arround 1555. If i searching for Value (only first character in uppercase) and getting 8979 count.
Am expecting both count should be same. like i want to search with case insensitive.
Fuzzy Query is a term level query, then, Elasticsearch won't apply any analyzer on your search term. You have to normalize it before submitting your search to ES. It's the same for multiple other query types.
While the full text queries will analyze the query string before executing, the term-level queries operate on the exact terms that are stored in the inverted index
See https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html

Elastic Search: Query the given index to find out values fields

Can anyone help me in making a query for the following index..
"_index": "demodata",
"_type": "sarthak",
"_id": "AVyLnJgOVAC1tB7BveyG",
"_score": 1,
"_source": {
"values": """[{"label":"Male","value":"m","selected":true},{"label":"Female","value":"F"},{"label":"Other","value":"O"}]""",
"name": "select-1496990862221",
"className": "form-control",
"label": "Select",
"type": "select",
"required": true
I want to get inside values and get the label ,value and selected values.Also the values are not fixed they will change. Si I want a query which works on other types. Thanks for the help.
it's a nested field, something like this should work :
“must”: [
{
“nested”: {
“path”: “values”,
“filter”: {
“terms”: {
“values.label”: [
“male”
]
}
}
}
}
]
This query works for ES 1.7.*, you might need a little modification if you work with other versions.

Term not found in Search but present in a term vector in Elasticsearch

I have term in my dataset which does not give any search results but is present in a document.
If I request a term vector:
GET index_5589b14f3004fb6be70e4724/document_set/382.txt/_termvector
{
"fields" : ["plain_text", "pdf_text"],
"term_statistics" : true,
"field_statistics" : true
}
The term vector has this word:
...
"advis": { //porter stemmed version of the word "advising"
"doc_freq": 1,
"ttf": 1,
"term_freq": 1,
"tokens": [
{
"position": 81,
"start_offset": 412,
"end_offset": 420
}
]
},...
"air": {
But when I search this word to retrieve all the documents where it has occurred I get zero hits:
GET index_5589b14f3004fb6be70e4724/document_set/_search
{
"query": {
"multi_match": {
"query": "advis",
"fields": ["plain_text", "pdf_text"]
}
},
"explain": true
}
Why is this happening?
This is due to the fact that the search term is getting analyzed most probably in the above example advis is being stemmed to advi.
You can explicitly specify keyword analyzer in the query and you should get the values
GET index_5589b14f3004fb6be70e4724/document_set/_search
{
"query": {
"multi_match": {
"query": "advis",
"fields": ["plain_text", "pdf_text"],
"analyzer" : "keyword"
}
},
"explain": true
}

Elasticsearch query doesn't produce expected result

I'm having trouble creating a query which should search for any documents with a certain search term in the fields title and text, and should match a state field which could be zero or more values where atleast one must match.
Given the following query:
"bool" : {
"must" : {
"multi_match" : {
"query" : "test",
"fields" : [ "title", "text" ]
}
},
"should" : {
"terms" : {
"state" : [ "NEW" ]
}
},
"minimum_should_match" : "1"
}
Should not the following data be returned as a result?
{
"_shards": {
"failed": 0,
"successful": 5,
"total": 5
},
"hits": {
"hits": [
{
"_id": "JXnEkYFDQp2feATMzp2LTA",
"_index": "tips",
"_score": 1.0,
"_source": {
"state": "NEW",
"text": "This is a test",
"title": "Test"
},
"_type": "tip"
}
],
"max_score": 1.0,
"total": 1
},
"timed_out": false,
"took": 1
}
In my test this is not the case. What am i doing wrong?
the following is the java code producing the outputted query.
SearchRequestBuilder builder = client.prepareSearch("tips").setTypes("tip");
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
if(searchTermIsNotEmpty(searchTerm)){
MultiMatchQueryBuilder qb = QueryBuilders.multiMatchQuery(
searchTerm,
"title", "text"
);
boolQuery.must(qb);
}
if(filters.size() > 0){
boolQuery.should(QueryBuilders.termsQuery("state",filters));
boolQuery.minimumNumberShouldMatch(1);
}
if(boolQuery.hasClauses()){
builder.setQuery(boolQuery);
}
logger.info(boolQuery.toString());
SearchResponse result = builder.execute().actionGet();
return result.toString();
Any help on this is greatly appreciated!
Seems i found the issue, for some reason i was unable to fetch when using the filter enum in it's original form. I had to convert the enum to string and lowercase it.
I then added the following query
boolQuery.must(QueryBuilders.termsQuery("state", getLowerCaseEnumCollection(filters)).minimumMatch(1));
I'm new to elasticsearch, so i don't know if this is a bug, or a feature. Im just glad i figured it out.

Querying an embedded list in OrientDB

I have a document in my OrientDB database (version 1.0.1), with a structure largely like this:
{
"timestamp": "...",
"duration": 665,
"testcases": [
{
"testName": "test01",
"type": "ignore",
"filename": "tests/test1.js"
},
{
"iterations": 1,
"runningTime": 45,
"testName": "test02",
"type": "pass",
"filename": "tests/test1.js"
},
...
{
"testName": "test05",
"type": "ignore",
"filename": "tests/test1.js"
}
]
}
How can I query across the entire list, eg. if I want to find all documents that contain a testcase with the type "ignore"?
I've attempted the following query
select from testresult where testcases['type'] = 'ignore'
but this results in a NumberFormatException.
select from testresult where testcases[0]['type'] = 'ignore'
works, but obviously only looks at the first list element of each document.
select from testresult where testcases contains(type = 'ignore')
Doesn't provide any results, but the query is accepted as valid.
Update:
The following query works as intended, if the testcases are stored as separate documents instead of as an embedded list.
select from testresult where testcases contains (type = 'ignore')
I know it's an old question but I had the same problem and just stubled upon an answer here:
https://www.mail-archive.com/orient-database#googlegroups.com/msg00662.html
The following should work. It does in my very similiar use case.
select from testresult where 'ignore' in testcases.type
I have similar problem and end up with :
select * from testresult
let $tmp = (select from
(select expand(testcases) from testresult )
where
value.type = 'ignore')
where
testcases in $tmp.value
this will give you all the testresult documents that contains at least one testcase which type is ignore. This query works on embedded lists. Take care that expand function is available in OrientDB >= 1.4.0.
The inner query :
select from (select expand(testcases) from testresult) where value.type='ignore'
select only the different testcases with a type = 'ignore'. The result are testcases. To have the whole document we match those testcases with the ones contained in each document (testcases in $tmp.value).
I don't know if there is a simpler way to query embedded list...
Try
select from testresult where testcases traverse ( type = 'ignore' )
Check the traverse operator ( https://groups.google.com/forum/?fromgroups#!topic/orient-database/zoBGmIg85o4 ) to know how to use the fetchplan or put any() instead of testcases just after "where".
For example we have a class called Country which has an embeddedlist property with some of its isoCodes. If we attempt the following query :
select name,description,isoCodes,status from Country where isoCodes traverse ( value = 'GB' OR value = 'IT' )
Orientdb Rest interface provides :
{
"result": [{
"#type": "d", "#version": 0,
"name": "Italy",
"isoCodes": [
{
"#type": "d", "#version": 0,
"code": "iso3166-A2",
"value": "IT"
},
{
"#type": "d", "#version": 0,
"code": "iso3166-A3",
"value": "ITA"
}],
"status": [
{
"#type": "d", "#version": 0,
"status": "1",
"startingDate": "2012-04-24"
}]
}, {
"#type": "d", "#version": 0,
"name": "United Kingdom",
"isoCodes": [
{
"#type": "d", "#version": 0,
"code": "iso3166-A2",
"value": "GB"
},
{
"#type": "d", "#version": 0,
"code": "iso3166-A3",
"value": "GBR"
}],
"status": [
{
"#type": "d", "#version": 0,
"status": "1",
"startingDate": "2012-04-24"
}]
}
]
}
Hope it helps!!.
Regards.

Categories

Resources