ElasticSearch - matchPhraseQuery API to search with multiple fields

ElasticSearch - matchPhraseQuery API to search with multiple fields - java

Am searching for specific field which is having ngram tokenizer and am querying that field (code) using matchPhraseQuery and it is working fine.
Now, i want to search with 3 field. How can we do this.?
Please find my java code which am searching for only one field (code).
SearchRequest searchRequest = new SearchRequest(INDEX);
searchRequest.types(TYPE);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryBuilder qb = QueryBuilders.matchPhraseQuery("code", code);
searchSourceBuilder.query(qb);
searchSourceBuilder.size(10);
searchRequest.source(searchSourceBuilder);
Please find my mappings details below :
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "ngram",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
},
"attribute" : {
"type" : "text",
"analyzer" : "custom_analyzer"
},
"term" : {
"type" : "text",
"analyzer" : "custom_analyzer"
}
}
}
}
}
Now, i want to make a search query for 3 fields code, attribute, term
I have tried the below java code, which is not working as expected :
BoolQueryBuilder orQuery = QueryBuilders.boolQuery();
QueryBuilder qb1 = QueryBuilders.matchPhraseQuery("catalog_keywords", keyword);
QueryBuilder qb2 = QueryBuilders.matchPhraseQuery("product_keywords", keyword);
orQuery.should(qb1);
orQuery.should(qb2);
orQuery.minimumShouldMatch(1);
searchSourceBuilder.query(orQuery);
searchSourceBuilder.size(10);
searchRequest.source(searchSourceBuilder);
My Input Query :
Logi
Output am getting like :
"Materiały, programy doborowe | Marketing | Katalogi, broszury"
Which is totally irrelavant to my query. Expected results is, Logiciels
And my field having value with delimiters |, so i just want only the exact match of the word/character. It should not print with all the delimiters and all.

Use bool query :
GET products/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"FIELD": "PHRASE"
}
},
{
"match_phrase": {
"FIELD": "PHRASE"
}
},
{
"match_phrase": {
"FIELD": "PHRASE"
}
}
]
}
}
}

Related

Remove hyphens while search time in ElasticSearch

I want to create a search for books with ElasticSearch and SpringData.
I index my books with ISBN/EAN without hyphens and save it in my database. This data I index with ElasticSearch.
Indexed data: 1113333444444
If I'm search for a ISBN/EAN with hyphen: 111-3333-444444
There is no result. If I'm searching without hyphen, my book will be found as expected.
My settings are like this:
{
"analysis": {
"filter": {
"clean_special": {
"type": "pattern_replace",
"pattern": "[^a-zA-Z0-9]",
"replacement": ""
}
},
"analyzer": {
"isbn_search_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"clean_special"
]
}
}
}
}
I index my fields like this:
#Field(type = FieldType.Keyword, searchAnalyzer = "isbn_search_analyzer")
private String isbn;
#Field(type = FieldType.Keyword, searchAnalyzer = "isbn_search_analyzer")
private String ean;
If I test my analyzer:
GET indexname/_analyze
{
"analyzer" : "isbn_search_analyzer",
"text" : "111-3333-444444"
}
I get following result:
{
"tokens" : [
{
"token" : "1113333444444",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 0
}
]
}
If I'm search like this:
GET indexname/_search
{
"query": {
"query_string": {
"fields": [ "isbn", "ean" ],
"query": "111-3333-444444"
}
}
}
I don't get any result. Have someone of you an idea?

As mentioned by #P.J.Meisch, you have done everything correct, but missed defining your field data type to text, when you define them as keyword, even though you are explicitly telling ElasticSearch to use your custom-analyzer isbn_search_analyzer, it will be ignored.
Working example on your sample data when field is defined as text.
Index mapping
{
"settings": {
"analysis": {
"filter": {
"clean_special": {
"type": "pattern_replace",
"pattern": "[^a-zA-Z0-9]",
"replacement": ""
}
},
"analyzer": {
"isbn_search_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"clean_special"
]
}
}
}
},
"mappings": {
"properties": {
"isbn": {
"type": "text",
"analyzer": "isbn_search_analyzer"
},
"ean": {
"type": "text",
"analyzer": "isbn_search_analyzer"
}
}
}
}
Index Sample records
{
"isbn" : "111-3333-444444"
}
{
"isbn" : "111-3333-2222"
}
Search query
{
"query": {
"query_string": {
"fields": [
"isbn",
"ean"
],
"query": "111-3333-444444"
}
}
}
And search response
"hits": [
{
"_index": "65780647",
"_type": "_doc",
"_id": "1",
"_score": 0.6931471,
"_source": {
"isbn": "111-3333-444444"
}
}
]

Elasticsearch does not analyze fields of type keyword. You need to set the type to text.

Why one of the field of response has "_id" field's value?

I'm working on Elasticsearch Java API, and I faced weird problem.
below is the data stored:
"_index": "my_index",
"_type": "SEC",
"_id": "1111111111111111",
"_score": 0,
"_source": {
"LOG_NO": 2222222222222222
}
And I call the request like below:
QueryBuilder queryBuilder = QueryBuilders.boolQuery().filter(query);
request.setQuery(queryBuilder);
SearchResponse searchResponse = request.get();
This is the search response:
"_index":"my_index",
"_type":"SEC",
"_id":"1111111111111111",
"_score":null,
"_source": {
"LOG_NO":1111111111111111,
}
As you can see, the "LOG_NO" of the response should be '2222222222222222' not '1111111111111111'.
the parameter 'query' is QueryBuilder and value is like below:
{
"bool": {
"must": [
{
"range": {
"LOG_GEN_TIME": {
"from": "2018-11-01 00:00:00+09:00",
"to": "2018-11-01 23:59:59+09:00",
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
},
{
"bool": {
"must": [
{
"term": {
"ASSET_IP": "xx.xxx.xxx.xxx"
}
},
{
"term": {
"DST_PORT": "xx"
}
}
]
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
I don't understand what is the problem.
Any comments would be appreciated, Thanks.
-- Edited for #Val
"_index": "my_index",
"_type": "SEC",
"_id": "9197340043548295192",
"_score": null,
"_source": {
"ASSET_IP": "xx.xxx.xx.xxx",
"LOG_NO": 9197340043548295200,
"LOG_GEN_TIME": "2018-11-01 23:10:53+09:00",
"SRC_IP": "xx.xxx.xx.xxx",
"SRC_PORT": xx,
"DST_IP": "xx.xxx.xx.xxx",
"DST_PORT": xx,
"DESCRIPTION": "log",
"DST_NATION_CD": "USA",
}
}
this is the full document that I expect to be returned and the only "LOG_NO" field makes problem.

I found that the problem is javascript.
Because JS can't express the parameter that exceed the range of number, it was transformed.
My ES setting was the integer field "LOG_NO" is set as a index and the "_id" is string expression of "LOG_NO".
So there was no problem, except the integer number looks different on web.
Hope others would not suffer the same problem.

ElasticSearch - JavaApi searching by each character instead of term (word)

Am fetching documents from elastic search using java api, i have the following code in my elastic search documents and am trying to search it with the following pattern.
code : MS-VMA1615-0D
Input : MS-VMA1615-0D -- Am getting the results (MS-VMA1615-0D).
Input : VMA1615 -- Am getting the results (MS-VMA1615-0D) .
Input : VMA -- Am getting the results (MS-VMA1615-0D) .
But, if i give input like below, am not getting results.
Input : V -- Am not getting the results.
INPUT : MS -- Am not getting the results.
INPUT : -V -- Am not getting the results.
INPUT : 615 -- Am not getting the results.
Am expecting to return the code MS-VMA1615-0D. In simple, am trying to search character by character instead of term (word).
It should not return the code MS-VMA1615-0D for the following cases, Because its not matching with my code.
Input : VK -- should not return the results.
INPUT : MS3 -- should not return the results.
Please find my below java code that am using
private final String INDEX = "products";
private final String TYPE = "doc";
SearchRequest searchRequest = new SearchRequest(INDEX);
searchRequest.types(TYPE);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryStringQueryBuilder qsQueryBuilder = new QueryStringQueryBuilder(code);
qsQueryBuilder.defaultField("code");
searchSourceBuilder.query(qsQueryBuilder);
searchSourceBuilder.size(50);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = null;
try {
searchResponse = SearchEngineClient.getInstance().search(searchRequest);
} catch (IOException e) {
e.getLocalizedMessage();
}
Item item = null;
SearchHit[] searchHits = searchResponse.getHits().getHits();
Please find my mapping details :
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "my_pattern_tokenizer",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
},
"tokenizer": {
"my_pattern_tokenizer": {
"type": "pattern",
"pattern": "-|\\d"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
After Update with new Answer :
This is my request via Java API
'SearchRequest{searchType=QUERY_THEN_FETCH, indices=[products], indicesOptions=IndicesOptions[id=38, ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false], types=[doc], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=128, source={"size":50,"query":{"match_phrase":{"code":{"query":"1615","slop":0,"boost":1.0}}}}}
' . But am getting response as null

Follow up: ElasticSearch - JavaApi searching not happening without (*) in my input query
Your mapping should look like:
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "ngram",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
And you should be using a match_phrase query.
In Kibana:
GET products/_search
{
"query": {
"match_phrase": {
"code": "V"
}
}
}
will return the result:
"hits": [
{
"_index": "products",
"_type": "doc",
"_id": "EoGtdGQBqdof7JidJkM_",
"_score": 0.2876821,
"_source": {
"code": "MS-VMA1615-0D"
}
}
]
But this:
GET products/_search
{
"query": {
"match_phrase": {
"code": "VK"
}
}
}
wont:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Based on your comment:
Instead of using a Query string:
QueryStringQueryBuilder qsQueryBuilder = new QueryStringQueryBuilder(code);
qsQueryBuilder.defaultField("code");
searchSourceBuilder.query(qsQueryBuilder);
searchSourceBuilder.size(50);
searchRequest.source(searchSourceBuilder);
Use a match phrase query:
QueryBuilder query = QueryBuilders.matchPhraseQuery("code", code);
searchSourceBuilder.query(query);
searchSourceBuilder.size(50);
searchRequest.source(searchSourceBuilder);

elasticsearch aggs groupby field and orderby score

currently I implemented it from postman but I can't implement it from JAVA code. Below is the post json body.
I only want to have the full-text search for Innovation. And groupby email and orderby score. But seems the sum of the score still didn't work. Who can help me out? It worked now.
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "Innovation"
}
}
]
}
},
"highlight": {
"require_field_match": false,
"pre_tags" : [ "<b>" ],
"post_tags" : [ "</b>" ],
"order" : "score",
"highlight_filter" : false,
"fields": {
"*": {}
}
},
"aggs": {
"group_by_emails": {
"terms": { "field": "email" }
}
}
}
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.from(0);
searchSourceBuilder.size(10);
searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
QueryStringQueryBuilder queryString = new QueryStringQueryBuilder(keyword);
searchSourceBuilder.query(queryString);
Script script = new Script("_score;");
AggregationBuilder aggregation = AggregationBuilders
.terms("agg")
.field("email")
.order(BucketOrder.aggregation("sum_score", false))
.subAggregation(AggregationBuilders.sum("sum_score").script(script))
;
searchSourceBuilder.aggregation(aggregation);
System.out.println(searchSourceBuilder.toString());
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.source(searchSourceBuilder);

ElasticSearch stored_fields java API

I am trying to build ElasticSearch query using java API. This query uses stored_fields, can anyone please help me how to build stored_field query from java code.
{
"from": 0,
"size": 10,
"stored_fields": [
"f1",
"f2",
"f3",
"f4"
],
"query": {
"bool": {
"must": {
"match": {
"compositeField1": {
"query": "test123",
"type": "boolean",
"operator": "AND"
}
}
}
},
"sort": [
{
"_score": {}
}
]
}

Code following
SearchRequestBuilder srb = ....
srb.setFrom(0).setSize(10).storedFields("f1", "f2", "f3", "f4");
srb.addSort(SortBuilders.scoreSort());
BoolQueryBuilder bqb = new BoolQueryBuilder();
bqb.must(QueryBuilders.matchQuery("compositeField1", "test123")
.operator(Operator.AND).type(MatchQuery.Type.BOOLEAN));
srb.setQuery(bqb);
Note: ES set some default required parameters.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

ElasticSearch - matchPhraseQuery API to search with multiple fields - java

Use bool query : GET products/_search { "query": { "bool": { "must": [ { "match_phrase": { "FIELD": "PHRASE" } }, { "match_phrase": { "FIELD": "PHRASE" } }, { "match_phrase": { "FIELD": "PHRASE" } } ] } } }

Related

Remove hyphens while search time in ElasticSearch

Why one of the field of response has "_id" field's value?

ElasticSearch - JavaApi searching by each character instead of term (word)

elasticsearch aggs groupby field and orderby score

ElasticSearch stored_fields java API

Categories

Resources