ElasticSearch - JavaApi searching by each character instead of term (word)

ElasticSearch - JavaApi searching by each character instead of term (word) - java

Am fetching documents from elastic search using java api, i have the following code in my elastic search documents and am trying to search it with the following pattern.
code : MS-VMA1615-0D
Input : MS-VMA1615-0D -- Am getting the results (MS-VMA1615-0D).
Input : VMA1615 -- Am getting the results (MS-VMA1615-0D) .
Input : VMA -- Am getting the results (MS-VMA1615-0D) .
But, if i give input like below, am not getting results.
Input : V -- Am not getting the results.
INPUT : MS -- Am not getting the results.
INPUT : -V -- Am not getting the results.
INPUT : 615 -- Am not getting the results.
Am expecting to return the code MS-VMA1615-0D. In simple, am trying to search character by character instead of term (word).
It should not return the code MS-VMA1615-0D for the following cases, Because its not matching with my code.
Input : VK -- should not return the results.
INPUT : MS3 -- should not return the results.
Please find my below java code that am using
private final String INDEX = "products";
private final String TYPE = "doc";
SearchRequest searchRequest = new SearchRequest(INDEX);
searchRequest.types(TYPE);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryStringQueryBuilder qsQueryBuilder = new QueryStringQueryBuilder(code);
qsQueryBuilder.defaultField("code");
searchSourceBuilder.query(qsQueryBuilder);
searchSourceBuilder.size(50);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = null;
try {
searchResponse = SearchEngineClient.getInstance().search(searchRequest);
} catch (IOException e) {
e.getLocalizedMessage();
}
Item item = null;
SearchHit[] searchHits = searchResponse.getHits().getHits();
Please find my mapping details :
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "my_pattern_tokenizer",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
},
"tokenizer": {
"my_pattern_tokenizer": {
"type": "pattern",
"pattern": "-|\\d"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
After Update with new Answer :
This is my request via Java API
'SearchRequest{searchType=QUERY_THEN_FETCH, indices=[products], indicesOptions=IndicesOptions[id=38, ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false], types=[doc], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=128, source={"size":50,"query":{"match_phrase":{"code":{"query":"1615","slop":0,"boost":1.0}}}}}
' . But am getting response as null

Follow up: ElasticSearch - JavaApi searching not happening without (*) in my input query
Your mapping should look like:
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "ngram",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
And you should be using a match_phrase query.
In Kibana:
GET products/_search
{
"query": {
"match_phrase": {
"code": "V"
}
}
}
will return the result:
"hits": [
{
"_index": "products",
"_type": "doc",
"_id": "EoGtdGQBqdof7JidJkM_",
"_score": 0.2876821,
"_source": {
"code": "MS-VMA1615-0D"
}
}
]
But this:
GET products/_search
{
"query": {
"match_phrase": {
"code": "VK"
}
}
}
wont:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Based on your comment:
Instead of using a Query string:
QueryStringQueryBuilder qsQueryBuilder = new QueryStringQueryBuilder(code);
qsQueryBuilder.defaultField("code");
searchSourceBuilder.query(qsQueryBuilder);
searchSourceBuilder.size(50);
searchRequest.source(searchSourceBuilder);
Use a match phrase query:
QueryBuilder query = QueryBuilders.matchPhraseQuery("code", code);
searchSourceBuilder.query(query);
searchSourceBuilder.size(50);
searchRequest.source(searchSourceBuilder);

Related

Remove hyphens while search time in ElasticSearch

I want to create a search for books with ElasticSearch and SpringData.
I index my books with ISBN/EAN without hyphens and save it in my database. This data I index with ElasticSearch.
Indexed data: 1113333444444
If I'm search for a ISBN/EAN with hyphen: 111-3333-444444
There is no result. If I'm searching without hyphen, my book will be found as expected.
My settings are like this:
{
"analysis": {
"filter": {
"clean_special": {
"type": "pattern_replace",
"pattern": "[^a-zA-Z0-9]",
"replacement": ""
}
},
"analyzer": {
"isbn_search_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"clean_special"
]
}
}
}
}
I index my fields like this:
#Field(type = FieldType.Keyword, searchAnalyzer = "isbn_search_analyzer")
private String isbn;
#Field(type = FieldType.Keyword, searchAnalyzer = "isbn_search_analyzer")
private String ean;
If I test my analyzer:
GET indexname/_analyze
{
"analyzer" : "isbn_search_analyzer",
"text" : "111-3333-444444"
}
I get following result:
{
"tokens" : [
{
"token" : "1113333444444",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 0
}
]
}
If I'm search like this:
GET indexname/_search
{
"query": {
"query_string": {
"fields": [ "isbn", "ean" ],
"query": "111-3333-444444"
}
}
}
I don't get any result. Have someone of you an idea?

As mentioned by #P.J.Meisch, you have done everything correct, but missed defining your field data type to text, when you define them as keyword, even though you are explicitly telling ElasticSearch to use your custom-analyzer isbn_search_analyzer, it will be ignored.
Working example on your sample data when field is defined as text.
Index mapping
{
"settings": {
"analysis": {
"filter": {
"clean_special": {
"type": "pattern_replace",
"pattern": "[^a-zA-Z0-9]",
"replacement": ""
}
},
"analyzer": {
"isbn_search_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"clean_special"
]
}
}
}
},
"mappings": {
"properties": {
"isbn": {
"type": "text",
"analyzer": "isbn_search_analyzer"
},
"ean": {
"type": "text",
"analyzer": "isbn_search_analyzer"
}
}
}
}
Index Sample records
{
"isbn" : "111-3333-444444"
}
{
"isbn" : "111-3333-2222"
}
Search query
{
"query": {
"query_string": {
"fields": [
"isbn",
"ean"
],
"query": "111-3333-444444"
}
}
}
And search response
"hits": [
{
"_index": "65780647",
"_type": "_doc",
"_id": "1",
"_score": 0.6931471,
"_source": {
"isbn": "111-3333-444444"
}
}
]

Elasticsearch does not analyze fields of type keyword. You need to set the type to text.

Why one of the field of response has "_id" field's value?

I'm working on Elasticsearch Java API, and I faced weird problem.
below is the data stored:
"_index": "my_index",
"_type": "SEC",
"_id": "1111111111111111",
"_score": 0,
"_source": {
"LOG_NO": 2222222222222222
}
And I call the request like below:
QueryBuilder queryBuilder = QueryBuilders.boolQuery().filter(query);
request.setQuery(queryBuilder);
SearchResponse searchResponse = request.get();
This is the search response:
"_index":"my_index",
"_type":"SEC",
"_id":"1111111111111111",
"_score":null,
"_source": {
"LOG_NO":1111111111111111,
}
As you can see, the "LOG_NO" of the response should be '2222222222222222' not '1111111111111111'.
the parameter 'query' is QueryBuilder and value is like below:
{
"bool": {
"must": [
{
"range": {
"LOG_GEN_TIME": {
"from": "2018-11-01 00:00:00+09:00",
"to": "2018-11-01 23:59:59+09:00",
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
},
{
"bool": {
"must": [
{
"term": {
"ASSET_IP": "xx.xxx.xxx.xxx"
}
},
{
"term": {
"DST_PORT": "xx"
}
}
]
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
I don't understand what is the problem.
Any comments would be appreciated, Thanks.
-- Edited for #Val
"_index": "my_index",
"_type": "SEC",
"_id": "9197340043548295192",
"_score": null,
"_source": {
"ASSET_IP": "xx.xxx.xx.xxx",
"LOG_NO": 9197340043548295200,
"LOG_GEN_TIME": "2018-11-01 23:10:53+09:00",
"SRC_IP": "xx.xxx.xx.xxx",
"SRC_PORT": xx,
"DST_IP": "xx.xxx.xx.xxx",
"DST_PORT": xx,
"DESCRIPTION": "log",
"DST_NATION_CD": "USA",
}
}
this is the full document that I expect to be returned and the only "LOG_NO" field makes problem.

I found that the problem is javascript.
Because JS can't express the parameter that exceed the range of number, it was transformed.
My ES setting was the integer field "LOG_NO" is set as a index and the "_id" is string expression of "LOG_NO".
So there was no problem, except the integer number looks different on web.
Hope others would not suffer the same problem.

ElasticSearch - matchPhraseQuery API to search with multiple fields

Am searching for specific field which is having ngram tokenizer and am querying that field (code) using matchPhraseQuery and it is working fine.
Now, i want to search with 3 field. How can we do this.?
Please find my java code which am searching for only one field (code).
SearchRequest searchRequest = new SearchRequest(INDEX);
searchRequest.types(TYPE);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryBuilder qb = QueryBuilders.matchPhraseQuery("code", code);
searchSourceBuilder.query(qb);
searchSourceBuilder.size(10);
searchRequest.source(searchSourceBuilder);
Please find my mappings details below :
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "ngram",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
},
"attribute" : {
"type" : "text",
"analyzer" : "custom_analyzer"
},
"term" : {
"type" : "text",
"analyzer" : "custom_analyzer"
}
}
}
}
}
Now, i want to make a search query for 3 fields code, attribute, term
I have tried the below java code, which is not working as expected :
BoolQueryBuilder orQuery = QueryBuilders.boolQuery();
QueryBuilder qb1 = QueryBuilders.matchPhraseQuery("catalog_keywords", keyword);
QueryBuilder qb2 = QueryBuilders.matchPhraseQuery("product_keywords", keyword);
orQuery.should(qb1);
orQuery.should(qb2);
orQuery.minimumShouldMatch(1);
searchSourceBuilder.query(orQuery);
searchSourceBuilder.size(10);
searchRequest.source(searchSourceBuilder);
My Input Query :
Logi
Output am getting like :
"Materiały, programy doborowe | Marketing | Katalogi, broszury"
Which is totally irrelavant to my query. Expected results is, Logiciels
And my field having value with delimiters |, so i just want only the exact match of the word/character. It should not print with all the delimiters and all.

Use bool query :
GET products/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"FIELD": "PHRASE"
}
},
{
"match_phrase": {
"FIELD": "PHRASE"
}
},
{
"match_phrase": {
"FIELD": "PHRASE"
}
}
]
}
}
}

Best practice to search ingest-attachment from documents (2k+ documents with ingest-attachment)

Am fetching the indexed documents from elastic search using Java API. But am getting Null as a response from elastic search when Index having more number of document like (2k+).
If index doesnt have more documents less than 500 something, the below Java API code is working properly.
More number of documents in Index, creating issue. ( Is that something like performance issue while fetching ?)
I used ingest-attachment processor plugin for attachment, i attached PDF in my documents.
But if i search with the same query using kibana with curl script am getting response, and am able to see the results in Kibana
Please find my java code below
private final static String ATTACHMENT = "document_attachment";
private final static String TYPE = "doc";
public static void main(String args[])
{
RestHighLevelClient restHighLevelClient = null;
try {
restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
new HttpHost("localhost", 9201, "http")));
} catch (Exception e) {
System.out.println(e.getMessage());
}
SearchRequest contentSearchRequest = new SearchRequest(ATTACHMENT);
SearchSourceBuilder contentSearchSourceBuilder = new SearchSourceBuilder();
contentSearchRequest.types(TYPE);
QueryStringQueryBuilder attachmentQB = new QueryStringQueryBuilder("Activa");
attachmentQB.defaultField("attachment.content");
contentSearchSourceBuilder.query(attachmentQB);
contentSearchSourceBuilder.size(50);
contentSearchRequest.source(contentSearchSourceBuilder);
SearchResponse contentSearchResponse = null;
try {
contentSearchResponse = restHighLevelClient.search(contentSearchRequest); // returning null response
} catch (IOException e) {
e.getLocalizedMessage();
}
System.out.println("Request --->"+contentSearchRequest.toString());
System.out.println("Response --->"+contentSearchResponse.toString());
SearchHit[] contentSearchHits = contentSearchResponse.getHits().getHits();
long contenttotalHits=contentSearchResponse.getHits().totalHits;
System.out.println("condition Total Hits --->"+contenttotalHits);
Please find my script that am using in kibana., am getting response for the below script.
GET document_attachment/_search?pretty
{
"query" :{
"match": {"attachment.content": "Activa"}
}
}
Please find the below search request from Java API
SearchRequest{searchType=QUERY_THEN_FETCH, indices=[document_attachment], indicesOptions=IndicesOptions[id=38, ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false], types=[doc], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=128, source={"size":50,"query":{"match":{"attachment.content":{"query":"Activa","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}}}}
Please find my mapping details
{
"document_attachment": {
"mappings": {
"doc": {
"properties": {
"app_language": {
"type": "text"
},
"attachment": {
"properties": {
"author": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"content": {
"type": "text",
"analyzer": "custom_analyzer"
},
"content_length": {
"type": "long"
},
"content_type": {
"type": "text"
},
"date": {
"type": "date"
},
"language": {
"type": "text"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"catalog_description": {
"type": "text"
},
"fileContent": {
"type": "text"
}
}
}
}
}
}
}
Please find my settings details
PUT _ingest/pipeline/document_attachment
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "fileContent"
}
}
]
}
Am getting this error only when am trying to search based on attachment.content , If i search with some other field am able to get results.
Am using ElasticSearch 6.2.3 version
Please find the error below.
org.apache.http.ContentTooLongException: entity content is too long [105539255] for the configured buffer limit [104857600]
at org.elasticsearch.client.HeapBufferedAsyncResponseConsumer.onEntityEnclosed(HeapBufferedAsyncResponseConsumer.java:76)
at org.apache.http.nio.protocol.AbstractAsyncResponseConsumer.responseReceived(AbstractAsyncResponseConsumer.java:131)
at org.apache.http.impl.nio.client.MainClientExec.responseReceived(MainClientExec.java:315)
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseReceived(DefaultClientExchangeHandlerImpl.java:147)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.responseReceived(HttpAsyncRequestExecutor.java:303)
at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:255)
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" java.lang.NullPointerException
at com.es.utility.DocumentSearch.main(DocumentSearch.java:88)

elasticsearch aggs groupby field and orderby score

currently I implemented it from postman but I can't implement it from JAVA code. Below is the post json body.
I only want to have the full-text search for Innovation. And groupby email and orderby score. But seems the sum of the score still didn't work. Who can help me out? It worked now.
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "Innovation"
}
}
]
}
},
"highlight": {
"require_field_match": false,
"pre_tags" : [ "<b>" ],
"post_tags" : [ "</b>" ],
"order" : "score",
"highlight_filter" : false,
"fields": {
"*": {}
}
},
"aggs": {
"group_by_emails": {
"terms": { "field": "email" }
}
}
}
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.from(0);
searchSourceBuilder.size(10);
searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
QueryStringQueryBuilder queryString = new QueryStringQueryBuilder(keyword);
searchSourceBuilder.query(queryString);
Script script = new Script("_score;");
AggregationBuilder aggregation = AggregationBuilders
.terms("agg")
.field("email")
.order(BucketOrder.aggregation("sum_score", false))
.subAggregation(AggregationBuilders.sum("sum_score").script(script))
;
searchSourceBuilder.aggregation(aggregation);
System.out.println(searchSourceBuilder.toString());
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.source(searchSourceBuilder);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

ElasticSearch - JavaApi searching by each character instead of term (word) - java

Related

Remove hyphens while search time in ElasticSearch

Why one of the field of response has "_id" field's value?

ElasticSearch - matchPhraseQuery API to search with multiple fields

Best practice to search ingest-attachment from documents (2k+ documents with ingest-attachment)

elasticsearch aggs groupby field and orderby score

Categories

Resources