Why one of the field of response has "_id" field's value? - java

I'm working on Elasticsearch Java API, and I faced weird problem.
below is the data stored:
"_index": "my_index",
"_type": "SEC",
"_id": "1111111111111111",
"_score": 0,
"_source": {
"LOG_NO": 2222222222222222
}
And I call the request like below:
QueryBuilder queryBuilder = QueryBuilders.boolQuery().filter(query);
request.setQuery(queryBuilder);
SearchResponse searchResponse = request.get();
This is the search response:
"_index":"my_index",
"_type":"SEC",
"_id":"1111111111111111",
"_score":null,
"_source": {
"LOG_NO":1111111111111111,
}
As you can see, the "LOG_NO" of the response should be '2222222222222222' not '1111111111111111'.
the parameter 'query' is QueryBuilder and value is like below:
{
"bool": {
"must": [
{
"range": {
"LOG_GEN_TIME": {
"from": "2018-11-01 00:00:00+09:00",
"to": "2018-11-01 23:59:59+09:00",
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
},
{
"bool": {
"must": [
{
"term": {
"ASSET_IP": "xx.xxx.xxx.xxx"
}
},
{
"term": {
"DST_PORT": "xx"
}
}
]
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
I don't understand what is the problem.
Any comments would be appreciated, Thanks.
-- Edited for #Val
"_index": "my_index",
"_type": "SEC",
"_id": "9197340043548295192",
"_score": null,
"_source": {
"ASSET_IP": "xx.xxx.xx.xxx",
"LOG_NO": 9197340043548295200,
"LOG_GEN_TIME": "2018-11-01 23:10:53+09:00",
"SRC_IP": "xx.xxx.xx.xxx",
"SRC_PORT": xx,
"DST_IP": "xx.xxx.xx.xxx",
"DST_PORT": xx,
"DESCRIPTION": "log",
"DST_NATION_CD": "USA",
}
}
this is the full document that I expect to be returned and the only "LOG_NO" field makes problem.

I found that the problem is javascript.
Because JS can't express the parameter that exceed the range of number, it was transformed.
My ES setting was the integer field "LOG_NO" is set as a index and the "_id" is string expression of "LOG_NO".
So there was no problem, except the integer number looks different on web.
Hope others would not suffer the same problem.

Related

Remove hyphens while search time in ElasticSearch

I want to create a search for books with ElasticSearch and SpringData.
I index my books with ISBN/EAN without hyphens and save it in my database. This data I index with ElasticSearch.
Indexed data: 1113333444444
If I'm search for a ISBN/EAN with hyphen: 111-3333-444444
There is no result. If I'm searching without hyphen, my book will be found as expected.
My settings are like this:
{
"analysis": {
"filter": {
"clean_special": {
"type": "pattern_replace",
"pattern": "[^a-zA-Z0-9]",
"replacement": ""
}
},
"analyzer": {
"isbn_search_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"clean_special"
]
}
}
}
}
I index my fields like this:
#Field(type = FieldType.Keyword, searchAnalyzer = "isbn_search_analyzer")
private String isbn;
#Field(type = FieldType.Keyword, searchAnalyzer = "isbn_search_analyzer")
private String ean;
If I test my analyzer:
GET indexname/_analyze
{
"analyzer" : "isbn_search_analyzer",
"text" : "111-3333-444444"
}
I get following result:
{
"tokens" : [
{
"token" : "1113333444444",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 0
}
]
}
If I'm search like this:
GET indexname/_search
{
"query": {
"query_string": {
"fields": [ "isbn", "ean" ],
"query": "111-3333-444444"
}
}
}
I don't get any result. Have someone of you an idea?
As mentioned by #P.J.Meisch, you have done everything correct, but missed defining your field data type to text, when you define them as keyword, even though you are explicitly telling ElasticSearch to use your custom-analyzer isbn_search_analyzer, it will be ignored.
Working example on your sample data when field is defined as text.
Index mapping
{
"settings": {
"analysis": {
"filter": {
"clean_special": {
"type": "pattern_replace",
"pattern": "[^a-zA-Z0-9]",
"replacement": ""
}
},
"analyzer": {
"isbn_search_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"clean_special"
]
}
}
}
},
"mappings": {
"properties": {
"isbn": {
"type": "text",
"analyzer": "isbn_search_analyzer"
},
"ean": {
"type": "text",
"analyzer": "isbn_search_analyzer"
}
}
}
}
Index Sample records
{
"isbn" : "111-3333-444444"
}
{
"isbn" : "111-3333-2222"
}
Search query
{
"query": {
"query_string": {
"fields": [
"isbn",
"ean"
],
"query": "111-3333-444444"
}
}
}
And search response
"hits": [
{
"_index": "65780647",
"_type": "_doc",
"_id": "1",
"_score": 0.6931471,
"_source": {
"isbn": "111-3333-444444"
}
}
]
Elasticsearch does not analyze fields of type keyword. You need to set the type to text.

ElasticSearch - JavaApi searching by each character instead of term (word)

Am fetching documents from elastic search using java api, i have the following code in my elastic search documents and am trying to search it with the following pattern.
code : MS-VMA1615-0D
Input : MS-VMA1615-0D -- Am getting the results (MS-VMA1615-0D).
Input : VMA1615 -- Am getting the results (MS-VMA1615-0D) .
Input : VMA -- Am getting the results (MS-VMA1615-0D) .
But, if i give input like below, am not getting results.
Input : V -- Am not getting the results.
INPUT : MS -- Am not getting the results.
INPUT : -V -- Am not getting the results.
INPUT : 615 -- Am not getting the results.
Am expecting to return the code MS-VMA1615-0D. In simple, am trying to search character by character instead of term (word).
It should not return the code MS-VMA1615-0D for the following cases, Because its not matching with my code.
Input : VK -- should not return the results.
INPUT : MS3 -- should not return the results.
Please find my below java code that am using
private final String INDEX = "products";
private final String TYPE = "doc";
SearchRequest searchRequest = new SearchRequest(INDEX);
searchRequest.types(TYPE);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryStringQueryBuilder qsQueryBuilder = new QueryStringQueryBuilder(code);
qsQueryBuilder.defaultField("code");
searchSourceBuilder.query(qsQueryBuilder);
searchSourceBuilder.size(50);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = null;
try {
searchResponse = SearchEngineClient.getInstance().search(searchRequest);
} catch (IOException e) {
e.getLocalizedMessage();
}
Item item = null;
SearchHit[] searchHits = searchResponse.getHits().getHits();
Please find my mapping details :
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "my_pattern_tokenizer",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
},
"tokenizer": {
"my_pattern_tokenizer": {
"type": "pattern",
"pattern": "-|\\d"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
After Update with new Answer :
This is my request via Java API
'SearchRequest{searchType=QUERY_THEN_FETCH, indices=[products], indicesOptions=IndicesOptions[id=38, ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false], types=[doc], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=128, source={"size":50,"query":{"match_phrase":{"code":{"query":"1615","slop":0,"boost":1.0}}}}}
' . But am getting response as null
Follow up: ElasticSearch - JavaApi searching not happening without (*) in my input query
Your mapping should look like:
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "ngram",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
And you should be using a match_phrase query.
In Kibana:
GET products/_search
{
"query": {
"match_phrase": {
"code": "V"
}
}
}
will return the result:
"hits": [
{
"_index": "products",
"_type": "doc",
"_id": "EoGtdGQBqdof7JidJkM_",
"_score": 0.2876821,
"_source": {
"code": "MS-VMA1615-0D"
}
}
]
But this:
GET products/_search
{
"query": {
"match_phrase": {
"code": "VK"
}
}
}
wont:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Based on your comment:
Instead of using a Query string:
QueryStringQueryBuilder qsQueryBuilder = new QueryStringQueryBuilder(code);
qsQueryBuilder.defaultField("code");
searchSourceBuilder.query(qsQueryBuilder);
searchSourceBuilder.size(50);
searchRequest.source(searchSourceBuilder);
Use a match phrase query:
QueryBuilder query = QueryBuilders.matchPhraseQuery("code", code);
searchSourceBuilder.query(query);
searchSourceBuilder.size(50);
searchRequest.source(searchSourceBuilder);

elasticsearch normalizer is not working for the lowercase

I have created the below normalizer for the field code to query the elasticsearch based on both upper and lower case.
PUT my_index12
{
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
}
And am trying to search using wildcard query, am not getting any results, without normalizer am able to get with the uppercase, exactly how it is there in elasticsearch
get my_index12/_search
{
"query": {
"wildcard": {
"code.keyword": {
"value": "*AB-7000-5000-Wk-21*"
}
}
}
}
Please find my index below
{
"_index": "my_index12",
"_type": "doc",
"_id": "2",
"_score": 1,
"_source": {
"code": "ABCq123S"
}
},
{
"_index": "my_index12",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"code": "AB-7000-5000-Wk-21"
}
}
If i try to do the mapping for code.keyword
"mappings": {
"doc": {
"properties": {
"code.keyword": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
am getting the below error while inserting documents into the index
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "object mapping for [code] tried to parse field [code] as object, but found a concrete value"
}
],
"type": "mapper_parsing_exception",
"reason": "object mapping for [code] tried to parse field [code] as object, but found a concrete value"
},
"status": 400
}

ElasticSearch hits have no fields

I am having a problem where when I run a search on elastic using the java api I get back results... but when I try and extract values from the results there are no fields.
ElasticSearch v5.3.1
Elastic API: org.elasticsearch.client:transport v5.3.0
My code:
SearchRequestBuilder srb = client.prepareSearch("documents").setTypes("documents").setSearchType(SearchType.QUERY_THEN_FETCH).setQuery(qb).setFrom(0).setSize((10)).setExplain(false);
srb.addDocValueField("title.raw");
SearchResponse response = srb.get();
response.getHits().forEach(new Consumer<SearchHit>() {
#Override
public void accept(SearchHit hit) {
System.out.println(hit);
Map<String, SearchHitField> fields = hit.getFields();
Object obj = fields.get("title.raw").getValue();
}
});
When the forEach runs the obj is always coming back null. Fields has an item in it with the key of title.raw and it has a SearchHitField.
Fields is only used when you are trying to obtain stored fields. By default you should obtain fields using the source. With the following code samples I'll try to explain it.
PUT documents
{
"settings": {
"number_of_replicas": 0,
"number_of_shards": 1
},
"mappings": {
"document": {
"properties": {
"title": {
"type": "text",
"store": true
},
"description": {
"type": "text"
}
}
}
}
}
PUT documents/document/_bulk
{"index": {}}
{"title": "help me", "description": "description of help me"}
{"index": {}}
{"title": "help me two", "description": "another description of help me"}
GET documents/_search
{
"query": {
"match": {
"title": "help"
}
},
"stored_fields": ["title"],
"_source": {
"includes": ["description"]
}
}
Next the response, notice the difference between the stored field Title and the normal field description.
{
"_index": "documents",
"_type": "document",
"_id": "AVuciB12YLj8D0X3N5We",
"_score": 0.14638957,
"_source": {
"description": "another description of help me"
},
"fields": {
"title": [
"help me two"
]
}
}

How to build an Elasticsearch-Query with startsWith-functionality and special characters

I have JsonObjects that i search with Elasticsearch from a Java Application, using the Java API to build searchQueries. The objects contain a field called "such" that contains a searchString with which the JsonObject should be found, for example a simple searchString would be "STVBBM160A". Besides the usual characters a-Z 0-9 the searchString could also look like the following examples:
"STV-157ABR", "F-G/42-W3" or "DDM000.074.6652"
The search should return results already when only the first characters are put into a searchfield, which it does for a search like "F-G/42"
My Problem: The search sometimes doesn't return results at all, but when the last character is typed it finds the right document.
What i tried: First I wanted to use a WildcardQuery where the query would be "typedStuff*", but the WildcardQuery didn't return any results at all, as soon as I typed anything but * (It used to work for other searchFields with other values)
Now I am using a QueryStringQuery, which also takes the input and puts a * character to the end. By escaping the QueryString, I am able to search for Strings like "F-G/42" and so on, but the search for "DDM000.074.6652" doesn't return any results until elasticsearch has the whole String to search. Also, when i type "STV" all results with "STV-xxxxx" (containing the "-" after STV) are returned, but not the object with "STVBBM160A", again until the whole String is given for the search (without showing any results inbetween as soon as the searchString is "STVB")
This is the query I'm using right now:
{
"size": 1000,
"min_score": 1,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "MY_DATA_TYPE",
"fields": [
"doc.db_doc_type"
]
}
},
{
"query_string": {
"query": "MY_SPECIFIC_TYPE",
"fields": [
"doc.db_doc_specific"
]
}
}
],
"should": {
"query_string": {
"query": "STV*",
"fields": [
"doc.such"
],
"boost": 3,
"escape": true
}
}
}
}
}
This is the old Query with the WildCardQuery, which doesn't return any results at all unless there is no queryString but *:
{
"size": 50,
"min_score": 1,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "MY_DATA_TYPE",
"fields": [
"doc.db_doc_type"
]
}
},
{
"query_string": {
"query": "MY_SPECIFIC_TYPE",
"fields": [
"doc.db_doc_specific"
]
}
}
],
"should": {
"wildcard": {
"doc.such": {
"wildcard": "STV*",
"boost": 3
}
}
}
}
}
}
When using a PrefixQuery, the search also doesn't return any results at all (with and without the *):
{
"size": 50,
"min_score": 1,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "MY_DATA_TYPE",
"fields": [
"doc.db_doc_type"
]
}
},
{
"query_string": {
"query": "MY_SPECIFIC_TYPE",
"fields": [
"doc.db_doc_specific"
]
}
}
],
"should": {
"prefix": {
"doc.such": {
"prefix": "HSTKV*",
"boost": 3
}
}
}
}
}
}
How can this query be changed to achieve the goal of getting all results starting with the specified String, no matter if the field doc.such also contains Numbers or special chars like "_" or "." or "/" ?
Thanks in advance
As soon as you want to query prefixes, suffixes or substring in a serious way, you need to leverage nGrams. In your case, since you're only after prefixes, an edgeNGram tokenizer would be in order. You need to change the settings of your index to be like this one:
PUT your_index
{
"settings": {
"analysis": {
"analyzer": {
"prefix_analyzer": {
"tokenizer": "prefix_tokenizer",
"filter": [
"lowercase"
]
},
"search_prefix_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"prefix_tokenizer": {
"type": "edgeNGram",
"min_gram": "1",
"max_gram": "25"
}
}
}
},
"mappings": {
"your_type": {
"properties": {
"doc": {
"properties": {
"such": {
"type": "string",
"fields": {
"starts_with": {
"type": "string",
"analyzer": "prefix_analyzer",
"search_analyzer": "search_prefix_analyzer"
}
}
}
}
}
}
}
}
}
What will happen with this analyzer is that when indexing F-G/42-W3 the following tokens will be indexed: f, f-, f-g, f-g/, f-g/4, f-g/42, f-g/42-, f-g/42-w, f-g/42-w3.
At search time, we'll simply lowercase the user input and the prefix will be matched against the indexed tokens.
Then your query can simply be transformed to a match query:
{
"size": 1000,
"min_score": 1,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "MY_DATA_TYPE",
"fields": [
"doc.db_doc_type"
]
}
},
{
"query_string": {
"query": "MY_SPECIFIC_TYPE",
"fields": [
"doc.db_doc_specific"
]
}
}
],
"should": {
"match": {
"doc.such": {
"query": "F-G/4"
}
}
}
}
}
}

Categories

Resources