elasticsearch normalizer is not working for the lowercase

elasticsearch normalizer is not working for the lowercase - java

I have created the below normalizer for the field code to query the elasticsearch based on both upper and lower case.
PUT my_index12
{
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
}
And am trying to search using wildcard query, am not getting any results, without normalizer am able to get with the uppercase, exactly how it is there in elasticsearch
get my_index12/_search
{
"query": {
"wildcard": {
"code.keyword": {
"value": "*AB-7000-5000-Wk-21*"
}
}
}
}
Please find my index below
{
"_index": "my_index12",
"_type": "doc",
"_id": "2",
"_score": 1,
"_source": {
"code": "ABCq123S"
}
},
{
"_index": "my_index12",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"code": "AB-7000-5000-Wk-21"
}
}
If i try to do the mapping for code.keyword
"mappings": {
"doc": {
"properties": {
"code.keyword": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
am getting the below error while inserting documents into the index
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "object mapping for [code] tried to parse field [code] as object, but found a concrete value"
}
],
"type": "mapper_parsing_exception",
"reason": "object mapping for [code] tried to parse field [code] as object, but found a concrete value"
},
"status": 400
}

Related

Remove hyphens while search time in ElasticSearch

I want to create a search for books with ElasticSearch and SpringData.
I index my books with ISBN/EAN without hyphens and save it in my database. This data I index with ElasticSearch.
Indexed data: 1113333444444
If I'm search for a ISBN/EAN with hyphen: 111-3333-444444
There is no result. If I'm searching without hyphen, my book will be found as expected.
My settings are like this:
{
"analysis": {
"filter": {
"clean_special": {
"type": "pattern_replace",
"pattern": "[^a-zA-Z0-9]",
"replacement": ""
}
},
"analyzer": {
"isbn_search_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"clean_special"
]
}
}
}
}
I index my fields like this:
#Field(type = FieldType.Keyword, searchAnalyzer = "isbn_search_analyzer")
private String isbn;
#Field(type = FieldType.Keyword, searchAnalyzer = "isbn_search_analyzer")
private String ean;
If I test my analyzer:
GET indexname/_analyze
{
"analyzer" : "isbn_search_analyzer",
"text" : "111-3333-444444"
}
I get following result:
{
"tokens" : [
{
"token" : "1113333444444",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 0
}
]
}
If I'm search like this:
GET indexname/_search
{
"query": {
"query_string": {
"fields": [ "isbn", "ean" ],
"query": "111-3333-444444"
}
}
}
I don't get any result. Have someone of you an idea?

As mentioned by #P.J.Meisch, you have done everything correct, but missed defining your field data type to text, when you define them as keyword, even though you are explicitly telling ElasticSearch to use your custom-analyzer isbn_search_analyzer, it will be ignored.
Working example on your sample data when field is defined as text.
Index mapping
{
"settings": {
"analysis": {
"filter": {
"clean_special": {
"type": "pattern_replace",
"pattern": "[^a-zA-Z0-9]",
"replacement": ""
}
},
"analyzer": {
"isbn_search_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"clean_special"
]
}
}
}
},
"mappings": {
"properties": {
"isbn": {
"type": "text",
"analyzer": "isbn_search_analyzer"
},
"ean": {
"type": "text",
"analyzer": "isbn_search_analyzer"
}
}
}
}
Index Sample records
{
"isbn" : "111-3333-444444"
}
{
"isbn" : "111-3333-2222"
}
Search query
{
"query": {
"query_string": {
"fields": [
"isbn",
"ean"
],
"query": "111-3333-444444"
}
}
}
And search response
"hits": [
{
"_index": "65780647",
"_type": "_doc",
"_id": "1",
"_score": 0.6931471,
"_source": {
"isbn": "111-3333-444444"
}
}
]

Elasticsearch does not analyze fields of type keyword. You need to set the type to text.

Spring Data Elasticsearch with ES 7.2.1 | GeoPoint mapping failure while indexing

I am using ES 7.2.1 to store large amount of location based data and querying for near-by locations.
For location coordinates, I am using GeoPoint fields from my java codebase.
ES: 7.2.1
Spring Data Elasticsearch: 4.0.0.DATAES-690-SNAPSHOT
MVN org.elasticsearch: 7.2.1
Template:
curl -X PUT "localhost:9200/_template/store_locator_template?pretty"
{
"order": 1,
"index_patterns": [
"store_locator_*"
],
"settings": {
},
"mappings": {
"properties": {
"esId": {
"type": "keyword"
},
"geoPoint": {
"type": "geo_point"
},
"storeName": {
"type": "keyword"
}
}
}
}
When trying to insert data via bulkIndex(), I am getting this error:
org.springframework.data.elasticsearch.ElasticsearchException:
Bulk indexing has failures. Use ElasticsearchException.getFailedDocuments()
for detailed messages [{QObQeXEBqxAg6uMFyeNZ=ElasticsearchException
[Elasticsearch exception
[type=illegal_argument_exception, reason=
mapper [geoPoint] of different type,
current_type [geo_point], merged_type [ObjectMapper]]]}]
Entity:
#Getter
#Setter
#ToString
#EqualsAndHashCode(of = "esId", callSuper = false)
#NoArgsConstructor
#Document(indexName = "store_locator_index", replicas = 0, createIndex = false)
public class EsEntity {
#Id
#Field(type = FieldType.Text)
private String esId;
#GeoPointField
private GeoPoint geoPoint;
#Field(type = FieldType.Text)
private String storeName;
}
UPDATE:
If I use the below code, it works fine. it puts the mapping as required and spring data es does no complain!
//clazz -> entity class with #Document annotation
boolean indexCreated = false;
if (!elasticsearchOperations.indexExists(clazz)) {
indexCreated = elasticsearchOperations.createIndex(clazz);
}
if (indexCreated) {
elasticsearchOperations.refresh(clazz);
elasticsearchOperations.putMapping(clazz); --> Does the MAGIC
}
... And the mapping generated from the above code is:
{
"esentity":{ ---> Why is this here??
"properties":{
"esId":{
"type":"keyword",
"index":true
},
"geoPoint":{
"type":"geo_point"
}
}
}
}
It is adding a type, by the name of my entity class, to the mapping!
====================
Also.....
Everything seems to be working for:
ES: 6.4.3
Spring Data Elasticsearch: 3.1.X
I am able to (via template) insert document with GeoPoint field.
The index is generated automatically when doc is inserted via code.
The same set of code works fine with no error!!!!
Here's my template:
curl -X PUT "localhost:9200/_template/store_locator_template?pretty"
{
"order": 1,
"index_patterns": [
"store_locator_*"
],
"settings": {
},
"mappings": {
"store_locator_index": {
"properties": {
"esId": {
"type": "keyword"
},
"geoPoint": {
"type": "geo_point"
},
"storeName": {
"type": "keyword"
}
}
}
}
}
Here's the mapping:
{
"mapping": {
"properties": {
"esId": {
"type": "keyword"
},
"geoPoint": {
"type": "geo_point"
}
}
}
}

There are some things that don't match in the code you show:
In the first template you show, you define the storeName to be of type keyword, but on the entity you have it as type text.
A field annotated with #Id is always a type keyword, the #Field annotation defining it as type text is ignored.
I used the following versions: ES 7.3.0 (don't have 7.2.1 on my machine), Spring Data 4.0 current master, client libs set to 7.3.0.
When I don't have the template defined, but create the index with the code you showed:
boolean indexCreated = false;
Class<EsEntity> clazz = EsEntity.class;
if (!elasticsearchOperations.indexExists(clazz)) {
indexCreated = elasticsearchOperations.createIndex(clazz);
}
if (indexCreated) {
elasticsearchOperations.refresh(clazz);
elasticsearchOperations.putMapping(clazz);
}
I get the following index:
{
"store_locator_index": {
"aliases": {},
"mappings": {
"properties": {
"esId": {
"type": "keyword"
},
"geoPoint": {
"type": "geo_point"
},
"storeName": {
"type": "text"
}
}
},
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "1",
"provided_name": "store_locator_index",
"creation_date": "1587073075464",
"store": {
"type": "fs"
},
"number_of_replicas": "0",
"uuid": "72aZqWDtS7KLDMwdkgVtag",
"version": {
"created": "7030099"
}
}
}
}
}
The mapping looks like it should, there is no type info in the mapping (this was written by the Spring Data Elasticsearch 3.2 version when using ES 6, but i not used anymore)
When I add the template you showed and then do a bulk insert with the following code:
EsEntity es1 = new EsEntity();
es1.setEsId("1");
es1.setGeoPoint(new GeoPoint(12, 34));
es1.setStoreName("s1");
IndexQuery query1 = new IndexQueryBuilder().withId("1").withObject(es1).build();
EsEntity es2 = new EsEntity();
es2.setEsId("2");
es2.setGeoPoint(new GeoPoint(56, 78));
es2.setStoreName("s2");
IndexQuery query2 = new IndexQueryBuilder().withId("2").withObject(es2).build();
elasticsearchOperations.bulkIndex(Arrays.asList(query1, query2), IndexCoordinates.of("store_locator_index"));
then the following index is created (note that store_name is type keywordnow, coming from the template):
{
"store_locator_index": {
"aliases": {},
"mappings": {
"properties": {
"_class": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"esId": {
"type": "keyword"
},
"geoPoint": {
"type": "geo_point"
},
"storeName": {
"type": "keyword"
}
}
},
"settings": {
"index": {
"creation_date": "1587073540386",
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "LqzXMC5uRmKmImIzblFBOQ",
"version": {
"created": "7030099"
},
"provided_name": "store_locator_index"
}
}
}
}
and the two documents are inserted as they should:
{
"took": 22,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "store_locator_index",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"_class": "com.sothawo.springdataelastictest.EsEntity",
"esId": "1",
"geoPoint": {
"lat": 12.0,
"lon": 34.0
},
"storeName": "s1"
}
},
{
"_index": "store_locator_index",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"_class": "com.sothawo.springdataelastictest.EsEntity",
"esId": "2",
"geoPoint": {
"lat": 56.0,
"lon": 78.0
},
"storeName": "s2"
}
}
]
}
}
So I cannot find an error in the code, but you should check the templates and existing indices if there are conflicting entries.

Elasticsearch Query String Query not working with synonym analyzer

I am trying to configure elastic search with synonyms.
These are my settings:
"analysis": {
"analyzer": {
"category_synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym_filter"
]
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
}
}
}
Mappings config:
"category": {
"properties": {
"name": {
"type":"string",
"search_analyzer" : "category_synonym",
"index_analyzer" : "standard",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
And the list of my synonyms
film => video,
ooh => panels , poster,
commercial => advertisement,
print => magazine
I must say that I am using Elasticsearch Java API.
I am using QueryBuilders.queryStringQuery because this is the only way how I set analyzers to my request.
So, when I am making:
QueryBuilders.queryStringQuery("name:film").analyzer(analyzer)
It returns me
[
{
"id": 71,
"name": "Pitch video",
"description": "... ",
"parent": null
},
{
"id": 25,
"name": "Video",
"description": "... ",
"parent": null
}
]
That is perfect for me, but when I am calling something like this
QueryBuilders.queryStringQuery("name:vid").analyzer(analyzer)
I expect that it should return same objects, but there is nothing: []
So, I added asterisk to queryStringQuery:
QueryBuilders.queryStringQuery("name:vid*").analyzer(analyzer)
Works well, but now
QueryBuilders.queryStringQuery("name:film*").analyzer(analyzer)
returns me []
So, how can I configure my elastic search that it will return same objects when I am searching video, vid, film and fil?
Thanks in advance!

Hm, I don't think Elasticsearch will know to "translate" fil into vid :-). So, I think you need edgeNGrams for this, both at indexing and search time.
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"category_synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym_filter",
"my_edgeNGram_filter"
]
},
"standard_edgeNGram": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonym_filter",
"my_edgeNGram_filter"
]
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
},
"my_edgeNGram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 8
}
}
}
},
"mappings": {
"test": {
"properties": {
"name": {
"type": "string",
"analyzer": "category_synonym",
"index_analyzer": "standard_edgeNGram",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
POST test/test/1
{"name": "Pitch video"}
POST test/test/2
{"name": "Video"}
GET /test/test/_search
{
"query": {
"query_string": {
"query": "name:fil"
}
}
}

Elasticsearch : How to retrieve only the desired nested objects

I have the below mapping structure for my Elasticsearch index.
{
"users": {
"mappings": {
"user-type": {
"properties": {
"lastModifiedBy": {
"type": "string"
},
"lastModifiedDate": {
"type": "date",
"format": "dateOptionalTime"
},
"details": {
"type": "nested",
"properties": {
"lastModifiedBy": {
"type": "string"
},
"lastModifiedDate": {
"type": "date",
"format": "dateOptionalTime"
},
"views": {
"type": "nested",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"properties": {
"properties": {
"name": {
"type": "string"
},
"type": {
"type": "string"
},
"value": {
"type": "string"
}
}
}
}
}
}
}
}
}
}
}
}
Basically I want to retrieve ONLY the view object inside details based on index id & view id(details.views.id).
I have tried with the below java code.But seems to be not working.
SearchRequestBuilder srq = this.client.prepareSearch(this.indexName)
.setTypes(this.type)
.setQuery(QueryBuilders.termQuery("_id", sid))
.setPostFilter(FilterBuilders.nestedFilter("details.views",
FilterBuilders.termFilter("details.views.id", id)));
Below is the query structure for this java code.
{
"query": {
"term": {
"_id": "123"
}
},
"post_filter": {
"nested": {
"filter": {
"term": {
"details.views.id": "def"
}
},
"path": "details.views"
}
}
}

Since details is nested and view is nested inside details, you basically need two nested filters as well (one for each level) + the constraint on the _id field is best done with the ids query. The query DSL would look like this:
{
"query": {
"ids": {
"values": [
"123"
]
}
},
"post_filter": {
"nested": {
"filter": {
"nested": {
"path": "details.view",
"filter": {
"term": {
"details.views.id": "def"
}
}
}
},
"path": "details"
}
}
}
Translating this into Java code yields:
// 2nd-level nested filter
FilterBuilder detailsView = FilterBuilders.nestedFilter("details.views",
FilterBuilders.termFilter("details.views.id", id));
// 1st-level nested filter
FilterBuilder details = FilterBuilders.nestedFilter("details", detailsView);
// ids constraint
IdsQueryBuilder ids = QueryBuilders.idsQuery(this.type).addIds("123");
SearchRequestBuilder srq = this.client.prepareSearch(this.indexName)
.setTypes(this.type)
.setQuery(ids)
.setPostFilter(details);
PS: I second what #Paul said, i.e. always play around with the query DSL first and when you know you have zeroed in on the exact query you need, then you can translate it to the Java form.

Elasticsearch Java API addMapping() and setSettings() usage

Problem: How to create an index from a json file using
The json file contains a definition for the index de_brochures. It also defines an analyzer de_analyzerwith custom filters that are used by the respective index.
As the json works with curl and Sense I assume I have to adapt the syntax of it to work with the java API.
I don't want to use XContentFactory.jsonBuilder() as the json comes from a file!
I have the following json file to create my mapping from and to set settings:
Using Sense with PUT /indexname it does create an index from this.
{
"mappings": {
"de_brochures": {
"properties": {
"text": {
"type": "string",
"store": true,
"index_analyzer": "de_analyzer"
},
"classification": {
"type": "string",
"index": "not_analyzed"
},
"language": {
"type": "string",
"index": "not_analyzed"
}
}
}
"settings": {
"analysis": {
"filter": {
"de_stopwords": {
"type": "stop",
"stopwords": "_german_"
},
"de_stemmer": {
"type": "stemmer",
"name": "light_german"
}
},
"analyzer": {
"de_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"de_stopwords",
"de_stemmer"
]
}
}
}
}
}
As the above did not work with addMapping() alone I tried to split it into two seperate files (I realized that I had to remove the "mappings": and "settings": part):
------ Mapping json ------
{
"de_brochures": {
"properties": {
"text": {
"type": "string",
"store": true,
"index_analyzer": "de_analyzer"
},
"classification": {
"type": "string",
"index": "not_analyzed"
},
"language": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
------- Settings json --------
{
"analysis": {
"filter": {
"de_stopwords": {
"type": "stop",
"stopwords": "_german_"
},
"de_stemmer": {
"type": "stemmer",
"name": "light_german"
}
},
"analyzer": {
"de_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"de_stopwords",
"de_stemmer"
]
}
}
}
}
This is my java code to load and add/set the json.
CreateIndexRequestBuilder createIndexRequestBuilder = client.admin().indices().prepareCreate(index);
// CREATE SETTINGS
String settings_json = new String(Files.readAllBytes(brochures_mapping_path));
createIndexRequestBuilder.setSettings(settings_json);
// CREATE MAPPING
String mapping_json = new String(Files.readAllBytes(brochures_mapping_path));
createIndexRequestBuilder.addMapping("de_brochures", mapping_json);
CreateIndexResponse indexResponse = createIndexRequestBuilder.execute().actionGet();
There is no more complaint about the mapping file's structure but it now fails with the error:
Caused by: org.elasticsearch.index.mapper.MapperParsingException: Analyzer [de_analyzer] not found for field [text]

Solution:
I managed to do it with my original json file using createIndexRequestBuilder.setSource(settings_json);

I think the problem is with structure of your mapping file.
Here is a sample example.
mapping.json
{
"en_brochures": {
"properties": {
"text": {
"type": "string",
"store": true,
"index_analyzer": "en_analyzer",
"term_vector": "yes"
},
"classification": {
"type": "string",
"index": "not_analyzed"
},
"language": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
String mapping = new String(Files.readAllBytes(Paths.get("mapping.json")));
createIndexRequestBuilder.addMapping('en_brochures', mapping);
CreateIndexResponse indexResponse =createIndexRequestBuilder.execute().actionGet();
This works in mine, you can try.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

elasticsearch normalizer is not working for the lowercase - java

Related

Remove hyphens while search time in ElasticSearch

Spring Data Elasticsearch with ES 7.2.1 | GeoPoint mapping failure while indexing

Elasticsearch Query String Query not working with synonym analyzer

Elasticsearch : How to retrieve only the desired nested objects

Elasticsearch Java API addMapping() and setSettings() usage

Categories

Resources