Elasticsearch Query String Query not working with synonym analyzer

Elasticsearch Query String Query not working with synonym analyzer - java

I am trying to configure elastic search with synonyms.
These are my settings:
"analysis": {
"analyzer": {
"category_synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym_filter"
]
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
}
}
}
Mappings config:
"category": {
"properties": {
"name": {
"type":"string",
"search_analyzer" : "category_synonym",
"index_analyzer" : "standard",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
And the list of my synonyms
film => video,
ooh => panels , poster,
commercial => advertisement,
print => magazine
I must say that I am using Elasticsearch Java API.
I am using QueryBuilders.queryStringQuery because this is the only way how I set analyzers to my request.
So, when I am making:
QueryBuilders.queryStringQuery("name:film").analyzer(analyzer)
It returns me
[
{
"id": 71,
"name": "Pitch video",
"description": "... ",
"parent": null
},
{
"id": 25,
"name": "Video",
"description": "... ",
"parent": null
}
]
That is perfect for me, but when I am calling something like this
QueryBuilders.queryStringQuery("name:vid").analyzer(analyzer)
I expect that it should return same objects, but there is nothing: []
So, I added asterisk to queryStringQuery:
QueryBuilders.queryStringQuery("name:vid*").analyzer(analyzer)
Works well, but now
QueryBuilders.queryStringQuery("name:film*").analyzer(analyzer)
returns me []
So, how can I configure my elastic search that it will return same objects when I am searching video, vid, film and fil?
Thanks in advance!

Hm, I don't think Elasticsearch will know to "translate" fil into vid :-). So, I think you need edgeNGrams for this, both at indexing and search time.
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"category_synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym_filter",
"my_edgeNGram_filter"
]
},
"standard_edgeNGram": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonym_filter",
"my_edgeNGram_filter"
]
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
},
"my_edgeNGram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 8
}
}
}
},
"mappings": {
"test": {
"properties": {
"name": {
"type": "string",
"analyzer": "category_synonym",
"index_analyzer": "standard_edgeNGram",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
POST test/test/1
{"name": "Pitch video"}
POST test/test/2
{"name": "Video"}
GET /test/test/_search
{
"query": {
"query_string": {
"query": "name:fil"
}
}
}

Related

Spring Data for ElasticSearch save method increments count (incorrect) even for update

I'm having an issue in which the count keeps incrementing by 1 whenever I do an update to a document using save and the count's supposed to remain the same. If I create a document with save then the count is incremented by 2. Am I setting something wrong?
This is my settings for the ElasticSearch index:
{
"aliases": {
"case": {}
},
"mappings": {
"_doc": {
"dynamic": false,
"properties": {
"created": {
"index": true,
"type": "date"
},
"modified": {
"index": true,
"type": "date"
},
"type": {
"index": true,
"type": "keyword", "normalizer": "lower_case_normalizer"
},
"states": {
"type": "nested",
"properties": {
"from": {
"index": true,
"type": "keyword", "normalizer": "lower_case_normalizer"
},
"to": {
"index": true,
"type": "keyword", "normalizer": "lower_case_normalizer"
},
"event": {
"index": true,
"type": "keyword", "normalizer": "lower_case_normalizer"
}
}
}
}
}
},
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
},
"analysis": {
"normalizer": {
"lower_case_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
}
}
This is how I insert a document into ES:
public Case createCase(final Case case) throws UnableToGenerateUUIDException {
final UUID caseId = uuidService.getNowTimeUUID();
final Instant now = Instant.now();
case.setCreated(now);
case.setModified(now);
case.setId(caseId.toString());
return caseRepository.save(case);
}

This is the bug from the Chrome extension I used, not from Spring Data. I apologize for the mistake. I have verified that the count() from Spring Data reflects the correct number.

Why does ElasticSearch is not showing the score?

I am using ElasticSearch 2.3.1 on Ubuntu 16.04.
The mapping is:
{
"settings": {
"analysis": {
"filter": {
"2gramsto3_filter": {
"type": "ngram",
"min_gram": 2,
"max_gram": 3
}
},
"analyzer": {
"2gramsto3": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"2gramsto3_filter"
]
}
}
}
},
"mappings": {
"agents": {
"properties": {
"presentation": {
"type": "string",
"analyzer": "2gramsto3"
},
"cv": {
"type": "string",
"analyzer": "2gramsto3"
}
}
}
}
The query is:
{
"size": 20,
"from": 0,
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
[
{
"match": {
"cv": "folletto"
}
},
{
"match": {
"cv": " psicologia"
}
},
{
"match": {
"cv": " tenacia"
}
}
]
]
}
}
]
}
}
}
It found 14567 documents but the score is always "_score": 0
I read the filters have the score, so, why not in this case?
Thank you!

The score is not calculated for filters. You need to use a normal query if you need scores.
Just take into account implications pointed out at the documentation below.
Ref doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html

How to implement an autocomplete search field (suggestor) with an existing ElasticSearch index?

The ES index consists of 2 types that are implicitly mapped (default mapping). One type is "person" or an author, the 2nd type is "document".
The index has some 500k entries.
What I have to do is: implement an autocomplete (suggestions) functionality where only the fields "title", "classification" (document) and "name" (author) are relevant for the suggestions shown to the user.
Could it be done without changing the 500k docs in the index?
I found some tutorials that suggest preparing a specific mapping and also altering the documents (this I want to avoid if possible) and so on but I am new to this and I am not sure how to go about the this problem?
Below is the JSON for the index, and how the documents look:
//a Document
{
"rawsource": "Phys.Rev. D67 (2003) 084031",
"pubyear": 2003,
"citedFrom": 19,
"topics": [
{
"name": "General Relativity and Quantum Cosmology"
}
],
"cited": [
{
"ref": 0,
"id": "PN132433"
},
{
"ref": 1,
"id": "PN206900"
}
],
"id": "PN120001",
"collection": "PN",
"source": "Phys Rev D",
"classification": "Physics",
"title": "Observables in causal set cosmology",
"url": "http://arxiv.org/abs/gr-qc/0210061",
"authors": [
{
"name": "Brightwell, Graham"
},
{
"name": "Dowker, H. Fay"
},
{
"name": "Garcia, Raquel S."
},
{
"name": "Henson, Joe"
},
{
"name": "Sorkin, Rafael D."
}
]
}
//a Person (author)
{
"name": "Terasawa, M.",
"documents": [
{
"citedFrom": 0,
"id": "PN039187"
}
],
"coAuthors": [
{
"name": "Famiano, M. A.",
"count": "1"
},
{
"name": "Boyd, R. N.",
"count": "1"
}
],
"topics": [
{
"name": "Astrophysics",
"count": "1"
}
]
}
//the mapping (implicit/default)
{
"dlsnew": {
"aliases": {
},
"mappings": {
"person": {
"properties": {
"coAuthors": {
"properties": {
"count": {
"type": "string"
},
"name": {
"type": "string"
}
}
},
"documents": {
"properties": {
"citedFrom": {
"type": "long"
},
"id": {
"type": "string"
}
}
},
"name": {
"type": "string"
},
"referenced": {
"properties": {
"count": {
"type": "string"
},
"id": {
"type": "string"
}
}
},
"topics": {
"properties": {
"count": {
"type": "string"
},
"name": {
"type": "string"
}
}
}
}
},
"document": {
"properties": {
"abstract": {
"type": "string"
},
"authors": {
"properties": {
"name": {
"type": "string"
}
}
},
"cited": {
"properties": {
"id": {
"type": "string"
},
"ref": {
"type": "long"
}
}
},
"citedFrom": {
"type": "long"
},
"classification": {
"type": "string"
},
"collection": {
"type": "string"
},
"id": {
"type": "string"
},
"pubyear": {
"type": "long"
},
"rawsource": {
"type": "string"
},
"source": {
"type": "string"
},
"title": {
"type": "string"
},
"topics": {
"properties": {
"name": {
"type": "string"
}
}
},
"url": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1454247029258",
"number_of_shards": "5",
"uuid": "k_CyQaxwSAaae67wW98HyQ",
"version": {
"created": "1050299"
},
"number_of_replicas": "1"
}
},
"warmers": {
}
}
}
The implementation is to be done using JAVA and the Vaadin Framework (this is not relevant at this point, but examples in Java/Vaadin will be most welcomed).
Thanks.

So, I think I solved my problem on the Elasticsearch side or at least to a good enough extend for me and the task at hand. I followed this ruby example.
I had to re-index all documents to accommodate the new settings for my index and to change my mapping explicitly.
They key is in defining proper analyzers and an edgeNGram filter in this case, like so:
"settings": {
"index": {
"analysis": {
"filter": {
"def_ngram_filter": {
"min_gram": "1",
"side": "front",
"type": "edgeNGram",
"max_gram": "16"
}
},
"analyzer": {
"def_search_analyzer": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "def_tokenizer"
},
"def_ngram_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"def_ngram_filter"
],
"type": "custom",
"tokenizer": "def_tokenizer"
},
"def_shingle_analyzer": {
"filter": [
"shingle",
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "def_tokenizer"
},
"def_default_analyzer": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "def_tokenizer"
}
},
"tokenizer": {
"def_tokenizer": {
"type": "whitespace"
}
}
}
}
}
and the use these in the mapping for the fields to be searched, like so:
"mappings": {
"person": {
"properties": {
"coAuthors": {
"properties": {
"count": {
"type": "string"
},
"name": {
"type": "string"
}
}
},
"documents": {
"properties": {
"citedFrom": {
"type": "long"
},
"id": {
"type": "string"
}
}
},
"name": {
"type": "string",
"analyzer": "def_default_analyzer",
"fields": {
"ngrams": {
"type": "string",
"index_analyzer": "def_ngram_analyzer",
"search_analyzer": "def_search_analyzer"
},
"shingles": {
"type": "string",
"analyzer": "def_shingle_analyzer"
},
"stemmed": {
"type": "string",
"analyzer": "def_snowball_analyzer"
}
}
},
"referenced": {
"properties": {
"count": {
"type": "string"
},
"id": {
"type": "string"
}
}
},
"topics": {
"properties": {
"count": {
"type": "string"
},
"name": {
"type": "string"
}
}
}
}
},
"document": {
"properties": {
"abstract": {
"type": "string"
},
"authors": {
"properties": {
"name": {
"type": "string",
"analyzer": "def_default_analyzer",
"fields": {
"ngrams": {
"type": "string",
"index_analyzer": "def_ngram_analyzer",
"search_analyzer": "def_search_analyzer"
},
"shingles": {
"type": "string",
"analyzer": "def_shingle_analyzer"
},
"stemmed": {
"type": "string",
"analyzer": "def_snowball_analyzer"
}
}
}
}
},
"cited": {
"properties": {
"id": {
"type": "string"
},
"ref": {
"type": "long"
}
}
},
"citedFrom": {
"type": "long"
},
"classification": {
"type": "string"
},
"collection": {
"type": "string"
},
"id": {
"type": "string"
},
"pubyear": {
"type": "long"
},
"rawsource": {
"type": "string"
},
"source": {
"type": "string"
},
"title": {
"type": "string",
"analyzer": "def_default_analyzer",
"fields": {
"ngrams": {
"type": "string",
"index_analyzer": "def_ngram_analyzer",
"search_analyzer": "def_search_analyzer"
},
"shingles": {
"type": "string",
"analyzer": "def_shingle_analyzer"
},
"stemmed": {
"type": "string",
"analyzer": "def_snowball_analyzer"
}
}
},
"topics": {
"properties": {
"name": {
"type": "string",
"analyzer": "def_default_analyzer",
"fields": {
"ngrams": {
"type": "string",
"index_analyzer": "def_ngram_analyzer",
"search_analyzer": "def_search_analyzer"
},
"shingles": {
"type": "string",
"analyzer": "def_shingle_analyzer"
},
"stemmed": {
"type": "string",
"analyzer": "def_snowball_analyzer"
}
}
}
}
},
"url": {
"type": "string"
}
}
}
}
then querying the index with the following works as expected:
curl -XGET "http://localhost:9200/_search " -d'
{
"size": 5,
"query": {
"multi_match": {
"query": "physics",
"type": "most_fields",
"fields": [
"document.title^10",
"document.title.shingles^2",
"document.title.ngrams",
"person.name^10",
"person.name.shingles^2",
"person.name.ngrams",
"document.topics.name^10",
"document.topics.name.shingles^2",
"document.topics.name.ngrams"
],
"operator": "and"
}
}
}'
Hope this will help someone, it is probably not the best example as I am a complete noob to this, but it worked for me.

There exist different Autocomplete components for Vaadin.
Have a look at this link.
Depending on which Add-On you choose, the databinding is done differently, but you have to "connect" it to your index.

Elasticsearch : How to retrieve only the desired nested objects

I have the below mapping structure for my Elasticsearch index.
{
"users": {
"mappings": {
"user-type": {
"properties": {
"lastModifiedBy": {
"type": "string"
},
"lastModifiedDate": {
"type": "date",
"format": "dateOptionalTime"
},
"details": {
"type": "nested",
"properties": {
"lastModifiedBy": {
"type": "string"
},
"lastModifiedDate": {
"type": "date",
"format": "dateOptionalTime"
},
"views": {
"type": "nested",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"properties": {
"properties": {
"name": {
"type": "string"
},
"type": {
"type": "string"
},
"value": {
"type": "string"
}
}
}
}
}
}
}
}
}
}
}
}
Basically I want to retrieve ONLY the view object inside details based on index id & view id(details.views.id).
I have tried with the below java code.But seems to be not working.
SearchRequestBuilder srq = this.client.prepareSearch(this.indexName)
.setTypes(this.type)
.setQuery(QueryBuilders.termQuery("_id", sid))
.setPostFilter(FilterBuilders.nestedFilter("details.views",
FilterBuilders.termFilter("details.views.id", id)));
Below is the query structure for this java code.
{
"query": {
"term": {
"_id": "123"
}
},
"post_filter": {
"nested": {
"filter": {
"term": {
"details.views.id": "def"
}
},
"path": "details.views"
}
}
}

Since details is nested and view is nested inside details, you basically need two nested filters as well (one for each level) + the constraint on the _id field is best done with the ids query. The query DSL would look like this:
{
"query": {
"ids": {
"values": [
"123"
]
}
},
"post_filter": {
"nested": {
"filter": {
"nested": {
"path": "details.view",
"filter": {
"term": {
"details.views.id": "def"
}
}
}
},
"path": "details"
}
}
}
Translating this into Java code yields:
// 2nd-level nested filter
FilterBuilder detailsView = FilterBuilders.nestedFilter("details.views",
FilterBuilders.termFilter("details.views.id", id));
// 1st-level nested filter
FilterBuilder details = FilterBuilders.nestedFilter("details", detailsView);
// ids constraint
IdsQueryBuilder ids = QueryBuilders.idsQuery(this.type).addIds("123");
SearchRequestBuilder srq = this.client.prepareSearch(this.indexName)
.setTypes(this.type)
.setQuery(ids)
.setPostFilter(details);
PS: I second what #Paul said, i.e. always play around with the query DSL first and when you know you have zeroed in on the exact query you need, then you can translate it to the Java form.

Elasticsearch Java API addMapping() and setSettings() usage

Problem: How to create an index from a json file using
The json file contains a definition for the index de_brochures. It also defines an analyzer de_analyzerwith custom filters that are used by the respective index.
As the json works with curl and Sense I assume I have to adapt the syntax of it to work with the java API.
I don't want to use XContentFactory.jsonBuilder() as the json comes from a file!
I have the following json file to create my mapping from and to set settings:
Using Sense with PUT /indexname it does create an index from this.
{
"mappings": {
"de_brochures": {
"properties": {
"text": {
"type": "string",
"store": true,
"index_analyzer": "de_analyzer"
},
"classification": {
"type": "string",
"index": "not_analyzed"
},
"language": {
"type": "string",
"index": "not_analyzed"
}
}
}
"settings": {
"analysis": {
"filter": {
"de_stopwords": {
"type": "stop",
"stopwords": "_german_"
},
"de_stemmer": {
"type": "stemmer",
"name": "light_german"
}
},
"analyzer": {
"de_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"de_stopwords",
"de_stemmer"
]
}
}
}
}
}
As the above did not work with addMapping() alone I tried to split it into two seperate files (I realized that I had to remove the "mappings": and "settings": part):
------ Mapping json ------
{
"de_brochures": {
"properties": {
"text": {
"type": "string",
"store": true,
"index_analyzer": "de_analyzer"
},
"classification": {
"type": "string",
"index": "not_analyzed"
},
"language": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
------- Settings json --------
{
"analysis": {
"filter": {
"de_stopwords": {
"type": "stop",
"stopwords": "_german_"
},
"de_stemmer": {
"type": "stemmer",
"name": "light_german"
}
},
"analyzer": {
"de_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"de_stopwords",
"de_stemmer"
]
}
}
}
}
This is my java code to load and add/set the json.
CreateIndexRequestBuilder createIndexRequestBuilder = client.admin().indices().prepareCreate(index);
// CREATE SETTINGS
String settings_json = new String(Files.readAllBytes(brochures_mapping_path));
createIndexRequestBuilder.setSettings(settings_json);
// CREATE MAPPING
String mapping_json = new String(Files.readAllBytes(brochures_mapping_path));
createIndexRequestBuilder.addMapping("de_brochures", mapping_json);
CreateIndexResponse indexResponse = createIndexRequestBuilder.execute().actionGet();
There is no more complaint about the mapping file's structure but it now fails with the error:
Caused by: org.elasticsearch.index.mapper.MapperParsingException: Analyzer [de_analyzer] not found for field [text]

Solution:
I managed to do it with my original json file using createIndexRequestBuilder.setSource(settings_json);

I think the problem is with structure of your mapping file.
Here is a sample example.
mapping.json
{
"en_brochures": {
"properties": {
"text": {
"type": "string",
"store": true,
"index_analyzer": "en_analyzer",
"term_vector": "yes"
},
"classification": {
"type": "string",
"index": "not_analyzed"
},
"language": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
String mapping = new String(Files.readAllBytes(Paths.get("mapping.json")));
createIndexRequestBuilder.addMapping('en_brochures', mapping);
CreateIndexResponse indexResponse =createIndexRequestBuilder.execute().actionGet();
This works in mine, you can try.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Elasticsearch Query String Query not working with synonym analyzer - java

Related

Spring Data for ElasticSearch save method increments count (incorrect) even for update

Why does ElasticSearch is not showing the score?

How to implement an autocomplete search field (suggestor) with an existing ElasticSearch index?

Elasticsearch : How to retrieve only the desired nested objects

Elasticsearch Java API addMapping() and setSettings() usage

Categories

Resources