ElasticSearch edge-ngram not working - java

I have configured the my index with the following settings and the matchAll query results the have a value "trial" in the field IPRANGE.
The settings:
{
"settings" : {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 5
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings" : {
"users" : {
"properties" : {
"IPRANGE" : {
"type" : "string",
"analyzer" : "autocomplete"
}
}
}
},
refresh_interval: "1000"
}
But when I search with following payload it doesn't return results, ie is 0 hits.
URL:
http://xxxxxx:9200/db2/users/_search
Payload:
{
"query": {
"match": {
"IPRANGE": "tr"
}
}
}
What could be the issue?

How have you indexed the document? Here is an example that works:
I changed the mapping so that the autocomplete analyzer is used to index the IPRANGE field, when searching against the field the default analyzer will be used (you don't want to split the search term in same way).
/POST http://localhost:9200/test
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 5
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"users": {
"properties": {
"IPRANGE": {
"type": "string",
"search_analyzer": "autocomplete"
}
}
}
}
}
Index the document
/POST http://localhost:9200/test/users/1/
{
"IPRANGE":"trial"
}
Search request:
/POST http://localhost:9200/test/users/_search
{
"query": {
"match": {
"IPRANGE": "tr"
}
}
}
Returns the following result:
{
took: 10
timed_out: false
_shards: {
total: 5
successful: 5
failed: 0
}
hits: {
total: 1
max_score: 0.30685282
hits: [
{
_index: test
_type: users
_id: 1
_score: 0.30685282
_source: {
IPRANGE: trial
}
}
]
}
}

Related

Remove hyphens while search time in ElasticSearch

I want to create a search for books with ElasticSearch and SpringData.
I index my books with ISBN/EAN without hyphens and save it in my database. This data I index with ElasticSearch.
Indexed data: 1113333444444
If I'm search for a ISBN/EAN with hyphen: 111-3333-444444
There is no result. If I'm searching without hyphen, my book will be found as expected.
My settings are like this:
{
"analysis": {
"filter": {
"clean_special": {
"type": "pattern_replace",
"pattern": "[^a-zA-Z0-9]",
"replacement": ""
}
},
"analyzer": {
"isbn_search_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"clean_special"
]
}
}
}
}
I index my fields like this:
#Field(type = FieldType.Keyword, searchAnalyzer = "isbn_search_analyzer")
private String isbn;
#Field(type = FieldType.Keyword, searchAnalyzer = "isbn_search_analyzer")
private String ean;
If I test my analyzer:
GET indexname/_analyze
{
"analyzer" : "isbn_search_analyzer",
"text" : "111-3333-444444"
}
I get following result:
{
"tokens" : [
{
"token" : "1113333444444",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 0
}
]
}
If I'm search like this:
GET indexname/_search
{
"query": {
"query_string": {
"fields": [ "isbn", "ean" ],
"query": "111-3333-444444"
}
}
}
I don't get any result. Have someone of you an idea?
As mentioned by #P.J.Meisch, you have done everything correct, but missed defining your field data type to text, when you define them as keyword, even though you are explicitly telling ElasticSearch to use your custom-analyzer isbn_search_analyzer, it will be ignored.
Working example on your sample data when field is defined as text.
Index mapping
{
"settings": {
"analysis": {
"filter": {
"clean_special": {
"type": "pattern_replace",
"pattern": "[^a-zA-Z0-9]",
"replacement": ""
}
},
"analyzer": {
"isbn_search_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"clean_special"
]
}
}
}
},
"mappings": {
"properties": {
"isbn": {
"type": "text",
"analyzer": "isbn_search_analyzer"
},
"ean": {
"type": "text",
"analyzer": "isbn_search_analyzer"
}
}
}
}
Index Sample records
{
"isbn" : "111-3333-444444"
}
{
"isbn" : "111-3333-2222"
}
Search query
{
"query": {
"query_string": {
"fields": [
"isbn",
"ean"
],
"query": "111-3333-444444"
}
}
}
And search response
"hits": [
{
"_index": "65780647",
"_type": "_doc",
"_id": "1",
"_score": 0.6931471,
"_source": {
"isbn": "111-3333-444444"
}
}
]
Elasticsearch does not analyze fields of type keyword. You need to set the type to text.

Passing array as parameter in filter function with spring-data-mongodb

I have this collection of documents:
[
{
"name": "name1",
"data": [
{
"numbers": ["1","2","3"]
}
]
},
{
"name": "name2",
"data": [
{
"numbers": ["2","5","3"]
}
]
},
{
"name": "name3",
"data": [
{
"numbers": ["1","5","2"]
}
]
},
{
"name": "name4",
"data": [
{
"numbers": ["1","4","3"]
}
]
},
{
"name": "name5",
"data": [
{
"numbers": ["1","2"]
}
]
}
]
I want to get all documents of this collection when an array passed as a parameter is a subset of data.numbers.
This is the aggregation that I'm using.
db.testing.aggregate(
[
{ "$match" : { "data.numbers" : { "$exists" : true } } },
{ "$project" : { "is_subset" : { "$filter" : { "input" : "$data", "as" : "d", "cond" : { "$setIsSubset" :[ ["1"],"$$d.numbers"] } } } } },
{ "$match" : { "is_subset.0" : { "$exists" : true } } }]
);
I'm trying to reproduce the above aggregation in Spring Data MongoDB.
How to pass an array as parameter in $filter and $setIsSubset functions?
operations.aggregate(
newAggregation(Testing.class,
match(where("data.numbers").exists(true)),
project().and(
filter("data")
.as("d")
.by(???))
.as("is_subset"),
match(where("is_subset.0").exists(true))
), Testing.class);
I solve my issue.
operations.aggregate(
newAggregation(Testing.class,
match(where("data.numbers").exists(true)),
project("id", "name").and(
filter("data")
.as("d")
.by(context -> new Document("$setIsSubset", Arrays.asList(numbers, "$$d.numbers"))))
.as("is_subset"),
match(where("is_subset.0").exists(true))
), Testing.class);
I created a Document with the content that I needed in the $filter condition.
new Document("$setIsSubset", Arrays.asList(numbers, "$$d.numbers"))

Mongo db java driver query convert

I have the following data structure
[{
"id": "1c7bbebd-bc3d-4352-9ac0-98c01d13189d",
"version": 0,
"groups": [
{
"internalName": "Admin group",
"fields": [
{
"internalName": "Is verified",
"uiProperties": {
"isShow": true
}
},
{
"internalName": "Hide",
"uiProperties": {
"isHide": false
}
},
...
]
},
...
]
},
{
"id": "2b7bbebd-bc3d-4352-9ac0-98c01d13189d",
"version": 0,
"groups": [
{
"internalName": "User group",
"fields": [
{
"internalName": "Is verified",
"uiProperties": {
"isShow": true
}
},
{
"internalName": "Blocked",
"uiProperties": {
"isBlocked": true
}
},
...
]
},
...
]
},
...
]
Internal names of the fields can be repeated. I want to group by group.field.internalName and cut the array(for pagination) and get the output like:
{
"totalCount": 3,
"items": [
{
"internalName": "Blocked"
},
{
"internalName": "Hide"
},
{
"internalName": "Is verified"
}
]}
I wrote a query that works,
db.layouts.aggregate(
{
$unwind : "$groups"
},
{
$unwind : "$groups.fields"
},
{
$group: {
"_id" : {
"internalName" : "$groups.fields.internalName",
},
"internalName" : {
$first : "$groups.fields.internalName"
}
}
},
{
$group: {
"_id" : null,
"items" : {
$push : "$$ROOT"
},
"totalCount" : {
$sum : 1
}
}
},
{
$project: {
"items" : {
$slice : [ "$items", 0, 20 ]
},
"totalCount": 1
}
})
but I have the problem of translating it to java api. Notice that i need to use mongoTemplate approach. Here is what i have and where i'm struck
final List<AggregationOperation> aggregationOperations = new ArrayList<>();
aggregationOperations.add(unwind("groups"));
aggregationOperations.add(unwind("groups.fields"));
aggregationOperations.add(
group("groups.fields.internalName")
.first("groups.fields.internalName").as("internalName")
);
aggregationOperations.add(
group()
.push("$$ROOT").as("fields")
.sum("1").as("totalCount") // ERROR only string ref can be placed, but i need a number?
);
aggregationOperations.add(
project()
.andInclude("totalCount")
.and("fields").slice(size, page * size)
);
final Aggregation aggregation = newAggregation(aggregationOperations);
mongoTemplate.aggregate(aggregation, LAYOUTS, FieldLites.class).getMappedResults()
With this query i have the problem with sum(), because i can place only a String ref by api(but need a number) and with project operation - got an exception
java.lang.IllegalArgumentException: Invalid reference 'totalCount'!] with root cause
Can you help me with this query translation?
You can use count
group()
.push("$$ROOT").as("fields")
.count().as("totalCount")

Why does ElasticSearch is not showing the score?

I am using ElasticSearch 2.3.1 on Ubuntu 16.04.
The mapping is:
{
"settings": {
"analysis": {
"filter": {
"2gramsto3_filter": {
"type": "ngram",
"min_gram": 2,
"max_gram": 3
}
},
"analyzer": {
"2gramsto3": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"2gramsto3_filter"
]
}
}
}
},
"mappings": {
"agents": {
"properties": {
"presentation": {
"type": "string",
"analyzer": "2gramsto3"
},
"cv": {
"type": "string",
"analyzer": "2gramsto3"
}
}
}
}
The query is:
{
"size": 20,
"from": 0,
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
[
{
"match": {
"cv": "folletto"
}
},
{
"match": {
"cv": " psicologia"
}
},
{
"match": {
"cv": " tenacia"
}
}
]
]
}
}
]
}
}
}
It found 14567 documents but the score is always "_score": 0
I read the filters have the score, so, why not in this case?
Thank you!
The score is not calculated for filters. You need to use a normal query if you need scores.
Just take into account implications pointed out at the documentation below.
Ref doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html

ElasticSearch Boolean Query Result Mismatch

This is my index mapping
"index":{
"mappings":{
"patient":{
"properties":{
"LastName":{
"type":"string"
},
"accountType":{
"type":"string"
},
"civilStatus":{
"type":"string"
},
"consultations":{
"type":"nested",
"properties":{
"deleted":{
"type":"boolean"
},
"diagnosis":{
"type":"string",
"index":"not_analyzed"
},
"documentDate":{
"type":"date",
"format":"dateOptionalTime"
},
"firstName":{
"type":"string"
},
"lastName":{
"type":"string"
},
"middleName":{
"type":"string"
},
"prescriptions":{
"type":"string"
}
}
},
"firstName":{
"type":"string"
},
"gender":{
"type":"string"
},
"id":{
"type":"string",
"index":"not_analyzed"
},
"lastName":{
"type":"string"
},
"middleName":{
"type":"string"
},
"occupation":{
"type":"string"
},
"owner":{
"type":"string",
"index":"not_analyzed"
},
"patientPin":{
"type":"string"
}
}
}
}
}
}
Here's the only saved data on ElasticSearch
{
"_index":"index",
"_type":"patient",
"_id":"TENANT1100066",
"_score":1.0,
"_source":{
"id":"100066",
"firstName":"Johnny",
"patientPin":"201408000001",
"middleName":"John ",
"consultations":[
{
"id":null,
"prescriptions":[
],
"diagnosis":[
"headache of unknown origin"
],
"documentDate":"2014-08-05T10:10:00.000+08:00",
"deleted":false,
"lastName":"David",
"firstName":"Johnny ",
"middleName":"John "
}
],
"owner":"TENANT1",
"gender":"MALE",
"occupation":"Unspecified",
"accountType":"INDIVIDUAL",
"civilStatus":"SINGLE",
"lastName":"David"
}
}
And here's the sample query I built to check how boolean query works.
{
"nested" : {
"query" : {
"bool" : {
"must" : [ {
"match" : {
"consultations.diagnosis" : {
"query" : "Kawasaki's Disease",
"type" : "phrase"
}
}
}, {
"match" : {
"consultations.diagnosis" : {
"query" : "Alcohol Intoxication",
"type" : "phrase"
}
}
} ],
"must_not" : {
"match" : {
"consultations.deleted" : {
"query" : "true",
"type" : "boolean"
}
}
},
"should" : {
"match" : {
"consultations.diagnosis" : {
"query" : "headache of unknown origin",
"type" : "phrase"
}
}
}
}
},
"path" : "consultations"
}
Apparently, Kawasaki's Disease and Fibriasis does not exist but headache of unknown origin exists but no
results are returned(Which is Johnny John David) what am I missing here? The operation I had in my mind was
(Kawasaki's Disease AND Fibriasis) OR headache of unknown origin.
What I had in mind was if there was no patients with Kawasakis Disease AND Fibriasis search for Patients with "headache of unknown origin". Which clearly we have, but my query is returning 0 results. what Am I missing here
In your query, you require that matching documents have both (Kawasaki's Disease AND Fibriasis) as you added these 2 conditions in the must clause.
Your document only match your should clause, so it doesn't appear in the search results.
To achieve what you want :
(Kawasaki's Disease AND Fibriasis) OR headache of unknown origin
You can embed the two diseases in another bool query and add this query in the should section of the root query, like this :
{
"query": {
"nested": {
"path": "consultations",
"query": {
"bool": {
"should": [
{
"bool": {
"must": [{
"match_phrase": {
"consultations.diagnosis": "Kawasaki's Disease"
}
},
{
"match_phrase": {
"consultations.diagnosis": "Alcohol Intoxication"
}
}
]
}
},
{
"match_phrase": {
"consultations.diagnosis": "headache of unknown origin"
}
}
],
"minimum_number_should_match": 1
}
}
}
}
}
Which outputs the previously indexed patient :
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.3007646,
"hits": [
{
"_index": "test",
"_type": "patient",
"_id": "TENANT1100066",
"_score": 0.3007646,
"_source": {
"id": "100066",
"firstName": "Johnny",
"patientPin": "201408000001",
"middleName": "John ",
"consultations": [
{
"id": null,
"prescriptions": [],
"diagnosis": [
"headache of unknown origin"
],
"documentDate": "2014-08-05T10:10:00.000+08:00",
"deleted": false,
"lastName": "David",
"firstName": "Johnny ",
"middleName": "John "
}
],
"owner": "TENANT1",
"gender": "MALE",
"occupation": "Unspecified",
"accountType": "INDIVIDUAL",
"civilStatus": "SINGLE",
"lastName": "David"
}
}
]
}
}

Categories

Resources