For ex: I have 2 documents with this body:
{
"id": "doc_one",
"name": "test_name",
"date_creation": "some_date_cr_1",
"date_updation": "some_date_up_1"
}
And the second doc:
{
"id": "doc_two",
"name": "test_name",
"date_creation": "some_date_cr_2",
"date_updation": "some_date_up_2"
}
What I want to do: to create two runtime field or Map('data_creation',count_of_doc_where_field_not_null_AND_the_condition_is_met).
For ex: I've got the 1st doc, there is date_creation IS NOT NULL and the condition startDate<=date_creation<=endDate is met, so, I create some field count = 0 and when I've got this case I do count++. When I will get all the docs I will set finally count value from map as result: Map('data_creation',final_count) and the same for another field but in the same map.
I tried to use script, but there is return Map for each doc, for ex:
{
"_index": "my_index_001",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"my_doubled_field": [
{
"NEW": 2
}
]
}
},
{
"_index": "my_index_001",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"fields": {
"my_doubled_field": [
{
"NEW": 2
}
]
}
}
I have index below 3 documents to index where one document dont have date_creation field:
POST sample/_doc
{
"id": "doc_two",
"name": "test_name",
"date_updation": "some_date_up_2"
}
POST sample/_doc
{
"id": "doc_one",
"name": "test_name",
"date_creation": "some_date_cr_1",
"date_updation": "some_date_up_1"
}
POST sample/_doc
{
"id": "doc_two",
"name": "test_name",
"date_creation": "some_date_cr_2",
"date_updation": "some_date_up_2"
}
Now you can use filter aggregation from elasticsearch as shown below:
{
"size": 0,
"aggs": {
"date_creation": {
"filter": {
"range": {
"date_creation": {
"gte": "2020-01-09T10:20:10"
}
}
}
},
"date_updation": {
"filter": {
"range": {
"date_updation": {
"gte": "2020-01-09T10:20:10"
}
}
}
}
}
}
Response:
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"date_updation" : {
"meta" : { },
"doc_count" : 3
},
"date_creation" : {
"meta" : { },
"doc_count" : 2
}
}
You can see date_updation field is available in 3 doc so it is showing count as 3 and date_creation field is available in 2 doc so it is showing count as 2.
I need to do sortByCount and return all the fields instead of just _id and count.
sortByCount returns:
{ "_id" : "1", "count" : 4 }
{ "_id" : "2", "count" : 3 }
{ "_id" : "3", "count" : 2 }
{ "_id" : "4", "count" : 2 }
{ "_id" : "5", "count" : 1 }
But, I need a complete document like below:
{
"_id": 1,
"title": "The Pillars of Society",
"artist": "Grosz",
"year": 1926,
"tags": ["painting", "satire", "Expressionism", "caricature"]
} {
"_id": 2,
"title": "Melancholy III",
"artist": "Munch",
"year": 1902,
"tags": ["woodcut", "Expressionism"]
} {
"_id": 3,
"title": "Dancer",
"artist": "Miro",
"year": 1925,
"tags": ["oil", "Surrealism", "painting"]
} {
"_id": 4,
"title": "The Great Wave off Kanagawa",
"artist": "Hokusai",
"tags": ["woodblock", "ukiyo-e"]
} {
"_id": 5,
"title": "The Persistence of Memory",
"artist": "Dali",
"year": 1931,
"tags": ["Surrealism", "painting", "oil"]
}
Is there any way to replace root after sortByCount? In Java, I don't see any push method after sortByCount
$sortByCount is essentially a combination of $group followed by $sort on count field of group stage. If you really want the entire documents, you can try this:
db.collection.aggregate([
{
"$group": {
"_id": {
"id": "$_id"
},
"count": {
$sum: 1
},
"reqItems": {
$push: {
"title": "$title",
"artist": "$artist"
}
}
}
},
{
$sort: {
count: -1
}
}
])
Playground link
Hi have written a query for getting the avg of values at a position in Elastic search
elastic search payload : "userData": [ { "sub":1234, "value":678,"condition" :"A" },{ "sub":1234, "value":678,"condition" :"B" }]
{
"aggs": {
"student_data": {
"date_histogram": {
"field":"#timestamp",
"calendar_interval":"minute"
},
"aggs": {
"user_avg": {
"avg": {
"field":"value"
}
}
}
}
}
}
What I want is to get the array of elements of which the avg value is returned.
For example, if the avg of values on the basis of condition 'A' is 42 with values as {20,10,40,60,80}
In the output needed a field which can provide an array of [20,10,40,60,80]
I don't think you can obtain an array formatted like [20, 10, 40, 60, 80] in the response of a query. I can't think of a way to obtain it by using aggregations or scripted fields. Nevertheless, you can easily (1) get that information from the same query that specifies the aggregations and the filter logic; then, (2) post-process the query response to collect all the value's values used to calculate the average, by formatting format them in the way you prefer. How you post-process your response depends on the client/script you are using to send queries to Elasticsearch.
For example, you can output the values used to calculate the average as query hits.
{
"size": 100, <-- adjust this upper limit to your use case
"_source": "value", <-- include only the `value` field in the response
"query": {
"match": {
"condition": "A"
}
},
"aggs": {
"user_avg": {
"avg": {
"field": "value"
}
}
}
}
Or you can output the values used to calculate the average in a more compact way, by using terms aggregations.
{
"size": 0,
"_source": "value",
"query": {
"match": {
"condition": "A"
}
},
"aggs": {
"group_by_values": {
"terms": {
"field": "value",
"size": 100 . <-- adjust this upper limit to your use case
}
},
"user_avg": {
"avg": {
"field": "value"
}
}
}
}
The result of the latter will be something like:
"aggregations" : {
"array_of_values" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 50,
"doc_count" : 2
},
{
"key" : 60,
"doc_count" : 1
},
{
"key" : 100,
"doc_count" : 1
}
]
},
"user_avg" : {
"value" : 65.0
}
}
[
{
"_class": "com.netas.netmetriks.common.model.entity.WorkOrder",
"failCount": 0,
"id": "1",
"messageType": "RESET_DCU",
"ongoingWorks": [
1
],
"status": "IN_PROGRESS",
"successCount": 0,
"type": "workorder",
"workOrderDetailMap": {
"1": {
"data": {
"_class": "com.netas.netmetriks.common.model.converted.DeviceId",
"manufacturerFlag": "DSM",
"serialNumber": "87654321"
},
"dcuId": {
"manufacturerFlag": "DSM",
"serialNumber": "87654321"
},
"id": 1,
"requestDate": "20160818114933",
"resultDocuments": [],
"status": "IN_PROGRESS"
},
"2": {
"data": {
"_class": "com.netas.netmetriks.common.model.converted.DeviceId",
"manufacturerFlag": "DSM",
"serialNumber": "87654322"
},
"dcuId": {
"manufacturerFlag": "DSM",
"serialNumber": "87654322"
},
"id": 2,
"requestDate": "20160818114934",
"resultDocuments": [],
"status": "IN_PROGRESS"
}
}
}
]
Simply i want to obtain inner of "1" and "2" objects.
I am trying to obtain data,dcuId,id,requestDate,resultDocuments,status.
SELECT wd.* FROM netmetriks n
UNNEST workOrderDetailMap wd
WHERE n.type = 'workorder' and n.id = '1' ORDER BY n.documentId ASC LIMIT 10 OFFSET 0
I wrote a query but could not get rid of "1" and "2".
HashMap is used in entity when storing data so the result shows 1,2,3,4 so on...
I have 999 documents which I am using for experimenting with elastic search.
There is a field f4 in my type mapping which is analyzed and has following settings for analyzer :
"myNGramAnalyzer" => [
"type" => "custom",
"char_filter" => ["html_strip"],
"tokenizer" => "standard",
"filter" => ["lowercase","standard","asciifolding","stop","snowball","ngram_filter"]
]
My filter is as below :
"filter" => [
"ngram_filter" => [
"type" => "edgeNGram",
"min_gram" => "2",
"max_gram" => "20"
]
]
I have value for field f4 as "Proj1", "Proj2", "Proj3"...... so on.
Now when I try to do search using cross fields for "proj1" string, I was expecting document with "Proj1" to be returned at the top of the response with max score. But it doesn't. Rest all the data is almost same in content.
Also I don't understand why it matches all 999 document?
Following is my search :
{
"index": "myindex",
"type": "mytype",
"body": {
"query": {
"multi_match": {
"query": "proj1",
"type": "cross_fields",
"operator": "and",
"fields": "f*"
}
},
"filter": {
"term": {
"deleted": "0"
}
}
}
}
My search response is :
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 999,
"max_score": 1,
"hits": [{
"_index": "myindex",
"_type": "mytype",
"_id": "42",
"_score": 1,
"_source": {
"f1": "396","f2": "125650","f3": "BH.1511AI.001",
"f4": "Proj42",
"f5": "BH.1511AI.001","f6": "","f7": "","f8": "","f9": "","f10": "","f11": "","f12": "","f13": "","f14": "","f15": "","f16": "09/05/16 | 01:02PM | User","deleted": "0"
}
}, {
"_index": "myindex",
"_type": "mytype",
"_id": "47",
"_score": 1,
"_source": {
"f1": "396","f2": "137946","f3": "BH.152096.001",
"f4": "Proj47",
"f5": "BH.1511AI.001","f6": "","f7": "","f8": "","f9": "","f10": "","f11": "","f12": "","f13": "","f14": "","f15": "","f16": "09/05/16 | 01:02PM | User","deleted": "0"
}
},
//.......
//.......
//MANY RECORDS IN BETWEEN HERE
//.......
//.......
{
"_index": myindex,
"_type": "mytype",
"_id": "1",
"_score": 1,
"_source": {
"f1": "396","f2": "142095","f3": "BH.705215.001",
"f4": "Proj1",
"f5": "BH.1511AI.001","f6": "","f7": "","f8": "","f9": "","f10": "","f11": "","f12": "","f13": "","f14": "","f15": "","f16": "09/05/16 | 01:02PM | User","deleted": "0"
}
//.......
//.......
//MANY RECORDS IN BETWEEN HERE
//.......
//.......
}]
}
}
Any thing that I am doing wrong or missing? (Apologies for lengthy question, but I thought to give all possible information discarding unnecessary other code).
EDITED :
Term vector response
{
"_index": "myindex",
"_type": "mytype",
"_id": "10",
"_version": 1,
"found": true,
"took": 9,
"term_vectors": {
"f4": {
"field_statistics": {
"sum_doc_freq": 5886,
"doc_count": 999,
"sum_ttf": 5886
},
"terms": {
"pr": {
"doc_freq": 999,
"ttf": 999,
"term_freq": 1,
"tokens": [{
"position": 0,
"start_offset": 0,
"end_offset": 6
}]
},
"pro": {
"doc_freq": 999,
"ttf": 999,
"term_freq": 1,
"tokens": [{
"position": 0,
"start_offset": 0,
"end_offset": 6
}]
},
"proj": {
"doc_freq": 999,
"ttf": 999,
"term_freq": 1,
"tokens": [{
"position": 0,
"start_offset": 0,
"end_offset": 6
}]
},
"proj1": {
"doc_freq": 111,
"ttf": 111,
"term_freq": 1,
"tokens": [{
"position": 0,
"start_offset": 0,
"end_offset": 6
}]
},
"proj10": {
"doc_freq": 11,
"ttf": 11,
"term_freq": 1,
"tokens": [{
"position": 0,
"start_offset": 0,
"end_offset": 6
}]
}
}
}
}
}
EDITED 2
Mappings for field f4
"f4" : {
"type" : "string",
"index_analyzer" : "myNGramAnalyzer",
"search_analyzer" : "standard"
}
I have updated to use standard analyzer for query time, which has improved the results but still not what I expected.
Instead of 999 (all documents) now it return 111 documents like "Proj1", "Proj11", "Proj111"......"Proj1", "Proj181"......... etc.
Still "Proj1" is in between the results and not at the top.
There is no index_analyzer (at least not from Elasticsearch version 1.7). For mapping parameters you can use analyzer and search_analyzer.
Try the following steps in order to make it work.
Create myindex with analyzer settings:
PUT /myindex
{
"settings": {
"analysis": {
"filter": {
"ngram_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"myNGramAnalyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": "html_strip",
"filter": [
"lowercase",
"standard",
"asciifolding",
"stop",
"snowball",
"ngram_filter"
]
}
}
}
}
}
Add mappings to mytype (to make it short I just mapped the relevant fields):
PUT /myindex/_mapping/mytype
{
"properties": {
"f1": {
"type": "string"
},
"f4": {
"type": "string",
"analyzer": "myNGramAnalyzer",
"search_analyzer": "standard"
},
"deleted": {
"type": "string"
}
}
}
Index some data:
PUT myindex/mytype/1
{
"f1":"396",
"f4":"Proj12" ,
"deleted": "0"
}
PUT myindex/mytype/2
{
"f1":"42",
"f4":"Proj22" ,
"deleted": "1"
}
Now try your query:
GET myindex/mytype/_search
{
"query": {
"multi_match": {
"query": "proj1",
"type": "cross_fields",
"operator": "and",
"fields": "f*"
}
},
"filter": {
"term": {
"deleted": "0"
}
}
}
It should return document #1. It worked for me with Sense. I am using Elasticsearch 2.X versions.
Hope I have managed to help :)
After hours of spending time to find a solution to this, I finally made it work.
So I kept everything same as mentioned in my question, using n gram analzyer while indexing data. The only thing I had to change was, to use the all field in my search query as a bool query with my existing multi-match query.
Now my result for search text Proj1 would return me results in an order such as Proj1, Proj121, Proj11, etc.
Although this does not return the exact order like Proj1, Proj11, Proj121, etc, but still it closely resembles the result that I wanted.