I've indexed documents like bellow in elasticsearch.
{
"category": "clothing (f)",
"description": "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
"name": "Women's Unstoppable Graphic T-Shirt",
"price": "$34.99"
}
There are categories like clothing (m), clothing (f) etc. I am trying to exclude the cloting (m) category items if the search is for female items. The query I am trying is:
{
"query": {
"bool": {
"must": [
{
"match": {
"description": "women's black shirt"
}
}
],
"must_not": [
{
"term": {
"category": "clothing (m)"
}
}
]
}
},
"from": 0,
"size": 50
}
But this is not working as expected. There are always few results with clothing (m) document with other documents. How can I exclude documents which have a particular category?
In order to exclude a specific term (exact match) you will have to use keyword datatype.
Keyword datatypes are typically used for filtering (Find me all blog posts where status is published), for sorting, and for aggregations. Keyword fields are only searchable by their exact value.
Keyword Datatype
Your current query catches clothing (m) in the results because when you indexed your documents they were analyzed with elasticsearch standard analyzer which analyzes clothing (m) as clothing and (m).
In your query you searched for category as text datatype.
Text datatype fields are analyzed, that is they are passed through an analyzer to convert the string into a list of individual terms before being indexed.
Run this command:
POST my_index/_analyze
{
"text": ["clothing (m)"]
}
Results:
{
"tokens" : [
{
"token" : "clothing",
"start_offset" : 0,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "m",
"start_offset" : 10,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}
A working example:
Assuming you mappings look like that:
{
"my_index" : {
"mappings" : {
"properties" : {
"category" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"description" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"price" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
Let's post a few documents:
POST my_index/_doc/1
{
"category": "clothing (m)",
"description": "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
"name": "Women's Unstoppable Graphic T-Shirt",
"price": "$34.99"
}
POST my_index/_doc/2
{
"category": "clothing (f)",
"description": "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
"name": "Women's Unstoppable Graphic T-Shirt",
"price": "$34.99"
}
Now our query should look like this:
GET my_index/_search
{
"query": {
"bool": {
"must": {
"match": {
"description": "women's black shirt"
}
},
"filter": {
"bool": {
"must_not": {
"term": {
"category.keyword": "clothing (m)"
}
}
}
}
}
},
"from": 0,
"size": 50
}
The results:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.43301374,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.43301374,
"_source" : {
"category" : "clothing (f)",
"description" : "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
"name" : "Women's Unstoppable Graphic T-Shirt",
"price" : "$34.99"
}
}
]
}
}
Results without using keyword
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.43301374,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.43301374,
"_source" : {
"category" : "clothing (f)",
"description" : "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
"name" : "Women's Unstoppable Graphic T-Shirt",
"price" : "$34.99"
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.43301374,
"_source" : {
"category" : "clothing (m)",
"description" : "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
"name" : "Women's Unstoppable Graphic T-Shirt",
"price" : "$34.99"
}
}
]
}
}
As you can see from the last results we got also clothing (m).
BTW don't use term for text datatype. use match.
Hope this helps.
Related
There is this opensearch query constructed using openserch-java
GET eventsearch/_search
{
"aggregations": {
"WEB": {
"aggregations": {
"eventDate": {
"date_histogram": {
"extended_bounds": {
"max": "2022-12-01T00:00:00Z",
"min": "2022-01-01T00:00:00Z"
},
"field": "eventDate",
"fixed_interval": "1d",
"min_doc_count": 0
}
}
},
"filter": {
"term": {
"channel": {
"value": "WEB",
"case_insensitive": true
}
}
}
}
},
"query": {
"bool": {
"filter": [
{
"range": {
"eventDate": {
"from": "2022-01-01T00:00:00Z",
"to": "2022-12-01T00:00:00Z"
}
}
}
],
"must": [
{
"match_all": {}
}
]
}
},
"size": 0
}
Running query, the response is this:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 26,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"WEB" : {
"doc_count" : 25,
"eventDate" : {
"buckets" : [
{
"key_as_string" : "2022-01-01T00:00:00.000Z",
"key" : 1640995200000,
"doc_count" : 0
},
{
"key_as_string" : "2022-01-02T00:00:00.000Z",
"key" : 1641081600000,
"doc_count" : 0
},
{
"key_as_string" : "2022-01-03T00:00:00.000Z",
"key" : 1641168000000,
"doc_count" : 0
},
{
"key_as_string" : "2022-01-04T00:00:00.000Z",
"key" : 1641254400000,
"doc_count" : 0
},
....................
]
}
}
}
}
In java I need to perform this query and get the results from there.
But after using the opensearchclient.search and then get the "aggregations" list method, I receive this (image attached) and get
If I try to get the "WEB" from the Map, there is no other "eventDate" aggregation to fetch.
Is there a way to fetch this inner aggregation using opensearch-java client? I had no luck with documentation.
opensearch-java 2.1.0
There is currently no feature like this, it exists an open bug, with merged code, but not released.
https://github.com/opensearch-project/opensearch-java/issues/197
This is my sample es index document:
"hits" : [
{
"_index" : "project_note",
"_type" : "project_note",
"_id" : "19",
"_score" : 1.0,
"_source" : {
"createTime" : "2021-10-04T13:43:55.330",
"createTimeInMs" : 1633333435330,
"createdBy" : "test",
"editTime" : "2021-10-04T13:43:55.330",
"editTimeInMs" : 1633333435330,
"editedBy" : "test",
"versionId" : 1,
"id" : "19",
"organizationId" : "28",
"accessLevel" : "PUBLIC",
"status" : "ACTIVE",
"projectId" : "95",
"userId" : 129,
"noteType" : "SYSTEM_GENERATED",
"projectDemographicLogId" : "1"
},
{
"_index" : "project_note",
"_type" : "project_note",
"_id" : "19",
"_score" : 1.0,
"_source" : {
"createTime" : "2021-10-04T13:43:55.330",
"createTimeInMs" : 1633333435330,
"createdBy" : "test",
"editTime" : "2021-10-04T13:43:55.330",
"editTimeInMs" : 1633333435330,
"editedBy" : "test",
"versionId" : 1,
"id" : "19",
"organizationId" : "28",
"accessLevel" : "PUBLIC",
"status" : "ACTIVE",
"projectId" : "95",
"userId" : 129
}
]
In the first doc, it has noteType but in the second, I don't have that field stored in db.
I want to exclude the documents where noteType==null or noteType is absent.
But, I am getting only the docs which have noteType="SYSTEM_GENERATED"
My approach:
{
"query":
{
"bool" : {
"must" : [
{
"term" : {
"projectId" : {
"value" : "95",
"boost" : 1.0
}
}
},
{
"range" : {
"createTimeInMs" : {
"from" : null,
"to" : 1633594455000,
"include_lower" : true,
"include_upper" : true,
"boost" : 1.0
}
}
}
],
"must_not" : [
{
"term" : {
"noteType" : {
"value" : "SYSTEM_GENERATED",
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
}
Equivalent java code:
BoolQueryBuilder queryBuilder= QueryBuilders.boolQuery();
queryBuilder.must(QueryBuilders.termQuery("projectId", requestInfo.getProjectId()));
queryBuilder.must(rangeQuery("createTimeInMs").lte(requestInfo.getCreateTimeInMs()));
if(!requestInfo.isIncludeLog()) {
queryBuilder.mustNot(QueryBuilders.termQuery("noteType", Defs.SYSTEM_NOTE_TYPE));
}
If only the must_not part of the query is used (excluding the must part)
{
"query": {
"bool": {
"must_not": [
{
"term": {
"noteType.keyword": {
"value": "SYSTEM_GENERATED",
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
}
}
The search result is similar to what you expect to get
"hits": [
{
"_index": "69477995",
"_type": "_doc",
"_id": "2",
"_score": 0.0,
"_source": {
"createTime": "2021-09-26T15:54:08.373",
"createTimeInMs": 1632650048373,
"createdBy": "test",
"editTime": "2021-09-26T15:54:08.373",
"editTimeInMs": 1632650048373,
"editedBy": "test",
"versionId": 1,
"id": "18",
"note": "note-1, simple note ",
"organizationId": "28",
"accessLevel": "PUBLIC",
"status": "ACTIVE",
"taskId": "5",
"userId": 129
}
}
]
We have an elastic search document which has a dimension called city. Each document will have only one value for city field. I have a scenario where I need to query the person based on the city or cities.
Documents in Elasticsearch
{
person_id: "1",
property_value : 25000,
city: "Bangalore"
}
{
person_id: "2",
property_value : 100000,
city: "Bangalore"
}
{
person_id: "1",
property_value : 15000,
city: "Delhi"
}
Note: The aggregation should be performed on property_value and group by on person_id.
For eg.,
If I query for Bangalore it should return document with person_id 1 and 2.
If I query for both Delhi and Bangalore it should return this
{
person_id: "1",
property_value : 40000,
city: ["Bangalore", "Delhi"]
}
Looking at your data, I've come up with a sample mapping, request query and the response.
Mapping:
PUT my_index_city
{
"mappings": {
"properties": {
"person_id":{
"type": "keyword"
},
"city":{
"type":"text",
"fields":{
"keyword":{
"type": "keyword"
}
}
},
"property_value":{
"type": "long"
}
}
}
}
Sample Request:
Note that I've made use of simple query string to filter the documents having Bangalore and Delhi.
For aggregation I've made use of Terms Aggregation on person_id and Sum Aggregation on the property_value field.
POST my_index_city/_search
{
"size": 0,
"query": {
"query_string": {
"default_field": "city",
"query": "Bangalore Delhi"
}
},
"aggs": {
"my_person": {
"terms": {
"field": "person_id",
"size": 10,
"min_doc_count": 2
},
"aggs": {
"sum_property_value": {
"sum": {
"field": "property_value"
}
}
}
}
}
}
Sample Response:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_person" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 2,
"sum_property_value" : {
"value" : 40000.0
}
}
]
}
}
}
Note: This query would only work if the person_id has multiple documents but each document having unique/different city value.
What I mean to say is, if the person_id has multiple documents with same city, the aggregation would not give right answer.
Updated Answer:
There is no direct way to achieve what you are looking for unless you modify the mapping. What I've done is, made use of nested datatype and ingested all the documents for person_id as a single document.
Mapping:
PUT my_sample_city_index
{
"mappings": {
"properties": {
"person_id":{
"type": "keyword"
},
"property_details":{
"type":"nested", <------ Note this
"properties": {
"city":{
"type": "text",
"fields":{
"keyword":{
"type":"keyword"
}
}
},
"property_value":{
"type": "long"
}
}
}
}
}
}
Sample Documents:
POST my_sample_city_index/_doc/1
{
"person_id": "1",
"property_details":[
{
"property_value" : 25000,
"city": "Bangalore"
},
{
"property_value" : 15000,
"city": "Delhi"
}
]
}
POST my_sample_city_index/_doc/2
{
"person_id": "2",
"property_details":[
{
"property_value" : 100000,
"city": "Bangalore"
}
]
}
Aggregation Query:
POST my_sample_city_index/_search
{
"size": 0,
"query": {
"nested": {
"path": "property_details",
"query": {
"query_string": {
"default_field": "property_details.city",
"query": "bangalore delhi"
}
}
}
},
"aggs": {
"persons": {
"terms": {
"field": "person_id",
"size": 10
},
"aggs": {
"property_sum": {
"nested": { <------ Note this
"path": "property_details"
},
"aggs": {
"total_sum": {
"sum": {
"field": "property_details.property_value"
}
}
}
}
}
}
}
}
Note that I've applied initially a term query on person_id post which I've applied Nested Aggregation, further on which I've applied metric sum aggregation query.
This should also work correctly if a person has multiple properties in the same city.
Response:
{
"took" : 31,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"persons" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 1,
"property_sum" : {
"doc_count" : 2,
"total_sum" : {
"value" : 40000.0
}
}
},
{
"key" : "2",
"doc_count" : 1,
"property_sum" : {
"doc_count" : 1,
"total_sum" : {
"value" : 100000.0
}
}
}
]
}
}
}
Let me know if this helps!
I'm working with documents that contain music playlists.
Each document has this structure:
{
"user_id": "5858",
"playlists": [
{
"name": "My Playlist",
"guild_ids": ["7575"],
"items": [
{
"title": "title",
"url": "url",
"duration": 200000
}
]
}
]
}
I would like to extract all playlists from the same guild.
But the thing is that i'd like the results to be returned in a single document. One single document with a list of playlists.
The expected result for guild_id=5656 would be like this:
{
"playlists": [
{
"name": "My Playlist",
"guild_ids": ["5656"],
"items": [
{
"title": "title",
"url": "url",
"duration": 200000
}
]
},
// other playlists where guild_ids contains "5656"
]
}
I tried to use aggregation but i always get the same number of documents as the number of unique user_ids. I get the playlists grouped by user_id.
The following query can get us the expected output:
db.collection.aggregate([
{
$unwind:"$playlists"
},
{
$match:{
"playlists.guild_ids":{
$in:["7575"]
}
}
},
{
$group:{
"_id":null,
"playlists":{
$push: "$playlists"
}
}
},
{
$project:{
"_id":0
}
}
]).pretty()
Data set:
{
"_id" : ObjectId("5d88225e38db7cf8d3f75cd6"),
"user_id" : "5858",
"playlists" : [
{
"name" : "My Playlist",
"guild_ids" : [
"7575"
],
"items" : [
{
"title" : "title",
"url" : "url",
"duration" : 200000
}
]
}
]
}
{
"_id" : ObjectId("5d88225e38db7cf8d3f75cd7"),
"user_id" : "5858",
"playlists" : [
{
"name" : "My Playlist 2",
"guild_ids" : [
"1234"
],
"items" : [
{
"title" : "title",
"url" : "url",
"duration" : 200000
}
]
}
]
}
{
"_id" : ObjectId("5d88225e38db7cf8d3f75cd8"),
"user_id" : "5858",
"playlists" : [
{
"name" : "My Playlist 3",
"guild_ids" : [
"7575"
],
"items" : [
{
"title" : "title",
"url" : "url",
"duration" : 200000
}
]
}
]
}
Output:
{
"playlists" : [
{
"name" : "My Playlist",
"guild_ids" : [
"7575"
],
"items" : [
{
"title" : "title",
"url" : "url",
"duration" : 200000
}
]
},
{
"name" : "My Playlist 3",
"guild_ids" : [
"7575"
],
"items" : [
{
"title" : "title",
"url" : "url",
"duration" : 200000
}
]
}
]
}
Query analysis: We are unwinding the playlists, filtering only those which has 7575 guild ID and then grouping them all again.
I want to update request fields only in an array using java.This is my existing document in mongo db:
{
"_id": "6691e5068dwe335w42cb0a699650f",
"Opportunity_Owner": "Self",
"Account_Name": "IC",
"Lead_Source": "Callbox",
"Opportunity_Name": "name1 ",
"Stage": "Proposal",
"Stage_Status": "A",
"1555570551211": [],
"1555556165153": [],
"1555556059584": [{
"id": "1557389940585",
"Notes": "Note 1"
},
{
"id": "1557389945398",
"Notes": "Hi Bobby "
},
{
"id": "1557389978181",
"Notes": "Spoken to Bobby."
},
{
"id": "1557389990159",
"Notes": "plan to call on 29/Apr"
}
],
"createdBy": "2c18b8dbb7d74a41a66f53a90117480a",
"createdDate": "1562911250917"
}
Request payload:
{
"_id" : "6691e5068dwe335w42cb0a699650f",
"Stage_Status" : "I",
"1555556059584" : [
{
"id" : "1557389940585",
"Notes" : "updated note 123"
}
]
}
I am trying to update "Stage_Status" and "1555556059584.Notes" at a time using $set.I am able to update "Stage_Status" but "1555556059584" array is going to reset with what i have updated with last one.
expected output:
{
"_id" : "6691e5068dwe335w42cb0a699650f",
"Opportunity_Owner" : "Self",
"Account_Name" : "IC",
"Lead_Source" : "Callbox",
"Opportunity_Name" : "name1 ",
"Stage" : "Proposal",
"Stage_Status" : "I",
"1555570551211" : [],
"1555556165153" : [],
"1555556059584" : [
{
"id" : "1557389940585",
"Notes" : "updated note 123"
},
{
"id" : "1557389945398",
"Notes" : "Hi Bobby "
},
{
"id" : "1557389978181",
"Notes" : "Spoken to Bobby."
},
{
"id" : "1557389990159",
"Notes" : "plan to call on 29/Apr"
}
],
"createdBy" : "2c18b8dbb7d74a41a66f53a90117480a",
"createdDate" : "1562911250917"
}
can any one please help me to figure it out in java.
I guess you wanted to update Stage_Status and 1555556059584.Notes at Once .
here is a demo about it
> db.student.find()
{ "_id" : ObjectId("5d2c09ea8ed60ae70d3dd76b"), "name" : "bigbang", "courses" : [ { "name" : "en", "classRoom" : "9001" }, { "name" : "math", "classRoom" : "1001" } ] }
> db.student.update({name:'bigbang','courses.name':'en'},{ $set: {'courses.$.classRoom':'1009',name :"course"} })
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.student.find()
{ "_id" : ObjectId("5d2c09ea8ed60ae70d3dd76b"), "name" : "course", "courses" : [ { "name" : "en", "classRoom" : "1009" }, { "name" : "math", "classRoom" : "1001" } ] }
the java demo is like this
collection.updateOne(and(eq("Stage_Status","A"),eq("1555556059584.id","1557389940585")),new Document("$set" ,new Document("Stage_Status","YOUR_NEW_VALUE").append("1555556059584.$.Notes","YOUR_NEW_VALUE")));
you must set the 1555556059584.id to let the diver know which element to be update .