Platform: MongoDB, Spring, SpringDataMongoDB
I have a collection called "Encounter" with below structure
Encounter:
{ "_id" : "49a0515b-e020-4e0d-aa6c-6f96bb867288",
"_class" : "com.keype.hawk.health.emr.api.transaction.model.Encounter",
"encounterTypeId" : "c4f657f0-015d-4b02-a216-f3beba2c64be",
"visitId" : "8b4c48c6-d969-4926-8b8f-05d2f58491ae",
"status" : "ACTIVE",
"form" :
{
"_id" : "be3cddc5-4cec-4ce5-8592-72f1d7a0f093",
"formCode" : "CBC",
"fields" : {
"dc" : {
"label" : "DC",
"name" : "tc",
},
"tc" : {
"label" : "TC",
"name" : "tc",
},
"notes" : {
"label" : "Notes",
"name" : "notes",
}
},
"notes" : "Blood Test",
"dateCreated" : NumberLong("1376916746564"),
"dateModified" : NumberLong("1376916746564"),
"staffCreated" : 10013,
"staffModified" : 10013
},
}
The element "fields" is represented using a Java Hashmap as:
protected LinkedHashMap<String, Field> fields;
The Key to the hashmap () is not fixed, but generated at run time.
How do I query to get all documents in the collection where "label" = "TC"?
It's not possible to query like db.encounter.find({'form.fields.dc.label':'TC'}) because the element name 'dc' is NOT known. I want to skip that postion and the execute query, something like:
db.encounter.find({'form.fields.*.label':'TC'});
Any ideas?
Also, how do I best use indexes in this scenario?
If fields were an array and your key a part of the sub-document instead:
"fields" : [
{ "key" : "dc",
"label" : "DC",
"name" : "dc"
},
{ "key" : "tc",
"label" : "TC",
"name" : "tc"
}
]
In this case, you could simply query for any sub-element inside the array:
db.coll.find({"form.fields.label":"TC"})
Not sure how you would integrate that with Spring, but perhaps the idea helps? As far as indexes are concerned, you can index into the array, which gives you a multi-key index. Basically, the index will have a separate entry pointing to the document for each array value.
Related
Hi I'm reading data from mongodb into spark application.
My mongodb contains 2 collections.
One is profile_data(actual data with field names)
(Which holds all the input data including some unique fields)
{
"MessageStatus" : 2,
"Origin" : 1,
"_id" : ObjectId("596340fe8b0fa35d2880db1a"),
"accerlation" : 19.4,
"cylinders" : 4,
"displacement" : 119,
"file_id" : ObjectId("59633e48b760e7c8071a6c1c"),
"horsepower" : 82,
"modelyear" : 82,
"modified_date" : ISODate("2017-07-10T08:47:01.641Z"),
"mpg" : 31,
"snet_id" : "new_project",
"unique_id" : "784",
"username" : "chevy s-10",
"weight" : 2720
}
And another collection is : predictive_model_details(Which holds the ML model details like model name, feature fields and prediction field just like metadata)
{
"_id" : ObjectId("56b4351be4b064bb19a90324"),
"algorithm_id" : "55d717a53d9e22022ff2a1e9",
"algorithm_name" : "K- Nearest Neighbours (IBK)",
"client_id" : "562e1d51b760d0e408151b91",
"feature_fields" : [
{
"name" : "Origin",
"type" : "int"
},
{
"name" : "accerlation",
"type" : "Double"
},
{
"name" : "displacement",
"type" : "Int"
},
{
"name" : "horsepower",
"type" : "Int"
},
{
"name" : "modelyear",
"type" : "Int"
}
],
,
"makeActiveStatus" : "0",
"model_name" : "test1",
"parameter_type" : "system_defined",
"parameters" : [
{
"symbol" : "-K",
"value" : "1"
}
],
"predictor" : {
"name" : "mpg"
"type" : "Int"
},
"result_exists" : true,
"snet_id" : "new_project"
}
So I've created 2 datasets in spark for two collections in MongoDB. Now I want to map these 2 Datasets with all feature fields together and prediction field together.
And common field in 2 datasets is snet_id.
Could anyone please help?
I do have an unique index with a partialFilterExpression on a collection but duplicate data is sometimes inserted.
Index creation
getCollection().createIndex(new BasicDBObject(userId, 1)
, new BasicDBObject("name", "uidx-something-user")
.append("partialFilterExpression", new BasicDBObject(Properties.someting, new BasicDBObject("$eq", true)))
.append("unique", true));
The index from the getIndicies command
{
"v" : 1,
"unique" : true,
"key" : {
"userId" : 1
},
"name" : "uidx-something-user",
"ns" : "somewhere.something",
"partialFilterExpression" : {
"something" : {
"$eq" : true
}
}
}
The duplicated Docuemnts
{
"_id" : "08a8506c-bcbc-4ed6-9972-67fd7c37b4bc",
"userId" : "1068",
"express" : false,
"something" : true,
"items" : [ ],
"recipient" : {
"_id" : "efbd8618-c480-4194-964e-f5a821edf695"
}
}
{
"_id" : "b6695c6a-f29d-4531-96ac-795f14c72547",
"userId" : "1068",
"express" : false,
"something" : true,
"items" : [ ],
"recipient" : {
"_id" : "4f93fe38-edb2-4cb7-a1b3-c2c51ac8ded1"
}
MongoDb version: 3.2.7, seems also to happen with 3.2.12
A Sidenote: When dumping the collection and restoring it, a duplicate key error is thrown
Why is it sometimes possible to insert duplicate data and how to avoid that?
UPDATE
I created an MongoDb issue https://jira.mongodb.org/browse/SERVER-28153
Was fixed in 3.2.13, 3.4.4, 3.5.6
You cen read more in the mongodb jira
I have a data like below, and I want to group that data by the type, I'm using spring-data-mongodb .
[
{
"_id" : ObjectId("58a5518aace6132a88309d98"),
"type" : "SMS",
},
{
"_id" : ObjectId("58a5518bace6132a88309d99"),
"type" : "PUSH_NOTIFICATION",
},
{
"_id" : ObjectId("58a5519aace6132a0094d7df"),
"type" : "SMS",
},
{
"_id" : ObjectId("58a5519aace6132a0094d7e0"),
"type" : "PUSH_NOTIFICATION",
}
]
I'm using this method and won't work.
GroupByResults<Queuing> results = mongoTemplate.group("queuing",
GroupBy.key("type"), Queuing.class);
Anyone know the best and clear way to do this grouping using spring-data-mongodb.
Thanks.
This is the correct syntax for group operation.
GroupByResults<Queuing> results = mongoTemplate.group("queuing",
GroupBy.key("type").initialDocument("{}").reduceFunction("function(doc, prev) {}"),
Queuing.class);
More information here http://docs.spring.io/spring-data/mongodb/docs/current/reference/html/#mongo.group.example
I am facing a trouble in the use of ElasticSearch for my java application.
I explain myself, I have a mapping, which is something like :
{
"products": {
"properties": {
"id": {
"type": "long",
"ignore_malformed": false
},
"locations": {
"properties": {
"category": {
"type": "long",
"ignore_malformed": false
},
"subCategory": {
"type": "long",
"ignore_malformed": false
},
"order": {
"type": "long",
"ignore_malformed": false
}
}
},
...
So, as you can see, I receive a list of products, which are composed of locations. In my model, this locations are all the categories' product. It means that a product can be in 1 or more categories. In each of this category, the product has an order, which is the order the client wants to show them.
For instance, a diamond product can have a first place in Jewelry, but the third place in Woman (my examples are not so logic ^^).
So, when I click on Jewelry, I want to show this products, ordered by the field locations.order in this specific category.
For the moment, when I search all the products on a specific category the response for ElasticSearch that I receive is something like :
{"id":5331880,"locations":[{"category":5322606,"order":1},
{"category":5883712,"subCategory":null,"order":3},
{"category":5322605,"subCategory":6032961,"order":2},.......
Is it possible to sort this products, by the element locations.order for the specific category I am searching for ? For instance, if I am querying the category 5322606, I want the order 1 for this product to be taken.
Thank you very much beforehand !
Regards,
Olivier.
First a correction of terminology: in Elasticsearch, "parent/child" refers to completely separate docs, where the child doc points to the parent doc. Parent and children are stored on the same shard, but they can be updated independently.
With your example above, what you are trying to achieve can be done with nested docs.
Currently, your locations field is of type:"object". This means that the values in each location get flattened to look something like this:
{
"locations.category": [5322606, 5883712, 5322605],
"locations.subCategory": [6032961],
"locations.order": [1, 3, 2]
}
In other words, the "sub" fields get flattened into multi-value fields, which is of no use to you, because there is no correlation between category: 5322606 and order: 1.
However, if you change locations to be type:"nested" then internally it will index each location as a separate doc, meaning that each location can be queried independently, using the dedicated nested query and filter.
By default, the nested query will return a _score based upon how well each location matches, but in your case you want to return the highest value of the order field from any matching children. To do this, you'll need to use a custom_score query.
So let's start by creating the index with the appropriate mapping:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"products" : {
"properties" : {
"locations" : {
"type" : "nested",
"properties" : {
"order" : {
"type" : "long"
},
"subCategory" : {
"type" : "long"
},
"category" : {
"type" : "long"
}
}
},
"id" : {
"type" : "long"
}
}
}
}
}
'
The we index your example doc:
curl -XPOST 'http://127.0.0.1:9200/test/products?pretty=1' -d '
{
"locations" : [
{
"order" : 1,
"category" : 5322606
},
{
"order" : 3,
"subCategory" : null,
"category" : 5883712
},
{
"order" : 2,
"subCategory" : 6032961,
"category" : 5322605
}
],
"id" : 5331880
}
'
And now we can search for it using the queries we discussed above:
curl -XGET 'http://127.0.0.1:9200/test/products/_search?pretty=1' -d '
{
"query" : {
"nested" : {
"query" : {
"custom_score" : {
"script" : "doc[\u0027locations.order\u0027].value",
"query" : {
"constant_score" : {
"filter" : {
"and" : [
{
"term" : {
"category" : 5322605
}
},
{
"term" : {
"subCategory" : 6032961
}
}
]
}
}
}
}
},
"score_mode" : "max",
"path" : "locations"
}
}
}
'
Note: the single quotes within the script have been escaped as \u0027 to get around shell quoting. The script actually looks like this: "doc['locations.order'].value"
If you look at the _score from the results, you can see that it has used the order value from the matching location:
{
"hits" : {
"hits" : [
{
"_source" : {
"locations" : [
{
"order" : 1,
"category" : 5322606
},
{
"order" : 3,
"subCategory" : null,
"category" : 5883712
},
{
"order" : 2,
"subCategory" : 6032961,
"category" : 5322605
}
],
"id" : 5331880
},
"_score" : 2,
"_index" : "test",
"_id" : "cXTFUHlGTKi0hKAgUJFcBw",
"_type" : "products"
}
],
"max_score" : 2,
"total" : 1
},
"timed_out" : false,
"_shards" : {
"failed" : 0,
"successful" : 5,
"total" : 5
},
"took" : 9
}
Just add a more updated version related to sorting parent by child field.
We can query parent doc type sorted by child field ('count' e.g.) similar as follows.
https://gist.github.com/robinloxley1/7ea7c4f37a3413b1ca16
I am having the mongo document as below:
{
"_id" : ObjectId("506e9e54a4e8f51423679428"),
"description" : "ffffffffffffffff",
"menus" : [
{
"_id" : ObjectId("506e9e5aa4e8f51423679429"),
"description" : "ffffffffffffffffffff",
"items" : [
{
"name" : "xcvxc",
"description" : "vxvxcvxc",
"text" : "vxcvxcvx",
"menuKey" : "0",
"onSelect" : "1",
"_id" : ObjectId("506e9f07a4e8f5142367942f")
} ,
{
"name" : "abcd",
"description" : "qqq",
"text" : "qqq",
"menuKey" : "0",
"onSelect" : "3",
"_id" : ObjectId("507e9f07a4e8f5142367942f")
}
]
}
]
}
Now i want to change this to :
{
"_id" : ObjectId("506e9e54a4e8f51423679428"),
"description" : "ffffffffffffffff",
"menus" : [
{
"_id" : ObjectId("506e9e5aa4e8f51423679429"),
"description" : "ffffffffffffffffffff",
"items" : {
{
"name" : "xcvxc",
"description" : "vxvxcvxc",
"text" : "vxcvxcvx",
"menuKey" : "0",
"onSelect" : "1",
"_id" : ObjectId("506e9f07a4e8f5142367942f")
} ,
{
"name" : "abcd",
"description" : "qqq",
"text" : "qqq",
"menuKey" : "0",
"onSelect" : "3",
"_id" : ObjectId("507e9f07a4e8f5142367942f")
}
}
}
]
}
Is this possible in mongo? In the first schema, updating is not possible atomically becoz we can't use two "$" while updating deep layer. So i thought to change schema as same as second one, how can i achieve it?
For first one i have used "$push" for adding items inside menus...
Any help would be great..
Your update is changing 'menu' object, so I would suggest changing the schema so that the menu is the top level document, rather than an array in another document.
Menu could either have a field referencing the top level object (in another collection) that it belongs to, or you can denormalize the fields of the top level object into each menu document.
Without knowing complete requirements of the application, it's difficult to know when the schema is "good enough".