Hi I'm reading data from mongodb into spark application.
My mongodb contains 2 collections.
One is profile_data(actual data with field names)
(Which holds all the input data including some unique fields)
{
"MessageStatus" : 2,
"Origin" : 1,
"_id" : ObjectId("596340fe8b0fa35d2880db1a"),
"accerlation" : 19.4,
"cylinders" : 4,
"displacement" : 119,
"file_id" : ObjectId("59633e48b760e7c8071a6c1c"),
"horsepower" : 82,
"modelyear" : 82,
"modified_date" : ISODate("2017-07-10T08:47:01.641Z"),
"mpg" : 31,
"snet_id" : "new_project",
"unique_id" : "784",
"username" : "chevy s-10",
"weight" : 2720
}
And another collection is : predictive_model_details(Which holds the ML model details like model name, feature fields and prediction field just like metadata)
{
"_id" : ObjectId("56b4351be4b064bb19a90324"),
"algorithm_id" : "55d717a53d9e22022ff2a1e9",
"algorithm_name" : "K- Nearest Neighbours (IBK)",
"client_id" : "562e1d51b760d0e408151b91",
"feature_fields" : [
{
"name" : "Origin",
"type" : "int"
},
{
"name" : "accerlation",
"type" : "Double"
},
{
"name" : "displacement",
"type" : "Int"
},
{
"name" : "horsepower",
"type" : "Int"
},
{
"name" : "modelyear",
"type" : "Int"
}
],
,
"makeActiveStatus" : "0",
"model_name" : "test1",
"parameter_type" : "system_defined",
"parameters" : [
{
"symbol" : "-K",
"value" : "1"
}
],
"predictor" : {
"name" : "mpg"
"type" : "Int"
},
"result_exists" : true,
"snet_id" : "new_project"
}
So I've created 2 datasets in spark for two collections in MongoDB. Now I want to map these 2 Datasets with all feature fields together and prediction field together.
And common field in 2 datasets is snet_id.
Could anyone please help?
Related
I have a following mongoDB document structure -
db.menus.findOne()
{
"_id" : ObjectId("5cf25412326c3f4f26df039b"),
"restaurantId" : "301728",
"items" : [
{
"itemId" : "CEBM4H41JR",
"name" : "Crun Chicken",
"imageUrl" : "",
"price" : 572,
"attributes" : [
"Tasty",
"Spicy"
]
},
{
"itemId" : "53Q0XS3HPR",
"name" : "Devils Chicken",
"imageUrl" : "",
"price" : 595,
"attributes" : [
"Gravy",
"Salty"
]
}
]
}
I am trying to write a query to get all the menus based on the "attributes" field under "items" in the document.
I have done the following to get the menus if "name" of "items" is given and I am getting a result -
db.menus.find({ 'items' : {$elemMatch : {'name' : {$regex : "Chicken Thali", $options: 'i' }}}}).pretty()
I have tried this for getting the result for attributes but this is not working -
db.menus.find({'items' : {$elemMatch : {'attributes' : {$all : [{$regex : "Tasty", $options: 'i' }]}}}})
How do I get the list and I also want to write this query for mongoRepository in a spring boot application?
Further, based on the restaurantId's obtained, I have to query restaurant collection in order to find all the restaurants in restaurants collection having the following structure -
{
"_id" : ObjectId("5cf2540e326c3f4f26de93dd"),
"restaurantId" : "301728",
"name" : "Desire Foods",
"imageUrl" : "https://b.zmtcdn.com/data/pictures/8/301728/d690ccb500d746530f56e1d637949da2_featured_v2.jpg",
"latitude" : 28.4900591,
"longitude" : 77.3066401,
"attributes" : [
"Chinese",
" Fast Food",
" Bakery"
],
"opensAt" : "09:30",
"closesAt" : "22:30"
}
Is the whole operation possible in a single query?
I think you can modify your query to use $in instead of $all.
To achieve your intended result, you can try:
db.collection.aggregate([
{
"$match": {
"items": {
"$elemMatch": {
"attributes": {
"$in": [
"Tasty"
]
}
}
}
}
},
{
"$lookup": {
"from": "restaurant",
"localField": "restaurantId",
"foreignField": "restaurantId",
"as": "restaurants"
}
},
{
"$unwind": "restaurants"
},
{
"$replaceRoot": { "newRoot": "$restaurants" }
}
])
Use $match at appropriate stages as needed to limit the documents pulled in memory
Ok so I am making API requests to retrieve certain things like movies, songs, or to ping the server. However all of these responses are contained within the same response JSON object that has varying fields depending on the response. Below are three examples.
ping
{
"response" : {
"status" : "ok",
"version" : "0.9.1"
}
}
getIndexes
{
"response" : {
"status" : "ok",
"version" : "0.9.1",
"indexes" : {
"index" : [ {
"name" : "A",
"movie" : [ {
"id" : "150",
"name" : "A Movie"
}, {
"id" : "2400",
"name" : "Another Movie"
} ]
}, {
"name" : "S",
"movie" : [ {
"id" : "439",
"name" : "Some Movie"
}, {
"id" : "209",
"name" : "Some Movie Part 2"
} ]
} ]
}
}
}
getRandomSongs
{
"response" : {
"status" : "ok"
"version" : "0.9.1"
"randomSongs" : {
"song": [ {
"id" : "72",
"parent" : "58",
"isDir" : false,
"title" : "Letter From Yokosuka",
"album" : "Metaphorical Music",
"artist" : "Nujabes",
"track" : 7,
"year" : 2003,
"genre" : "Hip-Hop",
"coverArt" : "58",
"size" : 20407325,
"contentType" : "audio/flac",
"suffix" : "flac",
"transcodedContentType" : "audio/mpeg",
"transcodedSuffix" : "mp3",
"duration" : 190,
"bitRate" : 858,
"path" : "Nujabes/Metaphorical Music/07 - Letter From Yokosuka.flac",
"isVideo" : false,
"created" : "2015-06-06T01:18:05.000Z",
"albumId" : "2",
"artistId" : "0",
"type" : "music"
}, {
"id" : "3135",
"parent" : "3109",
"isDir" : false,
"title" : "Forty One Mosquitoes Flying In Formation",
"album" : "Tame Impala",
"artist" : "Tame Impala",
"track" : 4,
"year" : 2008,
"genre" : "Rock",
"coverArt" : "3109",
"size" : 10359844,
"contentType" : "audio/mpeg",
"suffix" : "mp3",
"duration" : 258,
"bitRate" : 320,
"path" : "Tame Impala/Tame Impala/04 - Forty One Mosquitoes Flying In Formation.mp3",
"isVideo" : false,
"created" : "2015-06-29T21:50:16.000Z",
"albumId" : "101",
"artistId" : "30",
"type" : "music"
} ]
}
}
}
So basically my question is, how should I structure my model classes to use for parsing these responses? At the moment I have an abstract response object that only contains fields for the status and version. However, by using this approach I will need a response class that extends this abstract class for ever request I make (e.g. AbstractResponse, IndexesResponse, RandomSongsResponse). Also, some models with the same name may have different fields depending on the API request made. I would prefer to avoid making a model class for every possible scenario.
And as an extra note, I am using GSON for JSON serialization/deserialization and Retrofit to communicate with the API.
I'm aware of the writerWithDefaultPrettyPrinter option in Jackson, but is there any way to customize it? See examples below.
If this isn't possible in Jackson, if you can't change pretty print options, then is there another popular JSON library that would do it?
Summary of options to change:
Don't open multiple containers on the same line
Don't close and open containers on the same line
Use 4 spaces as indents instead of 2
(another option, though I wouldn't use it) Open containers on a new line so that they line up vertically with their closing marker
Example of what it outputs now:
[ {
"id" : "12",
"payload" : [ {
"name" : "url",
"value" : [ {
"name" : "url",
"value" : "http://foobar.com"
} ]
}, {
"name" : "tags",
"value" : [ {
"name" : "tags",
"value" : "red"
}, {
"name" : "tags",
"value" : "green"
}, {
"name" : "tags",
"value" : "blue"
}, {
...
Example of what I'd like to get:
[
{
"id" : "12",
"payload" : [
{
"name" : "url",
"value" : [
{
"name" : "url",
"value" : "http://foobar.com"
}
]
},
{
"name" : "tags",
"value" : [
{
"name" : "tags",
"value" : "red"
},
{
"name" : "tags",
"value" : "green"
},
{
"name" : "tags",
"value" : "blue"
},
{
...
Platform: MongoDB, Spring, SpringDataMongoDB
I have a collection called "Encounter" with below structure
Encounter:
{ "_id" : "49a0515b-e020-4e0d-aa6c-6f96bb867288",
"_class" : "com.keype.hawk.health.emr.api.transaction.model.Encounter",
"encounterTypeId" : "c4f657f0-015d-4b02-a216-f3beba2c64be",
"visitId" : "8b4c48c6-d969-4926-8b8f-05d2f58491ae",
"status" : "ACTIVE",
"form" :
{
"_id" : "be3cddc5-4cec-4ce5-8592-72f1d7a0f093",
"formCode" : "CBC",
"fields" : {
"dc" : {
"label" : "DC",
"name" : "tc",
},
"tc" : {
"label" : "TC",
"name" : "tc",
},
"notes" : {
"label" : "Notes",
"name" : "notes",
}
},
"notes" : "Blood Test",
"dateCreated" : NumberLong("1376916746564"),
"dateModified" : NumberLong("1376916746564"),
"staffCreated" : 10013,
"staffModified" : 10013
},
}
The element "fields" is represented using a Java Hashmap as:
protected LinkedHashMap<String, Field> fields;
The Key to the hashmap () is not fixed, but generated at run time.
How do I query to get all documents in the collection where "label" = "TC"?
It's not possible to query like db.encounter.find({'form.fields.dc.label':'TC'}) because the element name 'dc' is NOT known. I want to skip that postion and the execute query, something like:
db.encounter.find({'form.fields.*.label':'TC'});
Any ideas?
Also, how do I best use indexes in this scenario?
If fields were an array and your key a part of the sub-document instead:
"fields" : [
{ "key" : "dc",
"label" : "DC",
"name" : "dc"
},
{ "key" : "tc",
"label" : "TC",
"name" : "tc"
}
]
In this case, you could simply query for any sub-element inside the array:
db.coll.find({"form.fields.label":"TC"})
Not sure how you would integrate that with Spring, but perhaps the idea helps? As far as indexes are concerned, you can index into the array, which gives you a multi-key index. Basically, the index will have a separate entry pointing to the document for each array value.
I am having the mongo document as below:
{
"_id" : ObjectId("506e9e54a4e8f51423679428"),
"description" : "ffffffffffffffff",
"menus" : [
{
"_id" : ObjectId("506e9e5aa4e8f51423679429"),
"description" : "ffffffffffffffffffff",
"items" : [
{
"name" : "xcvxc",
"description" : "vxvxcvxc",
"text" : "vxcvxcvx",
"menuKey" : "0",
"onSelect" : "1",
"_id" : ObjectId("506e9f07a4e8f5142367942f")
} ,
{
"name" : "abcd",
"description" : "qqq",
"text" : "qqq",
"menuKey" : "0",
"onSelect" : "3",
"_id" : ObjectId("507e9f07a4e8f5142367942f")
}
]
}
]
}
Now i want to change this to :
{
"_id" : ObjectId("506e9e54a4e8f51423679428"),
"description" : "ffffffffffffffff",
"menus" : [
{
"_id" : ObjectId("506e9e5aa4e8f51423679429"),
"description" : "ffffffffffffffffffff",
"items" : {
{
"name" : "xcvxc",
"description" : "vxvxcvxc",
"text" : "vxcvxcvx",
"menuKey" : "0",
"onSelect" : "1",
"_id" : ObjectId("506e9f07a4e8f5142367942f")
} ,
{
"name" : "abcd",
"description" : "qqq",
"text" : "qqq",
"menuKey" : "0",
"onSelect" : "3",
"_id" : ObjectId("507e9f07a4e8f5142367942f")
}
}
}
]
}
Is this possible in mongo? In the first schema, updating is not possible atomically becoz we can't use two "$" while updating deep layer. So i thought to change schema as same as second one, how can i achieve it?
For first one i have used "$push" for adding items inside menus...
Any help would be great..
Your update is changing 'menu' object, so I would suggest changing the schema so that the menu is the top level document, rather than an array in another document.
Menu could either have a field referencing the top level object (in another collection) that it belongs to, or you can denormalize the fields of the top level object into each menu document.
Without knowing complete requirements of the application, it's difficult to know when the schema is "good enough".