I have the following gridfs in a mongodb database:
db.outputFs.files.find()
{ "_id" : ObjectId("000000000000000000000001"), "chunkSize" : 261120, "length" : 232, "md5" : "42290309186cc5420acff293b92ae21d", "filename" : "/tmp/outputFs-01.tmp", "contentType" : null, "uploadDate" : ISODate("2015-04-13T13:50:48.259Z"), "aliases" : null, "metadata" : { "estado" : "FICHERO_PDTE_ENVIO", "dataDate" : "20141122", "inputOutputFs" : "output", "gridFileCompression" : "bzip2", "fileType" : "OUTPUT-TEST1", "filePath" : "/tmp/outputFs-01.tmp", "sourceParticipant" : "0100", "destinationParticipant" : "REE", "exportFileName" : "F1_0100_20141122_20150219.0", "processed" : "false", "fileMD5" : "4276e61a4b63d3d1d1b77e27e792bd13", "version" : 0 } }
{ "_id" : ObjectId("000000000000000000000002"), "chunkSize" : 261120, "length" : 232, "md5" : "42290309186cc5420acff293b92ae21d", "filename" : "/tmp/outputFs-02.tmp", "contentType" : null, "uploadDate" : ISODate("2015-04-13T13:50:48.259Z"), "aliases" : null, "metadata" : { "estado" : "FICHERO_ENVIADO_OK", "fechaEnvio" : ISODate("2015-04-13T13:50:48.259Z"), "dataDate" : "20141123", "inputOutputFs" : "output", "gridFileCompression" : "bzip2", "fileType" : "OUTPUT-TEST2", "filePath" : "/tmp/outputFs-02.tmp", "sourceParticipant" : "0100", "destinationParticipant" : "REE", "exportFileName" : "F1_0100_20141123_20150220.0", "processed" : "false", "fileMD5" : "4276e61a4b63d3d1d1b77e27e792bd13", "version" : 0 } }
db.outputFs.chunks.find()
{ "_id" : ObjectId("000000000000000000000001"), "files_id" : ObjectId("000000000000000000000001"), "n" : 0, "data" : { "$type" : 0, "$binary" : "QlpoOTFBWSZTWZSBQ/YABX5cAAAYQAH/+CAAMAFWA0NqptIb9UgAZ6qowAAFKSaJpGhk8ssmVlk7AAAALZtZZOf0vr859OqflcIs461Dm1skcSOpGpHMuu5HcsJG0j5I9PiR4kaRvvjWskfsVMkZVLxI3uRy/pGTqRj7VmMyTOBfUtb561rwkf0j09+Zbkd+cs1I77861xI7pypvfOt1v5DmR1I51nW7XGdaluRnGZjMzJMzOZGpHnrfGM56+/fnGPVVVVVqqpVVWxCSxCTSAEMkZI3IyqXuqXyRuR/i7kinChISkCh+wA==" } }
{ "_id" : ObjectId("000000000000000000000002"), "files_id" : ObjectId("000000000000000000000002"), "n" : 0, "data" : { "$type" : 0, "$binary" : "QlpoOTFBWSZTWZSBQ/YABX5cAAAYQAH/+CAAMAFWA0NqptIb9UgAZ6qowAAFKSaJpGhk8ssmVlk7AAAALZtZZOf0vr859OqflcIs461Dm1skcSOpGpHMuu5HcsJG0j5I9PiR4kaRvvjWskfsVMkZVLxI3uRy/pGTqRj7VmMyTOBfUtb561rwkf0j09+Zbkd+cs1I77861xI7pypvfOt1v5DmR1I51nW7XGdaluRnGZjMzJMzOZGpHnrfGM56+/fnGPVVVVVqqpVVWxCSxCTSAEMkZI3IyqXuqXyRuR/i7kinChISkCh+wA==" } }
When I try to retrieve it with Spring Data or even MongoChef (both Java clients) as a file, I receive the following error:
com.mongodb.BasicDBObject cannot be cast to [B
The collections were manually imported as they are, not using MongoChef or mongofiles and I have no idea where this error can come from.
The GridFS specification expects a Binary field called data in the chunks collection. However, your outputFs.chunks does not meet this criteria.
The data field here is not a Binary BSON data type but a regular document that happens to have two fields called $type and $data.
mongoimport will create a Binary field only for JSON entries in the following format. Please note that the order of the fields matters for mongoimport.
{
"$binary" : ...
"$type" : ...
}
Your example has the $type and $binary fields swapped.
Update your JSON file and import your outputFs.chunks again. mongoimport will create valid Binary fields and you'll be able to work with GridFS using your other MongoDB tools.
Related
I have the following query:
{'carCollectionStatuses': 'BELOW_ONE_HUNDRED', 'carCollection.carID':'e711c3aa-e073-0cd4-29e0-db2e503b61a0'}
Which returns:
{
"_id" : ObjectId("5a4bf83ae261b4cc9045d36d"),
"name" : "Colecao cabrita",
"title" : "Colecao selecionada a dedo",
"subtitle" : "Só carros que vão parar de funcionar",
"pictureUrl" : "http://www.google.com",
"carCollection" : [
{
"partnerName" : "-------",
"photo" : "https://--------",
"_id" : ObjectId("59d7cac33fde150006d24856"),
"location" : [
11123.21729,
1146.4362
],
"carModel" : "LOGAN Expression Hi-Flex 1.6 8V 4p",
"carBrand" : "Renault",
"carID" : "e711c3aa-e073-0cd4-29e0-db2e222b61a0"
},
{
"partnerName" : "-------",
"photo" : "---------",
"_id" : ObjectId("59dbc9701755db00069a4157"),
"location" : [
11123.21729,
1146.4362
],
"carModel" : "Megane Grand Tour Dynam. Hi-Flex 1.6 16V",
"carBrand" : "Renault",
"carID" : "e71125ad-70fc-f563-0000-db2e503b61a0"
}
],
"__v" : NumberInt(0),
"carCollectionStatuses" : [
"BELOW_ONE_HUNDRED"
]
}
Using spring-data-mongodb:
#Query("{'carCollectionStatuses': ?0, 'carCollection.carID': ?1}")
CarCollectionHolderEntity findBycarCollectionStatusesAndCarCollectionCarID(CarCollectionStatus status, String carID);
My problem is, using MQL directly I have a matching query, but using spring-mongo this query never returns the data, what am I doing wrong?
Thanks
Hi I'm reading data from mongodb into spark application.
My mongodb contains 2 collections.
One is profile_data(actual data with field names)
(Which holds all the input data including some unique fields)
{
"MessageStatus" : 2,
"Origin" : 1,
"_id" : ObjectId("596340fe8b0fa35d2880db1a"),
"accerlation" : 19.4,
"cylinders" : 4,
"displacement" : 119,
"file_id" : ObjectId("59633e48b760e7c8071a6c1c"),
"horsepower" : 82,
"modelyear" : 82,
"modified_date" : ISODate("2017-07-10T08:47:01.641Z"),
"mpg" : 31,
"snet_id" : "new_project",
"unique_id" : "784",
"username" : "chevy s-10",
"weight" : 2720
}
And another collection is : predictive_model_details(Which holds the ML model details like model name, feature fields and prediction field just like metadata)
{
"_id" : ObjectId("56b4351be4b064bb19a90324"),
"algorithm_id" : "55d717a53d9e22022ff2a1e9",
"algorithm_name" : "K- Nearest Neighbours (IBK)",
"client_id" : "562e1d51b760d0e408151b91",
"feature_fields" : [
{
"name" : "Origin",
"type" : "int"
},
{
"name" : "accerlation",
"type" : "Double"
},
{
"name" : "displacement",
"type" : "Int"
},
{
"name" : "horsepower",
"type" : "Int"
},
{
"name" : "modelyear",
"type" : "Int"
}
],
,
"makeActiveStatus" : "0",
"model_name" : "test1",
"parameter_type" : "system_defined",
"parameters" : [
{
"symbol" : "-K",
"value" : "1"
}
],
"predictor" : {
"name" : "mpg"
"type" : "Int"
},
"result_exists" : true,
"snet_id" : "new_project"
}
So I've created 2 datasets in spark for two collections in MongoDB. Now I want to map these 2 Datasets with all feature fields together and prediction field together.
And common field in 2 datasets is snet_id.
Could anyone please help?
I do have an unique index with a partialFilterExpression on a collection but duplicate data is sometimes inserted.
Index creation
getCollection().createIndex(new BasicDBObject(userId, 1)
, new BasicDBObject("name", "uidx-something-user")
.append("partialFilterExpression", new BasicDBObject(Properties.someting, new BasicDBObject("$eq", true)))
.append("unique", true));
The index from the getIndicies command
{
"v" : 1,
"unique" : true,
"key" : {
"userId" : 1
},
"name" : "uidx-something-user",
"ns" : "somewhere.something",
"partialFilterExpression" : {
"something" : {
"$eq" : true
}
}
}
The duplicated Docuemnts
{
"_id" : "08a8506c-bcbc-4ed6-9972-67fd7c37b4bc",
"userId" : "1068",
"express" : false,
"something" : true,
"items" : [ ],
"recipient" : {
"_id" : "efbd8618-c480-4194-964e-f5a821edf695"
}
}
{
"_id" : "b6695c6a-f29d-4531-96ac-795f14c72547",
"userId" : "1068",
"express" : false,
"something" : true,
"items" : [ ],
"recipient" : {
"_id" : "4f93fe38-edb2-4cb7-a1b3-c2c51ac8ded1"
}
MongoDb version: 3.2.7, seems also to happen with 3.2.12
A Sidenote: When dumping the collection and restoring it, a duplicate key error is thrown
Why is it sometimes possible to insert duplicate data and how to avoid that?
UPDATE
I created an MongoDb issue https://jira.mongodb.org/browse/SERVER-28153
Was fixed in 3.2.13, 3.4.4, 3.5.6
You cen read more in the mongodb jira
I have these 2 documents in my collection:
{
"_id" : ObjectId("5722042f8648ba1d04c65dad"),
"companyId" : ObjectId("570269639caabe24e4e4043e"),
"applicationId" : ObjectId("5710e3994df37620e84808a8"),
"steps" : [
{
"id" : NumberLong(0),
"responsiveUser" : "57206f9362d0260fd0af59b6",
"stepOnRejection" : NumberLong(0),
"notification" : "test"
},
{
"id" : NumberLong(1),
"responsiveUser" : "57206fd562d0261034075f70",
"stepOnRejection" : NumberLong(1),
"notification" : "test1"
}
]
}
{
"_id" : ObjectId("5728f317a8f9ba14187b84f8"),
"companyId" : ObjectId("570269639caabe24e4e4043e"),
"applicationId" : ObjectId("5710e3994df37620e84808a8"),
"steps" : [
{
"id" : NumberLong(0),
"responsiveUser" : "57206f9362d0260fd0af59b6",
"stepOnRejection" : NumberLong(0),
"notification" : "erter"
},
{
"id" : NumberLong(1),
"responsiveUser" : "57206f9362d0260fd0af59b6",
"stepOnRejection" : NumberLong(1),
"notification" : "3232"
}
]
}
Now I'm trying to get the document with the max _id and the id that equals 0 from a document inside of the steps array. I also have a projection that is supposed to show only the id of the matched element and nothing else.
Here is my query:
collection
.find(new Document("companyId", companyId)
.append("applicationId", applicationId)
.append("steps",
new Document("$elemMatch",
new Document("id", 0))))
.sort(new Document("_id", 1))
.limit(1)
.projection(new Document("steps.id", 1)
.append("_id", 0));
And it returns:
Document{{steps=[Document{{id=0}}, Document{{id=1}}]}}
Why is it returning 2 documents instead of 1?
The result should be looking like:
Document{{id=0}}
What am I missing here? I know that is something basic, but I really can't spot my mistake here.
Your query document tells Mongo to return those documents where in the 'steps' array they have a document where id: 0. You are NOT telling Mongo to return ONLY that field. You can use $elemMatch inside the projection document to get what you want (I'm writing this in the Mongo shell syntax because I'm not too familiar with the Java syntax):
{ steps: { $elemMatch: { id: 0 } },
'steps.id': 1,
_id: 0
}
Platform: MongoDB, Spring, SpringDataMongoDB
I have a collection called "Encounter" with below structure
Encounter:
{ "_id" : "49a0515b-e020-4e0d-aa6c-6f96bb867288",
"_class" : "com.keype.hawk.health.emr.api.transaction.model.Encounter",
"encounterTypeId" : "c4f657f0-015d-4b02-a216-f3beba2c64be",
"visitId" : "8b4c48c6-d969-4926-8b8f-05d2f58491ae",
"status" : "ACTIVE",
"form" :
{
"_id" : "be3cddc5-4cec-4ce5-8592-72f1d7a0f093",
"formCode" : "CBC",
"fields" : {
"dc" : {
"label" : "DC",
"name" : "tc",
},
"tc" : {
"label" : "TC",
"name" : "tc",
},
"notes" : {
"label" : "Notes",
"name" : "notes",
}
},
"notes" : "Blood Test",
"dateCreated" : NumberLong("1376916746564"),
"dateModified" : NumberLong("1376916746564"),
"staffCreated" : 10013,
"staffModified" : 10013
},
}
The element "fields" is represented using a Java Hashmap as:
protected LinkedHashMap<String, Field> fields;
The Key to the hashmap () is not fixed, but generated at run time.
How do I query to get all documents in the collection where "label" = "TC"?
It's not possible to query like db.encounter.find({'form.fields.dc.label':'TC'}) because the element name 'dc' is NOT known. I want to skip that postion and the execute query, something like:
db.encounter.find({'form.fields.*.label':'TC'});
Any ideas?
Also, how do I best use indexes in this scenario?
If fields were an array and your key a part of the sub-document instead:
"fields" : [
{ "key" : "dc",
"label" : "DC",
"name" : "dc"
},
{ "key" : "tc",
"label" : "TC",
"name" : "tc"
}
]
In this case, you could simply query for any sub-element inside the array:
db.coll.find({"form.fields.label":"TC"})
Not sure how you would integrate that with Spring, but perhaps the idea helps? As far as indexes are concerned, you can index into the array, which gives you a multi-key index. Basically, the index will have a separate entry pointing to the document for each array value.