Duplicate data is sometimes inserted with unique index - java

I do have an unique index with a partialFilterExpression on a collection but duplicate data is sometimes inserted.
Index creation
getCollection().createIndex(new BasicDBObject(userId, 1)
, new BasicDBObject("name", "uidx-something-user")
.append("partialFilterExpression", new BasicDBObject(Properties.someting, new BasicDBObject("$eq", true)))
.append("unique", true));
The index from the getIndicies command
{
"v" : 1,
"unique" : true,
"key" : {
"userId" : 1
},
"name" : "uidx-something-user",
"ns" : "somewhere.something",
"partialFilterExpression" : {
"something" : {
"$eq" : true
}
}
}
The duplicated Docuemnts
{
"_id" : "08a8506c-bcbc-4ed6-9972-67fd7c37b4bc",
"userId" : "1068",
"express" : false,
"something" : true,
"items" : [ ],
"recipient" : {
"_id" : "efbd8618-c480-4194-964e-f5a821edf695"
}
}
{
"_id" : "b6695c6a-f29d-4531-96ac-795f14c72547",
"userId" : "1068",
"express" : false,
"something" : true,
"items" : [ ],
"recipient" : {
"_id" : "4f93fe38-edb2-4cb7-a1b3-c2c51ac8ded1"
}
MongoDb version: 3.2.7, seems also to happen with 3.2.12
A Sidenote: When dumping the collection and restoring it, a duplicate key error is thrown
Why is it sometimes possible to insert duplicate data and how to avoid that?
UPDATE
I created an MongoDb issue https://jira.mongodb.org/browse/SERVER-28153

Was fixed in 3.2.13, 3.4.4, 3.5.6
You cen read more in the mongodb jira

Related

How to map two datasets in spark Java

Hi I'm reading data from mongodb into spark application.
My mongodb contains 2 collections.
One is profile_data(actual data with field names)
(Which holds all the input data including some unique fields)
{
"MessageStatus" : 2,
"Origin" : 1,
"_id" : ObjectId("596340fe8b0fa35d2880db1a"),
"accerlation" : 19.4,
"cylinders" : 4,
"displacement" : 119,
"file_id" : ObjectId("59633e48b760e7c8071a6c1c"),
"horsepower" : 82,
"modelyear" : 82,
"modified_date" : ISODate("2017-07-10T08:47:01.641Z"),
"mpg" : 31,
"snet_id" : "new_project",
"unique_id" : "784",
"username" : "chevy s-10",
"weight" : 2720
}
And another collection is : predictive_model_details(Which holds the ML model details like model name, feature fields and prediction field just like metadata)
{
"_id" : ObjectId("56b4351be4b064bb19a90324"),
"algorithm_id" : "55d717a53d9e22022ff2a1e9",
"algorithm_name" : "K- Nearest Neighbours (IBK)",
"client_id" : "562e1d51b760d0e408151b91",
"feature_fields" : [
{
"name" : "Origin",
"type" : "int"
},
{
"name" : "accerlation",
"type" : "Double"
},
{
"name" : "displacement",
"type" : "Int"
},
{
"name" : "horsepower",
"type" : "Int"
},
{
"name" : "modelyear",
"type" : "Int"
}
],
,
"makeActiveStatus" : "0",
"model_name" : "test1",
"parameter_type" : "system_defined",
"parameters" : [
{
"symbol" : "-K",
"value" : "1"
}
],
"predictor" : {
"name" : "mpg"
"type" : "Int"
},
"result_exists" : true,
"snet_id" : "new_project"
}
So I've created 2 datasets in spark for two collections in MongoDB. Now I want to map these 2 Datasets with all feature fields together and prediction field together.
And common field in 2 datasets is snet_id.
Could anyone please help?

Why I'm getting documents not included in the projection?

I have these 2 documents in my collection:
{
"_id" : ObjectId("5722042f8648ba1d04c65dad"),
"companyId" : ObjectId("570269639caabe24e4e4043e"),
"applicationId" : ObjectId("5710e3994df37620e84808a8"),
"steps" : [
{
"id" : NumberLong(0),
"responsiveUser" : "57206f9362d0260fd0af59b6",
"stepOnRejection" : NumberLong(0),
"notification" : "test"
},
{
"id" : NumberLong(1),
"responsiveUser" : "57206fd562d0261034075f70",
"stepOnRejection" : NumberLong(1),
"notification" : "test1"
}
]
}
{
"_id" : ObjectId("5728f317a8f9ba14187b84f8"),
"companyId" : ObjectId("570269639caabe24e4e4043e"),
"applicationId" : ObjectId("5710e3994df37620e84808a8"),
"steps" : [
{
"id" : NumberLong(0),
"responsiveUser" : "57206f9362d0260fd0af59b6",
"stepOnRejection" : NumberLong(0),
"notification" : "erter"
},
{
"id" : NumberLong(1),
"responsiveUser" : "57206f9362d0260fd0af59b6",
"stepOnRejection" : NumberLong(1),
"notification" : "3232"
}
]
}
Now I'm trying to get the document with the max _id and the id that equals 0 from a document inside of the steps array. I also have a projection that is supposed to show only the id of the matched element and nothing else.
Here is my query:
collection
.find(new Document("companyId", companyId)
.append("applicationId", applicationId)
.append("steps",
new Document("$elemMatch",
new Document("id", 0))))
.sort(new Document("_id", 1))
.limit(1)
.projection(new Document("steps.id", 1)
.append("_id", 0));
And it returns:
Document{{steps=[Document{{id=0}}, Document{{id=1}}]}}
Why is it returning 2 documents instead of 1?
The result should be looking like:
Document{{id=0}}
What am I missing here? I know that is something basic, but I really can't spot my mistake here.
Your query document tells Mongo to return those documents where in the 'steps' array they have a document where id: 0. You are NOT telling Mongo to return ONLY that field. You can use $elemMatch inside the projection document to get what you want (I'm writing this in the Mongo shell syntax because I'm not too familiar with the Java syntax):
{ steps: { $elemMatch: { id: 0 } },
'steps.id': 1,
_id: 0
}

Cannot cast BasicDBObject to [B

I have the following gridfs in a mongodb database:
db.outputFs.files.find()
{ "_id" : ObjectId("000000000000000000000001"), "chunkSize" : 261120, "length" : 232, "md5" : "42290309186cc5420acff293b92ae21d", "filename" : "/tmp/outputFs-01.tmp", "contentType" : null, "uploadDate" : ISODate("2015-04-13T13:50:48.259Z"), "aliases" : null, "metadata" : { "estado" : "FICHERO_PDTE_ENVIO", "dataDate" : "20141122", "inputOutputFs" : "output", "gridFileCompression" : "bzip2", "fileType" : "OUTPUT-TEST1", "filePath" : "/tmp/outputFs-01.tmp", "sourceParticipant" : "0100", "destinationParticipant" : "REE", "exportFileName" : "F1_0100_20141122_20150219.0", "processed" : "false", "fileMD5" : "4276e61a4b63d3d1d1b77e27e792bd13", "version" : 0 } }
{ "_id" : ObjectId("000000000000000000000002"), "chunkSize" : 261120, "length" : 232, "md5" : "42290309186cc5420acff293b92ae21d", "filename" : "/tmp/outputFs-02.tmp", "contentType" : null, "uploadDate" : ISODate("2015-04-13T13:50:48.259Z"), "aliases" : null, "metadata" : { "estado" : "FICHERO_ENVIADO_OK", "fechaEnvio" : ISODate("2015-04-13T13:50:48.259Z"), "dataDate" : "20141123", "inputOutputFs" : "output", "gridFileCompression" : "bzip2", "fileType" : "OUTPUT-TEST2", "filePath" : "/tmp/outputFs-02.tmp", "sourceParticipant" : "0100", "destinationParticipant" : "REE", "exportFileName" : "F1_0100_20141123_20150220.0", "processed" : "false", "fileMD5" : "4276e61a4b63d3d1d1b77e27e792bd13", "version" : 0 } }
db.outputFs.chunks.find()
{ "_id" : ObjectId("000000000000000000000001"), "files_id" : ObjectId("000000000000000000000001"), "n" : 0, "data" : { "$type" : 0, "$binary" : "QlpoOTFBWSZTWZSBQ/YABX5cAAAYQAH/+CAAMAFWA0NqptIb9UgAZ6qowAAFKSaJpGhk8ssmVlk7AAAALZtZZOf0vr859OqflcIs461Dm1skcSOpGpHMuu5HcsJG0j5I9PiR4kaRvvjWskfsVMkZVLxI3uRy/pGTqRj7VmMyTOBfUtb561rwkf0j09+Zbkd+cs1I77861xI7pypvfOt1v5DmR1I51nW7XGdaluRnGZjMzJMzOZGpHnrfGM56+/fnGPVVVVVqqpVVWxCSxCTSAEMkZI3IyqXuqXyRuR/i7kinChISkCh+wA==" } }
{ "_id" : ObjectId("000000000000000000000002"), "files_id" : ObjectId("000000000000000000000002"), "n" : 0, "data" : { "$type" : 0, "$binary" : "QlpoOTFBWSZTWZSBQ/YABX5cAAAYQAH/+CAAMAFWA0NqptIb9UgAZ6qowAAFKSaJpGhk8ssmVlk7AAAALZtZZOf0vr859OqflcIs461Dm1skcSOpGpHMuu5HcsJG0j5I9PiR4kaRvvjWskfsVMkZVLxI3uRy/pGTqRj7VmMyTOBfUtb561rwkf0j09+Zbkd+cs1I77861xI7pypvfOt1v5DmR1I51nW7XGdaluRnGZjMzJMzOZGpHnrfGM56+/fnGPVVVVVqqpVVWxCSxCTSAEMkZI3IyqXuqXyRuR/i7kinChISkCh+wA==" } }
When I try to retrieve it with Spring Data or even MongoChef (both Java clients) as a file, I receive the following error:
com.mongodb.BasicDBObject cannot be cast to [B
The collections were manually imported as they are, not using MongoChef or mongofiles and I have no idea where this error can come from.
The GridFS specification expects a Binary field called data in the chunks collection. However, your outputFs.chunks does not meet this criteria.
The data field here is not a Binary BSON data type but a regular document that happens to have two fields called $type and $data.
mongoimport will create a Binary field only for JSON entries in the following format. Please note that the order of the fields matters for mongoimport.
{
"$binary" : ...
"$type" : ...
}
Your example has the $type and $binary fields swapped.
Update your JSON file and import your outputFs.chunks again. mongoimport will create valid Binary fields and you'll be able to work with GridFS using your other MongoDB tools.

How do I get a specific element of the array in mongoDB?

I want to get a specific element of the array and through the responsaveis.$ (daniela.morais#sofist.com.br) but there is no result, there is problem in my syntax?
{
"_id" : ObjectId("54fa059ce4b01b3e086c83e9"),
"agencia" : "Abc",
"instancia" : "dentsuaegis",
"cliente" : "Samsung",
"nomeCampanha" : "Serie A",
"ativa" : true,
"responsaveis" : [
"daniela.morais#sofist.com.br",
"abc#sofist.com.br"
],
"email" : "daniela.morais#sofist.com.br"
}
Syntax 1
mongoCollection.findAndModify("{'responsaveis.$' : #}", oldUser.get("email"))
.with("{$set : {'responsaveis.$' : # }}", newUser.get("email"))
.returnNew().as(BasicDBObject.class);
Syntax 2
db.getCollection('validatag_campanhas').find({"responsaveis.$" : "daniela.morais#sofist.com.br"})
Result
Fetched 0 record(s) in 1ms
The $ positional operator is only used in update(...) or project calls, you can't use it to return the position within an array.
The correct syntax would be :-
Syntax 1
mongoCollection.findAndModify("{'responsaveis' : #}", oldUser.get("email"))
.with("{$set : {'responsaveis.$' : # }}", newUser.get("email"))
.returnNew().as(BasicDBObject.class);
Syntax 2
db.getCollection('validatag_campanhas').find({"responsaveis" : "daniela.morais#sofist.com.br"})
If you just want to project the specific element, you can use the positional operator $ in projection as
{"responsaveis.$":1}
db.getCollection('validatag_campanhas').find({"responsaveis" : "daniela.morais#sofist.com.br"},{"responsaveis.$":1})
Try with this
db.validatag_campanhas.aggregate(
{ $unwind : "$responsaveis" },
{
$match : {
"responsaveis": "daniela.morais#sofist.com.br"
}
},
{ $project : { responsaveis: 1, _id:0 }}
);
That would give you all documents which meets that conditions
{
"result" : [
{
"responsaveis" : "daniela.morais#sofist.com.br"
}
],
"ok" : 1
}
If you want one document that has in its responsaveis array the element "daniela.morais#sofist.com.br" you can eliminate the project operator like
db.validatag_campanhas.aggregate(
{ $unwind : "$responsaveis" },
{
$match : {
"responsaveis": "daniela.morais#sofist.com.br"
}
}
);
And that will give you
{
"result" : [
{
"_id" : ObjectId("54fa059ce4b01b3e086c83e9"),
"agencia" : "Abc",
"instancia" : "dentsuaegis",
"cliente" : "Samsung",
"nomeCampanha" : "Serie A",
"ativa" : true,
"responsaveis" : "daniela.morais#sofist.com.br",
"email" : "daniela.morais#sofist.com.br"
}
],
"ok" : 1
}
Hope it helps

Mongo query inside Hashmap with unknown hash key

Platform: MongoDB, Spring, SpringDataMongoDB
I have a collection called "Encounter" with below structure
Encounter:
{ "_id" : "49a0515b-e020-4e0d-aa6c-6f96bb867288",
"_class" : "com.keype.hawk.health.emr.api.transaction.model.Encounter",
"encounterTypeId" : "c4f657f0-015d-4b02-a216-f3beba2c64be",
"visitId" : "8b4c48c6-d969-4926-8b8f-05d2f58491ae",
"status" : "ACTIVE",
"form" :
{
"_id" : "be3cddc5-4cec-4ce5-8592-72f1d7a0f093",
"formCode" : "CBC",
"fields" : {
"dc" : {
"label" : "DC",
"name" : "tc",
},
"tc" : {
"label" : "TC",
"name" : "tc",
},
"notes" : {
"label" : "Notes",
"name" : "notes",
}
},
"notes" : "Blood Test",
"dateCreated" : NumberLong("1376916746564"),
"dateModified" : NumberLong("1376916746564"),
"staffCreated" : 10013,
"staffModified" : 10013
},
}
The element "fields" is represented using a Java Hashmap as:
protected LinkedHashMap<String, Field> fields;
The Key to the hashmap () is not fixed, but generated at run time.
How do I query to get all documents in the collection where "label" = "TC"?
It's not possible to query like db.encounter.find({'form.fields.dc.label':'TC'}) because the element name 'dc' is NOT known. I want to skip that postion and the execute query, something like:
db.encounter.find({'form.fields.*.label':'TC'});
Any ideas?
Also, how do I best use indexes in this scenario?
If fields were an array and your key a part of the sub-document instead:
"fields" : [
{ "key" : "dc",
"label" : "DC",
"name" : "dc"
},
{ "key" : "tc",
"label" : "TC",
"name" : "tc"
}
]
In this case, you could simply query for any sub-element inside the array:
db.coll.find({"form.fields.label":"TC"})
Not sure how you would integrate that with Spring, but perhaps the idea helps? As far as indexes are concerned, you can index into the array, which gives you a multi-key index. Basically, the index will have a separate entry pointing to the document for each array value.

Categories

Resources