Issue when trying to query a MongoDB data - java

I have a document in a collection that has the following attributes:
nodeid : long
type: string
bagid: long
So, nodes can be on a bag, and be of different types.
I need to find,
all nodes of type A, or nodes of type B in a given list of nodes, or, nodes of type C in a given bag.
How can I design that query in MongoDB? I had all IN clauses but it is the works way to go performance wise. Could you please point me into the right direction? I could not find an aggregation or reduce that would help me make this simpler.
I tried also doing a text search, using the three elements, but the or in the text search, for instance "type: A \"type: B node:X\" \"type: B node: Y\" and so on, does not work.
Thanks
Edit, adding samples:
{ "_id" : BinData(3,"NJUuYHEBAAAdCda3V+kXvg=="),
"type" : "question", "bagid" : NumberLong(1067),
"topics" : [ NumberLong(33), NumberLong(67), NumberLong(203), NumberLong(217) ],
"nodeid" : NumberLong(15855),
"creationDate" : ISODate("2020-04-09T18:23:17.812Z"),
"_class" : "com.test.NodeEvent" }
{ "_id" : BinData(3,"NJUuYHEBAAAdCda3V+kXvg=="),
"type" : "comment", "bagid" : NumberLong(1067),
"topics" : [ NumberLong(33), NumberLong(67), NumberLong(203), NumberLong(217) ],
"nodeid" : NumberLong(15857),
"creationDate" : ISODate("2020-04-09T18:23:17.812Z"),
"_class" : "com.test.NodeEvent" }
{ "_id" : BinData(3,"NJUuYHEBAAAdCda3V+kXvg=="),
"type" : "question", "bagid" : NumberLong(1069),
"topics" : [ NumberLong(33), NumberLong(67) ],
"nodeid" : NumberLong(15859), "creationDate" : ISODate("2020-04-09T18:23:17.812Z"),
"_class" : "com.test.NodeEvent" }

You can build your queries using the MongoDB Query Language. The query is written using the db.collection.find method. It uses various query operators like, $or, $in, and $and.
I need to find, all nodes of type A, or nodes of type B in a given
list of nodes, or, nodes of type C in a given bag. How can I design
that query in mongo?
db.collection.find( { $or: [
{ type: "A" },
{ type: "B", node: { $in: [ "A", "C" ] },
{ type: "C", bag: 130 }
]
} );
In the above query the condition { type: "C", bag: 130 } is equivalent to { $and: [ { type: "C" }, { bag: 130 } ] } }. This is also the case with the condition { type: "B", node: { $in: [ "A", "C" ] }. But, using the $and is optional, in this case (see the documentation for details).
The query would be something like, given bag 1067 give me all nodes
with topics: 33 or 203
db.collection.find( { bag: 1067, topics: { $in: [ 33, 203 ] } } )
The output depends upon the data in the collection. The find method returns a cursor, and you can apply various cursor methods on the retrned documents (for example, you can sort them by a field).

Related

Vespa.ai: How to make an array of floats handle null values?

I am trying to make a simple Vespa application, where one of my data fields are an Array. However the array contains some null values. For instance an array like: [2.0,1.4,null,5.6,...].
What can I use instead of float to represent elements in the array?
Seems like you want to use a sparse tensor field instead since some addresses does not have a value. x{} denotes a sparse tensor, x[128] is an example of a dense tensor. See https://docs.vespa.ai/documentation/tensor-user-guide.html for an intro to vespa tensor fields.
field stuff type tensor<float>(x{}) {
indexing: summary |attribute
}
[
{ "put": "id:example:example::0", "fields": {
"stuff" : {
"cells": [
{ "address" : { "x" : "0" }, "value": 2.0 },
{ "address" : { "x" : "1" }, "value": 1.4 },
{ "address" : { "x" : "3" }, "value": 5.6 },
]
}
}
}
]

Data extraction from the NVD data feed and cpe_match meaning

My IT product has CPE defined, for example:
cpe:/o:microsoft:windows_vista:6.0:sp1:~-~home_premium~-~x64~-
I am using NVD Data Feed to get all publicly known vulnerabilities.
CVEs are given in .json file and under each CVE item there is a configurations node.
If I want to check if my CPE exists in the current CVE item I guess I have to check configurations node, but I am not sure what is the purpose of the "operator" : "OR", "vulnerable" : false.
Can I just compare my CPE with cpe23Uri or I have to somehow consider operators and vulnerable nodes as well?
"configurations" : {
"CVE_data_version" : "4.0",
"nodes" : [ {
"operator" : "AND",
"children" : [ {
"operator" : "OR",
"cpe_match" : [ {
"vulnerable" : true,
"cpe23Uri" : "cpe:2.3:a:adobe:flash_player:*:*:*:*:*:*:*:*",
"versionStartIncluding" : "10.3",
"versionEndExcluding" : "10.3.183.19"
}, {
"vulnerable" : true,
"cpe23Uri" : "cpe:2.3:a:adobe:flash_player:*:*:*:*:*:*:*:*",
"versionStartIncluding" : "11.2",
"versionEndIncluding" : "11.2.202.233"
} ]
}, {
"operator" : "OR",
"cpe_match" : [ {
"vulnerable" : false,
"cpe23Uri" : "cpe:2.3:o:apple:mac_os_x:-:*:*:*:*:*:*:*"
}, {
"vulnerable" : false,
"cpe23Uri" : "cpe:2.3:o:linux:linux_kernel:-:*:*:*:*:*:*:*"
}, {
"vulnerable" : false,
"cpe23Uri" : "cpe:2.3:o:microsoft:windows:-:*:*:*:*:*:*:*"
} ]
} ]
}]
It depends what information you're trying to determine. Notice that the "OR" operator in the lower half of the node only applies between those three items, which together are "AND"ed with the top half. The skeletal structure of the node in question is:
"operator" : "AND",
"children" : [ {
"operator" : "OR",
"cpe_match" : [ {
...
} ]
}, {
"operator" : "OR",
"cpe_match" : [ {
...
} ]
} ]
(I have reindented because the "children" node is logically within "AND" even though they are structurally on the same level.)
In other words, two cpe23Uris need to be matched to meet the condition described by this node: any one from the top half AND any one from the bottom half. Your Windows example would appear to match only the latter, and it would not be matching the vulnerable component of your system. To determine whether your system is vulnerable you would need to look for a component that matches a vulnerable item as well.

Effective use of Array field in Mongo Collection

We are using Mongo DB in our application and in our collection we are storing array as field. eg:
{
"_id" : ObjectId("54ef67573848ec32b156b053"),
"articleId" : "46384262",
"host" : "example.com",
"url" : "http://example.com/articleshow/46384262.cms",
"publishTime" : NumberLong("1424954100000"),
"tags" : [
"wind power",
"mytrah",
"make in india",
"government",
"andhra pradesh"
],
"catIds" : [
"2147477890",
"13352306",
"13358350",
"13358361"
]
}
Now my situation is need to create index on tags and catIds array as they are search field.
But creating an index on array field increases the size of indexes tremendously.
Could you please suggest a better way of achiving this.
You can restructure your collection in this way. Now you will have 3 collections:
Coll1, documents look like this:
{
"_id" : ObjectId("54ef67573848ec32b156b053"),
"articleId" : "46384262",
... your other stuff
}
Tags, documents look like this:
{
'_id': 1,
'name': 'wind power'
}
{
'_id': 2,
'name': 'mytrash'
}
....
and a collection that links coll1 to tags:
{
"collID" : ObjectId("54ef67573848ec32b156b053"),
"tagID": 1
}
{
"collID" : ObjectId("54ef67573848ec32b156b053"),
"tagID": 2
}
Mongo does not have joins, so you need to do joins on the application layer. And it will take you 3 mongo queries. The size of indexes should be smaller, but test it before making significant changes.

MongoDB search in embedded array

I am trying to store all my written java files in a MongoDB and so far I've applied a schema like this (incomplete entry):
{
"_id" : ObjectId("52b861c230044fd08d6c27c4"),
"interfaces" : [
{
"methodInterfaces" : [
{
"name" : "add",
"name_lc" : "add",
"returnType" : "Integer",
"returnType_lc" : "integer",
"parameterTypes" : [
"Integer",
"Integer"
],
"parameterTypes_lc" : [
"integer",
"integer"
]
},
{
"name" : "isValid",
"name_lc" : "isvalid",
"returnType" : "Boolean",
"returnType_lc" : "boolean",
"parameterTypes" : [
"Integer",
"Double"
],
"parameterTypes_lc" : [
"integer",
"double"
]
}
],
"name" : "Calculator",
"name_lc" : "calculator",
"filename" : "Calculator.java",
"filename_lc" : "calculator.java"
}
],
"name" : "Calculator",
"name_lc" : "calculator",
"filename" : "Calculator",
"filename_lc" : "calculator",
"path" : "/xyz/Calculator.java",
"md5" : "6dec7e62c5e4f9060c7612c252cd741",
"lastModification" : ""
}
So far I am able to query a class that contains a method name, but I am not able to query a class with a certain name (let interfaces.name_lc="calculator") that must contain two methods with particular names (let's say "add" and "divide") which themselves should have two integer, resp. an integer and a double as parameters and both return an integer (don't question whether this is reasonable or not -- just for illustration purposes).
This is just one example; it can be more complex, of course.
I don't know how I can query for a particular class with method and specified parameters. I need to describe it sharp and want sharp results.
I am not able to construct a query, that would only return files like Calculator ( add(integer,integer):integer; divide(integer,double):integer; ). I get, e.g., OtherClass ( add():void; method(integer):integer; ), which is not what I want. I am trying this for days now, and maybe one can enlighten me, how to solve this in MongoDB. Thanks a lot in advance!
I'm not sure you'll be able to do this in MongoDB with your document structure. The issue I ran into is around the parameters - I'm assuming you care about the order of the parameters (i.e. doSomething(String, int) is not the same as doSomething(int, String)), and the query operators to check all the values in an array treat the array as a set, so a) order agnostic and b) eliminates duplicates (so doSomething(String, String) matches doSomething(String)) (this is because I was using the $all keyword, see the documentation, especially the note at the bottom).
I managed to get a large part of the query you wanted, however, which might point you in the right direction.
{ "$and" : [ //putting these in an "and" query means all parts have to match
{ "interfaces.methodInterfaces" :
{ "$elemMatch" : { "name" : "add"}} //this bit finds documents where the method name is "add"
} ,
{ "interfaces.methodInterfaces" :
{ "$elemMatch" : { "returnType" : "Integer"}} // This bit matches the return type
} ,
{ "interfaces.methodInterfaces.parameterTypes" :
{ "$all" : [ "Integer" , "Integer"]} //This *should* find you all documents where the parameter types matches this array. But it doesn't, as it treats it as a set
}
]}
If you're querying via the Java driver, this looks like:
BasicDBObject findMethodByName = new BasicDBObject("interfaces.methodInterfaces",
new BasicDBObject("$elemMatch", new BasicDBObject("name", "add")));
BasicDBObject findMethodByReturn = new BasicDBObject("interfaces.methodInterfaces",
new BasicDBObject("$elemMatch", new BasicDBObject("returnType", "Integer")));
BasicDBObject findMethodByParams = new BasicDBObject("interfaces.methodInterfaces.parameterTypes",
new BasicDBObject("$all", asList("Integer", "Integer")));
BasicDBObject query = new BasicDBObject("$and", asList(findMethodByName, findMethodByReturn, findMethodByParams));
DBCursor found = collection.find(query);
I haven't included matching the class name, as this didn't seem to be the tricky part - just build up another simple query for that an add it into the "$and".
Since the arrays to store parameter types are not giving what you want, I suggest you think about something a little more structured, although it's a bit unwieldy. Instead of
"parameterTypes" : [
"Integer",
"Integer"
]
Consider something like
"parameterTypes" : {
"param1" : "Integer",
"param2" : "Integer"
}
Then you won't be doing set operations, you can query each parameter individually. This means you'll also get them in the correct order.

Morphia Query on Array of Subdocuments using elem

I have the following document structure in mongodb collection "Contact". There is an array of subdocuments called "numbers":
{
"name" : "Bill",
"numbers" : [
{
"type" : "home",
"number" : "01234",
},
{
"type" : "business",
"number" : "99099"
},
{
"type" : "fax",
"number" : "77777"
}
]
}
When I want to query only for "home" and "business" numbers, I can do something like this in mongodb-shell:
db.Contact.find({ numbers: { $elemMatch: {
type : { $in : ["home", "business"]},
number: { $regex : "^012" }
}}});
But how to do this in morphia? Is there any way?
I understand "$elemMatch" is supported in morphia. So I could do something like:
query.filter("numbers elem", ???);
But how exactly do I add a combined query for the subdocument?
It is too late, but maybe others can find it handy.
I found that solution https://groups.google.com/forum/#!topic/morphia/FlEjBoSqkhg
query.filter("numbers elem",
BasicDBObjectBuilder.start()
.push("type").add("$in", new String[]{"home", "business"}).pop()
.push("number").add("$regex", "^012").pop().get());
Instead of using morphia, consider using jongo. It lets you query MongoDB as you were using MongoDB shell. Furthermore, it will give you more freedom when mapping your array elements. Here is how your example will look with jongo:
contacts_collection.find("{numbers : {$elemMatch: {
type: {$in :#},
number: {$regex: #}
}
}
}",
new String[]{"home", "business"}, "^012")
.as(Contact.class);
Note that, if you only need a single number object (or multiple) from the array you can use a custom result mapper/handler. You just have to substitute .as(Contact.class) with :
.map(new ResultHandler<Number>() {...})
For a full example take a look at my blog post or at my GitHub repository

Categories

Resources