I have 300,000 documents in this specific collection.
Each document is considered as one taxi trip.
Each document contains a TaxiStation number and a License number.
My goal is to figure out the number of trips per TaxiLicense per TaxiStation.
For example:
TaxiStation A License X had 5 trips.
TaxiStation A License Y had 9 trips. And so on.
How can I optimize my query? It is takes an upwards time of 30 minutes to complete!
List /*of*/ taxistationOfCollection, taxiLicenseOfTaxistation;
//Here I get all the distinct TaxiStation numbers in the collection
taxistationOfCollection = coll.distinct("TaxiStation");
BasicDBObject query, tripquery;
int tripcount;
//Now I have to loop through each Taxi Station
for(int i = 0; i<taxistationOfCollection.size(); i++)
{
query = new BasicDBObject("TaxiStation", taxistationOfCollection.get(i));
//Here, I make a list of each distinct Taxi License in the current Taxi station
taxiLicenseOfTaxistation = coll.distinct("TaxiLicense", query);
//Now I make a loop to process each Taxi License within the current Taxi station
for(int k = 0; k<taxiLicenseOfTaxistation.size();k++)
{
tripcount=0;
if(taxiLicenseOfTaxistation.get(k) !=null)
{
//I'm looking for each Taxi Station with this Taxi License
tripquery= new BasicDBObject("TaxiStation", taxistationOfCollection.get(i)).append("TaxiLicense", taxiLicenseOfTaxistation.get(k));
DBCursor cursor = coll.find(tripquery);
try {
while(cursor.hasNext()) {
//Increasing my counter everytime I find a match
tripcount++;
cursor.next();
}
} finally {
//Finally printing the results
System.out.println("Station: " + taxistationOfCollection.get(i) + " License:" + taxiLicenseOfTaxistation.get(k)
+ " Trips: " + tripcount);
}
}
}
}
Sample Document :
{
"_id" : ObjectId("53df46ed9b2ed78fb7ca4f23"),
"Version" : "2",
"Display" : [],
"Generated" : "2014-08-04,16:40:05",
"GetOff" : "2014-08-04,16:40:05",
"GetOffCellInfo" : "46001,43027,11237298",
"Undisplay" : [],
"TaxiStation" : "0000",
"GetOn" : "2014-08-04,16:40:03",
"GetOnCellInfo" : "46001,43027,11237298",
"TaxiLicense" : "000000",
"TUID" : "26921876-3bd5-432e-a014-df0fb26c0e6c",
"IMSI" : "460018571356892",
"MCU" : "CM8001MA121225V1",
"System_ID" : "000",
"MeterGetOffTime" : "",
"MeterGetOnTime" : "",
"Setup" : [],
"MeterSID" : "",
"MeterWaitTime" : "",
"OS" : "4.2",
"PackageVersion" : "201407300888",
"PublishVersion" : "201312060943",
"SWVersion" : "rel_touchbox_20101010",
"MeterMile" : 0,
"MeterCharged" : 0,
"GetOnLongitude" : 0,
"GetOnLatitude" : 0,
"GetOffLongitude" : 0,
"TripLength" : 2,
"GetOffLatitude" : 0,
"Clicks" : 0,
"updateTime" : "2014-08-04 16:40:10"
}
Aggregation is probably what you are looking for. With an aggregation operation your whole code runs on the database and can be performed in a few lines. Performance should also be a lot better since the database handles everything that needs to be done an can take full advantage of indexes and other stuff.
From what you postet this boils down to a simple $group operation. In the shell this would look like:
db.taxistationOfCollection.aggregate([
{$group:
{ _id:
{station: "$TaxiStation",
licence: "$TaxiLicense"},
count : {$sum : 1}
}
])
This will give you documents of the form
{_id : {station: stationid, licence: licence_number}, count: number_of_documents}
For Java it would look like this:
DBObject taxigroup = new BasicDBObject("$group",
new BasicDBObject("_id",
new BasicDBObject("station","$TaxiStation")
.append("Licence","$TaxiLicense"))
.append("count", new BasicDBObject("$sum",1)));
AggregationOutput aggout = taxistationOfCollection.aggregate(
Arrays.asList(taxigroup));
Please note that the code snippets are not tested.
Related
I want to check if a specific value of a key exists in the Realtime Database or not and perform some action based on it.
I have the following data:
"Orders" : {
"03403426747" : {
"17" : {
"State" : "(4) Canceled",
"address" : "yubt",
"date" : "Feb 28, 2022",
"discount" : "0",
"name" : "uk",
"phone" : "0311111111",
"time" : "15:33:58 PM",
"totalAmount" : "3778"
},
"18" : {
"State" : "(1) Approved",
"address" : "yubt",
"date" : "Feb 28, 2022",
"discount" : "120",
"name" : "uk",
"phone" : "03111111111",
"time" : "16:01:58 PM",
"totalAmount" : "7703"
}
}
}
I want to check If any order from these has has "State" value other than "(3) Completed" & "(4) Canceled".
if anyone's order has a value other than these, I want to remove that user from the list which contains users with pending orders.
and if at any time that user has new order or older order State changed I want to again add that user to the list.
I want to check If any order from these has has "State" value other than "(3) Completed" & "(4) Canceled".
There is no way you can query the Realtime Database using a negation. What you can do instead is to create a separate query for each "other" state and join the results on the client.
However, if you consider at some point in time to try using Cloud Firestore, then you should consider using not equal (!=) query or if it fits your needs the not-in query.
After a long time, my brain light burned and I came up with a solution.
Create a data class for OrderState
public class OrderState {
public static int ordersCount = 0;
public static boolean state = false;
public static void update(boolean state){
if (state){
ordersCount = ordersCount + 1;
OrderState.state = true;
}else
if (!state && ordersCount > 0){
ordersCount = ordersCount - 1;
if (ordersCount < 1) OrderState.state = false;
}
}
public static void reset(){
ordersCount = 0;
state = false;
}
}
On FirebaseRecyclerAdapter -> onBindViewHolder
//if order not canceled or completed. it will update OrderState,
//ordersCount + 1 and state to true
if (adminOrders.getState().equals(Prevalent.orderStateNew) ||
adminOrders.getState().equals(Prevalent.orderStateApproved) ||
adminOrders.getState().equals(Prevalent.orderStateShipped) ||
adminOrders.getState().equals(Prevalent.orderStateApproved)){
OrderState.update(true);
}
changeUserWithOrderState();
On changing state of order by admin
//if the order is not already cancelled or completed, reduce one order from OrderState as it will be readded automatically upon Recycler refresh.
if (!adminOrders.getState().equals(Prevalent.orderStateCanceled) &&
!adminOrders.getState().equals(Prevalent.orderStateCompleted)) OrderState.update(false);
Al last if the user does not has any order with states New, Approved, and Shipped
OrderState.orderCount = 0;
OrderState.state = false;
and upon updating the database it will set the state to false.
private void changeUserWithOrderState() {
DatabaseReference userWithOrder = FirebaseDatabase.getInstance().getReference()
.child(Prevalent.usersWithOrders)
.child(userPhoneKey);
HashMap<String, Object> map = new HashMap<>();
map.put(Prevalent.orderState, String.valueOf(OrderState.state));
userWithOrder.updateChildren(map).addOnCompleteListener(task -> {
//Changed state based upon OrderState.state value...
});
}
data in mongo :
enter image description here
db.test2.aggregate([
{
"$project" : {
"contents" : 1,
"comments" : {
"$filter" : {
"input" : "$comments",
"as" : "item",
"cond" : {"$gt" : ['$$item.score', 2]}
},
},
"comments2" : {
"$filter" : {
"input" : "$comments2",
"as" : "item",
"cond" : {"$gt" : ["$$item.score", 5]}
}
}
}
},
{
"$project" : {
"content" : 1,
"commentsTotal" : {
"$reduce" : {
"input" : "$comments",
"initialValue" : 0,
"in" : {"$add" : ["$$value", "$$this.score"]}
}
},
"comments2Total" : {
"$reduce" : {
"input" : "$comments2",
"initialValue" : 0,
"in" : {"$add" : ["$$value", "$$this.score"]}
}
}
}
},
{$skip : 0},
{$limit: 3}
]);
<!-- language: lang-json-->
So you can see, this does the following :
1、filter the comments and comments2 which score is gt 5.
2、count total of the socre in comment array.
and i write the aggregation query in Spring like this:
AggregationExpression reduce = ArithmeticOperators.Add.valueOf("$$value").add("$$this.socre");
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.project().andExclude("_id")
.andInclude("content")
.and("comments").filter("item", ComparisonOperators.Gt.valueOf("item.score").greaterThanValue(3)).as("comments")
.and("comments2").filter("item", ComparisonOperators.Gt.valueOf("item.score").greaterThanValue(3)).as("comments2"),
Aggregation.project("comments", "comments2")
.and(ArrayOperators.Reduce.arrayOf("comments").withInitialValue("0").reduce(reduce)).as("commentsTotal")
);
when i run like up , it will throws exception :
java.lang.IllegalArgumentException: Invalid reference '$$value'!
You can try below aggregation by wrapping $filter inside the $reduce operation.
Something like below
AggregationExpression reduce1 = new AggregationExpression() {
#Override
public DBObject toDbObject(AggregationOperationContext aggregationOperationContext) {
DBObject filter = new BasicDBObject("$filter", new BasicDBObject("input", "$comments").append("as", "item").append("cond",
new BasicDBObject("$gt", Arrays.<Object>asList("$$item.score", 2))));
DBObject reduce = new BasicDBObject("input", filter).append("initialValue", 0).append("in", new BasicDBObject("$add", Arrays.asList("$$value", "$$this.socre")));
return new BasicDBObject("$reduce", reduce);
}
};
Aggregation aggregation = newAggregation(
Aggregation.project().andExclude("_id")
.andInclude("content")
.and(reduce1).as("commentsTotal")
);
This is an old question, but in case some one winds up here like me, here's how I was able to solve it.
You cannot access "$$this" and "$$value" variables directly like this in spring.
AggregationExpression reduce = ArithmeticOperators.Add.valueOf("$$value").add("$$this.socre");
To do this we have to use reduce variable enum, like this:
AggregationExpression reduce = ArithmeticOperators.Add.valueOf(ArrayOperators.Reduce.Variable.VALUE.getTarget()).add(ArrayOperators.Reduce.Variable.THIS.referringTo("score").getTarget());
Hope this helps!
I had to solve next task and hadn't find any solutions. So i hope my answer will help somebody.
User with roles (user have list of rights + list of roles, each role have own list of rights, needed to find full list of rights):
user structure
role structure
First, i lookup roles to roleDto (for example), then i collect rights from roles to 1 list:
ArrayOperators.Reduce reduce = ArrayOperators.Reduce.arrayOf("$roleDto.rights")
.withInitialValue(new ArrayList<>())
.reduce(ArrayOperators.ConcatArrays.arrayOf("$$value").concat("$$this"));
As result in reduce i have this 1 list of rights collected from roles.
After that i make:
SetOperators.SetUnion.arrayAsSet(reduce).union("$rights")
using previous result. Result type is AggregationExpression because AbstractAggregationExpression implements AggregationExpression.
So, finally i get smth like this (sorry for messy code):
private static AggregationExpression getAllRightsForUser() {
// concat rights from list of roles (each role have list of rights) - list of list to list
ArrayOperators.Reduce reduce = ArrayOperators.Reduce.arrayOf("$roleDto.rights")
.withInitialValue(new ArrayList<>())
.reduce(ArrayOperators.ConcatArrays.arrayOf("$$value").concat("$$this"));
// union result with user.rights
return SetOperators.SetUnion.arrayAsSet(reduce).union("$rights");
}
Result of this operation can be finally used somewhere like here ;) :
public static AggregationOperation addFieldOperation(AggregationExpression aggregationExpression, String fieldName) {
return aoc -> new Document("$addFields", new Document(fieldName, aggregationExpression.toDocument(aoc)));
}
I had the same issue, one of the solutions is to create a custom Reduce function, here's Union example:
public class SetUnionReduceExpression implements AggregationExpression {
#Override
public Document toDocument(AggregationOperationContext context) {
return new Document("$setUnion", ImmutableList.of("$$value", "$$this"));
}
}
I have a collection in mongodb - "text_failed" which has all the numbers on which I failed to send an SMS, the time they failed and some other information.
A document in this collection looks like this:
{
_id(ObjectId): xxxxxx2af8....
failTime(String): 2015-05-15 01:15:48
telNum(String): 95634xxxxx
//some other information
}
I need to fetch the top 500 numbers which failed the most in a month's duration. A number can occur any number of time during this month.(Eg: a number failed 143 times, other 46 etc.)
The problem I have is that during this duration the numbers failed crossed 7M. It's difficult to process this much information using the following code which doesn't use aggregation:
DBCollection collection = mongoDB.getCollection("text_failed");
BasicDBObject query = new BasicDBObject();
query.put("failTime", new BasicDBObject("$gt", "2015-05-15 00:00:00").append("$lt", "2015-06-15 00:00:00"));
BasicDBObject field = new BasicDBObject();
field.put("telNum", 1);
DBCursor cursor = collection.find(query, field);
HashMap<String, Integer> hm = new HashMap<String, Integer>();
//int count = 1;
System.out.println(cursor);
while(cursor.hasNext()) {
//System.out.println(count);
//count++;
DBObject object = cursor.next();
if(hm.containsKey(object.get("telNum").toString())) {
hm.put(object.get("telNum").toString(), hm.get(object.get("telNum").toString()) + 1);
}
else {
hm.put(object.get("telNum").toString(), 1);
}
}
This fetches 7M+ documents for me. I need only the top 500 numbers. The result should look something like this:
{
telNum: xxxxx54654 //the number which failed
count: 129 //number of times it failed
}
I used aggregation myself but didn't get the desired results. Can this be accomplished by aggregation? Or is there any other way more efficient in which I can do this?
You could try the following aggregation pipeline:
db.getCollection("text_failed").aggregate([
{
"$match": {
"failTime": { "$gt": "2015-05-01 00:00:00", "$lt": "2015-06-01 00:00:00" }
}
},
{
"$group": {
"_id": "$telNum",
"count": { "$sum": 1 }
}
},
{
"$sort": { "count": -1 }
},
{
"$limit": 500
}
])
I have 3 different collections, with different content in my script:
image, audio and video.
In each element I put in the database, I add a tag.
When I am trying to search for the tags (of the files I add each collection) I can only find the tags for the image collection:
-------------------------------CODE---------------------------------------------------
protected void search(String term) {
tagCounter = 0;
DBCollection image = db.getCollection("p");
DBCollection audio = db.getCollection("a");
DBCollection video = db.getCollection("video");
String search = searchField.getText();
search.trim().toLowerCase();
BasicDBObject tagQuery= new BasicDBObject();
tagQuery.put("tags", search);
DBCursor cursor = collection.find(tagQuery);
tagQuery.put("tags", search);
cursor = image.find(tagQuery);
while(cursor.hasNext()) {
results.addElement( cursor.next().toString());
tagCounter++;
searchField.setText(null);
}
cursor = audio.find(tagQuery);
while(cursor.hasNext()) {
results.addElement(cursor.next());
tagCounter++;
searchField.setText(null);
}
cursor = video.find(tagQuery);
while(cursor.hasNext()) {
results.addElement( cursor.next().toString()) ;
tagCounter++;
searchField.setText(null);
}
JOptionPane counter = new JOptionPane();
counter.showMessageDialog(resultList, "Search gave " + tagCounter + " files");
}
Can anyone help a newbie out? :)
The code works perfectly for me, except for the fact that you have a lot of references to things that are not declared/defined and also you are missing .toString() in the audio collection.
In a nutshell, the data is fetched the same way from all the collections, what you need to make sure that you do in your code is check what searchField.setText(null); line does - since you are getting things fine for the first collection but not the next two it tells me you are likely clearing something that's needed by the code.
Best thing to do is to use lots of "debugging" statements throughout, not just at the end. Here is my simplified version of your code (I put one matching document in each collection):
int tagCounter = 0;
DBCollection image = db.getCollection("p");
DBCollection audio = db.getCollection("a");
DBCollection video = db.getCollection("video");
String search = "tag1";
search.trim().toLowerCase();
BasicDBObject tagQuery= new BasicDBObject();
tagQuery.put("tags", search);
DBCursor cursor = null;
cursor = image.find(tagQuery);
while(cursor.hasNext()) {
System.out.println( cursor.next().toString());
tagCounter++;
}
System.out.println(tagCounter + " matches found in image");
cursor = audio.find(tagQuery);
tagCounter = 0;
while(cursor.hasNext()) {
System.out.println( cursor.next().toString());
tagCounter++;
}
System.out.println(tagCounter + " matches found in audio");
cursor = video.find(tagQuery);
tagCounter = 0;
while(cursor.hasNext()) {
System.out.println( cursor.next().toString());
tagCounter++;
}
System.out.println(tagCounter + " matches found in video");
And my output is:
{ "_id" : { "$oid" : "5186a59151058e0786e90eee"} , "tags" : [ "tag1" , "tag2"]}
1 matches found in image
{ "_id" : { "$oid" : "5186a59851058e0786e90eef"} , "tags" : [ "tag1" , "tag2"]}
1 matches found in audio
{ "_id" : { "$oid" : "5186a5a851058e0786e90ef0"} , "tags" : [ "tag1" , "tag2"]}
1 matches found in video
I have the following MongoDB collection "Games":
{
"_id" : ObjectId("515461d3c6c18efd4a811fd3"),
"gameid" : NumberLong("86982207656"),
"tableName" : "Hydra Zoom 40-100 bb",
"nplayers" : 6,
"playersList" : [
{ "exist" : true,
"suspended" : false,
"grade" : 0,
"clusterId" : -1,
"playerid" : "DoomY9999",
"playsWithFriends" : 0,
"squeezePlay" : 0,
"weakShowdown" : 0,
"numberOfPlays" : 1
},
{
"exist": true,
"suspended" : false,
I would like to map the following MySQL query into MongoDB
String query = "SELECT idplayer, COUNT(idplayer) AS countplayer "
+ "FROM (SELECT b.idgame, b.idplayer "
+ "FROM associations a, associations b "
+ "WHERE a.idplayer=? "
+ "AND b.idgame=a.idgame "
+ "AND b.idplayer <> ? "
+ "ORDER BY b.idgame DESC LIMIT 1000) as c"
+ " GROUP BY idplayer "
+ "ORDER BY countplayer DESC LIMIT 5;";
Description of the Query: This SQL Query counts the most frequent players that appear to play the same game of Player 'X'. The result will be the name of the players and the number of times the play together
A short note for LIMIT: the first "LIMIT 1000" will be actually limit of games we want to examine, since the database could be really large we only analyse the last 1000-games in DESC order (most recent have higher "gameid").
The Second limit 5: is for the 'top 5' friends. We will sum their numbers.
So far I have accomplished: almost everything with the Aggregation Framework, made exception for the "ORDER BY b.idgame DESC LIMIT 1000) as c". This is important for me because the number of games it goes through might be really high.
Here is my query in MongoDB (Java driver):
//build the query
DBObject match1 = new BasicDBObject("$match", new BasicDBObject("playersList.playerid",_playerid));
DBObject unwind = new BasicDBObject("$unwind", "$playersList");
DBObject match2 = new BasicDBObject("$match", new BasicDBObject("playersList.playerid",new BasicDBObject("$ne",_playerid)));
DBObject groupFields = new BasicDBObject("_id","$playersList.playerid");
groupFields.put("times", new BasicDBObject("$sum",1));
DBObject group = new BasicDBObject("$group", groupFields);
DBObject sort = new BasicDBObject("$sort", new BasicDBObject("times",-1) );
DBObject limit = new BasicDBObject("$limit", 5 );
DBObject group2 = new BasicDBObject("$group", "gameid");
DBObject sort2 = new BasicDBObject("$sort", new BasicDBObject("gameid",-1) );
DBObject limit2 = new BasicDBObject("$limit", 1000 );
DB db = mongoDb;
DBCollection coll = db.getCollection("games");
//aggregation query
//THIS WORKS
AggregationOutput output = coll.aggregate( match1, unwind, match2, group, sort, limit);
//THIS DOESN'T WORK!
AggregationOutput output = coll.aggregate( match1, unwind, match2, group, sort, limit, group2, sort2, limit2);
Please help me to fix this query. Thanks!
The field game wasn't in the result after the first group operation, so the second group operation which is based on the field game wouldn't work.
For a more efficient query, you should reorder your aggregation operations to reduce the data as early as possible. I moved the matching of the games before unwinding the playersList and it's not necessary to have the second group.
The aggregation operation is like this in the mongo shell:
// playerId to search for coplayers
var playerId = "DoomY9999"
db.game.aggregate([
// First $match can take advantage of suitable index if available
// Find all games that playerid X has played
{ $match : { "playersList.playerid" : playerId } },
// Sort by most recent games (gameid descending)
{ $sort : { "_id.gameid" : -1 } },
// Limit number of games to examine
{ $limit : 1000 },
// Create a stream of documents from the playersList array
{ $unwind : "$playersList" },
// Match players except for playerid X
{ $match : { "playersList.playerid" : {$ne : playerId }} },
// Count number of games each player has played
{ $group : {
_id : "$playersList.playerid",
count : { $sum : 1 }
}},
// Sort by most frequent players (count descending)
{ $sort : { "count" : -1 } },
// Limit results to 5 players
{ $limit : 5 },
// Rename the result fields
{ $project : {
_id : 0,
coplayer : "$_id",
count : 1
}}
])