How to use $toLower and $trim aggregation operators using MongoDB Java? - java

I have a collection called Users, which contains an array called skills.
I use the following code to unwind the skills array and count the number of documents associated with each skill:
Bson uw = unwind("$skills");
Bson sbc = sortByCount("$skills");
Bson limit = limit(10);
coll.aggregate(Arrays.asList(uw, sbc,limit)).forEach(printDocuments());
Now I want to make use of $trim and $toLower operations for the above aggregation because in the database, some skills saved in different ways (e.g., "CSS", "CSS ", and "css").
I'm able to do this in the mongo shell with the following aggregation:
db.users.aggregate([{$unwind:"$skills"} , {$sortByCount:{$toLower:{$trim:{input:"$skills"}}}}])
But I'm having troubles with implementing it in Java.
Do you have any idea?

I managed to find a way to do this with changing the sortByCount to the following:
Bson sbc = sortByCount(eq("$toLower", eq("$trim", eq("input", "$skills"))));

Related

How to sum a mongodb inner field and push it during grouping using MongoTemplate

I can use the sum within the push operation in a MongoDB console. However, I am not getting how can I do the same using MongoTemplate?
$group : {
_id: "$some_id",
my_field: { $push : {$sum: "$my_field" }}
}
The code I am using for this is something like:
Aggregation aggregation =
Aggregation.newAggregation(
match(matchingCriteria),
group("some_id")
.count()
.as("count")
.push("my_field")
.as("my_field")
project("some_id", "count", "my_field"));
AggregationResults<MyModel> result =
mongoTemplate.aggregate(aggregation, "my_collection", MyModel.class);
The thing is I want the sum of my_field but it is coming as an array of my_field here(as I am directly using the push). I am able to achieve the same using the above sum inside of push operation. But not able to use that for MongoTemplate. My app is in Spring Boot. I have also looked into the docs for these methods but couldn't find much.
Also, I tried directly using .sum() as well on the field(without using the push), but that is not working for me as my_field is an inner object, and it's not a number but an array of numbers after the grouping. That is why I need to use the push and sum combination.
Any help regarding this is appreciated. Thanks in advance.
I was able to get this to work using the below code:
Aggregation aggregation =
Aggregation.newAggregation(
match(allTestMatchingCriteria),
project("some_id")
.and(AccumulatorOperators.Sum.sumOf("my_field"))
.as("my_field_sum")
group("some_id")
.count()
.as("count")
.push("my_field_sum")
.as("my_field_sum"),
project("some_id", "count", "my_field_sum"));
AggregationResults<MyModel> result =
mongoTemplate.aggregate(aggregation, "my_collection", MyModel.class);
I used AccumulatorOperators.Sum in the projection stage itself and sum the inner fields and get the desired output. Then I passed this to the grouping stage where I did the count aggregation as I needed that data as well and then had to project all the data generated to be collected as output.

Apache Beam Group by Aggregate Fields

I have a PCollection reading data from AvroIO. I want to apply aggregation such that after grouping by a specific key, I want to count unique counts of some fields within that group.
With normal Pig or SQL this is just applying groupby and doing a distinct count, but unable to properly understand how to do it in Beam.
So far I have been able to write this:
Schema schema = new Schema.Parser().parse(new File(options.getInputSchema()));
Pipeline pipeline = Pipeline.create(options);
PCollection<GenericRecord> inputData= pipeline.apply(AvroIO.readGenericRecords(schema).from(options.getInput()));
PCollection<Row> filteredData = inputData.apply(Select.fieldNames("user_id", "field1", "field2"));
PCollection<Row> groupedData = filteredData.apply(Group.byFieldNames("user_id")
.aggregateField("field1",Count.perElement(),"out_field1")
.aggregateField("field2",Count.perElement(),"out_field2"));
But this does not accept the arguments in aggregateField method.
Can someone help in providing the correct way to do this.
Thanks!
You can replace Count.perElement() with the CountCombineFn() fn which is a subclass of CombineFn class as seen here
filteredData.apply(Group.byFieldNames("user_id")
.aggregateField("field1", CountCombineFn(), "out_field1")
.aggregateField("field2", CountCombineFn(), "out_field2"));

Java method for MongoDB collection.save()

I'm having a problem with MongoDB using Java when I try adding documents with customized _id field. And when I insert new document to that collection, I want to ignore the document if it's _id has already existed.
In Mongo shell, collection.save() can be used in this case but I cannot find the equivalent method to work with MongoDB java driver.
Just to add an example:
I have a collection of documents containing websites' information
with the URLs as _id field (which is unique)
I want to add some more documents. In those new documents, some might be existing in the current collection. So I want to keep adding all the new documents except for the duplicate ones.
This can be achieve by collection.save() in Mongo Shell but using MongoDB Java Driver, I can't find the equivalent method.
Hopefully someone can share the solution. Thanks in advance!
In the MongoDB Java driver, you could try using the BulkWriteOperation object with the initializeOrderedBulkOperation() method of the DBCollection object (the one that contains your collection). This is used as follows:
MongoClient mongo = new MongoClient("localhost", port_number);
DB db = mongo.getDB("db_name");
ArrayList<DBObject> objectList; // Fill this list with your objects to insert
BulkWriteOperation operation = col.initializeOrderedBulkOperation();
for (int i = 0; i < objectList.size(); i++) {
operation.insert(objectList.get(i));
}
BulkWriteResult result = operation.execute();
With this method, your documents will be inserted one at a time with error handling on each insert, so documents that have a duplicated id will throw an error as usual, but the operation will still continue with the rest of the documents. In the end, you can use the getInsertedCount() method of the BulkWriteResult object to know how many documents were really inserted.
This can prove to be a bit ineffective if lots of data is inserted this way, though. This is just sample code (that was found on journaldev.com and edited to fit your situation.). You may need to edit it so it fits your current configuration. It is also untested.
I guess save is doing something like this.
fun save(doc: Document, col: MongoCollection<Document>) {
if (doc.getObjectId("_id") != null) {
doc.put("_id", ObjectId()) // generate a new id
}
col.replaceOne(Document("_id", doc.getObjectId("_id")), doc)
}
Maybe they removed save so you decide how to generate the new id.

using $addToset with java morphia aggregation

I have mongodb aggregation query and it works perfectly in shell.
How can i rewrite this query to use with morphia ?
org.mongodb.morphia.aggregation.Group.addToSet(String field) accepts only one field name but i need to add object to the set.
Query:
......aggregate([
{$group:
{"_id":"$subjectHash",
"authors":{$addToSet:"$fromAddress.address"},
---->> "messageDataSet":{$addToSet:{"sentDate":"$sentDate","messageId":"$_id"}},
"messageCount":{$sum:1}}},
{$sort:{....}},
{$limit:10},
{$skip:0}
])
Java code:
AggregationPipeline aggregationPipeline = myDatastore.createAggregation(Message.class)
.group("subjectHash",
grouping("authors", addToSet("fromAddress.address")),
--------??????------>> grouping("messageDataSet", ???????),
grouping("messageCount", new Accumulator("$sum", 1))
).sort(...)).limit(...).skip(...);
That's currently not supported but if you'll file an issue I'd be happy to include that in an upcoming release.
Thanks for your answer, I can guess that according to source code. :(
I don't want to use spring-data or java-driver directly (for this project) so I changed my document representation.
Added messageDataSet object which contains sentDate and messageId (and some other nested objects) (these values become duplicated in a document which is a bad design).
Aggregation becomes : "messageDataSet":{$addToSet:"$messageDataSet"},
and Java code is: grouping("messageDataSet", addToSet("messageDataSet")),
This works with moprhia. Thanks.

MongoDB Text Index using Java Driver

Using the MongoDB Java API, I have not been able to successfully locate a full example using text search. The code I am using is this:
DBCollection coll;
String searchString = "Test String";
coll.createIndex(new BasicDBObject ("blogcomments", "text"));
DBObject q = start("blogcomments").text(searchString).get();
The name of my collection that I am performing the search on is blogcomments. creatIndex() is the replacement method for the deprecated method ensureIndex(). I have seen examples for how to use the createIndex(), but not how to execute actual searches with the Java API. Is this the correct way to go about doing this?
That's not quite right. Queries that use indexes of type "text" can not specify a field name at query time. Instead, the field names to include in the index are specified at index creation time. See the documentation for examples. Your query will look like this:
DBObject q = QueryBuilder.start().text(searchString).get();

Categories

Resources