Select 1 item per attribute value in Spring Data MongoRepository - java

I have a collection of objects in MongoDB and am using Spring Data MongoDB.
My collection of entities look something like this:
--------------------------------------------
| id | snapshot | name |
--------------------------------------------
| 2 | somedate | bla |
| 2 | somedate | foo |
| 3 | somedate | bar |
| 3 | somedate | cheese |
| 6 | somedate | milk |
| 6 | somedate | lorum |
| 6 | somedate | ipsum |
| 9 | somedate | do |
| 10 | somedate | re |
| 10 | somedate | mi |
| 15 | somedate | fa |
--------------------------------------------
I want to get a list of objects where I want to have only one object of each distinct id, the object for that id should be the one with the latest date.
My result should be something like this:
--------------------------------------------
| id | snapshot | name |
--------------------------------------------
| 2 | somedate | bla |
| 3 | somedate | bar |
| 6 | somedate | milk |
| 9 | somedate | do |
| 10 | somedate | mi |
| 15 | somedate | fa |
--------------------------------------------
Is this possible in using a MongoRepository query?
I'd appreciate any help.

With the aggregation framework it's possible. Run the following aggregation operation to get the desired result:
db.collection.aggregate([
{ "$sort": { "snapshot": -1 } },
{
"$group": {
"_id": "$id",
"snapshot": { "$first": "$snapshot" },
"name": { "$first": "$name" }
}
}
])
The above native aggregation operation can then be translated to Spring Data MongoDB aggregation as:
import static org.springframework.data.mongodb.core.aggregation.Aggregation.*;
TypedAggregation<Entity> aggregation = newAggregation(Entity.class,
sort(DESC, "snapshot"),
group("id")
.first("snapshot").as("snapshot")
.first("name").as("name")
);
AggregationResults<EntityStats> result = mongoTemplate.aggregate(aggregation, EntityStats.class);

Related

Extract elements from ObservableList

My Observable list looks like this : [AR | Argentina | 2 |
AU | Australia | 3 |
BE | Belgium | 1 |
BR | Brazil | 2 |
CA | Canada | 2 |
CH | Switzerland | 1 |
CN | China | 3 |
DE | Germany | 1 |
DK | Denmark | 1 |
EG | Egypt | 4 |
FR | France | 1 |
IL | Israel | 4 |
IN | India | 3 |
IT | Italy | 1 |
JP | Japan | 3 |
KW | Kuwait | 4 |
ML | Malaysia | 3 |
MX | Mexico | 2 |
NG | Nigeria | 4 |
NL | Netherlands | 1 |
SG | Singapore | 3 |
UK | United Kingdom | 1 |
US | United States of America | 2 |
ZM | Zambia | 4 |
ZW | Zimbabwe | 4 |
]
I would like to extract each these words and insert them in tableColumn so it would look like this
https://imgur.com/CXFW68K

Count distinct while aggregating others?

This is how my dataset looks like:
+---------+------------+-----------------+
| name |request_type| request_group_id|
+---------+------------+-----------------+
|Michael | X | 1020 |
|Michael | X | 1018 |
|Joe | Y | 1018 |
|Sam | X | 1018 |
|Michael | Y | 1021 |
|Sam | X | 1030 |
|Elizabeth| Y | 1035 |
+---------+------------+-----------------+
I want to calculate the amount of request_type's per person and count unique request_group_id's
Result should be following:
+---------+--------------------+---------------------+--------------------------------+
| name |cnt(request_type(X))| cnt(request_type(Y))| cnt(distinct(request_group_id))|
+---------+--------------------+---------------------+--------------------------------+
|Michael | 2 | 1 | 3 |
|Joe | 0 | 1 | 1 |
|Sam | 2 | 0 | 2 |
|John | 1 | 0 | 1 |
|Elizabeth| 0 | 1 | 1 |
+---------+--------------------+---------------------+--------------------------------+
What I've done so far: (helps to derive first two columns)
msgDataFrame.select(NAME, REQUEST_TYPE)
.groupBy(NAME)
.pivot(REQUEST_TYPE, Lists.newArrayList(X, Y))
.agg(functions.count(REQUEST_TYPE))
.show();
How to count distinct request_group_id's in this select? Is it possible to do within it?
I think it's possible only via two datasets join (my current result + separate aggregation by distinct request_group_id)
Example with "countDistinct" ("countDistinct" is not worked over window, replaced with "size","collect_set"):
val groupIdWindow = Window.partitionBy("name")
df.select($"name", $"request_type",
size(collect_set("request_group_id").over(groupIdWindow)).alias("countDistinct"))
.groupBy("name", "countDistinct")
.pivot($"request_type", Seq("X", "Y"))
.agg(count("request_type"))
.show(false)

Restriction on #OneToMany mapping

I have a product class where there is an #OneToMany association for a list of buyers. What I want to do is that buyer search performed by the association when I search for a product, use a null constraint for the end date column of the Buyer table. How to do this in a list mapping like this, below.
// it would be something I needed cri.createCriteria("listaBuyer", "buyer).add(Restriction.isNull("finalDate"));
Example
Registered data
product code | initial date | final date |
-------------------------------------------------------
1 | 2016-28-07 | 2017-28-07 |
------------------------------------------------------
2 | 2016-10-08 | 2017-28-07 |
------------------------------------------------------
3 | 2017-28-08 | |
-----------------------------------------------------
4 | 2017-30-08 | |
Product Class
public class Product {
#OneToMany(targetEntity=Buyer.class, orphanRemoval=true, cascade={CascadeType.PERSIST,CascadeType.MERGE}, mappedBy="product")
#LazyCollection(LazyCollectionOption.FALSE)
public List<Buyer> getListaBuyer() {
if (listaBuyer == null) {
listaBuyer = new ArrayList<Buyer>();
}
return listaBuyer;
}
built criterion
Criteria cri = getSession().createCriteria(Product.class);
cri.createCriteria("status", "sta");
cri.add(Restrictions.eq("id", Product.getId()));
return cri.list();
Expected outcome
product code | initial date | final date |
-------------------------------------------------------
3 | 2017-28-08 | |
-----------------------------------------------------
4 | 2017-30-08 | |
Returned result
product code | initial date | final date |
-------------------------------------------------------
1 | 2016-28-07 | 2017-28-07 |
------------------------------------------------------
2 | 2016-10-08 | 2017-28-07 |
------------------------------------------------------
3 | 2017-28-08 | |
-----------------------------------------------------
4 | 2017-30-08 | |

How to perform a query using a field that is a merge of 2 columns?

I'm building up a series of distribution analysis using Java Spark library. This is the actual code I'm using to fetch the data from a JSON file and save the output.
Dataset<Row> dataset = spark.read().json("local/foods.json");
dataset.createOrReplaceTempView("cs_food");
List<GenericAnalyticsEntry> menu_distribution= spark
.sql(" ****REQUESTED QUERY ****")
.toJavaRDD()
.map(row -> Triple.of( row.getString(0), BigDecimal.valueOf(row.getLong(1)), BigDecimal.valueOf(row.getLong(2))))
.map(GenericAnalyticsEntry::of)
.collect();
writeObjectAsJsonToHDFS(fs, "/local/output/menu_distribution_new.json", menu_distribution);
The query I'm looking for is based on this structure:
+------------+-------------+------------+------------+
| FIRST_FOOD | SECOND_FOOD | DATE | IS_SPECIAL |
+------------+-------------+------------+------------+
| Pizza | Spaghetti | 11/02/2017 | TRUE |
+------------+-------------+------------+------------+
| Lasagna | Pizza | 12/02/2017 | TRUE |
+------------+-------------+------------+------------+
| Spaghetti | Spaghetti | 13/02/2017 | FALSE |
+------------+-------------+------------+------------+
| Pizza | Spaghetti | 14/02/2017 | TRUE |
+------------+-------------+------------+------------+
| Spaghetti | Lasagna | 15/02/2017 | FALSE |
+------------+-------------+------------+------------+
| Pork | Mozzarella | 16/02/2017 | FALSE |
+------------+-------------+------------+------------+
| Lasagna | Mozzarella | 17/02/2017 | FALSE |
+------------+-------------+------------+------------+
How can I achieve this (written below) output from the code written above?
+------------+--------------------+----------------------+
| FOODS | occurrences(First) | occurrences (Second) |
+------------+--------------------+----------------------+
| Pizza | 2 | 1 |
+------------+--------------------+----------------------+
| Lasagna | 2 | 1 |
+------------+--------------------+----------------------+
| Spaghetti | 2 | 3 |
+------------+--------------------+----------------------+
| Mozzarella | 0 | 2 |
+------------+--------------------+----------------------+
| Pork | 1 | 0 |
+------------+--------------------+----------------------+
I've of course tried to figure out a solution by myself but had no luck with the my tries, I may be wrong, but I need something like this:
"SELECT (first_food + second_food) as menu, COUNT(first_food), COUNT(second_food) from cs_food GROUP BY menu"
From the example data, this looks like it will produce the output you want:
select
foods,
first_count,
second_count
from
(select first_food as food from menus
union select second_food from menus) as f
left join (
select first_food, count(*) as first_count from menus
group by first_food
) as ff on ff.first_food=f.food
left join (
select second_food, count(*) as second_count from menus
group by second_food
) as sf on sf.second_food=f.food
;
Simple combination of flatMap and groupBy should do the job like this (sorry, can't check if it 100% correct right now):
import spark.sqlContext.implicits._
val df = Seq(("Pizza", "Pasta"), ("Pizza", "Soup")).toDF("first", "second")
df.flatMap({case Row(first: String, second: String) => Seq((first, 1, 0), (second, 0, 1))})
.groupBy("_1")

How to resolve Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2?

Can somebody help me solving this type of error
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
I am searching for data in linked list but when I want to insert the data into an array, it turn up to be like this:
matric | nama | sem | cc | ch | fm
32255 | izzat | 1 | ccs2 | 3 | 45.0
| | 2 | ccs3 | 3 | 56.0
32345 | khai] | 3 | ccs4 | 3 | 45.0
| | 2 | ccs5 | 3 | 2.0
32246 | fifi | 1 | cc1 | 3 | 60.0
| | 1 | ccs3 | 4 | 34.0
34567 | dudu | 2 | ccs2 | 2 | 24.0
| | 2 | ccs4 | 6 | 79.0
first-->34567-->32246-->32345-->32255-->null
first-->6-->2-->4-->3-->3-->3-->3-->3-->null
first-->2-->2-->1-->1-->2-->3-->2-->1-->null
first-->dudu-->fifi-->khai]-->izzat-->null
first-->ccs4-->ccs2-->ccs3-->cc1-->ccs5-->ccs4-->ccs3-->ccs2-->null
first-->79.0-->24.0-->34.0-->60.0-->2.0-->45.0-->56.0-->45.0-->null
42insert matric= 032345
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
2
khai]
2
3
at inputoutput.LinkedList.getcc(LinkedList.java:141)
at inputoutput.baca.getcc(baca.java:84)
at inputoutput.Inputoutput.main(Inputoutput.java:75)
Java Result: 1
BUILD SUCCESSFUL (total time: 7 seconds)
the code:
String[] getcc(int mat,int sub) {
ListObject2 current = first2;
int count=0;
String b[]=new String[2] ;//2 is the subject number==sub
int x=0;
while (current!=null ) {
if(count==((mat*sub)+x) && ((mat*sub)+0)<((mat*sub)+x)<<((mat*sub)+sub)){
b[x]=current.data2;
x++;
}
current=current.next;
count++;
}
return b;
}
but I will get the input if search for last data in the linked list which is 032255
this is the output:
matric | nama | sem | cc | ch | fm
32255 | izzat | 1 | ccs2 | 3 | 45.0
| | 2 | ccs3 | 3 | 56.0
32345 | khai] | 3 | ccs4 | 3 | 45.0
| | 2 | ccs5 | 3 | 2.0
32246 | fifi | 1 | cc1 | 3 | 60.0
| | 1 | ccs3 | 4 | 34.0
34567 | dudu | 2 | ccs2 | 2 | 24.0
| | 2 | ccs4 | 6 | 79.0
first-->34567-->32246-->32345-->32255-->null
first-->6-->2-->4-->3-->3-->3-->3-->3-->null
first-->2-->2-->1-->1-->2-->3-->2-->1-->null
first-->dudu-->fifi-->khai]-->izzat-->null
first-->ccs4-->ccs2-->ccs3-->cc1-->ccs5-->ccs4-->ccs3-->ccs2-->null
first-->79.0-->24.0-->34.0-->60.0-->2.0-->45.0-->56.0-->45.0-->null
42insert matric= 032255
3
izzat
2
1
ccs3//the data i want to search
ccs2//
You're going into the if statement more than twice while walking the list. If you do that, you'll go past the bounds of the b array (which can only hold two values). You should use an ArrayList instead so you can add as many items as you need.

Categories

Resources