Stream groupingBy by one field then merge all others

Stream groupingBy by one field then merge all others - java

I have trouble with stream groupingby.
List<FAR> listFar = farList.stream().filter(f -> !f.getStatus().equals(ENUM.STATUS.DELETED))
.collect(Collectors.toList());
List<HAUL> haulList = listFar.stream().map(f -> f.getHaul()).flatMap(f -> f.stream())
.collect(Collectors.toList());
It groups by specie, it's all fine, but there are another attributes to HAUL.
Map<Specie, List<HAUL>> collect = haulList.stream().collect(Collectors.groupingBy(HAUL::getSpecie));
Attributes:
haul.getFishCount(); (Integer)
haul.getFishWeight(); (BigDecimal)
Is it possible to group by HAUL::getSpecie (by Specie), but also "merging" together those two extra fields, so I have total?
For example: I have 3 of HAUL elements where fish specie A has 50/30/10 kg in weight.
Can I group it by specie and have total weight?

If I understood correctly:
haulsList
.stream()
.collect(Collectors.groupingBy(HAUL::getSpecie,
Collectors.collectingAndThen(Collectors.toList(),
list -> {
int left = list.stream().mapToInt(HAUL::getFishCount).sum();
BigDecimal right = list.stream().map(HAUL::getFishWeight).reduce(BigDecimal.ZERO, (x, y) -> x.add(y));
return new AbstractMap.SimpleEntry<>(left, right);
})));
There is a form to do:
.stream()
.collect(Collectors.groupingBy(HAUL::getSpecie,
Collectors.summingInt(HAUL::getFishCount)));
or
.stream()
.collect(Collectors.groupingBy(HAUL::getSpecie,
Collectors.mapping(HAUL::getFishWeight, Collectors.reducing((x, y) -> x.add(y)))));
But you can't really make these to act at the same time.

You might use mapping and reduce for example:
class Foo { int count; double weight; String spice; }
List<Foo> fooList = Arrays.asList(
new Foo(1,new BigDecimal(10), "a"),
new Foo(2,new BigDecimal(38), "a"),
new Foo(5,new BigDecimal(2), "b"),
new Foo(4,new BigDecimal(8), "b"));
Map<String,Optional<BigDecimal>> spieceWithTotalWeight = fooList.stream().
collect(
groupingBy(
Foo::getSpice,
mapping(
Foo::getWeight,
Collectors.reducing(BigDecimal::add)
)
)
);
System.out.println(spieceWithTotalWeight); // {a=Optional[48], b=Optional[10]}
I hope this helps.

If I'm getting your question correctly, you want the total sum of count * weight for each specie.
You can do this by using Collectors.groupingBy with a downstream collector that reduces the list of HAUL of each specie to the sum of haul.getFishCount() * haul.getFishWeight():
Map<Specie, BigDecimal> result = haulList.stream()
.collect(Collectors.groupingBy(haul -> haul.getSpecie(),
Collectors.mapping(haul ->
new BigDecimal(haul.getFishCount()).multiply(haul.getFishWeight()),
Collectors.reducing(BigDecimal::plus))));
This will get the total sum of count * weight for each specie. If you could add the following method to your Haul class:
public BigDecimal getTotalWeight() {
return new BigDecimal(getFishCount()).multiply(getFishWeight());
}
Then, collecting the stream would be easier and more readable:
Map<Specie, BigDecimal> result = haulList.stream()
.collect(Collectors.groupingBy(haul -> haul.getSpecie(),
Collectors.mapping(haul -> haul.getTotalWeight(),
Collectors.reducing(BigDecimal::plus))));
EDIT: After all, it seems that you want separate sums for each field...
I would use Collectors.toMap with a merge function for this. Here's the code:
Map<Specie, List<BigDecimal>> result = haulList.stream()
.collect(Collectors.toMap(
haul -> haul.getSpecie(),
haul -> Arrays.asList(
new BigDecimal(haul.getFishCount()),
haul.getFishWeight()),
(list1, list2) -> {
list1.set(0, list1.get(0).plus(list2.get(0)));
list1.set(1, list1.get(1).plus(list2.get(1)));
return list1;
}));
This uses a list of 2 elements to store the fish count at index 0 and the fish weight at index 1, for every specie.

Related

Java streams average

I need to create two methods using streams. A method that returns an average score of each task.
public Map<String, Double> averageScoresPerTask(Stream<CourseResult> results) {}
and a method that returns a task with the highest average score.
public String easiestTask(Stream<CourseResult> results) {}
I can only modify those 2 methods.
Here is CourseResult class
public class CourseResult {
private final Person person;
private final Map<String, Integer> taskResults;
public CourseResult(final Person person, final Map<String, Integer> taskResults) {
this.person = person;
this.taskResults = taskResults;
}
public Person getPerson() {
return person;
}
public Map<String, Integer> getTaskResults() {
return taskResults;
}
}
And methods that create CourseResult objects.
private final String[] programTasks = {"Lab 1. Figures", "Lab 2. War and Peace", "Lab 3. File Tree"};
private final String[] practicalHistoryTasks = {"Shieldwalling", "Phalanxing", "Wedging", "Tercioing"};
private Stream<CourseResult> programmingResults(final Random random) {
int n = random.nextInt(names.length);
int l = random.nextInt(lastNames.length);
return IntStream.iterate(0, i -> i + 1)
.limit(3)
.mapToObj(i -> new Person(
names[(n + i) % names.length],
lastNames[(l + i) % lastNames.length],
18 + random.nextInt(20)))
.map(p -> new CourseResult(p, Arrays.stream(programTasks).collect(toMap(
task -> task,
task -> random.nextInt(51) + 50))));
}
private Stream<CourseResult> historyResults(final Random random) {
int n = random.nextInt(names.length);
int l = random.nextInt(lastNames.length);
AtomicInteger t = new AtomicInteger(practicalHistoryTasks.length);
return IntStream.iterate(0, i -> i + 1)
.limit(3)
.mapToObj(i -> new Person(
names[(n + i) % names.length],
lastNames[(l + i) % lastNames.length],
18 + random.nextInt(20)))
.map(p -> new CourseResult(p,
IntStream.iterate(t.getAndIncrement(), i -> t.getAndIncrement())
.map(i -> i % practicalHistoryTasks.length)
.mapToObj(i -> practicalHistoryTasks[i])
.limit(3)
.collect(toMap(
task -> task,
task -> random.nextInt(51) + 50))));
}
Based on these methods I can calculate an average of each task by dividing sum of scores of this task by 3, because there are only 3 Persons tho I can make it so it divides by a number equal to number of CourseResult objects in a stream if these methods get their .limit(3) changed.
I don't know how to access keys of taskResults Map. I think I need them to then return a map of unique keys. A value for each unique key should be an average of values from taskResults map assigend to those keys.

For your first question: map each CourseResult to taskResults, flatmap to get all entries of each taskResults map form all CourseResults, group by map keys (task names) and collect averaging the values for same keys:
public Map<String, Double> averageScoresPerTask(Stream<CourseResult> results) {
return results.map(CourseResult::getTaskResults)
.flatMap(m -> m.entrySet().stream())
.collect(Collectors.groupingBy(Map.Entry::getKey, Collectors.averagingInt(Map.Entry::getValue)));
}
You can use the same approach for your second question to calculate the average for each task and finaly stream over the entries of the resulting map to find the task with the highest average.
public String easiestTask(Stream<CourseResult> results) {
return results.map(CourseResult::getTaskResults)
.flatMap(m -> m.entrySet().stream())
.collect(Collectors.groupingBy(Map.Entry::getKey, Collectors.averagingInt(Map.Entry::getValue)))
.entrySet().stream()
.max(Map.Entry.comparingByValue())
.map(Map.Entry::getKey)
.orElse("No easy task found");
}
To avoid code duplication you can call the first method within the second:
public String easiestTask(Stream<CourseResult> results) {
return averageScoresPerTask(results).entrySet()
.stream()
.max(Map.Entry.comparingByValue())
.map(Map.Entry::getKey)
.orElse("No easy task found");
}
EDIT
To customize the calculation of the average regardless how many items your maps contain, don't use the inbuilt operations like Collectors.averagingInt or Collectors.averagingDouble. Instead wrap your collector in collectingAndThen and sum the scores using Collectors.summingInt and finally after collecting divide using a divisor according if the task name starts with Lab or not:
public Map<String, Double> averageScoresPerTask(Stream<CourseResult> results) {
return results.map(CourseResult::getTaskResults)
.flatMap(m -> m.entrySet().stream())
.collect(Collectors.collectingAndThen(
Collectors.groupingBy(Map.Entry::getKey, Collectors.summingInt(Map.Entry::getValue)),
map -> map.entrySet()
.stream()
.collect(Collectors.toMap(
Map.Entry::getKey,
e -> e.getKey().startsWith("Lab") ? e.getValue() / 3. : e.getValue() / 4.))
));
}

To create a map containing an average score for each task, you need to flatten the map taskResults of every CourseResult result object in the stream and group the data by key (i.e. by task name).
For that you can use collector groupingBy(), as its downstream collector that would be responsible for calculation the average from the score-values mapped to the same task you can use averagingDouble().
That's how it might look like:
public Map<String, Double> averageScoresPerTask(Stream<CourseResult> results) {
return results
.map(CourseResult::getTaskResults) // Stream<Map<String, Integer>> - stream of maps
.flatMap(map -> map.entrySet().stream()) // Stream<Map.Entry<String, Integer>> - stream of entries
.collect(Collectors.groupingBy(
Map.Entry::getKey,
Collectors.averagingDouble(Map.Entry::getValue)
));
}
To find the easiest task, you can use this map instead of passing the stream as an argument because the logic of this method requires applying the same operations. It would make sense in the real life scenario when you're retrieving the data that is stored somewhere (it would be better to avoid double-processing it) and more over in your case you can't generate a stream from the source twice and pass into these two methods because in your case stream data is random. Passing the same stream into both method is not an option because you can execute a stream pipeline only once, when it hits the terminal operation - it's done, you can't use it anymore, hence you can't pass the same stream with random data in these two methods.
public String easiestTask(Map<String, Double> averageByTask) {
return averageByTask.entrySet().stream()
.max(Map.Entry.comparingByValue()) // produces result of type Optianal<Map.Entry<String, Double>>
.map(Map.Entry::getKey) // transforming into Optianal<String>
.orElse("no data"); // or orElseThrow() if data is always expected to be present depending on your needs
}

Java streams adding multiple values conditionally

I have a List of objects like this, where amount can be negative or positive:
class Sale {
String country;
BigDecimal amount;
}
And I would like to end up with a pair of sums of all negative values, and all positive values, by country.
With these values:
country | amount
nl | 9
nl | -3
be | 7.9
be | -7
Is there a way to end up with Map<String, Pair<BigDecimal, BigDecimal>> using a single stream?
It's easy to do this with two separate streams, but I can't figure it out with just one.

It should be using Collectors.toMap with a merge function to sum pairs.
Assuming that a Pair is immutable and has only getters for the first and second elements, the code may look like this:
static Map<String, Pair<BigDecimal, BigDecimal>> sumUp(List<Sale> list) {
return list.stream()
.collect(Collectors.toMap(
Sale::getCountry,
sale -> sale.getAmount().signum() >= 0
? new Pair<>(sale.getAmount(), BigDecimal.ZERO)
: new Pair<>(BigDecimal.ZERO, sale.getAmount()),
(pair1, pair2) -> new Pair<>(
pair1.getFirst().add(pair2.getFirst()),
pair1.getSecond().add(pair2.getSecond())
)
// , LinkedHashMap::new // optional parameter to keep insertion order
));
}
Test
List<Sale> list = Arrays.asList(
new Sale("us", new BigDecimal(100)),
new Sale("uk", new BigDecimal(-10)),
new Sale("us", new BigDecimal(-50)),
new Sale("us", new BigDecimal(200)),
new Sale("uk", new BigDecimal(333)),
new Sale("uk", new BigDecimal(-70))
);
Map<String, Pair<BigDecimal, BigDecimal>> map = sumUp(list);
map.forEach((country, pair) ->
System.out.printf("%-4s|%s%n%-4s|%s%n",
country, pair.getFirst(), country, pair.getSecond()
));
Output
uk |333
uk |-80
us |300
us |-50

Solution clouse to Alex Rudenko's but using groupingBy and downstream collector:
Map<String, Pair<BigDecimal, BigDecimal>> map =
list.stream()
.collect(Collectors.groupingBy(Sale::getCountry,
Collectors.mapping(s ->
s.getAmount().signum() >= 0?
new Pair<>(s.getAmount(), BigDecimal.ZERO):
new Pair<>(BigDecimal.ZERO, s.getAmount()),
Collectors.reducing(new Pair(BigDecimal.ZERO, BigDecimal.ZERO),
(p1, p2) -> new Pair(p1.getKey().add(p2.getKey()),
p1.getValue().add(p2.getValue()))))
));

Calculate the percentage of value using Collection framework

I have List of TrainingRequest where each and every element has List of Feedback.
#Data
class TrainingRequest{
#Transient
List<Feedack> feedback;
}
#Data
class Feedback{
String Q1;
String Q2;
}
I need to get all given result of Q1,Q2 and calculate percentage of each value.
List<TrainingRequest> trainingList = Optional.ofNullable(trainingRequestList).orElseGet(Collections::emptyList)
.stream().map(m -> {
List<Feedback> feedback = findByTrainingRequestId(m.getId());
m.setFeedback(feedback); // assigning Feedack to TrainingRequest
return m;
}).collect(Collectors.toList());
To flat all the feedback
List<Feedback> flatMap = trainingList.stream().flatMap(f -> f.getFeedback().stream()).collect(Collectors.toList());
To calculate each value of Q1 and Q2, I'm grouping it and getting the count. I need to get the percentage of each Q1, Q2 value insted of count.
Map<String, Map<String, Long>> map = new TreeMap<>();
map.put("Q1", flatMap.stream().collect(Collectors.groupingBy(Feedback::getQ1, Collectors.counting())));
map.put("Q2", flatMap.stream().collect(Collectors.groupingBy(Feedback::getQ2, Collectors.counting())));
When I use Collectors.counting(), it's giving the following output:
{
"Q1": {
"unsatisfied": 2,
"Satisfied": 1,
"satisfied": 1
},
"Q2": {
"Yes": 4
}
}
But I need it to give percentage as I expected
{
"Q1": {
"unsatisfied": 50 %,
"Satisfied": 25 %,
"satisfied": 25 %
},
"Q2": {
"Yes": 100 %
}
}
How to do it in a efficient way? Do I need to optimize the above code?

Your question was a bit unclear, so I tried to simplify the logic a bit for myself. I came up with a snipit to calculate the percentage of even/odd integers in an IntStream (which is not so different than what you're trying to do).
IntStream.range(0, 101).boxed()
.collect(Collectors.groupingBy(integer -> (integer % 2) == 0 ? "even" : "odd",
Collectors.collectingAndThen(Collectors.counting(), aLong -> aLong + " %")));
Notice the use of the collectingAndThen() this let's us first collect the values, then map the result into another value using a mapper/finisher.
In your case, this would be translated into something like this
map.put("Q1", flatMap.stream().collect(Collectors.groupingBy(Feedback::getQ1,
Collectors.collectingAndThen(Collectors.counting(), count -> (count / flatMap.size()) * 100.00 + " %")));
map.put("Q2", flatMap.stream().collect(Collectors.groupingBy(Feedback::getQ2,
Collectors.collectingAndThen(Collectors.counting(), count -> (count / flatMap.size()) * 100.00 + " %")));
UPDATE
Since you specifically asked about optimization, here are a couple of points to that
1. Don't create a new collection when you can reuse the existing one
// this code is unnecessarily creating a new collection
List<TrainingRequest> trainingList = Optional.of(trainingRequestList).orElseGet(Collections::emptyList)
.stream().map(m -> {
List<Feedback> feedback = findByTrainingRequestId(m.getId());
m.setFeedback(feedback); // assigning Feedack to TrainingRequest
return m;
}).collect(Collectors.toList());
it could be simplified to this
// to avoid NullPointerExceptions
trainingRequestList = trainingRequestList == null ? Collections.emptyList() : trainingRequestList;
// because java is pass by reference we are able to do this
trainingRequestList.forEach(m -> m.setFeedback(findByTrainingRequestId(m.getId())));
2. Don't Collect if you are going to stream the collection again
// to hold the count of Q1 an Q2
final Map<String, Integer> count = new HashMap<>();
// Order(n), n = trainingRequests count
trainingRequestList.forEach(trainingRequest -> {
List<Feedback> feedbacks = findByTrainingRequestId(trainingRequest.getId());
// Order(m), m = feedbacks count
feedbacks.forEach(f -> {
count.merge("Q1", f.getQ1(), Integer::sum);
count.merge("Q2", f.getQ2(), Integer::sum);
});
trainingRequest.setFeedback(feedbacks);
}
// finally we can collect the percentage
// Order(1)
int totalCountOfFeedbacks = count.values().stream().mapToInt(Integer::intValue).sum();
Map<String, String> result = count.entrySet().stream().collect(Collectors.toMap(Map.Entry::getKey, entry -> 100.00 * (entry.getValue() / totalCountOfFeedbacks ) + " %"));
Notice that these optimizations will not affect the fact that your logic is currently Order(n * m), it would be difficult to provide you further hints without actually looking at the code.

This might not be an optimized answer but you can get the result.
Create a map to keep total values for each Q, and then use it to calculate percentage,
Map<String, Long> totalCountMap = map.entrySet().stream()
.collect(Collectors.toMap(Map.Entry::getKey, e -> e.getValue().values().stream().reduce(Long::sum).orElse(0l)));
Map<String, Map<String, Long>> result = map.entrySet().stream()
.collect(Collectors.toMap(Map.Entry::getKey, e -> e.getValue().entrySet().stream()
.collect(Collectors.toMap(Map.Entry::getKey, e1 -> (e1.getValue() * 100 / totalCountMap.get(e.getKey()))))));

Java stream API - avoid using same predicate twice to calculate average

class A {
double value;
String key;
}
class B {
List<A> aList;
}
Given List<B> bList, I want to calculate the average of the values for a specific key k. Assume each B can hold only 0 or 1 times an instance of A with key.equals(k).
I figured I could first filter aList, and later extract the value using mapToDouble:
double average = blist.stream().filter(
b -> b.aList.stream().anyMatch(
a -> a.key.equals(k)
)
).mapToDouble(
b -> b.aList.stream().filter(
a -> a.key.equals(k)
).findFirst().get().value
).average().orElse(0);
But there is clearly a redundancy here, since I am filtering the same list by the same predicate twice (a -> a.key.equals(k)).
Is there a way to mapToDouble while omitting elements with missing matching keys at the same time?
Edit:
Here is a more concrete example, hopefully it will make it easier to understand:
String courseName;
...
double average = students.stream().filter(
student -> student.getGrades().stream().anyMatch(
grade -> grade.getCourseName().equals(courseName)
)
).mapToDouble(
student -> student.getGrades().stream().filter(
grade -> grade.getCourseName().equals(courseName)
).findFirst().get().getValue()
).average().orElse(0);
System.out.println(courseName + " average: " + average);

Try this:
double average = bList.stream()
.flatMap(b -> b.aList.stream())
.filter(a -> a.key.equals(k))
.mapToDouble(a -> a.value)
.average()
.orElse(Double.NaN);
If your objects have private field and getters, which they really should, it'd be like this:
double average = bList.stream()
.map(B::getaList)
.flatMap(List::stream)
.filter(a -> a.getKey().equals(k))
.mapToDouble(A::getValue)
.average()
.orElse(Double.NaN);

Try this.
double average = blist.stream()
.map(b -> b.aList.stream()
.filter(a -> a.key.equals(k))
.findFirst())
.filter(a -> a.isPresent())
.mapToDouble(a -> a.get().value)
.average().orElse(0);

Use of stream, filter and average on list and jdk8

I have this list of data that look like this;
{id, datastring}
{1,"a:1|b:2|d:3"}
{2,"a:2|c:2|c:4"}
{3,"a:2|bb:2|a:3"}
{4,"a:3|e:2|ff:3"}
What I need to do here is to do operations like average or find all id for which a element in the string is less than a certain value.
Here are some example;
Averages
{a,2}{b,2}{bb,2}{c,3}{d,3}{e,2}{ff,3}
Find all id's where c<4
{2}
Find all id's where a<3
{1,2,3}
Would this be a good use of stream() and filter() ??

Yes you can use stream operations to achieve that but I would suggest to create a class for this datas, so that each row corresponds to one specific instance. That will make your life easier IMO.
class Data {
private int id;
private Map<String, List<Integer>> map;
....
}
That said let's take a look at how you could implement this. First, the find all's implementation:
public static Set<Integer> ids(List<Data> list, String value, Predicate<Integer> boundPredicate) {
return list.stream()
.filter(d -> d.getMap().containsKey(value))
.filter(d -> d.getMap().get(value).stream().anyMatch(boundPredicate))
.map(d -> d.getId())
.collect(toSet());
}
This one is simple to read. You get a Stream<Data> from the list. Then you apply a filter such that you only get instances that have the value given in the map, and that there is a value which satisfies the predicate you give. Then you map each instance to its corresponding id and you collect the resulting stream in a Set.
Example of call:
Set<Integer> set = ids(list, "a", value -> value < 3);
which outputs:
[1, 2, 3]
The average request was a bit more tricky. I ended up with another implementation, you finally get a Map<String, IntSummaryStatistics> at the end (which does contain the average) but also other informations.
Map<String, IntSummaryStatistics> stats = list.stream()
.flatMap(d -> d.getMap().entrySet().stream())
.collect(toMap(Map.Entry::getKey,
e -> e.getValue().stream().mapToInt(i -> i).summaryStatistics(),
(i1, i2) -> {i1.combine(i2); return i1;}));
You first get a Stream<Data>, then you flatMap each entry set of each map to have Stream<Entry<String, List<Integer>>. Now you collect this stream into a map for which each key is mapped by the entry's key and each List<Integer> is mapped by its corresponding IntSummaryStatistics value. If you have two identical keys, you combine their respective IntSummaryStatistics values.
Given you data set, you get a Map<String, IntSummaryStatistics>
ff => IntSummaryStatistics{count=1, sum=3, min=3, average=3.000000, max=3}
bb => IntSummaryStatistics{count=1, sum=2, min=2, average=2.000000, max=2}
a => IntSummaryStatistics{count=5, sum=11, min=1, average=2.200000, max=3}
b => IntSummaryStatistics{count=1, sum=2, min=2, average=2.000000, max=2}
c => IntSummaryStatistics{count=2, sum=6, min=2, average=3.000000, max=4}
d => IntSummaryStatistics{count=1, sum=3, min=3, average=3.000000, max=3}
e => IntSummaryStatistics{count=1, sum=2, min=2, average=2.000000, max=2}
from which you can easily grab the average.
Here's a full working example, the implementation can certainly be improved though.

I know that you have your answer, but here are my versions too :
Map<String, Double> result = list.stream()
.map(Data::getElements)
.flatMap((Multimap<String, Integer> map) -> {
return map.entries().stream();
})
.collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.averagingInt((Entry<String, Integer> token) -> {
return token.getValue();
})));
System.out.println(result);
List<Integer> result2 = list.stream()
.filter((Data data) -> {
return data.getElements().get("c").stream().anyMatch(i -> i < 4);
})
.map(Data::getId)
.collect(Collectors.toList());
System.out.println(result2);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Stream groupingBy by one field then merge all others - java

Related

Java streams average

Java streams adding multiple values conditionally

Calculate the percentage of value using Collection framework

Java stream API - avoid using same predicate twice to calculate average

Use of stream, filter and average on list and jdk8

Categories

Resources