Java 8 lambda sum, count and group by

Java 8 lambda sum, count and group by - java

Select sum(paidAmount), count(paidAmount), classificationName,
From tableA
Group by classificationName;
How can i do this in Java 8 using streams and collectors?
Java8:
lineItemList.stream()
.collect(Collectors.groupingBy(Bucket::getBucketName,
Collectors.reducing(BigDecimal.ZERO,
Bucket::getPaidAmount,
BigDecimal::add)))
This gives me sum and group by. But how can I also get count on the group name ?
Expectation is :
100, 2, classname1
50, 1, classname2
150, 3, classname3

Using an extended version of the Statistics class of this answer,
class Statistics {
int count;
BigDecimal sum;
Statistics(Bucket bucket) {
count = 1;
sum = bucket.getPaidAmount();
}
Statistics() {
count = 0;
sum = BigDecimal.ZERO;
}
void add(Bucket b) {
count++;
sum = sum.add(b.getPaidAmount());
}
Statistics merge(Statistics another) {
count += another.count;
sum = sum.add(another.sum);
return this;
}
}
you can use it in a Stream operation like
Map<String, Statistics> map = lineItemList.stream()
.collect(Collectors.groupingBy(Bucket::getBucketName,
Collector.of(Statistics::new, Statistics::add, Statistics::merge)));
this may have a small performance advantage, as it only creates one Statistics instance per group for a sequential evaluation. It even supports parallel evaluation, but you’d need a very large list with sufficiently large groups to get a benefit from parallel evaluation.
For a sequential evaluation, the operation is equivalent to
lineItemList.forEach(b ->
map.computeIfAbsent(b.getBucketName(), x -> new Statistics()).add(b));
whereas merging partial results after a parallel evaluation works closer to the example already given in the linked answer, i.e.
secondMap.forEach((key, value) -> firstMap.merge(key, value, Statistics::merge));

As you're using BigDecimal for the amounts (which is the correct approach, IMO), you can't make use of Collectors.summarizingDouble, which summarizes count, sum, average, min and max in one pass.
Alexis C. has already shown in his answer one way to do it with streams. Another way would be to write your own collector, as shown in Holger's answer.
Here I'll show another way. First let's create a container class with a helper method. Then, instead of using streams, I'll use common Map operations.
class Statistics {
int count;
BigDecimal sum;
Statistics(Bucket bucket) {
count = 1;
sum = bucket.getPaidAmount();
}
Statistics merge(Statistics another) {
count += another.count;
sum = sum.add(another.sum);
return this;
}
}
Now, you can make the grouping as follows:
Map<String, Statistics> result = new HashMap<>();
lineItemList.forEach(b ->
result.merge(b.getBucketName(), new Statistics(b), Statistics::merge));
This works by using the Map.merge method, whose docs say:
If the specified key is not already associated with a value or is associated with null, associates it with the given non-null value. Otherwise, replaces the associated value with the results of the given remapping function

You could reduce pairs where the keys would hold the sum and the values would hold the count:
Map<String, SimpleEntry<BigDecimal, Long>> map =
lineItemList.stream()
.collect(groupingBy(Bucket::getBucketName,
reducing(new SimpleEntry<>(BigDecimal.ZERO, 0L),
b -> new SimpleEntry<>(b.getPaidAmount(), 1L),
(v1, v2) -> new SimpleEntry<>(v1.getKey().add(v2.getKey()), v1.getValue() + v2.getValue()))));
although Collectors.toMap looks cleaner:
Map<String, SimpleEntry<BigDecimal, Long>> map =
lineItemList.stream()
.collect(toMap(Bucket::getBucketName,
b -> new SimpleEntry<>(b.getPaidAmount(), 1L),
(v1, v2) -> new SimpleEntry<>(v1.getKey().add(v2.getKey()), v1.getValue() + v2.getValue())));

Related

Calculate and classify the data in the collection list in java

There are the following entity classes:
#NoArgsConstructor
#AllArgsConstructor
#Data
class Log {
private String platform;
private LocalDateTime gmtCreate;
private Integer enlistCount;
private Integer dispatcherCount;
private Integer callbackCount;
}
Now, I have a list of 10,000 Log entity classes. I want to achieve the following effect. Group the data according to the gmtCreate field by each hour. At the same time, in each group, group by the platform field. Finally, , find the sum of the individual values (enlistCount, dispatcherCount, callbackCount) in these groups. The result looks like this:
Map<Integer, Map<String, Map<String, Integer>>> result = new HashMap<>();
/*
{
"23": {
"platform1": {
"callbackTotal": 66,
"dispatcherTotal": 77,
"enlistTotal": 33
},
"platform2": {
"callbackTotal": 13,
"dispatcherTotal": 5,
"enlistTotal": 64
}
},
"24": {
"platform2": {
"callbackTotal": 64,
"dispatcherTotal": 47,
"enlistTotal": 98
},
"platform7": {
"callbackTotal": 0,
"dispatcherTotal": 3,
"enlistTotal": 21
}
}
}
*/
The way I can think of is to use the stream to traverse and group multiple times, but I am worried that the efficiency is very low. Is there any efficient way to do it?

You can do it all in one stream by using groupingBy with a downstream collector and a mutable reduction with collect. You need a helper class to sum the values:
public class Total {
private int enlistCount = 0;
private int dispatcherCount = 0;
private int callbackCount = 0;
public Total addLog(Log log) {
this.enlistCount += log.enlistCount();
this.dispatcherCount += log.dispatcherCount();
this.callbackCount += log.callbackCount();
return this;
}
public Total add(Total that) {
this.enlistCount += that.enlistCount;
this.dispatcherCount += that.dispatcherCount;
this.callbackCount += that.callbackCount;
return this;
}
public Map<String, Integer> toMap() {
Map<String, Integer> map = new HashMap<>();
map.put("enlistTotal", enlistCount);
map.put("dispatcherTotal", dispatcherCount);
map.put("callbackTotal", callbackCount);
return map;
}
#Override
public String toString() {
return String.format("%d %d %d", enlistCount, dispatcherCount, callbackCount);
}
}
Then the stream, grouping, and collection looks like this:
logs.stream().collect(Collectors.groupingBy(Log::getPlatform,
Collectors.groupingBy(log -> log.getGmtCreate().getHour(),
Collector.of(Total::new, Total::addLog, Total::add, Total::toMap))))
To break that down, there's a groupingBy on the platform, and inside that a groupingBy on the hour of the day. Then all the log entries are summed by a collector which does a mutable reduction:
Collector.of(Total::new, Total::addLog, Total::add, Total::toMap)
This collector uses a supplier that provides a new Total with zeros for the counts, an accumulator that adds each Log's counts to the Total, a combiner that knows how to sum two Totals (this would only be used in a parallel scenario), and finally a finishing function that transforms the Total object to a Map<String, Integer>.

I would use a loop and make use of Map.computeIfAbsent and Map.merge. ComputeIfAbsent will determine if an entry is available for the given key. If so, it will return the value for that entry, otherwise it will create an entry with the supplied key and value and return that value. In this case, the value is another map. So the first time the hour is added. But it also returns the map just created for that hour. So another computeIfAbsent can be appended and the process repeated to get the platform map.
Map.merge is similar except that it will take a key, a value, and a merge function. If the value is not present for the key, the supplied value will be used. Otherwise, the merge function will be used to apply the new value to the map, merging with the existing one. In this case, the desired function is to add them so Integer::sum is used for this process.
I ran this for 100_000 copies of the supplied test data and it took about a second to run.
Here is some data
LocalDateTime ldt = LocalDateTime.now();
List<Log> list = new ArrayList<>(List.of(
new Log("platform1", ldt.plusHours(1), 66, 77, 33),
new Log("platform1", ldt.plusHours(1), 66, 77, 33),
new Log("platform2", ldt.plusHours(1), 13, 5, 64),
new Log("platform2", ldt.plusHours(2), 64, 47, 98),
new Log("platform7", ldt.plusHours(2), 0, 3, 21),
new Log("platform7", ldt.plusHours(2), 10, 15, 44)));
And here is the process.
first create Map to hold the results.
Map<Integer, Map<String, Map<String, Integer>>> result =
new HashMap<>();
Now loop thru the list of logs and create the maps as you need them, summing up the values as you go.
for (Log log : list) {
Map<String, Integer> platformMap = result
.computeIfAbsent(log.getGMTCreate().getHour(),
v -> new HashMap<>())
.computeIfAbsent(log.getPlatform(),
v -> new HashMap<>());
platformMap.merge("callbackTotal", log.getCallbackCount(),
Integer::sum);
platformMap.merge("dispatcherTotal",
log.getDispatcherCount(), Integer::sum);
platformMap.merge("enlistTotal", log.getEnlistCount(),
Integer::sum);
}
Print the results.
result.forEach((k, v) -> {
System.out.println(k);
v.forEach((kk, vv) -> {
System.out.println(" " + kk);
vv.entrySet().forEach(
e -> System.out.println(" " + e));
});
});
prints
12
platform2
callbackTotal=64
enlistTotal=13
dispatcherTotal=5
platform1
callbackTotal=66
enlistTotal=132
dispatcherTotal=154
13
platform2
callbackTotal=98
enlistTotal=64
dispatcherTotal=47
platform7
callbackTotal=65
enlistTotal=10
dispatcherTotal=18

Multiply and sum items of 2 lists using stream in Java

I have 2 lists. One that I get the values of my database and others that I get as input.
I have to multiply the value of productQuantity (HerdList) by proporcionValue (HerdCompositionList) that matches the primary key (productCode, ageRangeCode) and sum the values.
Below is the code with the for loop. How do I do it with streams?
Herd (Input)
int productCode;
int ageRangeCode;
int productQuantity;
HerdComposition (DB)
Integer productCode;
Integer ageRangeCode;
BigDecimal proporcionValue;
for (Herd informedItem : prodution.getHerdList()) {
for (HerdComposition herdItem : HerdCompositionList) {
if (herdItem.getProductCode() == informedItem.getProductCode()
&& herdItem.getAgeRangeCode() == informedItem.getAgeRangeCode()) {
totalHerd = totalHerd
.add(herdItem.getProporcionValue()
.multiply(new BigDecimal(informedItem.getProductQuantity())));
}
}
}

Since you've mentioned the primary key as (productCode, ageRangeCode), you can create a quantity lookup map from the input list:
Map<List<?>, Integer> productQuantityLookUp = herdList.stream()
.collect(Collectors.toMap(
h -> Arrays.asList(h.getProductCode(), h.getAgeRangeCode()),
Herd::getProductQuantity));
further, iterate over the database list and reduce it to get the total
BigDecimal totalHerd = herdCompositionList.stream()
.map(hc -> new BigDecimal(productQuantityLookUp
.getOrDefault(Arrays.asList(hc.getProductCode(), hc.getAgeRangeCode()), 0))
.multiply(hc.getProporcionValue()))
.reduce(BigDecimal.ZERO, BigDecimal::add);
One of the benefits of using this approach (not specifically stream), would be that your complexity would reduce from O(N*M) to O(N+M) for M times O(1) lookup of the intermediate map.

How can I manage items with overlapping ranges, where based on a value I get the matching items

Say I have the following items (unsorted):
A, with A.amount = 10
B, with B.amount = 100
C, with C.amount = 50
D, with D.amount = 50
Now for every unique amount boundary AB in items, find the items whose range include the value and calculate cumulative bracket. So:
AB=10 results in { A, B, C, D } -> cumulative bracket 210
AB=50 results in { B, C, D } -> cumulative bracket 200
AB=100 results in { B } -> cumulative bracket 100
It would be used like so:
for (int AB : collectAmountBoundaries(items)) {
Collection<Item> itemsInBracket = findItemsForAB(items, AB);
// execute logic, calculations etc with cumulative bracket value for AB
}
Now I can code all this using vanilla Java, by first manually transforming the collection of items into a map of AB→cumulativeBracketValue or something. However, since I'm working with ranges and overlap-logic I feel somehow a clean solution involving NavigableMap, Range logic or something clever should be possible (it feels like a common pattern). Or perhaps using streams to do a collect groupingBy?
I'm not seeing it right now. Any ideas on how to tackle this cleanly?

I think, doing a simple filter and then adding the filtered result to a List and amount to a total is sufficient.
static ListAndCumalativeAmount getCR(List<Item> items, double amount) {
ListAndCumalativeAmount result = new ListAndCumalativeAmount();
items.stream().filter(item -> item.amount >= amount).forEach((i) -> {
result.getItems().add(i.name);
result.add(i.amount);
});
return result;
}
static class ListAndCumalativeAmount {
private List<String> items = new ArrayList<>();
private Double amount = new Double(0.0);
public List<String> getItems() {
return items;
}
public void add(double value) {
amount = amount + value;
}
public Double getAmount() {
return amount;
}
}

This is a way to do it with streams and groupingBy:
Map<Integer, SimpleEntry<List<Item>, Double>> groupedByBracketBoundary = items.stream()
.collect(groupingBy(o -> o.getAmount())).entrySet().stream()
// map map-values to map-entries of original value and sum, keeping key the same
.collect(toMap(Entry::getKey, e -> new SimpleEntry<>(e.getValue(),
e.getValue().stream()
.map(o -> o.getAmount())
.reduce(0d, (amount1, amount2) -> amount1 + amount2))));
LinkedHashSet<Integer> sortedUniqueAmountBoundaries = internalList.stream()
.map(o -> o.getAmount())
.sorted()
.collect(Collectors.toCollection(LinkedHashSet::new));
for (int ab : sortedUniqueAmountBoundaries) {
List<Item> itemsInBracket = groupedByBracketBoundary.get(ab).getKey();
double cumulativeAmountForBracket = groupedByBracketBoundary.get(ab).getValue();
// execute logic, calculations etc with cumulative bracket value for AB
}
Somehow this feels succinct and verbose at the same time, it's rather dense. Isn't there a JDK api or 3rd party library that does this kind of thing?

How to get 10 Objects based on a value from the Object?

I have to create a method that gives the 10 Taxpayers that spent the most in the entire system.
There's a lot of classes already created and code that would have to be in between but what I need is something like:
public TreeSet<Taxpayer> getTenTaxpayers(){
TreeSet<Taxpayer> taxp = new TreeSet<Taxpayer>();
...
for(Taxpayer t: this.taxpayers.values()){ //going through the Map<String, Taxpayer>
for(Invoice i: this.invoices.values()){ //going through the Map<String, Invoice>
if(taxp.size()<=10){
if(t.getTIN().equals(i.getTIN())){ //if the TIN on the taxpayer is the same as in the Invoice
...
}
}
}
}
return taxp;
}
To sum it up, I have to go through a Map<String, Taxpayer> which has for example 100 Taxpayers, then go through a Map<String, Invoice> for each respective invoice and return a new Collection holding the 10 Taxpayers that spent the most on the entire system based on 1 attribute on the Invoice Class. My problem is how do I get those 10, and how do I keep it sorted. My first look at it was to use a TreeSet with a Comparator but the problem is the TreeSet would be with the class Taxpayer while what we need to compare is an attribute on the class Invoice.

Is this a classic Top K problem ? Maybe you can use the java.util.PriorityQueue to build a min heap to get the top 10 Taxpayer.

This can be broken down into 3 steps:
Extract distinct TaxPayers
Extract Invoices for each payer and then sum amount
Sort by the payed amount and limit to first 10
If you are using java-8 you can do something like:
final Map<TaxPayer, Double> toTenMap = payersMap.values() // get values from map
.stream() // create java.util.Stream
.distinct() // do not process duplicates (TaxPayer must provide a standard-compliant equals method)
.map(taxPayer -> {
final double totalAmount = invoicesMap
.values() // get values from the invoices map
.stream() // create Stream
.filter(invoice -> invoice.getTIN().equals(taxPayer.getTIN())) // get only those for the current TaxPayer
.mapToDouble(Invoice::getAmount) // get amount
.sum(); // sum amount
return new AbstractMap.SimpleEntry<>(taxPayer, totalAmount); // create Map.Entry
})
.sorted( ( entry1, entry2 ) -> { // sort by total amount
if (entry1.getValue() > entry2.getValue()) return 1;
if (entry1.getValue() < entry2.getValue()) return -1;
return 0;
})
.limit(10) // get only top ten payers
.collect(Collectors.toMap( // save to map
AbstractMap.SimpleEntry::getKey,
AbstractMap.SimpleEntry::getValue
));
Surely there is a more elegant solution. Also, I haven't tested it because I don't have much time now.

Issue with Java 8 Lambda for effective final while incrementing counts

I want to use Java 8 Lambda expression in following scenario but I am getting Local variable fooCount defined in an enclosing scope must be final or effectively final. I understand what the error message says, but I need to calculate percentage here so need to increment fooCount and barCount then calculate percentage. So what's the way to achieve it:
// key is a String with values like "FOO;SomethinElse" and value is Long
final Map<String, Long> map = null;
....
private int calculateFooPercentage() {
long fooCount = 0L;
long barCount = 0L;
map.forEach((k, v) -> {
if (k.contains("FOO")) {
fooCount++;
} else {
barCount++;
}
});
final int fooPercentage = 0;
//Rest of the logic to calculate percentage
....
return fooPercentage;
}
One option I have is to use AtomicLong here instead of long but I would like to avoid it, so later if possible I want to use parallel stream here.

There is a count method in stream to do counts for you.
long fooCount = map.keySet().stream().filter(k -> k.contains("FOO")).count();
long barCount = map.size() - fooCount;
If you want parallelisation, change .stream() to .parallelStream().
Alternatively, if you were trying to increment a variable manually, and use stream parallelisation, then you would want to use something like AtomicLong for thread safety. A simple variable, even if the compiler allowed it, would not be thread-safe.

To get both numbers, matching and non-matching elements, you can use
Map<Boolean, Long> result = map.keySet().stream()
.collect(Collectors.partitioningBy(k -> k.contains("FOO"), Collectors.counting()));
long fooCount = result.get(true);
long barCount = result.get(false);
But since your source is a Map, which knows its total size, and want to calculate a percentage, for which barCount is not needed, this specific task can be solved as
private int calculateFooPercentage() {
return (int)(map.keySet().stream().filter(k -> k.contains("FOO")).count()
*100/map.size());
}
Both variants are thread safe, i.e. changing stream() to parallelStream() will perform the operation in parallel, however, it’s unlikely that this operation will benefit from parallel processing. You would need humongous key strings or maps to get a benefit…

I agree with the other answers indicating you should use countor partitioningBy.
Just to explain the atomicity problem with an example, consider the following code:
private static AtomicInteger i1 = new AtomicInteger(0);
private static int i2 = 0;
public static void main(String[] args) {
IntStream.range(0, 100000).parallel().forEach(n -> i1.incrementAndGet());
System.out.println(i1);
IntStream.range(0, 100000).parallel().forEach(n -> i2++);
System.out.println(i2);
}
This returns the expected result of 100000 for i1 but an indeterminate number less than that (between 50000 and 80000 in my test runs) for i2. The reason should be pretty obvious.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java 8 lambda sum, count and group by - java

Related

Calculate and classify the data in the collection list in java

Multiply and sum items of 2 lists using stream in Java

How can I manage items with overlapping ranges, where based on a value I get the matching items

How to get 10 Objects based on a value from the Object?

Issue with Java 8 Lambda for effective final while incrementing counts

Categories

Resources