Related
I have a stream of data as shown below and I wish to collect the data based on a condition.
Stream of data:
452857;0;L100;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L120;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L121;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L126;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L100;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L122;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
I wish to collect the data based on the index = 2 (L100,L121 ...) and store it in different lists of L120,L121,L122 etc using Java 8 streams. Any suggestions?
Note: splittedLine array below is my stream of data.
For instance: I have tried the following but I think there's a shorter way:
List<String> L100_ENTITY_NAMES = Arrays.asList("L100", "L120", "L121", "L122", "L126");
List<List<String>> list= L100_ENTITY_NAMES.stream()
.map(entity -> Arrays.stream(splittedLine)
.filter(line -> {
String[] values = line.split(String.valueOf(DELIMITER));
if(values.length > 0){
return entity.equals(values[2]);
}
else{
return false;
}
}).collect(Collectors.toList())).collect(Collectors.toList());
I'd rather change the order and also collect the data into a Map<String, List<String>> where the key would be the entity name.
Assuming splittedLine is the array of lines, I'd probably do something like this:
Set<String> L100_ENTITY_NAMES = Set.of("L100", ...);
String delimiter = String.valueOf(DELIMITER);
Map<String, List<String>> result =
Arrays.stream(splittedLine)
.map(line -> {
String[] values = line.split(delimiter );
if( values.length < 3) {
return null;
}
return new AbstractMap.SimpleEntry<>(values[2], line);
})
.filter(Objects::nonNull)
.filter(tempLine -> L100_ENTITY_NAMES.contains(tempLine.getEntityName()))
.collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.mapping(Map.Entry::getValue, Collectors.toList());
Note that this isn't necessarily shorter but has a couple of other advantages:
It's not O(n*m) but rather O(n * log(m)), so it should be faster for non-trivial stream sizes
You get an entity name for each list rather than having to rely on the indices in both lists
It's easier to understand because you use distinct steps:
split and map the line
filter null values, i.e. lines that aren't valid in the first place
filter lines that don't have any of the L100 entity names
collect the filtered lines by entity name so you can easily access the sub lists
I would convert the semicolon-delimited lines to objects as soon as possible, instead of keeping them around as a serialized bunch of data.
First, I would create a model modelling our data:
public record LBasedEntity(long id, int zero, String lcode, …) { }
Then, create a method to parse the line. This can be as well an external parsing library, for this looks like CSV with semicolon as delimiter.
private static LBasedEntity parse(String line) {
String[] parts = line.split(";");
if (parts.length < 3) {
return null;
}
long id = Long.parseLong(parts[0]);
int zero = Integer.parseInt(parts[1]);
String lcode = parts[2];
…
return new LBasedEntity(id, zero, lcode, …);
}
Then the mapping is trivial:
Map<String, List<LBasedEntity>> result = Arrays.stream(lines)
.map(line -> parse(line))
.filter(Objects::nonNull)
.filter(lBasedEntity -> L100_ENTITY_NAMES.contains(lBasedEntity.lcode()))
.collect(Collectors.groupingBy(LBasedEntity::lcode));
map(line -> parse(line)) parses the line into an LBasedEntity object (or whatever you call it);
filter(Objects::nonNull) filters out all null values produced by the parse method;
The next filter selects all entities of which the lcode property is contained in the L100_ENTITY_NAMES list (I would turn this into a Set, to speed things up);
Then a Map is with key-value pairs of L100_ENTITY_NAME → List<LBasedEntity>.
You're effectively asking for what languages like Scala provide on collections: groupBy. In Scala you could write:
splitLines.groupBy(_(2)) // Map[String, List[String]]
Of course, you want this in Java, and in my opinion, not using streams here makes sense due to Java's lack of a fold or groupBy function.
HashMap<String, ArrayList<String>> map = new HashMap<>();
for (String[] line : splitLines) {
if (line.length < 2) continue;
ArrayList<String> xs = map.getOrDefault(line[2], new ArrayList<>());
xs.addAll(Arrays.asList(line));
map.put(line[2], xs);
}
As you can see, it's very easy to understand, and actually shorter than the stream based solution.
I'm leveraging two key methods on a HashMap.
The first is getOrDefault; basically if the value associate with our key doesn't exist, we can provide a default. In our case, an empty ArrayList.
The second is put, which actually acts like a putOrReplace because it lets us override the previous value associated with the key.
I hope that was helpful. :)
you're asking for a shorter way to achieve the same, actually your code is good. I guess the only part that makes it look lengthy is the if/else check in the stream.
if (values.length > 0) {
return entity.equals(values[2]);
} else {
return false;
}
I would suggest introduce two tiny private methods to improve the readability, like this:
List<List<String>> list = L100_ENTITY_NAMES.stream()
.map(entity -> getLinesByEntity(splittedLine, entity)).collect(Collectors.toList());
private List<String> getLinesByEntity(String[] splittedLine, String entity) {
return Arrays.stream(splittedLine).filter(line -> isLineMatched(entity, line)).collect(Collectors.toList());
}
private boolean isLineMatched(String entity, String line) {
String[] values = line.split(DELIMITER);
return values.length > 0 && entity.equals(values[2]);
}
I am trying to remove duplicates from a list of Student Objects based on multiple properties while preserving the order,As shown below i have list of student objects where we have multiple students with same name with varying attendence...i need to remove the duplicate student with same name and having studentAttendence as 100 while preserving the order.
Student{studentId=1, studentName='Sam', studentAttendence=100, studentAddress='New York'}
Student{studentId=2, studentName='Sam', studentAttendence=50, studentAddress='New York'}
Student{studentId=3, studentName='Sam', studentAttendence=60, studentAddress='New York'}
Student{studentId=4, studentName='Nathan', studentAttendence=40, studentAddress='LA'}
Student{studentId=5, studentName='Ronan', studentAttendence=100, studentAddress='Atlanta'}
Student{studentId=6, studentName='Nathan', studentAttendence=100, studentAddress='LA'}
desire output After removing the duplicates:
Student{studentId=2, studentName='Sam', studentAttendence=50, studentAddress='New York'}
Student{studentId=3, studentName='Sam', studentAttendence=60, studentAddress='New York'}
Student{studentId=4, studentName='Nathan', studentAttendence=40, studentAddress='LA'}
Student{studentId=5, studentName='Ronan', studentAttendence=100, studentAddress='Atlanta'}
What i have right now is only removing duplicates based on the name and not considering the percentage(100)...and also not preserving the order..any help is greatly appreciated.(Student supplier is a simple supplier function of list of students)
studentsSupplier.get().stream()
.sorted(Comparator.comparing(Student::getStudentName))
.collect(Collectors.collectingAndThen(
Collectors.toCollection(
() -> new TreeSet<>(Comparator.comparing(Student::getStudentName))), ArrayList::new));
Note: Only duplicate records with studentName matching and percentage 100 must be removed,(Record Ronon has percentage 100 but there is no duplicate with the same studentname so that must not be removed)
If you want to preserve order, obviously don't call .sorted, which messes with order.
More generally using streams here is complicated. Streams like it if the operation(s) you want to perform on each element within a stream are independent (do not need to look at anything except the one element being considered, i.e. no need to look at neighbours). That's not the case here.
If it is correct to remove any student with an attendence of 100 (by the way, that's a typo, the proper word is attendance), then all this stuff about 'duplicates' is a red herring, and all you need is:
list.removeIf(s -> s.getStudentAttendence() >= 100);
But if the idea is: Remove a record only if its attendence is 100+, and there is at least one other record in the list with the same name, it gets more complicated.
The primary issue is that your data storage mechanism is not in an appropriate form for this job. If you just stop with the lambdas, this is not hard. It helps to think about your list as consisting of 100 million entries. It is obviously not feasible for the entire stream op to keep the names of 100 million entries 'in memory'. You don't have that much memory. The datastructure (a List) also does not offer any fast lookups; there is no way to write code that answers the question 'how many records with studentName Sam is in this list?) without looping through 100 million entries which is a non-trivial job.
Thus, given the limits of:
The input data is in List form.
The input data is not already sorted.
The output must keep the same order as the input.
Then the job is impossible on its face!
So, instead you need to accept that it isn't an easy one-liner, and that you need to first make alternative versions of the same data store that do store what you need.
Then there are additional concerns. In particular, what happens if you have 3 Sam students and each record has studentAttendence = 100? Should they ALL be deleted? Should none be deleted? Delete 2 arbitrary ones?
Often if you're having trouble writing an algorithm, the actual problem is that you haven't fully specified the behaviour you want, and thus your floundering is mostly down to you not fully understanding the problem, more than it is a coding issue.
Let's say the rule is simply: Delete all students with attendence = 100, but only if there is a record with the same name with an attendence below 100. If all records have an attendence of 100, keep them all, then:
List<Students> students = ...;
Set<String> dupeNames = students.stream()
.filter(s -> s.getAttendence() < 100)
.map(Student::getStudentName)
.collect(Collectors.toSet());
students.removeIf(s -> s.getAttendence() < 100 && dupeNames.contains(s.getStudentName());
Will do the job, and will do it quickly. (O(n), to be algorithmically specific: Making a set-based duplicate requires constant-time steps per student record, so O(n), and the removeIf call similarly requires checking each student, but only having to do constant-time work per step, because .contains() on a set is constant time assuming good hash distribution, which Strings usually have), thus, a constant amount of O(n) operations means the whole operation is O(n): The time it takes grows linearly with how many students are in your input list (vs solutions that involve scanning the whole list every time you process a single entry in the list, which grows with the square of the input size).
Assuming studentId is uniqe as given in the example you coulduse the method List.removeIf with a BiPredicate accepting a student and the list of students.
BiPredicate<Student,List<Student>> pred =
(stud,list) -> list.stream()
.filter(s -> s.getStudentId() != stud.getStudentId())
.anyMatch(s -> s.getStudentName().equals(stud.getStudentName()) && stud.getStudentAttendence() == 100);
students.removeIf(stud -> pred.test(stud, students));
You should Filter record like this:
studentList.stream().filter(s -> s.getStudentAttendence() != 100)
.filter(distinctByKeys(Student::getStudentName, Student::getStudentAttendence))
.collect(Collectors.toList());
distinctByKeys method:
private static <T> Predicate<T> distinctByKeys(Function<? super T, ?>... keyExtractors) {
final Map<List<?>, Boolean> seen = new ConcurrentHashMap<>();
return t -> {
final List<?> keys = Arrays.stream(keyExtractors).map(ke -> ke.apply(t)).collect(Collectors.toList());
return seen.putIfAbsent(keys, Boolean.TRUE) == null;
};
}
After filtering
[Student [studentId=2, studentName=Sam, studentAttendence=50, studentAddress=New York],
Student [studentId=3, studentName=Sam, studentAttendence=60, studentAddress=New York],
Student [studentId=4, studentName=Nathan, studentAttendence=40, studentAddress=LA],
Student [studentId=5, studentName=Ronan, studentAttendence=76, studentAddress=Atlanta]]
Try this:
Map<String, Integer> checkList = new HashMap<>();
Student [] buffer = studentsSupplier.get()
.stream()
.map( student ->
{
checkList.compute( student.getStudentName(), (v,k) -> (v == null) ? Integer.valueOf( 1 ) : Integer.valueOf( v.intValue() + 1 ) );
return student;
}
.toArray( Student []::new );
List<Student> result = buffer.stream()
.filter( student -> (checkList.get( student.getStudentName() ).intValue() == 1) || (student.getAttendance() != 100) )
.collect( Collectors.toList() );
This assumes that a record with an attendance of 100 remains in the list if it is the only one for that student.
The nasty version of the same code would look like this, but it is even less understandable:
Map<String, Integer> checkList = new HashMap<>();
List<Student> result = studentsSupplier.get()
.stream()
.peek( student -> checkList.compute( student.getStudentName(), (v,k) -> (v == null) ? Integer.valueOf( 1 ) : Integer.valueOf( v.intValue() + 1 ) )
.collect( Collectors.toList() )
.stream()
.filter( student -> (checkList.get( student.getStudentName() ).intValue() == 1) || (student.getAttendance() != 100) )
.collect( Collectors.toList() );
There is a List of object like:-
ID Employee IN_COUNT OUT_COUNT Date
1 ABC 5 7 2020-06-11
2 ABC 12 5 2020-06-12
3 ABC 9 6 2020-06-13
This is the an employee data for three date which I get from a query in List object.
Not I want total number of IN_COUNT and OUT_COUNT between three date. This can be achieved by doing first iterating stream for only IN_COUNT and calling sum() and then in second iteration, only OUT_COUNT data can be summed. But I don't want to iterate the list two times.
How is this possible in functional programming using stream or any other option.
What you are trying to do is called a 'fold' operation in functional programming. Java streams call this 'reduce' and 'sum', 'count', etc. are just specialized reduces/folds. You just have to provide a binary accumulation function. I'm assuming Java Bean style getters and setters and an all args constructor. We just ignore the other fields of the object in our accumulation:
List<MyObj> data = fetchData();
Date d = new Date();
MyObj res = data.stream()
.reduce((a, b) -> {
return new MyObj(0, a.getEmployee(),
a.getInCount() + b.getInCount(), // Accumulate IN_COUNT
a.getOutCount() + b.getOutCount(), // Accumulate OUT_COUNT
d);
})
.orElseThrow();
This is simplified and assumes that you only have one employee in the list, but you can use standard stream operations to partition and group your stream (groupBy).
If you don't want to or can't create a MyObj, you can use a different type as accumulator. I'll use Map.entry, because Java lacks a Pair/Tuple type:
Map.Entry<Integer, Integer> res = l.stream().reduce(
Map.entry(0, 0), // Identity
(sum, x) -> Map.entry(sum.getKey() + x.getInCount(), sum.getValue() + x.getOutCount()), // accumulate
(s1, s2) -> Map.entry(s1.getKey() + s2.getKey(), s1.getValue() + s2.getValue()) // combine
);
What's happening here? We now have a reduce function of Pair accum, MyObj next -> Pair. The 'identity' is our start value, the accumulator function adds the next MyObj to the current result and the last function is only used to combine intermediate results (e.g., if done in parallel).
Too complicated? We can split the steps of extracting interesting properties and accumulating them:
Map.Entry<Integer, Integer> res = l.stream()
.map(x -> Map.entry(x.getInCount(), x.getOutCount()))
.reduce((x, y) -> Map.entry(x.getKey() + y.getKey(), x.getValue() + y.getValue()))
.orElseGet(() -> Map.entry(0, 0));
You can use reduce to done this:
public class Counts{
private int inCount;
private int outCount;
//constructor, getters, setters
}
public static void main(String[] args){
List<Counts> list = new ArrayList<>();
list.add(new Counts(5, 7));
list.add(new Counts(12, 5));
list.add(new Counts(9, 6));
Counts total = list.stream().reduce(
//it's start point, like sum = 0
//you need this if you don't want to modify objects from list
new Counts(0,0),
(sum, e) -> {
sum.setInCount( sum.getInCount() + e.getInCount() );
sum.setOutCount( sum.getOutCount() + e.getOutCount() );
return sum;
}
);
System.out.println(total.getInCount() + " - " + total.getOutCount());
}
Select sum(paidAmount), count(paidAmount), classificationName,
From tableA
Group by classificationName;
How can i do this in Java 8 using streams and collectors?
Java8:
lineItemList.stream()
.collect(Collectors.groupingBy(Bucket::getBucketName,
Collectors.reducing(BigDecimal.ZERO,
Bucket::getPaidAmount,
BigDecimal::add)))
This gives me sum and group by. But how can I also get count on the group name ?
Expectation is :
100, 2, classname1
50, 1, classname2
150, 3, classname3
Using an extended version of the Statistics class of this answer,
class Statistics {
int count;
BigDecimal sum;
Statistics(Bucket bucket) {
count = 1;
sum = bucket.getPaidAmount();
}
Statistics() {
count = 0;
sum = BigDecimal.ZERO;
}
void add(Bucket b) {
count++;
sum = sum.add(b.getPaidAmount());
}
Statistics merge(Statistics another) {
count += another.count;
sum = sum.add(another.sum);
return this;
}
}
you can use it in a Stream operation like
Map<String, Statistics> map = lineItemList.stream()
.collect(Collectors.groupingBy(Bucket::getBucketName,
Collector.of(Statistics::new, Statistics::add, Statistics::merge)));
this may have a small performance advantage, as it only creates one Statistics instance per group for a sequential evaluation. It even supports parallel evaluation, but you’d need a very large list with sufficiently large groups to get a benefit from parallel evaluation.
For a sequential evaluation, the operation is equivalent to
lineItemList.forEach(b ->
map.computeIfAbsent(b.getBucketName(), x -> new Statistics()).add(b));
whereas merging partial results after a parallel evaluation works closer to the example already given in the linked answer, i.e.
secondMap.forEach((key, value) -> firstMap.merge(key, value, Statistics::merge));
As you're using BigDecimal for the amounts (which is the correct approach, IMO), you can't make use of Collectors.summarizingDouble, which summarizes count, sum, average, min and max in one pass.
Alexis C. has already shown in his answer one way to do it with streams. Another way would be to write your own collector, as shown in Holger's answer.
Here I'll show another way. First let's create a container class with a helper method. Then, instead of using streams, I'll use common Map operations.
class Statistics {
int count;
BigDecimal sum;
Statistics(Bucket bucket) {
count = 1;
sum = bucket.getPaidAmount();
}
Statistics merge(Statistics another) {
count += another.count;
sum = sum.add(another.sum);
return this;
}
}
Now, you can make the grouping as follows:
Map<String, Statistics> result = new HashMap<>();
lineItemList.forEach(b ->
result.merge(b.getBucketName(), new Statistics(b), Statistics::merge));
This works by using the Map.merge method, whose docs say:
If the specified key is not already associated with a value or is associated with null, associates it with the given non-null value. Otherwise, replaces the associated value with the results of the given remapping function
You could reduce pairs where the keys would hold the sum and the values would hold the count:
Map<String, SimpleEntry<BigDecimal, Long>> map =
lineItemList.stream()
.collect(groupingBy(Bucket::getBucketName,
reducing(new SimpleEntry<>(BigDecimal.ZERO, 0L),
b -> new SimpleEntry<>(b.getPaidAmount(), 1L),
(v1, v2) -> new SimpleEntry<>(v1.getKey().add(v2.getKey()), v1.getValue() + v2.getValue()))));
although Collectors.toMap looks cleaner:
Map<String, SimpleEntry<BigDecimal, Long>> map =
lineItemList.stream()
.collect(toMap(Bucket::getBucketName,
b -> new SimpleEntry<>(b.getPaidAmount(), 1L),
(v1, v2) -> new SimpleEntry<>(v1.getKey().add(v2.getKey()), v1.getValue() + v2.getValue())));
I have two lists. One shows number of successful attempts for each individual in a group of people for some game.
public class SuccessfulAttempts{
String name;
int successCount;
}
List<SuccessfulAttempts> success;
And total number of attempts for each individual.
public class TotalAttempts{
String name;
int totalCount;
}
List<TotalAttempts> total;
And I want to show the percentage success for each person in the group.
public class PercentageSuccess{
String name;
float percentage;
}
List<PercentageSuccess> percentage;
And assume I have populate first two lists like this.
success.add(new SuccessfulAttempts(Alice, 4));
success.add(new SuccessfulAttempts(Bob, 7));
total.add(new TotalAttempts(Alice, 5));
total.add(new TotalAttempts(Bob, 10));
Now I want to calculate the percentage success for each person using Java Streams. So I actually need this kind of a result for the list List<PercentageSuccess> percentage.
new PercentageSuccess(Alice, 80);
new PercentageSuccess(Bob, 70);
And I want to calculate them (Alice's percentage and Bob's percentage) in parallel (I know how to do sequentially using a loop). How I can achieve this with Java Streams (or any other simple way)??
I would suggest converting one of your list to a Map for Easier access of count. Else for each value of one list you've to loop in the other list which will be O(n^2) complexity.
List<SuccessfulAttempts> success = new ArrayList<>();
List<TotalAttempts> total = new ArrayList<>();
success.add(new SuccessfulAttempts("Alice", 4));
success.add(new SuccessfulAttempts("Bob", 7));
total.add(new TotalAttempts("Alice", 5));
total.add(new TotalAttempts("Bob", 10));
// First create a Map
Map<String, Integer> attemptsMap = success.parallelStream()
.collect(Collectors.toMap(SuccessfulAttempts::getName, SuccessfulAttempts::getSuccessCount));
// Loop through the list of players and calculate percentage.
List<PercentageSuccess> percentage =
total.parallelStream()
// Remove players who have not participated from List 'total'. ('attempt' refers to single element in List 'total').
.filter(attempt -> attemptsMap.containsKey(attempt.getName()))
// Calculate percentage and create the required object
.map(attempt -> new PercentageSuccess(attempt.getName(),
((attemptsMap.get(attempt.getName()) * 100) / attempt.getTotalCount())))
// Collect it back to list
.collect(Collectors.toList());
percentage.forEach(System.out::println);
If arrays are of same same size and correctly ordered, you can use integer indexes to access original list elements.
List<PercentageSuccess> result = IntStream.range(0, size).parallel().mapToObj(index -> /*get the elements and construct percentage progress for person with given index*/).collect(Collectors.toList())
This means you have to create a method or custructor for PercentageSuccess which construncts a percentage for given SuccessAttempts and TotalAttempts.
PercentageSuccess(SuccessfulAttempts success, TotalAttempts total) {
this.name = success.name;
this.percentage = (float) success.successCount / (float) total.totalCount;
}
Then you construct a stream of integers from 0 to size which is parallel:
IntStream.range(0, size).parallel()
this is actually parallel for loop. Then turn each integer into PercentageSuccess of index'th person (note that you must enshure that lists are of same size and not shuffled, otherwice my code is not correct).
.mapToObj(index -> new PercentageSuccess(success.get(index), total.get(index))
and finally turn Stream to List with
.collect(Collectors.toList())
Also, this approach is not optimal in case success or total are LinkedList or other list implementation with O(n) cost of accessing element by index.
private static List<PercentageAttempts> percentage(List<SuccessfulAttempts> success, List<TotalAttempts> total) {
Map<String, Integer> successMap = success.parallelStream()
.collect(Collectors.toMap(SuccessfulAttempts::getName, SuccessfulAttempts::getSuccessCount, (a, b) -> a + b));
Map<String, Integer> totalMap = total.parallelStream()
.collect(Collectors.toMap(TotalAttempts::getName, TotalAttempts::getTotalCount));
return successMap.entrySet().parallelStream().map(entry -> new PercentageAttempts(entry.getKey(),
entry.getValue() * 1.0f / totalMap.get(entry.getKey()) * 100))
.collect(Collectors.toList());
}