How can I add objects from one stream to two different lists simultaneously
Currently I am doing
body.getSurroundings().parallelStream()
.filter(o -> o.getClass().equals(ResourcePoint.class))
.map(o -> (ResourcePoint)o)
.filter(o -> !resourceMemory.contains(o))
.forEach(resourceMemory::add);
to add objects from my stream into a linkedlist "resourceMemory", but I also want to add the same objects to another list simultaneously, but I can't find the syntax for it. Is it possible or do I need to have two copies of this code for each list?
There are several fundamental errors you should understand first, before trying to expand your code.
First of all, forEach does not guaranty a particular order of element processing, so it’s likely the wrong tool for adding to a List, even for sequential streams, however, it is completely wrong to use with a parallel stream to add to a collection like LinkedList which is not thread safe, as the action will be performed concurrently.
But even if resourceMemory was a thread safe collection, your code still was broken as there is an interference between your filter condition and the terminal action. .filter(o -> !resourceMemory.contains(o)) queries the same list which you are modifying in the terminal action and it shouldn’t be hard to understand how this can brake even with thread-safe collections:
Two or more threads may process the filter and find that the element is not contained in the list, then all of them will add the element, contradicting your obvious intention of not having duplicates.
You could resort to forEachOrdered which will perform the action in order and non-concurrently:
body.getSurroundings().parallelStream()
.filter(o -> o instanceof ResourcePoint)
.map(o -> (ResourcePoint)o)
.forEachOrdered(o -> {// not recommended, just for explanation
if(!resourceMemory.contains(o))
resourceMemory.add(o);
});
This will work and it’s obvious how you could add to another list within that action, but it’s far away from recommended coding style. Also, the fact that this terminal action synchronizes with all processing threads will destroy any potential benefit of parallel processing, especially as the most expensive operation of this stream pipeline is invoking contains on a LinkedList which will (must) happen single-threaded.
The correct way to collect stream elements into a list is via, as the name suggests, collect:
List<ResourcePoint> resourceMemory
=body.getSurroundings().parallelStream()
.filter(o -> o instanceof ResourcePoint)
.map(o -> (ResourcePoint)o)
.distinct() // no duplicates
.collect(Collectors.toList()); // collect into a list
This doesn’t return a LinkedList, but you should rethink carefully whether you really need a LinkedList. In 99% of all cases, you don’t. If you really need a LinkedList, you can replace Collectors.toList() with Collectors.toCollection(LinkedList::new).
Now if you really must add to an existing list created outside of your control, which might already contain elements, you should consider the fact mentioned above, that you have to ensure single-threaded access to a non-thread-safe list anyway, so there’s no benefit from doing it from within the parallel stream at all. In most cases, it’s more efficient to let the stream work independently from that list and add the result in a single threaded step afterwards:
Set<ResourcePoint> newElements=
body.getSurroundings().parallelStream()
.filter(o -> o instanceof ResourcePoint)
.map(o -> (ResourcePoint)o)
.collect(Collectors.toCollection(LinkedHashSet::new));
newElements.removeAll(resourceMemory);
resourceMemory.addAll(newElements);
Here, we collect into a LinkedHashSet which implies maintenance of the encounter order and sorting out duplicates within the new elements, then use removeAll on the new elements to remove existing elements of the target list (here we benefit from the hash set nature of the temporary collection), finally, the new elements are added to the target list, which, as explained, must happen single-threaded anyway for a target collection which isn’t thread safe.
It’s easy to add the newElements to another target collection with this solution, much easier than writing a custom collector for producing two lists during the stream processing. But note that the stream operations as written above are way too cheep to assume any benefit from parallel processing. You would need a very large number of elements to compensate the initial multi-threading overhead. It’s even possible that there is no number for which it ever pays off.
Instead of
.forEach(resourceMemory::add)
You could invoke
.forEach(o -> {
resourceMemory.add(o);
otherResource.add(o);
})
or put the add operations in a separate method so you could provide a method reference
.forEach(this::add)
void add(ResourcePoint p) {
resourceMemory.add(o);
otherResource.add(o);
}
But bear in mind, that the order of insertion maybe different with each run as you use a parallel stream.
Related
I'm interested in sorting an list of object based on date attribute in that object. I can either use list sort method.
list.sort( (a, b) -> a.getDate().compareTo(b.getDate()) );
Or I can use stream sorted method
List<E> l = list.stream()
.sorted( (a, b) -> a.getDate().compareTo(b.getDate()))
.collect(Collectors.toList());
Out of both above option which should we use and why?
I know the former one will update my original list and later one will not update the original but instead give me a fresh new list object.
So, I don't care my original list is getting updated or not. So which one is good option and why?
If you only need to sort your List, and don't need any other stream operations (such as filtering, mapping, etc...), there's no point in adding the overhead of creating a Stream and then creating a new List. It would be more efficient to just sort the original List.
If you wish to known which is best, your best option is to benchmark it: you may reuse my answer JMH test.
It should be noted that:
List::sort use Arrays::sort. It create an array before sorting. It does not exists for other Collection.
Stream::sorted is done as state full intermediate operation. This means the Stream need to remember its state.
Without benchmarking, I'd say that:
You should use collection.sort(). It is easier to read: collection.stream().sorted().collect(toList()) is way to long to read and unless you format your code well, you might have an headache (I exaggerate) before understanding that this line is simply sorting.
sort() on a Stream should be called:
if you filter many elements making the Stream effectively smaller in size than the collection (sorting N items then filtering N items is not the same than filtering N items then sorting K items with K <= N).
if you have a map transformation after the sort and you loose a way to sort using the original key.
If you use your stream with other intermediate operation, then sort might be required / useful:
collection.stream() // Stream<U> #0
.filter(...) // Stream<U> #1
.sorted() // Stream<U> #2
.map(...) // Stream<V> #3
.collect(toList()) // List<V> sorted by U.
;
In that example, the filter apply before the sort: the stream #1 is smaller than #0, so the cost of sorting with stream might be less than Collections.sort().
If all that you do is simply filtering, you may also use a TreeSet or a collectingAndThen operation:
collection.stream() // Stream<U> #0
.filter(...) // Stream<U> #1
.collect(toCollection(TreeSet::new))
;
Or:
collection.stream() // Stream<U>
.filter(...) // Stream<U>
.collect(collectingAndThen(toList(), list -> {
list.sort();
return list;
})); // List<V>
Streams have some overheads because it creates many new objects like a concrete Stream, a Collector, and a new List. So if you just want to sort a list and doesn't care about whether the original gets changed or not, use List.sort.
There is also Collections.sort, which is an older API. The difference between it and List.sort can be found here.
Stream.sorted is useful when you are doing other stream operations alongside sorting.
Your code can also be rewritten with Comparator:
list.sort(Comparator.comparing(YourClass::getDate)));
First one would be better in term of performance. In the first one, the sort method just compares the elements of the list and orders them. The second one will create a stream from your list, sort it and create a new list from that stream.
In your case, since you can update the first list, the first approach is the better, both in term of performance and memory consumption. The second one is convenient if you need to and with a stream, or if you have a stream and want to end up with a sorted list.
You use the first method
list.sort((a, b) -> a.getDate().compareTo(b.getDate()));
it's much faster than the second one and it didn't create a new intermediate object. You could use the second method when you want to do some additional stream operations (e.g. filtering, map).
Just iterating below list & adding into another shared mutable list via java 8 streams.
List<String> list1 = Arrays.asList("A1","A2","A3","A4","A5","A6","A7","A8","B1","B2","B3");
List<String> list2 = new ArrayList<>();
Consumer<String> c = t -> list2.add(t.startsWith("A") ? t : "EMPTY");
list1.stream().forEach(c);
list1.parallelStream().forEach(c);
list1.forEach(c);
What is the difference between above three iteration & which one we need to use. Are there any considerations?
Regardless of whether you use parallel or sequential Stream, you shouldn't use forEach when your goal is to generate a List. Use map with collect:
List<String> list2 =
list2.stream()
.map(item -> item.startsWith("A") ? item : "EMPTY")
.collect(Collectors.toList());
Functionally speaking,for the simple cases they are almost the same, but generally speaking, there are some hidden differences:
Lets start by quoting from Javadoc of forEach for iterable use-cases stating that:
performs the given action for each element of the Iterable until all
elements have been processed or the action throws an exception.
and also we can iterate over a collection and perform a given action on each element – by just passing a class that implements the Consumer interface
void forEach(Consumer<? super T> action)
https://docs.oracle.com/javase/8/docs/api/java/lang/Iterable.html#forEach-java.util.function.Consumer-
The order of Stream.forEach is random while Iterable.forEach is always executed in the iteration order of the Iterable.
If Iterable.forEach is iterating over a synchronized collection, Iterable.forEach takes the collection's lock once and holds it across all the calls to the action method. The Stream.forEach call uses the collection's spliterator, which does not lock
The action specified in Stream.forEach is required to be non-interfering while Iterable.forEach is allowed to set values in the underlying ArrayList without problems.
In Java, Iterators returned by Collection classes, e.g. ArrayList, HashSet, Vector, etc., are fail fast. This means that if you try to add() or remove() from the underlying data structure while iterating it, you get a ConcurrentModificationException.
https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html#fail-fast
More Info:
What is the difference between .foreach and .stream().foreach?
What is difference between Collection.stream().forEach() and Collection.forEach()?
When working with streams, you should write your code in a way that if you switch to parallel streams, it does not produce the wrong results.
Imagine if in your code you were doing reading and writing on the same shared memory (list2) and you distribute your process into several threads (using parallel streams). Then you are DOOMED. Therefore you have several options.
make your shared memory (list2) thread-safe. for example by using AtomicReferences
List<String> list2 = new ArrayList<>();
AtomicReference<List<String>> listSafe = new AtomicReference<>();
listSafe.getAndUpdate(strings -> {strings.add("newvalue"); return strings;});
or you can go with the purely functional approach (code with no side effects)
like the #Eran solution.
Is the following statement true?
The sorted() operation is a “stateful intermediate operation”, which means that subsequent operations no longer operate on the backing collection, but on an internal state.
(Source and source - they seem to copy from each other or come from the same source.)
Disclaimer: I am aware the following snippets are not legit usages of Java Stream API. Don't use in the production code.
I have tested Stream::sorted as a snippet from sources above:
final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());
list.stream()
.filter(i -> i > 5)
.sorted()
.forEach(list::remove);
System.out.println(list); // Prints [0, 1, 2, 3, 4, 5]
It works. I replaced Stream::sorted with Stream::distinct, Stream::limit and Stream::skip:
final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());
list.stream()
.filter(i -> i > 5)
.distinct()
.forEach(list::remove); // Throws NullPointerException
To my surprise, the NullPointerException is thrown.
All the tested methods follow the stateful intermediate operation characteristics. Yet, this unique behavior of Stream::sorted is not documented nor the Stream operations and pipelines part explains whether the stateful intermediate operations really guarantee a new source collection.
Where my confusion comes from and what is the explanation of the behavior above?
The API documentation makes no such guarantee “that subsequent operations no longer operate on the backing collection”, hence, you should never rely on such a behavior of a particular implementation.
Your example happens to do the desired thing by accident; there’s not even a guarantee that the List created by collect(Collectors.toList()) supports the remove operation.
To show a counter-example
Set<Integer> set = IntStream.range(0, 10).boxed()
.collect(Collectors.toCollection(TreeSet::new));
set.stream()
.filter(i -> i > 5)
.sorted()
.forEach(set::remove);
throws a ConcurrentModificationException. The reason is that the implementation optimizes this scenario, as the source is already sorted. In principle, it could do the same optimization to your original example, as forEach is explicitly performing the action in no specified order, hence, the sorting is unnecessary.
There are other optimizations imaginable, e.g. sorted().findFirst() could get converted to a “find the minimum” operation, without the need to copy the element into a new storage for sorting.
So the bottom line is, when relying on unspecified behavior, what may happen to work today, may break tomorrow, when new optimizations are added.
Well sorted has to be a full copying barrier for the stream pipeline, after all your source could be not sorted; but this is not documented as such, thus do not rely on it.
This is not just about sorted per-se, but what other optimization can be done to the stream pipeline, so that sorted could be entirely skipped. For example:
List<Integer> sortedList = IntStream.range(0, 10)
.boxed()
.collect(Collectors.toList());
StreamSupport.stream(() -> sortedList.spliterator(), Spliterator.SORTED, false)
.sorted()
.forEach(sortedList::remove); // fails with CME, thus no copying occurred
Of course, sorted needs to be a full barrier and stop to do an entire sort, unless, of course, it can be skipped, thus the documentation makes no such promises, so that we don't run in weird surprises.
distinct on the other hand does not have to be a full barrier, all distinct does is check one element at a time, if it is unique; so after a single element is checked (and it is unique) it is passed to the next stage, thus without being a full barrier. Either way, this is not documented also...
You shouldn't have brought up the cases with a terminal operation forEach(list::remove) because list::remove is an interfering function and it violates the "non-interference" principle for terminal actions.
It's vital to follow the rules before wondering why an incorrect code snippet causes unexpected (or undocumented) behaviour.
I believe that list::remove is the root of the problem here. You wouldn't have noticed the difference between the operations for this scenario if you'd written a proper action for forEach.
Let's say you have a collection with some strings and you want to return the first two characters of each string (or some other manipulation...).
In Java 8 for this case you can use either the map or the forEach methods on the stream() which you get from the collection (maybe something else but that is not important right now).
Personally I would use the map primarily because I associate forEach with mutating the collection and I want to avoid this. I also created a really small test regarding the performance but could not see any improvements when using forEach (I perfectly understand that small tests cannot give reliable results but still).
So what are the use-cases where one should choose forEach?
map is the better choice for this, because you're not trying to do anything with the strings yet, just map them to different strings.
forEach is designed to be the "final operation." As such, it doesn't return anything, and is all about mutating some state -- though not necessarily that of the original collection. For instance, you might use it to write elements to a file, having used other constructs (including map) to get those elements.
forEach terminates the stream and is exectued because of the side effect of the called Cosumer. It does not necessarily mutate the stream members.
map maps each stream element to a different value/object using a provided Function. A Stream <R> is returned on which more steps can act.
The forEach terminal operation might be useful in several cases: when you want to collect into some older class for which you don't have a proper collector or when you don't want to collect at all, but send you data somewhere outside (write into the database, print into OutputStream, etc.). There are many cases when the best way is to use both map (as intermediate operation) and forEach (as terminal operation).
I was wondering, whether there is a preferred way to get from a stream of lists to a collection containing the elements of all the lists in the stream.
I can think of two ways to get there:
final Stream<List<Integer>> stream = Stream.empty();
final List<Integer> one = stream.collect(ArrayList::new, ArrayList::addAll, ArrayList::addAll);
final List<Integer> two = stream.flatMap(List::stream).collect(Collectors.toList());
The second option looks much nicer to me, but I guess the first one is more efficient in parallel streams.
Are there further arguments for or against one of the two methods?
The main difference is that flatMap is an intermediate operation. while collect is a terminal operation.
So flatMap is the only way to process the flattened stream items if you want to do other operations than collecting immediately.
Further collect(ArrayList::new, ArrayList::addAll, ArrayList::addAll) is very hard to read given the fact that you have two identical method references ArrayList::addAll with completely different semantics.
Regarding parallel processing, your guess is wrong. The first one has lesser capabilities of parallel processing as it relies on ArrayList.addAll applied to the stream items (sub-lists) which can’t be broken into parallel sub-steps. In contrast, Collectors.toList() applied to a flatMap can do parallel processing of sub-list items if the particular Lists encountered in the stream support it. But this will be relevant only if you have a rather small stream of rather big sub-lists.
The only drawback of flatMap is the intermediate stream creation which adds an overhead in the case that you have a lot of very small sub-lists.
But in your example, the stream is empty so it doesn’t matter (scnr).
I think the intent of option two is much clearer than that of option one. It took me a few seconds to work out what was happening with the first one, it doesn't look "right" - although it seems valid. Option two was more obvious to me.
Essentially, the intent of what you are doing is a flatmap. If that's the case I'd expect to see flatmap used rather than using addAll().