Should I use shared mutable variable update in Java 8 Streams - java

Just iterating below list & adding into another shared mutable list via java 8 streams.
List<String> list1 = Arrays.asList("A1","A2","A3","A4","A5","A6","A7","A8","B1","B2","B3");
List<String> list2 = new ArrayList<>();
Consumer<String> c = t -> list2.add(t.startsWith("A") ? t : "EMPTY");
list1.stream().forEach(c);
list1.parallelStream().forEach(c);
list1.forEach(c);
What is the difference between above three iteration & which one we need to use. Are there any considerations?

Regardless of whether you use parallel or sequential Stream, you shouldn't use forEach when your goal is to generate a List. Use map with collect:
List<String> list2 =
list2.stream()
.map(item -> item.startsWith("A") ? item : "EMPTY")
.collect(Collectors.toList());

Functionally speaking,for the simple cases they are almost the same, but generally speaking, there are some hidden differences:
Lets start by quoting from Javadoc of forEach for iterable use-cases stating that:
performs the given action for each element of the Iterable until all
elements have been processed or the action throws an exception.
and also we can iterate over a collection and perform a given action on each element – by just passing a class that implements the Consumer interface
void forEach(Consumer<? super T> action)
https://docs.oracle.com/javase/8/docs/api/java/lang/Iterable.html#forEach-java.util.function.Consumer-
The order of Stream.forEach is random while Iterable.forEach is always executed in the iteration order of the Iterable.
If Iterable.forEach is iterating over a synchronized collection, Iterable.forEach takes the collection's lock once and holds it across all the calls to the action method. The Stream.forEach call uses the collection's spliterator, which does not lock
The action specified in Stream.forEach is required to be non-interfering while Iterable.forEach is allowed to set values in the underlying ArrayList without problems.
In Java, Iterators returned by Collection classes, e.g. ArrayList, HashSet, Vector, etc., are fail fast. This means that if you try to add() or remove() from the underlying data structure while iterating it, you get a ConcurrentModificationException.
https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html#fail-fast
More Info:
What is the difference between .foreach and .stream().foreach?
What is difference between Collection.stream().forEach() and Collection.forEach()?

When working with streams, you should write your code in a way that if you switch to parallel streams, it does not produce the wrong results.
Imagine if in your code you were doing reading and writing on the same shared memory (list2) and you distribute your process into several threads (using parallel streams). Then you are DOOMED. Therefore you have several options.
make your shared memory (list2) thread-safe. for example by using AtomicReferences
List<String> list2 = new ArrayList<>();
AtomicReference<List<String>> listSafe = new AtomicReference<>();
listSafe.getAndUpdate(strings -> {strings.add("newvalue"); return strings;});
or you can go with the purely functional approach (code with no side effects)
like the #Eran solution.

Related

What's the difference between list interface sort method and stream interface sorted method?

I'm interested in sorting an list of object based on date attribute in that object. I can either use list sort method.
list.sort( (a, b) -> a.getDate().compareTo(b.getDate()) );
Or I can use stream sorted method
List<E> l = list.stream()
.sorted( (a, b) -> a.getDate().compareTo(b.getDate()))
.collect(Collectors.toList());
Out of both above option which should we use and why?
I know the former one will update my original list and later one will not update the original but instead give me a fresh new list object.
So, I don't care my original list is getting updated or not. So which one is good option and why?
If you only need to sort your List, and don't need any other stream operations (such as filtering, mapping, etc...), there's no point in adding the overhead of creating a Stream and then creating a new List. It would be more efficient to just sort the original List.
If you wish to known which is best, your best option is to benchmark it: you may reuse my answer JMH test.
It should be noted that:
List::sort use Arrays::sort. It create an array before sorting. It does not exists for other Collection.
Stream::sorted is done as state full intermediate operation. This means the Stream need to remember its state.
Without benchmarking, I'd say that:
You should use collection.sort(). It is easier to read: collection.stream().sorted().collect(toList()) is way to long to read and unless you format your code well, you might have an headache (I exaggerate) before understanding that this line is simply sorting.
sort() on a Stream should be called:
if you filter many elements making the Stream effectively smaller in size than the collection (sorting N items then filtering N items is not the same than filtering N items then sorting K items with K <= N).
if you have a map transformation after the sort and you loose a way to sort using the original key.
If you use your stream with other intermediate operation, then sort might be required / useful:
collection.stream() // Stream<U> #0
.filter(...) // Stream<U> #1
.sorted() // Stream<U> #2
.map(...) // Stream<V> #3
.collect(toList()) // List<V> sorted by U.
;
In that example, the filter apply before the sort: the stream #1 is smaller than #0, so the cost of sorting with stream might be less than Collections.sort().
If all that you do is simply filtering, you may also use a TreeSet or a collectingAndThen operation:
collection.stream() // Stream<U> #0
.filter(...) // Stream<U> #1
.collect(toCollection(TreeSet::new))
;
Or:
collection.stream() // Stream<U>
.filter(...) // Stream<U>
.collect(collectingAndThen(toList(), list -> {
list.sort();
return list;
})); // List<V>
Streams have some overheads because it creates many new objects like a concrete Stream, a Collector, and a new List. So if you just want to sort a list and doesn't care about whether the original gets changed or not, use List.sort.
There is also Collections.sort, which is an older API. The difference between it and List.sort can be found here.
Stream.sorted is useful when you are doing other stream operations alongside sorting.
Your code can also be rewritten with Comparator:
list.sort(Comparator.comparing(YourClass::getDate)));
First one would be better in term of performance. In the first one, the sort method just compares the elements of the list and orders them. The second one will create a stream from your list, sort it and create a new list from that stream.
In your case, since you can update the first list, the first approach is the better, both in term of performance and memory consumption. The second one is convenient if you need to and with a stream, or if you have a stream and want to end up with a sorted list.
You use the first method
list.sort((a, b) -> a.getDate().compareTo(b.getDate()));
it's much faster than the second one and it didn't create a new intermediate object. You could use the second method when you want to do some additional stream operations (e.g. filtering, map).

Side effects of lambda expression in java 8

Please consider the below code snippet.
List<String> list = new ArrayList<String>();
list.add("A");
list.add("B");
list.add("C");
List<String> copyList = new ArrayList<String>();
Consumer<String> consumer = s->copyList.add(s);
list.stream().forEach(consumer);
Since we are using lambda expression, as per functional programming (pure functions) it should only compute the input & provide corresponding output.
But here in the example it is trying to add elements to the list which is neither input nor declared inside the lambda scope.
Is this a good practice, I mean, leading to any side effects?
forEach would be useless if it didn't produce side-effects, since it has no return value. Hence, whenever you use forEach you should be expecting side-effects to take place. Therefore there's nothing wrong with your example.
A Consumer<String> can print the String, or insert it into some database, or write it into some output file, or store it in some Collection (as in your example), etc...
From the Stream Javadoc:
A stream pipeline consists of a source (which might be an array, a collection, a generator function, an I/O channel, etc), zero or more intermediate operations (which transform a stream into another stream, such as Stream.filter(Predicate)), and a terminal operation (which produces a result or side-effect, such as Stream.count() or Stream.forEach(Consumer)).
Besides, if you look at the Javadoc of Consumer, you'll see that it's expected to have side-effects:
java.util.function.Consumer
Represents an operation that accepts a single input argument and returns no result. Unlike most other functional interfaces, Consumer is expected to operate via side-effects.
I guess this means Java Streams and functional interfaces were not designed to be used only for "purely" functional programming.
For a forEach or even a stream().forEach it is pretty straightforward. Your example works fine.
Be aware though that if you would do this with other streaming methods, then you could get some surprises: e.g. The following code prints absolutely nothing.
List<String> lst = Arrays.asList("a", "b", "c");
lst.stream().map(
s -> {
System.out.println(s);
return "-";
});
In this case, the stream acts more like a builder which prepares a process but does not execute it yet. It's only when a collect, count or find... method is called that the lambda is executed.
An easy way to spot this, is by looking at the return type of the map method, which in turn is again a Stream.
Having said that, I think for your specific example, there are easier alternatives.
List<String> list = Arrays.asList("A", "B", "C");
// this is the base pattern for transforming one list to another.
// the map applies a transformation.
List<String> copyList1 = list.stream().map(e -> e).collect(Collectors.toList());
// if it's a 1-to-1 mapping, then you don't really need the map.
List<String> copyList2 = list.stream().collect(Collectors.toList());
// in essence, you could of course just clone the list without streaming.
List<String> copyList3 = new ArrayList<>(list);

Collection to stream to a new collection

I'm looking for the most pain free way to filter a collection. I'm thinking something like
Collection<?> foo = existingCollection.stream().filter( ... ). ...
But I'm not sure how is best to go from the filter, to returning or populating another collection. Most examples seem to be like "and here you can print". Possible there's a constructor, or output method that I'm missing.
There’s a reason why most examples avoid storing the result into a Collection. It’s not the recommended way of programming. You already have a Collection, the one providing the source data and collections are of no use on its own. You want to perform certain operations on it so the ideal case is to perform the operation using the stream and skip storing the data in an intermediate Collection. This is what most examples try to suggest.
Of course, there are a lot of existing APIs working with Collections and there always will be. So the Stream API offers different ways to handle the demand for a Collection.
Get an unmodifiable List implementation containing all elements (JDK 16):
List<T> results = l.stream().filter(…).toList();
Get an arbitrary List implementation holding the result:
List<T> results = l.stream().filter(…).collect(Collectors.toList());
Get an unmodifiable List forbidding null like List.of(…) (JDK 10):
List<T> results = l.stream().filter(…).collect(Collectors.toUnmodifiableList());
Get an arbitrary Set implementation holding the result:
Set<T> results = l.stream().filter(…).collect(Collectors.toSet());
Get a specific Collection:
ArrayList<T> results =
l.stream().filter(…).collect(Collectors.toCollection(ArrayList::new));
Add to an existing Collection:
l.stream().filter(…).forEach(existing::add);
Create an array:
String[] array=l.stream().filter(…).toArray(String[]::new);
Use the array to create a list with a specific specific behavior (mutable, fixed size):
List<String> al=Arrays.asList(l.stream().filter(…).toArray(String[]::new));
Allow a parallel capable stream to add to temporary local lists and join them afterward:
List<T> results
= l.stream().filter(…).collect(ArrayList::new, List::add, List::addAll);
(Note: this is closely related to how Collectors.toList() is currently implemented, but that’s an implementation detail, i.e. there is no guarantee that future implementations of the toList() collectors will still return an ArrayList)
An example from java.util.stream's documentation:
List<String>results =
stream.filter(s -> pattern.matcher(s).matches())
.collect(Collectors.toList());
Collectors has a toCollection() method, I'd suggest looking this way.
As an example that is more in line with Java 8 style of functional programming:
Collection<String> a = Collections.emptyList();
List<String> result = a.stream().
filter(s -> s.length() > 0).
collect(Collectors.toList());
You would possibly want to use toList or toSet or toMap methods from Collectors class.
However to get more control the toCollection method can be used. Here is a simple example:
Collection<String> c1 = new ArrayList<>();
c1.add("aa");
c1.add("ab");
c1.add("ca");
Collection<String> c2 = c1.stream().filter(s -> s.startsWith("a")).collect(Collectors.toCollection(ArrayList::new));
Collection<String> c3 = c1.stream().filter(s -> s.startsWith("a")).collect(Collectors.toList());
c2.forEach(System.out::println); // prints-> aa ab
c3.forEach(System.out::println); // prints-> aa ab

Understanding Spliterator, Collector and Stream in Java 8

I am having trouble understanding the Stream interface in Java 8, especially where it has to do with the Spliterator and Collector interfaces. My problem is that I simply can't understand Spliterator and the Collector interfaces yet, and as a result, the Stream interface is still somewhat obscure to me.
What exactly is a Spliterator and a Collector, and how can I use them? If I am willing to write my own Spliterator or Collector (and probably my own Stream in that process), what should I do and not do?
I read some examples scattered around the web, but since everything here is still new and subject to changes, examples and tutorials are still very sparse.
You should almost certainly never have to deal with Spliterator as a user; it should only be necessary if you're writing Collection types yourself and also intending to optimize parallelized operations on them.
For what it's worth, a Spliterator is a way of operating over the elements of a collection in a way that it's easy to split off part of the collection, e.g. because you're parallelizing and want one thread to work on one part of the collection, one thread to work on another part, etc.
You should essentially never be saving values of type Stream to a variable, either. Stream is sort of like an Iterator, in that it's a one-time-use object that you'll almost always use in a fluent chain, as in the Javadoc example:
int sum = widgets.stream()
.filter(w -> w.getColor() == RED)
.mapToInt(w -> w.getWeight())
.sum();
Collector is the most generalized, abstract possible version of a "reduce" operation a la map/reduce; in particular, it needs to support parallelization and finalization steps. Examples of Collectors include:
summing, e.g. Collectors.reducing(0, (x, y) -> x + y)
StringBuilder appending, e.g. Collector.of(StringBuilder::new, StringBuilder::append, StringBuilder::append, StringBuilder::toString)
Spliterator basically means "splittable Iterator".
Single thread can traverse/process the entire Spliterator itself, but the Spliterator also has a method trySplit() which will "split off" a section for someone else (typically, another thread) to process -- leaving the current spliterator with less work.
Collector combines the specification of a reduce function (of map-reduce fame), with an initial value, and a function to combine two results (thus enabling results from Spliterated streams of work, to be combined.)
For example, the most basic Collector would have an initial vaue of 0, add an integer onto an existing result, and would 'combine' two results by adding them. Thus summing a spliterated stream of integers.
See:
Spliterator.trySplit()
Collector<T,A,R>
The following are examples of using the predefined collectors to perform common mutable reduction tasks:
// Accumulate names into a List
List<String> list = people.stream().map(Person::getName).collect(Collectors.toList());
// Accumulate names into a TreeSet
Set<String> set = people.stream().map(Person::getName).collect(Collectors.toCollection(TreeSet::new));
// Convert elements to strings and concatenate them, separated by commas
String joined = things.stream()
.map(Object::toString)
.collect(Collectors.joining(", "));
// Compute sum of salaries of employee
int total = employees.stream()
.collect(Collectors.summingInt(Employee::getSalary)));
// Group employees by department
Map<Department, List<Employee>> byDept
= employees.stream()
.collect(Collectors.groupingBy(Employee::getDepartment));
// Compute sum of salaries by department
Map<Department, Integer> totalByDept
= employees.stream()
.collect(Collectors.groupingBy(Employee::getDepartment,
Collectors.summingInt(Employee::getSalary)));
// Partition students into passing and failing
Map<Boolean, List<Student>> passingFailing =
students.stream()
.collect(Collectors.partitioningBy(s -> s.getGrade() >= PASS_THRESHOLD));
Interface Spliterator - is a core feature of Streams.
The stream() and parallelStream() default methods are presented in the Collection interface. These methods use the Spliterator through the call to the spliterator():
...
default Stream<E> stream() {
return StreamSupport.stream(spliterator(), false);
}
default Stream<E> parallelStream() {
return StreamSupport.stream(spliterator(), true);
}
...
Spliterator is an internal iterator that breaks the stream into the smaller parts. These smaller parts can be processed in parallel.
Among other methods, there are two most important to understand the Spliterator:
boolean tryAdvance(Consumer<? super T> action)
Unlike the Iterator, it tries to perform the operation with the next element.
If operation executed successfully, the method returns true. Otherwise, returns false - that means that there is absence of element or end of the stream.
Spliterator<T> trySplit()
This method allows to split a set of data into a many smaller sets according to one or another criteria (file size, number of lines, etc).

Add objects from stream to two different lists simultaneously

How can I add objects from one stream to two different lists simultaneously
Currently I am doing
body.getSurroundings().parallelStream()
.filter(o -> o.getClass().equals(ResourcePoint.class))
.map(o -> (ResourcePoint)o)
.filter(o -> !resourceMemory.contains(o))
.forEach(resourceMemory::add);
to add objects from my stream into a linkedlist "resourceMemory", but I also want to add the same objects to another list simultaneously, but I can't find the syntax for it. Is it possible or do I need to have two copies of this code for each list?
There are several fundamental errors you should understand first, before trying to expand your code.
First of all, forEach does not guaranty a particular order of element processing, so it’s likely the wrong tool for adding to a List, even for sequential streams, however, it is completely wrong to use with a parallel stream to add to a collection like LinkedList which is not thread safe, as the action will be performed concurrently.
But even if resourceMemory was a thread safe collection, your code still was broken as there is an interference between your filter condition and the terminal action. .filter(o -> !resourceMemory.contains(o)) queries the same list which you are modifying in the terminal action and it shouldn’t be hard to understand how this can brake even with thread-safe collections:
Two or more threads may process the filter and find that the element is not contained in the list, then all of them will add the element, contradicting your obvious intention of not having duplicates.
You could resort to forEachOrdered which will perform the action in order and non-concurrently:
body.getSurroundings().parallelStream()
.filter(o -> o instanceof ResourcePoint)
.map(o -> (ResourcePoint)o)
.forEachOrdered(o -> {// not recommended, just for explanation
if(!resourceMemory.contains(o))
resourceMemory.add(o);
});
This will work and it’s obvious how you could add to another list within that action, but it’s far away from recommended coding style. Also, the fact that this terminal action synchronizes with all processing threads will destroy any potential benefit of parallel processing, especially as the most expensive operation of this stream pipeline is invoking contains on a LinkedList which will (must) happen single-threaded.
The correct way to collect stream elements into a list is via, as the name suggests, collect:
List<ResourcePoint> resourceMemory
=body.getSurroundings().parallelStream()
.filter(o -> o instanceof ResourcePoint)
.map(o -> (ResourcePoint)o)
.distinct() // no duplicates
.collect(Collectors.toList()); // collect into a list
This doesn’t return a LinkedList, but you should rethink carefully whether you really need a LinkedList. In 99% of all cases, you don’t. If you really need a LinkedList, you can replace Collectors.toList() with Collectors.toCollection(LinkedList::new).
Now if you really must add to an existing list created outside of your control, which might already contain elements, you should consider the fact mentioned above, that you have to ensure single-threaded access to a non-thread-safe list anyway, so there’s no benefit from doing it from within the parallel stream at all. In most cases, it’s more efficient to let the stream work independently from that list and add the result in a single threaded step afterwards:
Set<ResourcePoint> newElements=
body.getSurroundings().parallelStream()
.filter(o -> o instanceof ResourcePoint)
.map(o -> (ResourcePoint)o)
.collect(Collectors.toCollection(LinkedHashSet::new));
newElements.removeAll(resourceMemory);
resourceMemory.addAll(newElements);
Here, we collect into a LinkedHashSet which implies maintenance of the encounter order and sorting out duplicates within the new elements, then use removeAll on the new elements to remove existing elements of the target list (here we benefit from the hash set nature of the temporary collection), finally, the new elements are added to the target list, which, as explained, must happen single-threaded anyway for a target collection which isn’t thread safe.
It’s easy to add the newElements to another target collection with this solution, much easier than writing a custom collector for producing two lists during the stream processing. But note that the stream operations as written above are way too cheep to assume any benefit from parallel processing. You would need a very large number of elements to compensate the initial multi-threading overhead. It’s even possible that there is no number for which it ever pays off.
Instead of
.forEach(resourceMemory::add)
You could invoke
.forEach(o -> {
resourceMemory.add(o);
otherResource.add(o);
})
or put the add operations in a separate method so you could provide a method reference
.forEach(this::add)
void add(ResourcePoint p) {
resourceMemory.add(o);
otherResource.add(o);
}
But bear in mind, that the order of insertion maybe different with each run as you use a parallel stream.

Categories

Resources