Java Streams - Buffering huge streams

Java Streams - Buffering huge streams - java

I'm trying to collapse several streams backed by huge amounts of data into one, then buffer them. I'm able to collapse these streams into one stream of items with no problem. When I attempt to buffer/chunk the streams, though, it attempts to fully buffer the first stream, which instantly fills up my memory.
It took me a while to narrow down the issue to a minimum test case, but there's some code below.
I can refactor things such that I don't run into this issue, but without understanding why exactly this blows up, I feel like using streams is just a ticking time bomb.
I took inspiration from Buffer Operator on Java 8 Streams for the buffering.
import java.util.*;
import java.util.stream.LongStream;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
public class BreakStreams
{
//#see https://stackoverflow.com/questions/47842871/buffer-operator-on-java-8-streams
/**
* Batch a stream into chunks
*/
public static <T> Stream<List<T>> buffer(Stream<T> stream, final long count)
{
final Iterator<T> streamIterator = stream.iterator();
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(new Iterator<List<T>>()
{
#Override public boolean hasNext()
{
return streamIterator.hasNext();
}
#Override public List<T> next()
{
List<T> intermediate = new ArrayList<>();
for (long v = 0; v < count && hasNext(); v++)
{
intermediate.add(streamIterator.next());
}
return intermediate;
}
}, 0), false);
}
public static void main(String[] args)
{
//create streams from huge datasets
Stream<Long> streams = Stream.of(LongStream.range(0, Integer.MAX_VALUE).boxed(),
LongStream.range(0, Integer.MAX_VALUE).boxed())
//collapse into one stream
.flatMap(x -> x);
//iterating over the stream one item at a time is OK..
// streams.forEach(x -> {
//buffering the stream is NOT ok, you will go OOM
buffer(streams, 25).forEach(x -> {
try
{
Thread.sleep(2500);
}
catch (InterruptedException ignore)
{
}
System.out.println(x);
});
}
}

This seems to be connected to the older issue “Why filter() after flatMap() is "not completely" lazy in Java streams?”. While that issue has been fixed for the Stream’s builtin operations, it seems to still exist when we try to iterate over a flatmapped stream externally.
We can simplify the code to reproduce the problem to
Stream.of(LongStream.range(0, Integer.MAX_VALUE))
.flatMapToLong(x -> x)
.iterator().hasNext();
Note that using Spliterator is affected as well
Stream.of(LongStream.range(0, Integer.MAX_VALUE))
.flatMapToLong(x -> x)
.spliterator()
.tryAdvance((long l) -> System.out.println("first item: "+l));
Both try to buffer elements until ultimately bailing out with an OutOfMemoryError.
Since spliterator().forEachRemaining(…) seems not to be affected, you could implement a solution which works for your use case of forEach, but it would be fragile, as it would still exhibit the problem for short-circuiting stream operations.
public static <T> Stream<List<T>> buffer(Stream<T> stream, final int count) {
boolean parallel = stream.isParallel();
Spliterator<T> source = stream.spliterator();
return StreamSupport.stream(
new Spliterators.AbstractSpliterator<List<T>>(
(source.estimateSize()+count-1)/count, source.characteristics()
&(Spliterator.SIZED|Spliterator.DISTINCT|Spliterator.ORDERED)
| Spliterator.NONNULL) {
List<T> list;
Consumer<T> c = t -> list.add(t);
#Override
public boolean tryAdvance(Consumer<? super List<T>> action) {
if(list == null) list = new ArrayList<>(count);
if(!source.tryAdvance(c)) return false;
do {} while(list.size() < count && source.tryAdvance(c));
action.accept(list);
list = null;
return true;
}
#Override
public void forEachRemaining(Consumer<? super List<T>> action) {
source.forEachRemaining(t -> {
if(list == null) list = new ArrayList<>(count);
list.add(t);
if(list.size() == count) {
action.accept(list);
list = null;
}
});
if(list != null) {
action.accept(list);
list = null;
}
}
}, parallel);
}
But note that Spliterator based solutions are preferable in general, as they support carrying additional information enabling optimizations and have lower iteration costs in a lot of use cases. So this is the way to go once this issue has been fixed in the JDK code.
As a workaround, you can use Stream.concat(…) to combine streams, but it has an explicit warning about not to combine too many streams at once in its documentation:
Use caution when constructing streams from repeated concatenation. Accessing an element of a deeply concatenated stream can result in deep call chains, or even StackOverflowException [sic].
The throwable’s name has been corrected to StackOverflowError in Java 9’s documentation

Related

How to remove all elements that match a certain condition except for N greatest of them with Stream API

My question is: is there a better way to implement this task?
I have a list of orderable elements (in this example by age, the youngest first).
And I want to delete all elements that fulfill a condition (in this example red elements) but keep the first 2 of them.
Stream<ElementsVO> stream = allElements.stream();
Stream<ElementsVO> redStream = stream.filter(elem->elem.getColor()==RED).sorted((c1, c2) -> { return c1.getAge() - c2.getAge();
}).limit(2);
Stream<ElementsVO> nonRedStream=stream.filter(elem->elem.getColor()!=RED);
List<ElementsVO> resultList = Stream.concat(redStream,nonRedStream).sorted((c1, c2) -> { return c1.getAge() - c2.getAge();
}).collect(Collectors.toList());
Any idea to improve this? Any way to implement an accumulator function or something like that with streams?

You can technically do this with a stateful predicate:
Predicate<ElementsV0> statefulPredicate = new Predicate<ElementsV0>() {
private int reds = 0;
#Override public boolean test(ElementsV0 e) {
if (elem.getColor() == RED) {
reds++;
return reds < 2;
}
return true;
}
};
Then:
List<ElementsVO> resultList =
allElements.stream()
.sorted(comparingInt(ElementsV0::getAge))
.filter(statefulPredicate)
.collect(toList());
This might work, but it is a violation of the Stream API: the documentation for Stream.filter says that the predicate should be stateless, which in general allows the stream implementation to apply the filter in any order. For small input lists, streamed sequentially, this will almost certainly be the appearance order in the list, but it's not guaranteed.
Caveat emptor. Your current way works, although you could do the partitioning of the list more efficiently using Collectors.partitioningBy to avoid iterating it twice.

You can implement a custom collector that will maintain two separate collections of RED and non-RED element.
And since you need only two red elements having the greatest age to improve performance, you can introduce a partial sorting. I.e. collection of non-red element needs to maintain an order and always must be of size 2 at most, with that overhead of sorting will be far less significant in comparison to sorting of elements having the property of RED in order to pick only two of them.
In order to create a custom collector, you might make use of the static method Collector.of() which expects the following arguments:
Supplier Supplier<A> is meant to provide a mutable container which store elements of the stream. Because we need to separate elements by color into two groups as a container, we can use a map that will contain only 2 keys (true and false), denoting whether elements mapped to this key are red. In order to store red-elements and perform a partial sorting, we need a collection that is capable of maintaining the order. PriorityQueue is a good choice for that purpose. To store all other elements, I've used ArrayDeque, which doesn't maintain the order and as fast as ArrayList.
Accumulator BiConsumer<A,T> defines how to add elements into the mutable container provided by the supplier. For this task, the accumulator needs to guarantee that the queue, containing red-elements will not exceed the given size by rejecting values that are smaller than the lowest value previously added to the queue and by removing the lowest value if the size has reached the limit and a new value needs to be added. This functionality extracted into a separate method tryAdd()
Combiner BinaryOperator<A> combiner() establishes a rule on how to merge two containers obtained while executing stream in parallel. Here, combiner rely on the same logic that was described for accumulator.
Finisher Function<A,R> is meant to produce the final result by transforming the mutable container. In the code below, finisher dumps the contents of both queues into a stream, sorts them and collects into an immutable list.
Characteristics allow fine-tuning the collector by providing additional information on how it should function. Here a characteristic Collector.Characteristics.UNORDERED is being applied. Which indicates that the order in which partial results of the reduction produced in parallel is not significant, that can improve performance of this collector with parallel streams.
The code might look like this:
public static void main(String[] args) {
List<ElementsVO> allElements =
List.of(new ElementsVO(Color.RED, 25), new ElementsVO(Color.RED, 23), new ElementsVO(Color.RED, 27),
new ElementsVO(Color.BLACK, 19), new ElementsVO(Color.GREEN, 23), new ElementsVO(Color.GREEN, 29));
Comparator<ElementsVO> byAge = Comparator.comparing(ElementsVO::getAge);
List<ElementsVO> resultList = allElements.stream()
.collect(getNFiltered(byAge, element -> element.getColor() != Color.RED, 2));
resultList.forEach(System.out::println);
}
The method below is responsible for creating of a collector that partition the elements based on the given predicate and will sort them in accordance with the provided comparator.
public static <T> Collector<T, ?, List<T>> getNFiltered(Comparator<T> comparator,
Predicate<T> condition,
int limit) {
return Collector.of(
() -> Map.of(true, new PriorityQueue<>(comparator),
false, new ArrayDeque<>()),
(Map<Boolean, Queue<T>> isRed, T next) -> {
if (condition.test(next)) isRed.get(false).add(next);
else tryAdd(isRed.get(true), next, comparator, limit);
},
(Map<Boolean, Queue<T>> left, Map<Boolean, Queue<T>> right) -> {
left.get(false).addAll(right.get(false));
left.get(true).forEach(next -> tryAdd(left.get(true), next, comparator, limit));
return left;
},
(Map<Boolean, Queue<T>> isRed) -> isRed.values().stream()
.flatMap(Queue::stream).sorted(comparator).toList(),
Collector.Characteristics.UNORDERED
);
}
This method is responsible for adding the next red-element into the priority queue. It expects a comparator in order to be able to determine whether the next element should be added or discarded, and a value of the maximum size of the queue (2), to check if it was exceeded.
public static <T> void tryAdd(Queue<T> queue, T next, Comparator<T> comparator, int size) {
if (queue.size() == size && comparator.compare(queue.element(), next) < 0)
queue.remove(); // if the next element is greater than the smallest element in the queue and max size has been exceeded, the smallest element needs to be removed from the queue
if (queue.size() < size) queue.add(next);
}
Output
lementsVO{color=BLACK, age=19}
ElementsVO{color=GREEN, age=23}
ElementsVO{color=RED, age=25}
ElementsVO{color=RED, age=27}
ElementsVO{color=GREEN, age=29}

I wrote a generic Collector with a predicate and a limit of elements to add which match the predicate:
public class LimitedMatchCollector<T> implements Collector<T, List<T>, List<T>> {
private Predicate<T> filter;
private int limit;
public LimitedMatchCollector(Predicate<T> filter, int limit)
{
super();
this.filter = filter;
this.limit = limit;
}
private int count = 0;
#Override
public Supplier<List<T>> supplier() {
return () -> new ArrayList<T>();
}
#Override
public BiConsumer<List<T>, T> accumulator() {
return this::accumulator;
}
#Override
public BinaryOperator<List<T>> combiner() {
return this::combiner;
}
#Override
public Set<Characteristics> characteristics() {
return Stream.of(Characteristics.IDENTITY_FINISH)
.collect(Collectors.toCollection(HashSet::new));
}
public List<T> accumulator(List<T> list , T e) {
if (filter.test(e)) {
if (count >= limit) {
return list;
}
count++;
}
list.add(e);
return list;
}
public List<T> combiner(List<T> left , List<T> right) {
right.forEach( e -> {
if (filter.test(e)) {
if (count < limit) {
left.add(e);
count++;
}
}
});
return left;
}
#Override
public Function<List<T>, List<T>> finisher()
{
return Function.identity();
}
}
Usage:
List<ElementsVO> list = Arrays.asList(new ElementsVO("BLUE", 1)
,new ElementsVO("BLUE", 2) // made color a String
,new ElementsVO("RED", 3)
,new ElementsVO("RED", 4)
,new ElementsVO("GREEN", 5)
,new ElementsVO("RED", 6)
,new ElementsVO("YELLOW", 7)
);
System.out.println(list.stream().collect(new LimitedMatchCollector<ElementsVO>( (e) -> "RED".equals(e.getColor()),2)));

How can I prevent mutation of a list of iterators?

I would like to avoid the mutation of the input list of iterators tests by others. I only want others to run on a deep copy of tests.
How can this be achieved in Java?
Here is an example showing the effect of the mutation on tests. Both of the two parts are sorting the input. But the second part has nothing to be sorted since the mutation from the first part iterated the iterators to the end.
You can run the following example online here:
https://onlinegdb.com/NC4WzLzmt
import java.util.*;
public class ImmutableExample {
public static void main(String[] args) {
System.out.println("sort on demand");
List<Iterator<Integer>> mutableTests = Arrays.asList(
Arrays.asList(1, 2).iterator(),
Arrays.asList(0).iterator(),
Collections.emptyIterator()
);
List<Iterator<Integer>> tests = Collections.unmodifiableList(mutableTests);
MergingIterator mergingIterator = new MergingIterator(tests);
while (mergingIterator.hasNext()) {
System.out.println(mergingIterator.next());
}
System.out.println("sort all at once");
/* uncomment the following will see the same result:*/
// tests = Arrays.asList(
// Arrays.asList(1, 2).iterator(),
// Arrays.asList(0).iterator(),
// Collections.emptyIterator()
// );
MergeKSortedIterators sol = new MergeKSortedIterators();
Iterable<Integer> result = sol.mergeKSortedIterators(tests);
for (Integer num : result) {
System.out.println(num);
}
}
}
class PeekingIterator implements Iterator<Integer>, Comparable<PeekingIterator> {
Iterator<Integer> iterator;
Integer peekedElement;
boolean hasPeeked;
public PeekingIterator(Iterator<Integer> iterator) {
this.iterator = iterator;
}
public boolean hasNext() {
return hasPeeked || iterator.hasNext();
}
public Integer next() {
int nextElem = hasPeeked ? peekedElement : iterator.next();
hasPeeked = false;
return nextElem;
}
public Integer peek() {
peekedElement = hasPeeked ? peekedElement : iterator.next();
hasPeeked = true;
return peekedElement;
}
#Override
public int compareTo(PeekingIterator that) {
return this.peek() - that.peek();
}
}
class MergingIterator implements Iterator<Integer> {
Queue<PeekingIterator> minHeap;
public MergingIterator(List<Iterator<Integer>> iterators) {
// minHeap = new PriorityQueue<>((x, y) -> x.peek().compareTo(y.peek()));
minHeap = new PriorityQueue<>();
for (Iterator<Integer> iterator : iterators) {
if (iterator.hasNext()) {
minHeap.offer(new PeekingIterator(iterator));
}
}
}
public boolean hasNext() {
return !minHeap.isEmpty();
}
public Integer next() {
PeekingIterator nextIter = minHeap.poll();
Integer next = nextIter.next();
if (nextIter.hasNext()) {
minHeap.offer(nextIter);
}
return next;
}
}
class MergeKSortedIterators {
public Iterable<Integer> mergeKSortedIterators(List<Iterator<Integer>> iteratorList) {
List<Integer> result = new ArrayList<>();
if (iteratorList.isEmpty()) {
return result;
}
PriorityQueue<PeekingIterator> pq = new PriorityQueue<>();
for (Iterator<Integer> iterator : iteratorList) {
if (iterator.hasNext()) {
pq.add(new PeekingIterator(iterator));
}
}
while (!pq.isEmpty()) {
PeekingIterator curr = pq.poll();
// result.add(curr.peek());
// cannot use this one as hasNext() checks on `hasPeeked`
result.add(curr.next());
if (curr.hasNext()) {
pq.add(curr);
}
}
return result;
}
}

This question seems to be based on a misunderstanding ... or two.
How can I prevent mutation of a list of iterators?
You need to distinguish between the mutability of a list, and the mutability of the items in the list. I think you are actually asking about the latter. (And as such, the list is not really relevant to the question. As we shall see.)
I would like to avoid the mutation of the input list of iterators tests by others.
Again, you appear to be asking about the list, but I think you actually mean to ask about the iterators.
I only want others to run on a deep copy of tests.
This implies you want the iterators to be immutable.
Here's the problem:
An Iterator is an inherently stateful / mutable object. Indeed, there is no way to implement next() without mutating the iterator object.
Iterator objects are typically not deep copyable. They typically don't support clone() or public constructors, and they typically do not implement Serializable. (Indeed, if they were serializable, the semantics of serialize / deserialize would be problematic.)
So basically, your idea of a list of immutable iterators or a list that (somehow) produces deep copies of iterators is not practical.
You commented:
So List<Iterator<Integer>> tests = Collections.unmodifiableList(mutableTests); cannot produce an unmodifiable list for List<Iterator<Integer>>?
Well, yes it can. But that doesn't solve the problem. You need a list of unmodifiable iterators rather than an unmodifiable list of iterators.
Possible solutions:
You could just recreate the list of iterators from their base collections for each test run.
Use Iterable instead of Iterator. The collection types you are using all implement Iterable, and the third iterator could be created from an empty list.
List<Iterable<Integer>> tests = Arrays.asList(
Arrays.asList(1, 2),
Arrays.asList(0),
Collections.emptyList()
);
// to use them ...
for (Iterable<Integer> iterable : tests) {
Iterator<Integer> iterator = iterable.iterator();
// etc ...
}
If your iterators could not be recreated (for example, if you were iterating a source that couldn't be created or "rewound"), you could conceivably implement a caching iterator wrapper that remembered all of the elements in the iteration sequence and could either reset to the start of the sequence, or generate a new iterator to replay the sequence. (But that would be overkill here.)

Java 8 One Stream To Multiple Map

Lets say I have huge webserver log file that does not fit in memory. I need to stream this file to a mapreduce method and save to database. I do this using Java 8 stream api. For example, I get a list after the mapreduce process such as, consumption by client, consumption by ip, consumption by content. But, my needs are not that like that given in my example. Since I cannot share code, I just want to give basic example.
By Java 8 Stream Api, I want to read file exactly once, get 3 lists at the same time, while I am streaming file, parallel or sequential. But parallel would be good. Is there any way to do that?

Generally collecting to anything other than standard API's gives you is pretty easy via a custom Collector. In your case collecting to 3 lists at a time (just a small example that compiles, since you can't share your code either):
private static <T> Collector<T, ?, List<List<T>>> to3Lists() {
class Acc {
List<T> left = new ArrayList<>();
List<T> middle = new ArrayList<>();
List<T> right = new ArrayList<>();
List<List<T>> list = Arrays.asList(left, middle, right);
void add(T elem) {
// obviously do whatever you want here
left.add(elem);
middle.add(elem);
right.add(elem);
}
Acc merge(Acc other) {
left.addAll(other.left);
middle.addAll(other.middle);
right.addAll(other.right);
return this;
}
public List<List<T>> finisher() {
return list;
}
}
return Collector.of(Acc::new, Acc::add, Acc::merge, Acc::finisher);
}
And using it via:
Stream.of(1, 2, 3)
.collect(to3Lists());
Obviously this custom collector does not do anything useful, but just an example of how you could work with it.

I have adapted the answer to this question to your case. The custom Spliterator will "split" the stream into multiple streams that collect by different properties:
#SafeVarargs
public static <T> long streamForked(Stream<T> source, Consumer<Stream<T>>... consumers)
{
return StreamSupport.stream(new ForkingSpliterator<>(source, consumers), false).count();
}
public static class ForkingSpliterator<T>
extends AbstractSpliterator<T>
{
private Spliterator<T> sourceSpliterator;
private List<BlockingQueue<T>> queues = new ArrayList<>();
private boolean sourceDone;
#SafeVarargs
private ForkingSpliterator(Stream<T> source, Consumer<Stream<T>>... consumers)
{
super(Long.MAX_VALUE, 0);
sourceSpliterator = source.spliterator();
for (Consumer<Stream<T>> fork : consumers)
{
LinkedBlockingQueue<T> queue = new LinkedBlockingQueue<>();
queues.add(queue);
new Thread(() -> fork.accept(StreamSupport.stream(new ForkedConsumer(queue), false))).start();
}
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
sourceDone = !sourceSpliterator.tryAdvance(t -> queues.forEach(queue -> queue.offer(t)));
return !sourceDone;
}
private class ForkedConsumer
extends AbstractSpliterator<T>
{
private BlockingQueue<T> queue;
private ForkedConsumer(BlockingQueue<T> queue)
{
super(Long.MAX_VALUE, 0);
this.queue = queue;
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
while (queue.peek() == null)
{
if (sourceDone)
{
// element is null, and there won't be no more, so "terminate" this sub stream
return false;
}
}
// push to consumer pipeline
action.accept(queue.poll());
return true;
}
}
}
You can use it as follows:
streamForked(Stream.of(new Row("content1", "client1", "location1", 1),
new Row("content2", "client1", "location1", 2),
new Row("content1", "client1", "location2", 3),
new Row("content2", "client2", "location2", 4),
new Row("content1", "client2", "location2", 5)),
rows -> System.out.println(rows.collect(Collectors.groupingBy(Row::getClient,
Collectors.groupingBy(Row::getContent,
Collectors.summingInt(Row::getConsumption))))),
rows -> System.out.println(rows.collect(Collectors.groupingBy(Row::getClient,
Collectors.groupingBy(Row::getLocation,
Collectors.summingInt(Row::getConsumption))))),
rows -> System.out.println(rows.collect(Collectors.groupingBy(Row::getContent,
Collectors.groupingBy(Row::getLocation,
Collectors.summingInt(Row::getConsumption))))));
// Output
// {client2={location2=9}, client1={location1=3, location2=3}}
// {client2={content2=4, content1=5}, client1={content2=2, content1=4}}
// {content2={location1=2, location2=4}, content1={location1=1, location2=8}}
Note that you can do pretty much anything you want with your the copies of the stream. As per your example, I used a stacked groupingBy collector to group the rows by two properties and then summed up the int property. So the result will be a Map<String, Map<String, Integer>>. But you could also use it for other scenarios:
rows -> System.out.println(rows.count())
rows -> rows.forEach(row -> System.out.println(row))
rows -> System.out.println(rows.anyMatch(row -> row.getConsumption() > 3))

Iterate over ConcurrentHashMap while deleting entries

I want to periodically iterate over a ConcurrentHashMap while removing entries, like this:
for (Iterator<Entry<Integer, Integer>> iter = map.entrySet().iterator(); iter.hasNext(); ) {
Entry<Integer, Integer> entry = iter.next();
// do something
iter.remove();
}
The problem is that another thread may be updating or modifying values while I'm iterating. If that happens, those updates can be lost forever, because my thread only sees stale values while iterating, but the remove() will delete the live entry.
After some consideration, I came up with this workaround:
map.forEach((key, value) -> {
// delete if value is up to date, otherwise leave for next round
if (map.remove(key, value)) {
// do something
}
});
One problem with this is that it won't catch modifications to mutable values that don't implement equals() (such as AtomicInteger). Is there a better way to safely remove with concurrent modifications?

Your workaround works but there is one potential scenario. If certain entries have constant updates map.remove(key,value) may never return true until updates are over.
If you use JDK8 here is my solution
for (Iterator<Entry<Integer, Integer>> iter = map.entrySet().iterator(); iter.hasNext(); ) {
Entry<Integer, Integer> entry = iter.next();
Map.compute(entry.getKey(), (k, v) -> f(v));
//do something for prevValue
}
....
private Integer prevValue;
private Integer f(Integer v){
prevValue = v;
return null;
}
compute() will apply f(v) to the value and in our case assign the value to the global variable and remove the entry.
According to Javadoc it is atomic.
Attempts to compute a mapping for the specified key and its current mapped value (or null if there is no current mapping). The entire method invocation is performed atomically. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this Map.

Your workaround is actually pretty good. There are other facilities on top of which you can build a somewhat similar solution (e.g. using computeIfPresent() and tombstone values), but they have their own caveats and I have used them in slightly different use-cases.
As for using a type that doesn't implement equals() for the map values, you can use your own wrapper on top of the corresponding type. That's the most straightforward way to inject custom semantics for object equality into the atomic replace/remove operations provided by ConcurrentMap.
Update
Here's a sketch that shows how you can build on top of the ConcurrentMap.remove(Object key, Object value) API:
Define a wrapper type on top of the mutable type you use for the values, also defining your custom equals() method building on top of the current mutable value.
In your BiConsumer (the lambda you're passing to forEach), create a deep copy of the value (which is of type your new wrapper type) and perform your logic determining whether the value needs to be removed on the copy.
If the value needs to be removed, call remove(myKey, myValueCopy).
If there have been some concurrent changes while you were calculating whether the value needs to be removed, remove(myKey, myValueCopy) will return false (barring ABA problems, which are a separate topic).
Here's some code illustrating this:
import java.util.Random;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.atomic.AtomicInteger;
public class Playground {
private static class AtomicIntegerWrapper {
private final AtomicInteger value;
AtomicIntegerWrapper(int value) {
this.value = new AtomicInteger(value);
}
public void set(int value) {
this.value.set(value);
}
public int get() {
return this.value.get();
}
#Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (!(obj instanceof AtomicIntegerWrapper)) {
return false;
}
AtomicIntegerWrapper other = (AtomicIntegerWrapper) obj;
if (other.value.get() == this.value.get()) {
return true;
}
return false;
}
public static AtomicIntegerWrapper deepCopy(AtomicIntegerWrapper wrapper) {
int wrapped = wrapper.get();
return new AtomicIntegerWrapper(wrapped);
}
}
private static final ConcurrentMap<Integer, AtomicIntegerWrapper> MAP
= new ConcurrentHashMap<>();
private static final int NUM_THREADS = 3;
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < 10; ++i) {
MAP.put(i, new AtomicIntegerWrapper(1));
}
Thread.sleep(1);
for (int i = 0; i < NUM_THREADS; ++i) {
new Thread(() -> {
Random rnd = new Random();
while (!MAP.isEmpty()) {
MAP.forEach((key, value) -> {
AtomicIntegerWrapper elem = MAP.get(key);
if (elem == null) {
System.out.println("Oops...");
} else if (elem.get() == 1986) {
elem.set(1);
} else if ((rnd.nextInt() & 128) == 0) {
elem.set(1986);
}
});
}
}).start();
}
Thread.sleep(1);
new Thread(() -> {
Random rnd = new Random();
while (!MAP.isEmpty()) {
MAP.forEach((key, value) -> {
AtomicIntegerWrapper elem =
AtomicIntegerWrapper.deepCopy(MAP.get(key));
if (elem.get() == 1986) {
try {
Thread.sleep(10);
} catch (Exception e) {}
boolean replaced = MAP.remove(key, elem);
if (!replaced) {
System.out.println("Bailed out!");
} else {
System.out.println("Replaced!");
}
}
});
}
}).start();
}
}
You'll see printouts of "Bailed out!", intermixed with "Replaced!" (removal was successful, as there were no concurrent updates that you care about) and the calculation will stop at some point.
If you remove the custom equals() method and continue to use a copy, you'll see an endless stream of "Bailed out!", because the copy is never considered equal to the value in the map.
If you don't use a copy, you won't see "Bailed out!" printed out, and you'll hit the problem you're explaining - values are removed regardless of concurrent changes.

Let us consider what options you have.
Create your own Container-class with isUpdated() operation and use your own workaround.
If your map contains just a few elements and you are iterating over the map very frequently compared against put/delete operation. It could be a good choice to use CopyOnWriteArrayList
CopyOnWriteArrayList<Entry<Integer, Integer>> lookupArray = ...;
The other option is to implement your own CopyOnWriteMap
public class CopyOnWriteMap<K, V> implements Map<K, V>{
private volatile Map<K, V> currentMap;
public V put(K key, V value) {
synchronized (this) {
Map<K, V> newOne = new HashMap<K, V>(this.currentMap);
V val = newOne.put(key, value);
this.currentMap = newOne; // atomic operation
return val;
}
}
public V remove(Object key) {
synchronized (this) {
Map<K, V> newOne = new HashMap<K, V>(this.currentMap);
V val = newOne.remove(key);
this.currentMap = newOne; // atomic operation
return val;
}
}
[...]
}
There is a negative side effect. If you are using copy-on-write Collections your updates will be never lost, but you can see some former deleted entry again.
Worst case: deleted entry will be restored every time if map get copied.

How to pop the first element in a java stream? [duplicate]

Java 8 introduced a Stream class that resembles Scala's Stream, a powerful lazy construct using which it is possible to do something like this very concisely:
def from(n: Int): Stream[Int] = n #:: from(n+1)
def sieve(s: Stream[Int]): Stream[Int] = {
s.head #:: sieve(s.tail filter (_ % s.head != 0))
}
val primes = sieve(from(2))
primes takeWhile(_ < 1000) print // prints all primes less than 1000
I wondered if it is possible to do this in Java 8, so I wrote something like this:
IntStream from(int n) {
return IntStream.iterate(n, m -> m + 1);
}
IntStream sieve(IntStream s) {
int head = s.findFirst().getAsInt();
return IntStream.concat(IntStream.of(head), sieve(s.skip(1).filter(n -> n % head != 0)));
}
IntStream primes = sieve(from(2));
Fairly simple, but it produces java.lang.IllegalStateException: stream has already been operated upon or closed because both findFirst() and skip() are terminal operations on Stream which can be done only once.
I don't really have to use up the stream twice since all I need is the first number in the stream and the rest as another stream, i.e. equivalent of Scala's Stream.head and Stream.tail. Is there a method in Java 8 Stream that I can use to achieve this?
Thanks.

Even if you hadn’t the problem that you can’t split an IntStream, you code didn’t work because you are invoking your sieve method recursively instead of lazily. So you had an infinity recursion before you could query your resulting stream for the first value.
Splitting an IntStream s into a head and a tail IntStream (which has not yet consumed) is possible:
PrimitiveIterator.OfInt it = s.iterator();
int head = it.nextInt();
IntStream tail = IntStream.generate(it::next).filter(i -> i % head != 0);
At this place you need a construct of invoking sieve on the tail lazily. Stream does not provide that; concat expects existing stream instances as arguments and you can’t construct a stream invoking sieve lazily with a lambda expression as lazy creation works with mutable state only which lambda expressions do not support. If you don’t have a library implementation hiding the mutable state you have to use a mutable object. But once you accept the requirement of mutable state, the solution can be even easier than your first approach:
IntStream primes = from(2).filter(i -> p.test(i)).peek(i -> p = p.and(v -> v % i != 0));
IntPredicate p = x -> true;
IntStream from(int n)
{
return IntStream.iterate(n, m -> m + 1);
}
This will recursively create a filter but in the end it doesn’t matter whether you create a tree of IntPredicates or a tree of IntStreams (like with your IntStream.concat approach if it did work). If you don’t like the mutable instance field for the filter you can hide it in an inner class (but not in a lambda expression…).

My StreamEx library has now headTail() operation which solves the problem:
public static StreamEx<Integer> sieve(StreamEx<Integer> input) {
return input.headTail((head, tail) ->
sieve(tail.filter(n -> n % head != 0)).prepend(head));
}
The headTail method takes a BiFunction which will be executed at most once during the stream terminal operation execution. So this implementation is lazy: it does not compute anything until traversal starts and computes only as much prime numbers as requested. The BiFunction receives a first stream element head and the stream of the rest elements tail and can modify the tail in any way it wants. You may use it with predefined input:
sieve(IntStreamEx.range(2, 1000).boxed()).forEach(System.out::println);
But infinite stream work as well
sieve(StreamEx.iterate(2, x -> x+1)).takeWhile(x -> x < 1000)
.forEach(System.out::println);
// Not the primes till 1000, but 1000 first primes
sieve(StreamEx.iterate(2, x -> x+1)).limit(1000).forEach(System.out::println);
There's also alternative solution using headTail and predicate concatenation:
public static StreamEx<Integer> sieve(StreamEx<Integer> input, IntPredicate isPrime) {
return input.headTail((head, tail) -> isPrime.test(head)
? sieve(tail, isPrime.and(n -> n % head != 0)).prepend(head)
: sieve(tail, isPrime));
}
sieve(StreamEx.iterate(2, x -> x+1), i -> true).limit(1000).forEach(System.out::println);
It interesting to compare recursive solutions: how many primes they capable to generate.
#John McClean solution (StreamUtils)
John McClean solutions are not lazy: you cannot feed them with infinite stream. So I just found by trial-and-error the maximal allowed upper bound (17793) (after that StackOverflowError occurs):
public void sieveTest(){
sieve(IntStream.range(2, 17793).boxed()).forEach(System.out::println);
}
#John McClean solution (Streamable)
public void sieveTest2(){
sieve(Streamable.range(2, 39990)).forEach(System.out::println);
}
Increasing upper limit above 39990 results in StackOverflowError.
#frhack solution (LazySeq)
LazySeq<Integer> ints = integers(2);
LazySeq primes = sieve(ints); // sieve method from #frhack answer
primes.forEach(p -> System.out.println(p));
Result: stuck after prime number = 53327 with enormous heap allocation and garbage collection taking more than 90%. It took several minutes to advance from 53323 to 53327, so waiting more seems impractical.
#vidi solution
Prime.stream().forEach(System.out::println);
Result: StackOverflowError after prime number = 134417.
My solution (StreamEx)
sieve(StreamEx.iterate(2, x -> x+1)).forEach(System.out::println);
Result: StackOverflowError after prime number = 236167.
#frhack solution (rxjava)
Observable<Integer> primes = Observable.from(()->primesStream.iterator());
primes.forEach((x) -> System.out.println(x.toString()));
Result: StackOverflowError after prime number = 367663.
#Holger solution
IntStream primes=from(2).filter(i->p.test(i)).peek(i->p=p.and(v->v%i!=0));
primes.forEach(System.out::println);
Result: StackOverflowError after prime number = 368089.
My solution (StreamEx with predicate concatenation)
sieve(StreamEx.iterate(2, x -> x+1), i -> true).forEach(System.out::println);
Result: StackOverflowError after prime number = 368287.
So three solutions involving predicate concatenation win, because each new condition adds only 2 more stack frames. I think, the difference between them is marginal and should not be considered to define a winner. However I like my first StreamEx solution more as it more similar to Scala code.

The solution below does not do state mutations, except for the head/tail deconstruction of the stream.
The lazyness is obtained using IntStream.iterate. The class Prime is used to keep the generator state
import java.util.PrimitiveIterator;
import java.util.stream.IntStream;
import java.util.stream.Stream;
public class Prime {
private final IntStream candidates;
private final int current;
private Prime(int current, IntStream candidates)
{
this.current = current;
this.candidates = candidates;
}
private Prime next()
{
PrimitiveIterator.OfInt it = candidates.filter(n -> n % current != 0).iterator();
int head = it.next();
IntStream tail = IntStream.generate(it::next);
return new Prime(head, tail);
}
public static Stream<Integer> stream() {
IntStream possiblePrimes = IntStream.iterate(3, i -> i + 1);
return Stream.iterate(new Prime(2, possiblePrimes), Prime::next)
.map(p -> p.current);
}
}
The usage would be this:
Stream<Integer> first10Primes = Prime.stream().limit(10)

You can essentially implement it like this:
static <T> Tuple2<Optional<T>, Seq<T>> splitAtHead(Stream<T> stream) {
Iterator<T> it = stream.iterator();
return tuple(it.hasNext() ? Optional.of(it.next()) : Optional.empty(), seq(it));
}
In the above example, Tuple2 and Seq are types borrowed from jOOλ, a library that we developed for jOOQ integration tests. If you don't want any additional dependencies, you might as well implement them yourself:
class Tuple2<T1, T2> {
final T1 v1;
final T2 v2;
Tuple2(T1 v1, T2 v2) {
this.v1 = v1;
this.v2 = v2;
}
static <T1, T2> Tuple2<T1, T2> tuple(T1 v1, T2 v2) {
return new Tuple<>(v1, v2);
}
}
static <T> Tuple2<Optional<T>, Stream<T>> splitAtHead(Stream<T> stream) {
Iterator<T> it = stream.iterator();
return tuple(
it.hasNext() ? Optional.of(it.next()) : Optional.empty,
StreamSupport.stream(Spliterators.spliteratorUnknownSize(
it, Spliterator.ORDERED
), false)
);
}

If you don't mind using 3rd party libraries cyclops-streams, I library I wrote has a number of potential solutions.
The StreamUtils class has large number of static methods for working directly with java.util.stream.Streams including headAndTail.
HeadAndTail<Integer> headAndTail = StreamUtils.headAndTail(Stream.of(1,2,3,4));
int head = headAndTail.head(); //1
Stream<Integer> tail = headAndTail.tail(); //Stream[2,3,4]
The Streamable class represents a replayable Stream and works by building a lazy, caching intermediate data-structure. Because it is caching and repayable - head and tail can be implemented directly and separately.
Streamable<Integer> replayable= Streamable.fromStream(Stream.of(1,2,3,4));
int head = repayable.head(); //1
Stream<Integer> tail = replayable.tail(); //Stream[2,3,4]
cyclops-streams also provides a sequential Stream extension that in turn extends jOOλ and has both Tuple based (from jOOλ) and domain object (HeadAndTail) solutions for head and tail extraction.
SequenceM.of(1,2,3,4)
.splitAtHead(); //Tuple[1,SequenceM[2,3,4]
SequenceM.of(1,2,3,4)
.headAndTail();
Update per Tagir's request -> A Java version of the Scala sieve using SequenceM
public void sieveTest(){
sieve(SequenceM.range(2, 1_000)).forEach(System.out::println);
}
SequenceM<Integer> sieve(SequenceM<Integer> s){
return s.headAndTailOptional().map(ht ->SequenceM.of(ht.head())
.appendStream(sieve(ht.tail().filter(n -> n % ht.head() != 0))))
.orElse(SequenceM.of());
}
And another version via Streamable
public void sieveTest2(){
sieve(Streamable.range(2, 1_000)).forEach(System.out::println);
}
Streamable<Integer> sieve(Streamable<Integer> s){
return s.size()==0? Streamable.of() : Streamable.of(s.head())
.appendStreamable(sieve(s.tail()
.filter(n -> n % s.head() != 0)));
}
Note - neither Streamable of SequenceM have an Empty implementation - hence the size check for Streamable and the use of headAndTailOptional.
Finally a version using plain java.util.stream.Stream
import static com.aol.cyclops.streams.StreamUtils.headAndTailOptional;
public void sieveTest(){
sieve(IntStream.range(2, 1_000).boxed()).forEach(System.out::println);
}
Stream<Integer> sieve(Stream<Integer> s){
return headAndTailOptional(s).map(ht ->Stream.concat(Stream.of(ht.head())
,sieve(ht.tail().filter(n -> n % ht.head() != 0))))
.orElse(Stream.of());
}
Another update - a lazy iterative based on #Holger's version using objects rather than primitives (note a primitive version is also possible)
final Mutable<Predicate<Integer>> predicate = Mutable.of(x->true);
SequenceM.iterate(2, n->n+1)
.filter(i->predicate.get().test(i))
.peek(i->predicate.mutate(p-> p.and(v -> v%i!=0)))
.limit(100000)
.forEach(System.out::println);

There are many interesting suggestions provided here, but if someone needs a solution without dependencies to third party libraries I came up with this:
import java.util.AbstractMap;
import java.util.Optional;
import java.util.Spliterators;
import java.util.stream.StreamSupport;
/**
* Splits a stream in the head element and a tail stream.
* Parallel streams are not supported.
*
* #param stream Stream to split.
* #param <T> Type of the input stream.
* #return A map entry where {#link Map.Entry#getKey()} contains an
* optional with the first element (head) of the original stream
* and {#link Map.Entry#getValue()} the tail of the original stream.
* #throws IllegalArgumentException for parallel streams.
*/
public static <T> Map.Entry<Optional<T>, Stream<T>> headAndTail(final Stream<T> stream) {
if (stream.isParallel()) {
throw new IllegalArgumentException("parallel streams are not supported");
}
final Iterator<T> iterator = stream.iterator();
return new AbstractMap.SimpleImmutableEntry<>(
iterator.hasNext() ? Optional.of(iterator.next()) : Optional.empty(),
StreamSupport.stream(Spliterators.spliteratorUnknownSize(iterator, 0), false)
);
}

To get head and tail you need a Lazy Stream implementation. Java 8 stream or RxJava are not suitable.
You can use for example LazySeq as follows.
Lazy sequence is always traversed from the beginning using very cheap
first/rest decomposition (head() and tail())
LazySeq implements java.util.List interface, thus can be used in
variety of places. Moreover it also implements Java 8 enhancements to
collections, namely streams and collectors
package com.company;
import com.nurkiewicz.lazyseq.LazySeq;
public class Main {
public static void main(String[] args) {
LazySeq<Integer> ints = integers(2);
LazySeq primes = sieve(ints);
primes.take(10).forEach(p -> System.out.println(p));
}
private static LazySeq<Integer> sieve(LazySeq<Integer> s) {
return LazySeq.cons(s.head(), () -> sieve(s.filter(x -> x % s.head() != 0)));
}
private static LazySeq<Integer> integers(int from) {
return LazySeq.cons(from, () -> integers(from + 1));
}
}

Here is another recipe using the way suggested by Holger.
It use RxJava just to add the possibility to use the take(int) method and many others.
package com.company;
import rx.Observable;
import java.util.function.IntPredicate;
import java.util.stream.IntStream;
public class Main {
public static void main(String[] args) {
final IntPredicate[] p={(x)->true};
IntStream primesStream=IntStream.iterate(2,n->n+1).filter(i -> p[0].test(i)).peek(i->p[0]=p[0].and(v->v%i!=0) );
Observable primes = Observable.from(()->primesStream.iterator());
primes.take(10).forEach((x) -> System.out.println(x.toString()));
}
}

This should work with parallel streams as well:
public static <T> Map.Entry<Optional<T>, Stream<T>> headAndTail(final Stream<T> stream) {
final AtomicReference<Optional<T>> head = new AtomicReference<>(Optional.empty());
final var spliterator = stream.spliterator();
spliterator.tryAdvance(x -> head.set(Optional.of(x)));
return Map.entry(head.get(), StreamSupport.stream(spliterator, stream.isParallel()));
}

If you want to get head of a stream, just:
IntStream.range(1, 5).first();
If you want to get tail of a stream, just:
IntStream.range(1, 5).skip(1);
If you want to get both head and tail of a stream, just:
IntStream s = IntStream.range(1, 5);
int head = s.head();
IntStream tail = s.tail();
If you want to find the prime, just:
LongStream.range(2, n)
.filter(i -> LongStream.range(2, (long) Math.sqrt(i) + 1).noneMatch(j -> i % j == 0))
.forEach(N::println);
If you want to know more, go to get abacus-common
Declaration： I'm the developer of abacus-common.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Streams - Buffering huge streams - java

Related

How to remove all elements that match a certain condition except for N greatest of them with Stream API

How can I prevent mutation of a list of iterators?

Java 8 One Stream To Multiple Map

Iterate over ConcurrentHashMap while deleting entries

How to pop the first element in a java stream? [duplicate]

Categories

Resources