How can I check if a stream instance has been consumed or not (meaning having called a terminal operation such that any further call to a terminal operation may fail with IllegalStateException: stream has already been operated upon or closed.?
Ideally I want a method that does not consume the stream if it has not yet been consumed, and that returns a boolean false if the stream has been consumed without catching an IllegalStateException from a stream method (because using Exceptions for control flow is expensive and error prone, in particular when using standard Exceptions).
A method similar to hasNext() in Iterator in the exception throwing and boolean return behavior (though without the contract to next()).
Example:
public void consume(java.util.function.Consumer<Stream<?>> consumer, Stream<?> stream) {
consumer.accept(stream);
// defensive programming, check state
if (...) {
throw new IllegalStateException("consumer must call terminal operation on stream");
}
}
The goal is to fail early if client code calls this method without consuming the stream.
It seems there is no method to do that and I'd have to add a try-catch block calling any terminal operation like iterator(), catch an exception and throw a new one.
An acceptable answer can also be "No solution exists" with a good justification of why the specification could not add such a method (if a good justification exists). It seems that the JDK streams usually have this snippets at the start of their terminal methods:
// in AbstractPipeline.java
if (linkedOrConsumed)
throw new IllegalStateException(MSG_STREAM_LINKED);
So for those streams, an implementation of such a method would not seem that difficult.
Taking into consideration that spliterator (for example) is a terminal operation, you can simply create a method like:
private static <T> Optional<Stream<T>> isConsumed(Stream<T> stream) {
Spliterator<T> spliterator;
try {
spliterator = stream.spliterator();
} catch (IllegalStateException ise) {
return Optional.empty();
}
return Optional.of(StreamSupport.stream(
() -> spliterator,
spliterator.characteristics(),
stream.isParallel()));
}
I don't know of a better way to do it... And usage would be:
Stream<Integer> ints = Stream.of(1, 2, 3, 4)
.filter(x -> x < 3);
YourClass.isConsumed(ints)
.ifPresent(x -> x.forEachOrdered(System.out::println));
Since I don't think there is a practical reason to return an already consumed Stream, I am returning Optional.empty() instead.
One solution could be to add an intermediate operation (e.g. filter()) to the stream before passing it to the consumer. In that operation you do nothing but saving the state, that the operation was called (e.g. with an AtomicBoolean):
public <T> void consume(Consumer<Stream<T>> consumer, Stream<T> stream) {
AtomicBoolean consumed = new AtomicBoolean(false);
consumer.accept(stream.filter(i -> {
consumed.set(true);
return true;
}));
if (!consumed.get()) {
throw new IllegalStateException("consumer must call terminal operation on stream");
}
}
Side Note: Do not use peek() for this, because it is not called with short-circuiting terminal operations (like findAny()).
Here is a standalone compilable solution that uses a delegating custom Spliterator<T> implementation + an AtomicBoolean to accomplish what you seek without losing thread-safety or affecting the parallelism of a Stream<T>.
The main entry is the Stream<T> track(Stream<T> input, Consumer<Stream<T>> callback) function - you can do whatever you want in the callback function. I first tinkered with a delegating Stream<T> implementation but it's just too big an interface to delegate without any issues (see my code comment, even Spliterator<T> has its caveats when delegating):
import java.util.Spliterator;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.function.Consumer;
import java.util.stream.IntStream;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
class StackOverflowQuestion56927548Scratch {
private static class TrackingSpliterator<T> implements Spliterator<T> {
private final AtomicBoolean tracker;
private final Spliterator<T> delegate;
private final Runnable callback;
public TrackingSpliterator(Stream<T> forStream, Runnable callback) {
this(new AtomicBoolean(true), forStream.spliterator(), callback);
}
private TrackingSpliterator(
AtomicBoolean tracker,
Spliterator<T> delegate,
Runnable callback
) {
this.tracker = tracker;
this.delegate = delegate;
this.callback = callback;
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
boolean advanced = delegate.tryAdvance(action);
if(tracker.compareAndSet(true, false)) {
callback.run();
}
return advanced;
}
#Override
public Spliterator<T> trySplit() {
Spliterator<T> split = this.delegate.trySplit();
//may return null according to JavaDoc
if(split == null) {
return null;
}
return new TrackingSpliterator<>(tracker, split, callback);
}
#Override
public long estimateSize() {
return delegate.estimateSize();
}
#Override
public int characteristics() {
return delegate.characteristics();
}
}
public static <T> Stream<T> track(Stream<T> input, Consumer<Stream<T>> callback) {
return StreamSupport.stream(
new TrackingSpliterator<>(input, () -> callback.accept(input)),
input.isParallel()
);
}
public static void main(String[] args) {
//some big stream to show it works correctly when parallelized
Stream<Integer> stream = IntStream.range(0, 100000000)
.mapToObj(Integer::valueOf)
.parallel();
Stream<Integer> trackedStream = track(stream, s -> System.out.println("consume"));
//dummy consume
System.out.println(trackedStream.anyMatch(i -> i.equals(-1)));
}
}
Just return the stream of the track function, maybe adapt the callback parameters type (you probably don't need to pass the stream) and you are good to go.
Please note that this implementation only tracks when the stream is actually consumed, calling .count() on a Stream that was produced by e.g. IntStream.range(0,1000) (without any filter steps etc.) will not consume the stream but return the underlying known length of the stream via Spliterator<T>.estimateSize()!
Related
I'm trying to implement a simple promise system in java. I'm doing it for special purpose so please don't recommend any libraries.
I have a problem when I try to implement a thenApply() method which takes a Function as parameter, similar to what CompletableFuture has and therefore returns a promise with another type.
The promise interface:
public interface Promise<T> {
Promise<T> then(Consumer<T> handler);
<U> Promise<U> thenApply(Function<T, U> handler);
}
My implementation so far:
public class PromiseImpl<T> implements Promise<T> {
private List<Consumer<T>> resultHandlers = new ArrayList<>();
public PromiseImpl(CompletableFuture<T> future) {
future.thenAccept(this::doWork);
}
#Override
public Promise<T> then(Consumer<T> handler) {
resultHandlers.add(handler);
return this;
}
#Override
public <U> Promise<U> thenApply(Function<T, U> handler) {
// How to implement here??? I don't have the result yet
handler.apply(?);
}
private void onResult(T result) {
for (Consumer<T> handler : resultHandlers) {
handler.accept(result);
}
}
private Object doWork(T result) {
onResult(result);
return null;
}
}
The problem is that I don't know the result of my initial future in the thenApply() method, so I cannot call my handler. Also, I don't want to call future.get() because this method is blocking.
How could I make this work?
The real problem is in the design of your Promise type. It is holding a set of callbacks, all of which are to be invoked on completion. This is a fundamental problem (limiting generic functionality around the return type of thenApply's function). This can be resolved by changing your Promise implementation to return a new promise whenever a handler is registered, instead of returning this, such that each promise object will have its own handler to invoke.
In addition to solving this, it's a better design for functional-style programming, as you can make your Promise objects immutable.
I would change the interface to be:
interface Promise<T> {
<U> Promise<U> thenApply(Function<T, U> handler);
Promise<Void> thenAccept(Consumer<T> consumer);
}
The "chaining" of callbacks can then be done around the future objects to which chained Promise instances have references. So the implementation can look like:
class PromiseImpl<T> implements Promise<T> {
private CompletableFuture<T> future;
public PromiseImpl(CompletableFuture<T> future) {
this.future = future;
}
#Override
public <U> Promise<U> thenApply(Function<T, U> function) {
return new PromiseImpl<>(this.future.thenApply(function));
}
#Override
public Promise<Void> thenAccept(Consumer<T> consumer) {
return new PromiseImpl<>(this.future.thenAccept(consumer));
}
private void onResult(T result) {
this.future.complete(result);
}
private Object doWork(T result) {
onResult(result);
return null;
}
}
And using that can be as simple as:
Promise<String> stringPromise = new PromiseImpl<>(new CompletableFuture<String>());
Promise<Long> longPromise = stringPromise.thenApply(str -> Long.valueOf(str.length()));
Promise<Void> voidPromise = stringPromise.thenAccept(str -> System.out.println(str));
EDIT:
Regarding Michael's comment about retrieving the value: that was not added as it wasn't in the original Promise API. But it's easy enough to add:
T get(); //To the interface
And implemented with:
public T get() {
//try-catch
return this.future.get();
}
Note: this is starting to look more and more like a duplication of CompletableFuture, which raises the question of why do this at all. But assuming there will be additional Promise-like methods in this interface, the method would be wrapping the future API.
If you need to use the same Promise object with a list of call backs, then you have no choice but to parameterize the Promise interface with both Function concrete type parameters:
public interface Promise<T, U>
And U wouldn't be able to be a method generic parameter on then or thenApply.
If you want to keep the rest of your class the same and just implement the thenApply method, you have to make a new CompletableFuture since that's the only way you currently have to construct a new Promise:
#Override
public <U> Promise<U> thenApply(Function<T, U> handler) {
CompletableFuture<U> downstream = new CompletableFuture<>();
this.then(t -> downstream.complete(handler.apply(t)));
return new PromiseImpl<>(downstream);
}
If you can add a private no-argument constructor for PromiseImpl, you can avoid making a new CompletableFuture:
#Override
public <U> Promise<U> thenApply(Function<T, U> handler) {
PromiseImpl result = new PromiseImpl();
this.then(t -> result.doWork(handler.apply(t)));
return result;
}
But really what you should do if you want to implement your own API on top of CompletableFuture is use the decorator pattern and wrap a CompletableFuture instance as a private variable in PromiseImpl.
You can return some anonymous class that extends your PromiseImpl and overrides onResult so handlers accept the result of applying mapper function. Do not forget to call the parent onResult so parent handlers will be called.
When executing async CompletableFuture, the parent threadcontext and moreover the org.slf4j.MDC context is lost.
This is bad as I'm using some kind of "fish tagging" to track logs from one request among multiple logfiles.
MDC.put("fishid", randomId())
Question: how can I retain that id during the tasks of CompletableFutures in general?
List<CompletableFuture<UpdateHotelAllotmentsRsp>> futures =
tasks.stream()
.map(task -> CompletableFuture.supplyAsync(
() -> businesslogic(task))
.collect(Collectors.toList());
List results = futures.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList());
public void businesslogic(Task task) {
LOGGER.info("mdc fishtag context is lost here");
}
The most readable way I solved this problem was as below -
---------------Thread utils class--------------------
public static Runnable withMdc(Runnable runnable) {
Map<String, String> mdc = MDC.getCopyOfContextMap();
return () -> {
MDC.setContextMap(mdc);
runnable.run();
};
}
public static <U> Supplier<U> withMdc(Supplier<U> supplier) {
Map<String, String> mdc = MDC.getCopyOfContextMap();
return (Supplier) () -> {
MDC.setContextMap(mdc);
return supplier.get();
};
}
---------------Usage--------------
CompletableFuture.supplyAsync(withMdc(() -> someSupplier()))
.thenRunAsync(withMdc(() -> someRunnable())
....
WithMdc in ThreadUtils would have to be overloaded to include other functional interfaces which are accepted by CompletableFuture
Please note that the withMdc() method is statically imported to improve readability.
At the end I created a Supplier wrapper retaining the MDC. If anyone has a better idea feel free to comment.
public static <U> CompletableFuture<U> supplyAsync(Supplier<U> supplier, Executor executor) {
return CompletableFuture.supplyAsync(new SupplierMDC(supplier), executor);
}
private static class SupplierMDC<T> implements Supplier<T> {
private final Supplier<T> delegate;
private final Map<String, String> mdc;
public SupplierMDC(Supplier<T> delegate) {
this.delegate = delegate;
this.mdc = MDC.getCopyOfContextMap();
}
#Override
public T get() {
MDC.setContextMap(mdc);
return delegate.get();
}
}
My solution theme would be to (It would work with JDK 9+ as a couple of overridable methods are exposed since that version)
Make the complete ecosystem aware of MDC
And for that, we need to address the following scenarios:
When all do we get new instances of CompletableFuture from within this class? → We need to return a MDC aware version of the same rather.
When all do we get new instances of CompletableFuture from outside this class? → We need to return a MDC aware version of the same rather.
Which executor is used when in CompletableFuture class? → In all circumstances, we need to make sure that all executors are MDC aware
For that, let's create a MDC aware version class of CompletableFuture by extending it. My version of that would look like below
import org.slf4j.MDC;
import java.util.Map;
import java.util.concurrent.*;
import java.util.function.Function;
import java.util.function.Supplier;
public class MDCAwareCompletableFuture<T> extends CompletableFuture<T> {
public static final ExecutorService MDC_AWARE_ASYNC_POOL = new MDCAwareForkJoinPool();
#Override
public CompletableFuture newIncompleteFuture() {
return new MDCAwareCompletableFuture();
}
#Override
public Executor defaultExecutor() {
return MDC_AWARE_ASYNC_POOL;
}
public static <T> CompletionStage<T> getMDCAwareCompletionStage(CompletableFuture<T> future) {
return new MDCAwareCompletableFuture<>()
.completeAsync(() -> null)
.thenCombineAsync(future, (aVoid, value) -> value);
}
public static <T> CompletionStage<T> getMDCHandledCompletionStage(CompletableFuture<T> future,
Function<Throwable, T> throwableFunction) {
Map<String, String> contextMap = MDC.getCopyOfContextMap();
return getMDCAwareCompletionStage(future)
.handle((value, throwable) -> {
setMDCContext(contextMap);
if (throwable != null) {
return throwableFunction.apply(throwable);
}
return value;
});
}
}
The MDCAwareForkJoinPool class would look like (have skipped the methods with ForkJoinTask parameters for simplicity)
public class MDCAwareForkJoinPool extends ForkJoinPool {
//Override constructors which you need
#Override
public <T> ForkJoinTask<T> submit(Callable<T> task) {
return super.submit(MDCUtility.wrapWithMdcContext(task));
}
#Override
public <T> ForkJoinTask<T> submit(Runnable task, T result) {
return super.submit(wrapWithMdcContext(task), result);
}
#Override
public ForkJoinTask<?> submit(Runnable task) {
return super.submit(wrapWithMdcContext(task));
}
#Override
public void execute(Runnable task) {
super.execute(wrapWithMdcContext(task));
}
}
The utility methods to wrap would be such as
public static <T> Callable<T> wrapWithMdcContext(Callable<T> task) {
//save the current MDC context
Map<String, String> contextMap = MDC.getCopyOfContextMap();
return () -> {
setMDCContext(contextMap);
try {
return task.call();
} finally {
// once the task is complete, clear MDC
MDC.clear();
}
};
}
public static Runnable wrapWithMdcContext(Runnable task) {
//save the current MDC context
Map<String, String> contextMap = MDC.getCopyOfContextMap();
return () -> {
setMDCContext(contextMap);
try {
return task.run();
} finally {
// once the task is complete, clear MDC
MDC.clear();
}
};
}
public static void setMDCContext(Map<String, String> contextMap) {
MDC.clear();
if (contextMap != null) {
MDC.setContextMap(contextMap);
}
}
Below are some guidelines for usage:
Use the class MDCAwareCompletableFuture rather than the class CompletableFuture.
A couple of methods in the class CompletableFuture instantiates the self version such as new CompletableFuture.... For such methods (most of the public static methods), use an alternative method to get an instance of MDCAwareCompletableFuture. An example of using an alternative could be rather than using CompletableFuture.supplyAsync(...), you can choose new MDCAwareCompletableFuture<>().completeAsync(...)
Convert the instance of CompletableFuture to MDCAwareCompletableFuture by using the method getMDCAwareCompletionStage when you get stuck with one because of say some external library which returns you an instance of CompletableFuture. Obviously, you can't retain the context within that library but this method would still retain the context after your code hits the application code.
While supplying an executor as a parameter, make sure that it is MDC Aware such as MDCAwareForkJoinPool. You could create MDCAwareThreadPoolExecutor by overriding execute method as well to serve your use case. You get the idea!
With that, your code would look like
List<CompletableFuture<UpdateHotelAllotmentsRsp>> futures =
tasks.stream()
new MDCAwareCompletableFuture<UpdateHotelAllotmentsRsp>().completeAsync(
() -> businesslogic(task))
.collect(Collectors.toList());
List results = futures.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList());
public UpdateHotelAllotmentsRsp businesslogic(Task task) {
LOGGER.info("mdc fishtag context is not lost here");
}
You can find a detailed explanation of all of the above here in a post about the same.
YES, Twitter Future did this correctly. They have a class Local.scala that Future.scala knows about.
The fix is for java authors to fix this issue so your Local state travels through ALL libaries that use CompletableFutures. Basically, Local.scala is used by Future and internally uses a ThreadLocal up until .thenApply or .thenAccept and it will capture state and transfer it when needed to the next one on and on. This works through all third party libraries with ZERO 3rd party library changes.
Here is more but poke Java Authors to fix their stuff...
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-May/047867.html
until then, MDC will NEVER work through 3rd party libraries.
My SO post on this
Does CompletableFuture have a corresponding Local context?
Or is there a better way to do this that I'm missing? I'd like to create a stream based on a Supplier (which are normally infinite), but have the stream terminate when the supplier returns null. I made this myself, but it seems like a fair amount of work to accomplish a pretty simple concept.
public class NullTerminatedStreamFactory {
static int characteristics = Spliterator.ORDERED | Spliterator.DISTINCT;
public static<T> Stream<T> makeNullTerminatedStream(Supplier<T> supplier) {
return StreamSupport.stream(new NullTerminatedSpliteratorFromSupplier<>(supplier, Long.MAX_VALUE, characteristics), false);
}
static class NullTerminatedSpliteratorFromSupplier<T> extends Spliterators.AbstractSpliterator<T> {
public NullTerminatedSpliteratorFromSupplier(Supplier<T> supplier, long est, int additionalCharacteristics) {
super(est, additionalCharacteristics);
this.supplier = supplier;
}
public Supplier<T> supplier;
#Override
public boolean tryAdvance(Consumer<? super T> action) {
T next = supplier.get();
if (next != null) {
action.accept(next);
return true;
}
return false;
}
}
}
For the record, I'm using it like this, to basically create a Stream from a BlockingQueue:
NullTerminatedStreamFactory.makeNullTerminatedStream(() -> {
try {
BlockingQueue<Message> queue = getBlockingQueue();
return queue.poll(1, TimeUnit.SECONDS);
} catch (Exception e) {
log.error("Exception while trying to get message from queue", e);
}
return null;
});
You've already found a perfectly valid hand-made implementation.
As mentioned in the comments, Java 9 seems to add a takeWhile(Predicate) method. Until then, you could use a third-party library that implements something like takeWhile():
jOOλ
jOOλ has limitWhile(), which does the same thing:
Seq.generate(supplier).limitWhile(Objects::nonNull);
(disclaimer, I work for the company behind jOOλ)
Javaslang
Javaslang implemented their own Stream class, which is inspired by the Scala collections, and thus has takeWhile()
Stream.gen(supplier).takeWhile(Objects::nonNull);
Functional Java
Functional Java also ship with their own Stream implementation, that has a takeWhile() method:
Stream.fromFunction(i -> supplier.get()).takeWhile(o -> o != null);
I am trying to achieve a behavior similar to that of an event bus. For my requirements, a PublishSubject seems suitable.
The subject emits items representing a result of some global operation, which might resolve successfully or fail in case of an exception. I can't use onNext() for success events and onError() with the Throwable in case of an error, since once onError() is invoked the subject terminates and any future subscribers will get no emissions apart from an onError() one.
Right now the way I see it I have to create a class representing the event, and optionally referencing a Throwable in case of an error. This however seems unwise, as one would have to handle errors inside onNext().
How would you go about it?
Creating a generic class wrapping events is a way to go. Say we call it ResponseOrError class, it should basically contain two fields
private T data;
private Throwable error;
and two simple factory methods :
public static <T> ResponseOrError<T> fromError(Throwable throwable) {
return new ResponseOrError<>(throwable);
}
public static <T> ResponseOrError<T> fromData(T data) {
return new ResponseOrError<>(data);
}
to remove some boilerplate code you can provide Transformer to make Observable of ResponseOrError type.
public static <T> Observable.Transformer<T, ResponseOrError<T>> toResponseOrErrorObservable() {
return new Observable.Transformer<T, ResponseOrError<T>>() {
#Override
public Observable<ResponseOrError<T>> call(final Observable<T> observable) {
return observable
.map(new Func1<T, ResponseOrError<T>>() {
#Override
public ResponseOrError<T> call(final T t) {
return ResponseOrError.fromData(t);
}
})
.onErrorResumeNext(new Func1<Throwable, Observable<? extends ResponseOrError<T>>>() {
#Override
public Observable<? extends ResponseOrError<T>> call(final Throwable throwable) {
return Observable.just(ResponseOrError.<T>fromError(throwable));
}
});
}
};
}
then you can use it like that :
final Observable<ResponseOrError<ImportantData>> compose = mNetworkService
.getImportantData()
.compose(ResponseOrError.<ImportantData>toResponseOrErrorObservable());
and now you can easily map result depending on success or failure or even provide another Transformer returning mapped Observable< T> instead of Observable< ResponseOrError< T>>
I'm a little confused by all my research. I have custom interface called TabularResultSet (which I've watered down for the sake of example) which traverses through any data set that is tabular in nature. It has a next() method like an iterator and it can be looping through a QueryResultSet, a tabbed-table from a clipboard, a CSV, etc...
However, I'm trying to create a Spliterator that wraps around my TabularResultSet and easily turns it into a stream. I cannot imagine a safe way to parallelize because the TabularResultSet could be traversing a QueryResultSet, and calling next() concurrently could wreak havoc. The only way I imagine parallelization can be done safely is to have the next() called by a single working thread and it passes the data off to a parallel thread to work on it.
So I think parallelization is not an easy option. How do I just get this thing to stream without parallelizing? Here is my work so far...
public final class SpliteratorTest {
public static void main(String[] args) {
TabularResultSet rs = null; /* instantiate an implementation; */
Stream<TabularResultSet> rsStream = StreamSupport.stream(new TabularSpliterator(rs), false);
}
public static interface TabularResultSet {
public boolean next();
public List<Object> getData();
}
private static final class TabularSpliterator implements Spliterator<TabularResultSet> {
private final TabularResultSet rs;
public TabularSpliterator(TabularResultSet rs) {
this.rs = rs;
}
#Override
public boolean tryAdvance(Consumer<? super TabularResultSet> action) {
action.accept(rs);
return rs.next();
}
#Override
public Spliterator<TabularResultSet> trySplit() {
return null;
}
#Override
public long estimateSize() {
return Long.MAX_VALUE;
}
#Override
public int characteristics() {
return 0;
}
}
}
It's probably easiest to extend Spliterators.AbstractSpliterator. If you do this, you need only implement tryAdvance. This can be turned into a parallel stream; the parallelism comes from the streams implementation calling tryAdvance multiple times, batching up the data it receives, and processing it in different threads.
If TabularResultSet is anything like a JDBC ResultSet, I don't think you want a Spliterator<TabularResultSet> or a Stream<TabularResultSet>. Instead it looks like a TabularResultSet represents an entire tabular data set, so you probably want each spliterator or stream element to represent one row in that table -- the List<Object> that is returned by getData()? If so, you'd want something like the following.
class TabularSpliterator extends Spliterators.AbstractSpliterator<List<Object>> {
private final TabularResultSet rs;
public TabularSpliterator(TabularResultSet rs) {
super(...);
this.rs = rs;
}
#Override public boolean tryAdvance(Consumer<? super List<Object>> action) {
if (rs.next()) {
action.accept(rs.getData());
return true;
} else {
return false;
}
}
}
Then you can turn an instance of this spliterator into a stream by calling StreamSupport.stream().
Note: in general, a Spliterator instance is not called from multiple threads and need not even be thread-safe. See the Spliterator class documentation at the paragraph beginning "Despite..." for details.
You're mostly there. All you have to do now is convert your Spliterator into a Stream. You can do that using the StreamSupport.stream(Spliterator, boolean) method. The boolean parameter is a flag for whether you want to do parallel streaming or not (you would want false, for not parallel)
If your TabularResultSet implemented Iterator, you could use the Spliterators.spliteratorUnknownSize() method to convert the Iterator into a Spliterator which basically does what the code you have above does.
Not sure if it's worth adding characteristics but you might want to consider
Spliterator.IMMUTABLE| Spliterator.ORDERED | Spliterator.NONNULL
good luck