akka stream behaviour based on outside variable

akka stream behaviour based on outside variable - java

I have a generic question on Akka Streams,
I need to change stream behavior based on variable outside of akka. The variable is static and is changed by other piece of code.
How would you achieve this. Simply by checking variable in stream element ?
For example:
.filterNot(ping -> pingRecieved)
The pingRecieved is static variable in Java class.

It is legal to have a stream stage check some global state and alter its behavior based on that state.
Whether it's a great idea is another question entirely.
At minimum, you'll want to be aware of the limits and subtleties of the Java Memory Model around visibility (because if the code writing to that variable isn't executing on the same thread as the stream stage (and if it's outside of Akka, it categorically won't; if it's code executed by an actor on the same dispatcher as the stream stage, it might at some point execute on the same thread, but controlling that is going to require some tradeoffs), there's no guarantee about when (or even possibly whether) the stream stage will see the write). Ensuring that visibility (e.g. volatile or using atomics) may in turn have substantial implications for performance, etc.
You may want to investigate alternatives like a custom stream stage which materializes as an object with methods which propagate updates to that value to the stage (e.g. via the async callback mechanisms in Akka): these would be guaranteed to become visible to the stage and would abstract away the concurrency. Another option would be to expose a source (e.g. a Source.queue) which injects changes to that value as stream elements which get merged into the stream and interpreted by the stream to change its behavior. Alternatively, in some cases it might be useful to use mapAsync or ask to pass stream elements to an actor.

Related

In Java can I depend on reference assignment being atomic to implement copy on write?

If I have an unsynchronized java collection in a multithreaded environment, and I don't want to force readers of the collection to synchronize[1], is a solution where I synchronize the writers and use the atomicity of reference assignment feasible? Something like:
private Collection global = new HashSet(); // start threading after this
void allUpdatesGoThroughHere(Object exampleOperand) {
// My hypothesis is that this prevents operations in the block being re-ordered
synchronized(global) {
Collection copy = new HashSet(global);
copy.remove(exampleOperand);
// Given my hypothesis, we should have a fully constructed object here. So a
// reader will either get the old or the new Collection, but never an
// inconsistent one.
global = copy;
}
}
// Do multithreaded reads here. All reads are done through a reference copy like:
// Collection copy = global;
// for (Object elm: copy) {...
// so the global reference being updated half way through should have no impact
Rolling your own solution seems to often fail in these type of situations, so I'd be interested in knowing other patterns, collections or libraries I could use to prevent object creation and blocking for my data consumers.
[1] The reasons being a large proportion of time spent in reads compared to writes, combined with the risk of introducing deadlocks.
Edit: A lot of good information in several of the answers and comments, some important points:
A bug was present in the code I posted. Synchronizing on global (a badly named variable) can fail to protect the syncronized block after a swap.
You could fix this by synchronizing on the class (moving the synchronized keyword to the method), but there may be other bugs. A safer and more maintainable solution is to use something from java.util.concurrent.
There is no "eventual consistency guarantee" in the code I posted, one way to make sure that readers do get to see the updates by writers is to use the volatile keyword.
On reflection the general problem that motivated this question was trying to implement lock free reads with locked writes in java, however my (solved) problem was with a collection, which may be unnecessarily confusing for future readers. So in case it is not obvious the code I posted works by allowing one writer at a time to perform edits to "some object" that is being read unprotected by multiple reader threads. Commits of the edit are done through an atomic operation so readers can only get the pre-edit or post-edit "object". When/if the reader thread gets the update, it cannot occur in the middle of a read as the read is occurring on the old copy of the "object". A simple solution that had probably been discovered and proved to be broken in some way prior to the availability of better concurrency support in java.

Rather than trying to roll out your own solution, why not use a ConcurrentHashMap as your set and just set all the values to some standard value? (A constant like Boolean.TRUE would work well.)
I think this implementation works well with the many-readers-few-writers scenario. There's even a constructor that lets you set the expected "concurrency level".
Update: Veer has suggested using the Collections.newSetFromMap utility method to turn the ConcurrentHashMap into a Set. Since the method takes a Map<E,Boolean> my guess is that it does the same thing with setting all the values to Boolean.TRUE behind-the-scenes.
Update: Addressing the poster's example
That is probably what I will end up going with, but I am still curious about how my minimalist solution could fail. – MilesHampson
Your minimalist solution would work just fine with a bit of tweaking. My worry is that, although it's minimal now, it might get more complicated in the future. It's hard to remember all of the conditions you assume when making something thread-safe—especially if you're coming back to the code weeks/months/years later to make a seemingly insignificant tweak. If the ConcurrentHashMap does everything you need with sufficient performance then why not use that instead? All the nasty concurrency details are encapsulated away and even 6-months-from-now you will have a hard time messing it up!
You do need at least one tweak before your current solution will work. As has already been pointed out, you should probably add the volatile modifier to global's declaration. I don't know if you have a C/C++ background, but I was very surprised when I learned that the semantics of volatile in Java are actually much more complicated than in C. If you're planning on doing a lot of concurrent programming in Java then it'd be a good idea to familiarize yourself with the basics of the Java memory model. If you don't make the reference to global a volatile reference then it's possible that no thread will ever see any changes to the value of global until they try to update it, at which point entering the synchronized block will flush the local cache and get the updated reference value.
However, even with the addition of volatile there's still a huge problem. Here's a problem scenario with two threads:
We begin with the empty set, or global={}. Threads A and B both have this value in their thread-local cached memory.
Thread A obtains obtains the synchronized lock on global and starts the update by making a copy of global and adding the new key to the set.
While Thread A is still inside the synchronized block, Thread B reads its local value of global onto the stack and tries to enter the synchronized block. Since Thread A is currently inside the monitor Thread B blocks.
Thread A completes the update by setting the reference and exiting the monitor, resulting in global={1}.
Thread B is now able to enter the monitor and makes a copy of the global={1} set.
Thread A decides to make another update, reads in its local global reference and tries to enter the synchronized block. Since Thread B currently holds the lock on {} there is no lock on {1} and Thread A successfully enters the monitor!
Thread A also makes a copy of {1} for purposes of updating.
Now Threads A and B are both inside the synchronized block and they have identical copies of the global={1} set. This means that one of their updates will be lost! This situation is caused by the fact that you're synchronizing on an object stored in a reference that you're updating inside your synchronized block. You should always be very careful which objects you use to synchronize. You can fix this problem by adding a new variable to act as the lock:
private volatile Collection global = new HashSet(); // start threading after this
private final Object globalLock = new Object(); // final reference used for synchronization
void allUpdatesGoThroughHere(Object exampleOperand) {
// My hypothesis is that this prevents operations in the block being re-ordered
synchronized(globalLock) {
Collection copy = new HashSet(global);
copy.remove(exampleOperand);
// Given my hypothesis, we should have a fully constructed object here. So a
// reader will either get the old or the new Collection, but never an
// inconsistent one.
global = copy;
}
}
This bug was insidious enough that none of the other answers have addressed it yet. It's these kinds of crazy concurrency details that cause me to recommend using something from the already-debugged java.util.concurrent library rather than trying to put something together yourself. I think the above solution would work—but how easy would it be to screw it up again? This would be so much easier:
private final Set<Object> global = Collections.newSetFromMap(new ConcurrentHashMap<Object,Boolean>());
Since the reference is final you don't need to worry about threads using stale references, and since the ConcurrentHashMap handles all the nasty memory model issues internally you don't have to worry about all the nasty details of monitors and memory barriers!

According to the relevant Java Tutorial,
We have already seen that an increment expression, such as c++, does not describe an atomic action. Even very simple expressions can define complex actions that can decompose into other actions. However, there are actions you can specify that are atomic:
Reads and writes are atomic for reference variables and for most primitive variables (all types except long and double).
Reads and writes are atomic for all variables declared volatile (including long and double variables).
This is reaffirmed by Section §17.7 of the Java Language Specification
Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values.
It appears that you can indeed rely on reference access being atomic; however, recognize that this does not ensure that all readers will read an updated value for global after this write -- i.e. there is no memory ordering guarantee here.
If you use an implicit lock via synchronized on all access to global, then you can forge some memory consistency here... but it might be better to use an alternative approach.
You also appear to want the collection in global to remain immutable... luckily, there is Collections.unmodifiableSet which you can use to enforce this. As an example, you should likely do something like the following...
private volatile Collection global = Collections.unmodifiableSet(new HashSet());
... that, or using AtomicReference,
private AtomicReference<Collection> global = new AtomicReference<>(Collections.unmodifiableSet(new HashSet()));
You would then use Collections.unmodifiableSet for your modified copies as well.
// ... All reads are done through a reference copy like:
// Collection copy = global;
// for (Object elm: copy) {...
// so the global reference being updated half way through should have no impact
You should know that making a copy here is redundant, as internally for (Object elm : global) creates an Iterator as follows...
final Iterator it = global.iterator();
while (it.hasNext()) {
Object elm = it.next();
}
There is therefore no chance of switching to an entirely different value for global in the midst of reading.
All that aside, I agree with the sentiment expressed by DaoWen... is there any reason you're rolling your own data structure here when there may be an alternative available in java.util.concurrent? I figured maybe you're dealing with an older Java, since you use raw types, but it won't hurt to ask.
You can find copy-on-write collection semantics provided by CopyOnWriteArrayList, or its cousin CopyOnWriteArraySet (which implements a Set using the former).
Also suggested by DaoWen, have you considered using a ConcurrentHashMap? They guarantee that using a for loop as you've done in your example will be consistent.
Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.
Internally, an Iterator is used for enhanced for over an Iterable.
You can craft a Set from this by utilizing Collections.newSetFromMap like follows:
final Set<E> safeSet = Collections.newSetFromMap(new ConcurrentHashMap<E, Boolean>());
...
/* guaranteed to reflect the state of the set at read-time */
for (final E elem : safeSet) {
...
}

I think your original idea was sound, and DaoWen did a good job getting the bugs out. Unless you can find something that does everything for you, it's better to understand these things than hope some magical class will do it for you. Magical classes can make your life easier and reduce the number of mistakes, but you do want to understand what they are doing.
ConcurrentSkipListSet might do a better job for you here. It could get rid of all your multithreading problems.
However, it is slower than a HashSet (usually--HashSets and SkipLists/Trees hard to compare). If you are doing a lot of reads for every write, what you've got will be faster. More importantly, if you update more than one entry at a time, your reads could see inconsistent results. If you expect that whenever there is an entry A there is an entry B, and vice versa, the skip list could give you one without the other.
With your current solution, to the readers, the contents of the map are always internally consistent. A read can be sure there's an A for every B. It can be sure that the size() method gives the precise number of elements that will be returned by the iterator. Two iterations will return the same elements in the same order.
In other words, allUpdatesGoThroughHere and ConcurrentSkipListSet are two good solutions to two different problems.

Can you use the Collections.synchronizedSet method? From HashSet Javadoc http://docs.oracle.com/javase/6/docs/api/java/util/HashSet.html
Set s = Collections.synchronizedSet(new HashSet(...));

Replace the synchronized by making global volatile and you'll be alright as far as the copy-on-write goes.
Although the assignment is atomic, in other threads it is not ordered with the writes to the object referenced. There needs to be a happens-before relationship which you get with a volatile or synchronising both reads and writes.
The problem of multiple updates happening at once is separate - use a single thread or whatever you want to do there.
If you used a synchronized for both reads and writes then it'd be correct but the performance may not be great with reads needing to hand-off. A ReadWriteLock may be appropriate, but you'd still have writes blocking reads.
Another approach to the publication issue is to use final field semantics to create an object that is (in theory) safe to be published unsafely.
Of course, there are also concurrent collections available.

How to Ensure Memory Visibility in Java when passing data across threads

I have a producer consumer like pattern where some threads are creating data and periodically passing putting chunks of that data to be consumed by some other threads.
Keeping the Java Memory Model in mind, how do i ensure that the data passed to the consumer thread has full 'visibility'?
I know there are data structures in java.util.concurrent like ConcurrentLinkedQueue that are built specifically for this, but I want to do this as low level as possible without utilizing those and have full transparency on what is going on under the covers to ensure the memory visibility part.

If you want "low level" then look into volatile and synchronized.

To transfer data, you need a field somewhere available to all threads. In your case it really needs to be some sort of collection to handle multiple entries. If you made the field final, referencing, say, a ConcurrentLinkedQueue, you'd pretty much be done. The field could be made public and everyone could see it, or you could make it available with a getter.
If you use an unsynchronized queue, you have more work to do, because you have to manually synchronize all access to it, which means you have to track down all usages; not easy when there's a getter method. Not only do you need to protect the queue from simultaneous access, you must make sure interdependent calls end up in the same synchronized block. For instance:
if (!queue.isEmpty()) obj = queue.remove();
If the whole thing is not synchronized, queue is perfectly capable of telling you it is not empty, then throwing a NoSuchElementException when you try to get the next element. (ConcurrentLinkedQueue's interface is specifically designed to let you do operations like this with one method call. Take a good look at it even if you don't want to use it.)
The simple solution is to wrap the queue in another object whose methods are carefully chosen and all synchronized. The wrapped class, even if it's LinkedList or ArrayList, will now act (if you do it right) like CLQ, and it can be freely released to the rest of the program.
So you would have what is really a global field with an immutable (final) reference to a wrapper class, which contains a LinkedList (for example) and has synchronized methods that use the LinkedList to store and access data. The wrapper class, like CLQ, would be thread-safe.
Some variants on this might be desirable. It might make sense to combine the wrapper with some other high-level class in your program. It might also make sense to create and make available instances of nested classes: perhaps one that only adds to the queue and one that only removes from it. (You couldn't do this with CLQ.)
A final note: having synchronized everything, the next step is to figure out how to unsynchronize (to keep threads from waiting too much) without breaking thread safety. Work really hard on this, and you'll end up rewriting ConcurrentLinkedQueue.

Passing an object received from an input stream to another class

Currently I have an object which I receive from a server, this contains two co-ordinates to be drawn to a canvas, which I can currently do in another class with hard coded co-ordinates.
My problem is that I cannot work out how to send the object (which is within a thread) to the second drawing class to be drawn.

An object is never within a thread. The best you can hope for is that only one thread references the object, and constant vigilance is required to keep it that way. Objects accessed by only one thread do not require anywhere near the thought and care that objects accessed by more than one thread do. But it's real hard to keep objects on one thread, and you can't do that here in any case.
Your object from the server can be referenced from a public, static field in any class, and so be made available anywhere in your program (and to any thread). There has to be a more elegant way to make it available where needed--to encapsulate it properly--but this will do as a fallback solution.
Then you have to take care of multi-threaded access. It sounds like your object can be made immutable. This just means that once you make it "public" by assigning it to it's referencing field, you never change it again (even if it is theoretically possible to do so). This makes things simpler and faster. Create your received object and, when it is fully assembled, place it in its field. Make sure that field is marked volatile so any changes will be immediately seen elsewhere.
Now your drawing class merely needs to look at the object when it needs it. However, before using it you want to copy the object to a local variable. The local variable will continue to point to the same object throughout the drawing process. The volatile field may change at any instant, continually refering to new or different objects. Using the local variable, your X and Y coordinates will always be consistent, if out-of-date. (Everything's a bit out-of-date in a multi-threading system.) If you used the field, you could get the X from one object sent from the server and the Y from another. (The real fun with multi-threading comes when X * X, where the X is an integer, gives a value of 35--same X from two different objects. That and when if (aA != null) aA.doSomething() throws a null pointer exception. Using local variables prevents all this.)
For now I think you can avoid synchronization and wait states. You might want to make your coordinate object truly immutable (with final fields) so other programmers (or even you after 6 months of doing other work) don't change the code to modify the object on the fly. (If they/you do, they/you will need synchronization.)

A Handler is one good way to transfer messages and data from a communications worker thread to the UI thread.
It would be more ideal if you included the basic outline of your code in your question. However, I would assume that what you basically have is a custom View (which you refer to as your "drawing class") which forms part of an overall layout that's set as the content view for your Activity. I then assume you have a communications worker thread, which might be contained within that same Activity class (or perhaps in a separate Service class - but for now I'll assume the simplest case). For your communications worker thread to update your View, the View needs to be updated on the UI thread. Therefore you would instantiate a Handler object that will run on the UI thread (perhaps in onCreate()) which updates the View based on the content of messages. Your worker thread then sends messages to that Handler.

Sending objects back and forth between threads in java?

I have multiple client handler threads, these threads need to pass received object to a server queue and the sever queue will pass another type of object back to the sending thread. The server queue is started and keeps running when the server starts.I am not sure which thread mechanism to use for the client handler threads notified an object is sent back. I don't intend to use socket or writing to a file.

If you wanted to do actual message passing take a look at SynchronusQueue. Each thread will have reference to the queue and would wait until one thread passed the reference through the queue.
This would be thread safe and address your requirements.
Though if you are simply looking to have threads read and write a shared variable you can use normalocity's suggestion though it's thread-safety depends on how you access it (via sychronized or volatile)

As far as making objects accessible in Java, there's no difference between multi-thread and single-thread. You just follow the scope rules (public, private, protected), and that's it. Multiple threads all run within the same process, so there isn't any special thread-only scope rules to know about.
For example, define a method where you pass the object in, and make that method accessible from the other thread. The object you want to pass around simply needs to be accessible from the other thread's scope.
As far as thread-safety, you can synchronize your writes, and for the most part, that will take care of things. Thread safety can get a bit hairy the more complicated your code, but I think this will get you started.
One method for processing objects, and producing result objects is to have a shared array or LinkedList that acts as a queue of objects, containing the objects to be processed, and the resulting objects from that processing. It's hard to go into much more detail than that without more specifics on what exactly you're trying to do, but most shared access to objects between threads comes down to either inter-thread method calls, or some shared collection/queue of objects.

Unless you are absolutely certain that it will always be only a single object at a time, use some sort of Queue.
If you are certain that it will always be only a single object at a time, use some sort of Queue anyway. :-)

Use a concurrent queue from the java.util.concurrent.*.
why? Almost guaranteed to provide better general performance than any thing hand rolled.
recommendation: use a bound queue and you will get back-pressure for free.
note: the depth of queue determines your general latency characteristics: shallower queues will have lower latencies at the cost of reduced bandwidth.
Use Future semantics
why? Futures provide a proven and standard means of getting asynchronous result.
recommendation: create a simple Request class and expose a method #getFutureResponse(). The implementation of this method can use a variety of signaling strategies, such as Lock, flag (using Atomic/CAS), etc.
note: use of timeout semantics in Future will allow you to link server behavior to your server SLA e.g. #getFutureResponse(sla_timeout_ms).

A book tip for if you want to dive a bit more into communication between threads (or processes, or systems): Pattern-Oriented Software Architecture Volume 2: Patterns for Concurrent and Networked Objects

Just use simple dependency injection.
MyFirstThread extends Thread{
public void setData(Object o){...}
}
MySecondThread extends Thread{
MyFirstThread callback;
MySecondThread(MyFirstThread callback){this.callback=callback)
}
MyFirstThread t1 = new MyFirstThread();
MySecondThread t2 = new MySecondThread(t1);
t1.start();
t2.start();
You can now do callback.setData(...) in your second thread.
I find this to be the safest way. Other solutions involve using volatile or some kind of shared object which I think is an overkill.
You may also want to use BlockingQueue and pass both of those to each thread. If you plan to have more than one thread then it is probably a better solution.

ability to get the progress on a Future<T> object

With reference to the java.util.concurrent package and the Future interface I notice (unless I am mistaken) that the ability to start a lengthy tasks and be able to query on the progress only comes with the SwingWorker implementing class.
This begs the following question:
Is there a way, in a non-GUI, non-Swing application (imaging a console application) to start a lengthy task in the background and allow the other threads to inspect the progress ? It seems to me that there is no reason why this capability should be limited to swing / GUI applications. Otherwise, the only available option, the way I see it, is to go through ExecutorService::submit which returns a Future object. However, the base Future interface does not allow monitoring the progress.

Obviously, the Future object would only be good for blocking and then receiving the result.
The Runnable or Callable object that you submit would either have to know how to provide this progress (percentage complete, count of attempts, status (enum?) etc) and provide that as an API call to the object itself, or posted in some lookup resource (in memory map or database if necessary). For simplicity I tend to like the object itself, especially since you're going to most likely need a handle (id) to lookup the object or a reference to the object itself.
This does mean that you have 3 threads operating. 1 for the actual work, 1 that is blocked while waiting for the result, and 1 that is a monitoring thread. The last one could be shared depending on your requirements.

In my case I passed a HashSet, with the Objects to process, as Parameter to the Method, wich was created as instance variable in the calling Class. When the asyncronous method removes the Objects after processing one can retrieve the size of the Map remaining in the calling Method. I thing in general passing Objects by Reference solves the Problem.

I was hoping that there was a standard concurrency framework way to stay updated on the progress of a long running task without requiring the client program to worry about orchestrating and synchronizing everything correctly. It seemed to me to that one could fathom an extended version of the Future<T> interface that would support:
public short progress(); in addition to the usual isDone() and get() methods.
Obviously the implementation of the progress() would then need to poll the object directly so maybe Future<T> would need to be specified as Future<T extends CanReportProgress> where CanReportProgress is the following interface:
public interface CanReportProgress {
public short progress();
}
This begs the question of why one would bother to go through the Future object as opposed to calling the object itself to get the progress. I don't know. I'll have to give it more thought. It could be argued that it is closer to the current contract / semantics whereby the Callable object is not, itself, accessed again by the client programmer after the call to ExecutorService::submit / execute.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.