Java Multithreading large arrays access

Java Multithreading large arrays access - java

My main class, generates multiple threads based on some rules. (20-40 threads live for long time).
Each thread create several threads (short time ) --> I am using executer for this one.
I need to work on Multi dimension arrays in the short time threads --> I wrote it like it is in the code below --> but I think that it is not efficient since I pass it so many times to so many threads / tasks --. I tried to access it directly from the threads (by declaring it as public --> no success) --> will be happy to get comments / advices on how to improve it.
I also look at next step to return a 1 dimension array as a result (which might be better just to update it at the Assetfactory class ) --> and I am not sure how to.
please see the code below.
thanks
Paz
import java.util.concurrent.*;
import java.util.logging.Level;
public class AssetFactory implements Runnable{
private volatile boolean stop = false;
private volatile String feed ;
private double[][][] PeriodRates= new double[10][500][4];
private String TimeStr,Bid,periodicalRateIndicator;
private final BlockingQueue<String> workQueue;
ExecutorService IndicatorPool = Executors.newCachedThreadPool();
public AssetFactory(BlockingQueue<String> workQueue) {
this.workQueue = workQueue;
}
#Override
public void run(){
while (!stop) {
try{
feed = workQueue.take();
periodicalRateIndicator = CheckPeriod(TimeStr, Bid) ;
if (periodicalRateIndicator.length() >0) {
IndicatorPool.submit(new CalcMvg(periodicalRateIndicator,PeriodRates));
}
}
if ("Stop".equals(feed)) {
stop = true ;
}
} // try
catch (InterruptedException ex) {
logger.log(Level.SEVERE, null, ex);
stop = true;
}
} // while
} // run
Here is the CalcMVG class
public class CalcMvg implements Runnable {
private double [][][] PeriodRates = new double[10][500][4];
public CalcMvg(String Periods, double[][][] PeriodRates) {
System.out.println(Periods);
this.PeriodRates = PeriodRates ;
}
#Override
public void run(){
try{
// do some work with the data of PeriodRates array e.g. print it (no changes to array
System.out.println(PeriodRates[1][1][1]);
}
catch (Exception ex){
System.out.println(Thread.currentThread().getName() + ex.getMessage());
logger.log(Level.SEVERE, null, ex);
}
}//run
} // mvg class

There are several things going on here which seem to be wrong, but it is hard to give a good answer with the limited amount of code presented.
First the actual coding issues:
There is no need to define a variable as volatile if only one thread ever accesses it (stop, feed)
You should declare variables that are only used in a local context (run method) locally in that function and not globally for the whole instance (almost all variables). This allows the JIT to do various optimizations.
The InterruptedException should terminate the thread. Because it is thrown as a request to terminate the thread's work.
In your code example the workQueue doesn't seem to do anything but to put the threads to sleep or stop them. Why doesn't it just immediately feed the actual worker-threads with the required workload?
And then the code structure issues:
You use threads to feed threads with work. This is inefficient, as you only have a limited amount of cores that can actually do the work. As the execution order of threads is undefined, it is likely that the IndicatorPool is either mostly idle or overfilling with tasks that have not yet been done.
If you have a finite set of work to be done, the ExecutorCompletionService might be helpful for your task.
I think you will gain the best speed increase by redesigning the code structure. Imagine the following (assuming that I understood your question correctly):
There is a blocking queue of tasks that is fed by some data source (e.g. file-stream, network).
A set of worker-threads equal to the amount of cores is waiting on that data source for input, which is then processed and put into a completion queue.
A specific data set is the "terminator" for your work (e.g. "null"). If a thread encounters this terminator, it finishes it's loop and shuts down.
Now the following holds true for this construct:
Case 1: The data source is the bottle-neck. It cannot be speed-up by using multiple threads, as your harddisk/network won't work faster if you ask more often.
Case 2: The processing power on your machine is the bottle neck, as you cannot process more data than the worker threads/cores on your machine can handle.
In both cases the conclusion is, that the worker threads need to be the ones that seek for new data as soon as they are ready to process it. As either they need to be put on hold or they need to throttle the incoming data. This will ensure maximum throughput.
If all worker threads have terminated, the work is done. This can be i.E. tracked through the use of a CyclicBarrier or Phaser class.
Pseudo-code for the worker threads:
public void run() {
DataType e;
try {
while ((e = dataSource.next()) != null) {
process(e);
}
barrier.await();
} catch (InterruptedException ex) {
}
}
I hope this is helpful on your case.

Passing the array as an argument to the constructor is a reasonable approach, although unless you intend to copy the array it isn't necessary to initialize PeriodRates with a large array. It seems wasteful to allocate a large block of memory and then reassign its only reference straight away in the constructor. I would initialize it like this:
private final double [][][] PeriodRates;
public CalcMvg(String Periods, double[][][] PeriodRates) {
System.out.println(Periods);
this.PeriodRates = PeriodRates;
}
The other option is to define CalcMvg as an inner class of AssetFactory and declare PeriodRate as final. This would allow instances of CalcMvg to access PeriodRate in the outer instance of AssetFactory.
Returning the result is more difficult since it involves publishing the result across threads. One way to do this is to use synchronized methods:
private double[] result = null;
private synchronized void setResult(double[] result) {
this.result = result;
}
public synchronized double[] getResult() {
if (result == null) {
throw new RuntimeException("Result has not been initialized for this instance: " + this);
}
return result;
}
There are more advanced multi-threading concepts available in the Java libraries, e.g. Future, that might be appropriate in this case.
Regarding your concerns about the number of threads, allowing a library class to manage the allocation of work to a thread pool might solve this concern. Something like an Executor might help with this.

Related

Java code exits after some seconds due to concurrency

I am writing the same code on tutorial. But in tutorial the program never exits, my in my computer it exits after 4 seconds. Why?
tutorial with exact time where this code is shown: https://youtu.be/vzBw1LPupnA?t=169
public class Main {
private static boolean stopRequested;
public static void main(String[] args) throws InterruptedException {
Thread backgroundThread = new Thread(() -> {
int i = 0;
while (!stopRequested) {
i++;
System.out.println("i = " + i);
}
});
backgroundThread.start();
TimeUnit.SECONDS.sleep(1);
stopRequested = true;
}
}

The reason that you are seeing different behavior on your machine and in the video is because the program has unspecified behavior. (Or to put it another way, it is not thread-safe.)
You have two threads accessing and updating a shared variable without taking the necessary steps that will guarantee that changes made by one thread are visible to the other. What happens in that case is not specified.
In some cases (e.g. on some platforms) the changes will be visible, either immediately or within a short time.
On others, the changes may never be visible.
In technical terms, there must be a happens-before relationship between the write by on thread and the subsequent read by the other thread. This can be provided by both threads synchronizing on the same mutex or lock, by using a volatile variable, and in other ways. But this code doesn't do any of those things, so there is no guarantee that the state change will be visible.
For more details, read about the Java Memory Model.
The above is sufficient to explain the difference, but there may be a more direct explanation.
In practice, something like a System.out.println can lead to changes in the visibility. Underneath the covers, the println call will typically result in synchronization on the output stream's buffers. That can result in a serendipitous happens-before that is sufficient to guarantee visibility. But this behavior is not specified, so you should not rely on it.
At any rate, adding trace statements can change the behavior of multi-threaded coded. And the fact that you (apparently) added them in your version is a second possible explanation for the difference.
The bottom line here is that a program with a memory visibility flaw is broken, but you may not be able to demonstrate that it is broken.

As the excellent Answer by Stephen C says, your code is not thread-safe.
Establishing an AtomicBoolean early on addresses the visibility problem explained in that other Answer. This class is a thread-safe wrapper around its payload boolean value.
The volatile keyword is another solution. But I find the Atomic… classes simpler and more obvious.
Also, in modern Java we rarely need to address the Thread class directly. Instead, use the Executors framework. Define your task as a Runnable or Callable, and submit to an executor service.
Something like this untested code.
public class Main {
private static final AtomicBoolean stopRequested = new AtomicBoolean( false ) ;
public static void main(String[] args) throws InterruptedException {
Runnable task = () -> {
int i = 0;
while ( ! stopRequested.get() ) {
i++;
System.out.println("i = " + i);
TimeUnit.MILLISECONDS.sleep(100); // Don’t spin too fast.
}
};
ExecutorService es = Executors.newSingleThreadedExecutorService() ;
es.submit( task ) ;
TimeUnit.SECONDS.sleep(1);
stopRequested.set( true ) ;
TimeUnit.SECONDS.sleep(1);
// Shut down here executor service. Boilerplate taken from Javadoc.
es.shutdown(); // Disable new tasks from being submitted
try {
// Wait a while for existing tasks to terminate
if (!es.awaitTermination(60, TimeUnit.SECONDS)) {
es.shutdownNow(); // Cancel currently executing tasks
// Wait a while for tasks to respond to being cancelled
if (!es.awaitTermination(60, TimeUnit.SECONDS))
System.err.println("Executor service did not terminate");
}
} catch (InterruptedException ex) {
// (Re-)Cancel if current thread also interrupted
es.shutdownNow();
// Preserve interrupt status
Thread.currentThread().interrupt();
}
}
}

Variable 'runner' is not updated inside loop

Like this, I have two thread. The SleepRunner thread add some random numbers to a list then change flag to true and sleep. The main thread wait SleepRunner thread until the flag in SleepRunner object change from false to true then main thread will interrupte SleepRunner thread and the program will end.
But the question is, when the while loop is no body code in main thread, the variable 'runner' is not updated inside loop in other words The program is not over after SleepRunner thread change flag from false to true. So I tried to use debug tools in idea, but the program ended smoothly. And If I write some code, like System.out.println() or Thread.sleep(1) in while loop body at main thread, the program ended successfully too. it's too incredible! Does anyone know why this happens? Thanks.
public class Test1 {
public static void main(String[] args) {
SleepRunner runner = new SleepRunner();
Thread thread = new Thread(runner);
thread.start();
while(!(runner.isFlag())){
/*try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}*/
}
System.out.println("END");
thread.interrupt();
}
}
public class SleepRunner implements Runnable {
private boolean flag = false;
public boolean isFlag() {
return flag;
}
#Override
public void run() {
List<Integer> list = new ArrayList<>();
for (int i = 0; i < 100; i++) {
try {
Thread.sleep((long) (Math.random() * 200));
}
catch (InterruptedException e) {
System.out.println("Interrupted");
}
int num = (int) (Math.random() * 100);
System.out.println(Thread.currentThread().getName() + " " + num);
list.add(num);
}
flag = true;
System.out.println("30 Seconds");
try {
Thread.sleep(30000);
}
catch (InterruptedException e) {
System.out.println("Interrupted in 30 seconds");
}
System.out.println("sleep runner thread end");
}
}

You've violated the java memory model.
Here's how the JMM works*:
Each thread, whenever any field (from any object) is read or updated, flips a coin. On heads, it will make a copy and update/read from that. On tails, it won't. Your job is to ensure your code functions correctly regardless of how the coin lands, and you can't force the coinflip in a unit test. The coin need not be 'fair'. The coin's behaviour depends on the music playing in your music player, the whims of a toddler, and the phase of the moon. (In other words, any update/read may be done to a local cache copy, or not, up to the java implementation).
You may safely conclude that the only way to do it correctly, is to ensure the thread never flips that coin.
The way to accomplish that is to establish so-called 'comes before' relationships. Establishing them is done primarily by using synchronization primitives, or by calling methods that use synchronization primitives. For example, if I do this:
thread X:
synchronized(x) {
x.foo();
System.out.println(shared.y);
shared.y = 10;
}
thread Y:
synchronized(x) {
x.foo();
System.out.println(shared.y);
shared.y = 20;
}
then you've established a relationship: code block A comes before code block B, or vice versa, but you've at least established that they must run in order.
As a consequence, this will print either 0 10 or 0 20, guaranteed. Without the synchronized block, it can legally print 0 0 as well. All 3 results would be an acceptable result (the java lang spec says it's okay, and any bugs filed that you think this makes no sense would be disregarded as 'working as intended').
volatile can also be used, but volatile is quite limited.
Generally, because this cannot be adequately tested, there are only 3 ways to do threading properly in java:
'in the large': Use a webserver or other app framework that takes care of the multithreading. You don't write the psv main() method, that framework does, and all you write are 'handlers'. None of your handlers touch any shared data at all. The handlers either don't share data, or share it via a bus designed to do it right, such as a DB in serializable transaction isolation mode, or rabbitmq or some other message bus.
'in the small': Use fork/join to parallellize a giant task. The handler for the task cannot, of course, use any shared data.
read Concurrency in Practice (the book), prefer using the classes in the java.util.concurrent package, and in general be a guru about how this stuff works, because doing threading any other way is likely to result in you programming bugs which your tests probably won't catch, but will either blow up at production time, or will result in no actual multithreading (e.g. if you overzealously synchronize everything, you end up having all cores except one core just waiting around, and your code will actually run way slower than if it was just single threaded).
*) The full explanation is about a book's worth. I'm just giving you oversimplified highlights, as this is merely an SO answer.

Java: Getting ExecutorService to produce repeatable behavior?

I have been trying to parallelize a portion of a method within my code (as shown in the Example class's function_to_parallelize(...) method). I have examined the executor framework and found that Futures & Callables can be used to create several worker threads that will ultimately return values. However, the online examples often shown with the executor framework are very simple and none of them appear to suffer my particular case of requiring methods in the class that contains that bit of code I'm trying to parallelize. As per one Stackoverflow thread, I've managed to write an external class that implements Callable called Solver that implements that method call() and set up the executor framework as shown in the method function_to_parallelize(...). Some of the computation that would occur in each worker thread requires methods *subroutine_A(...)* that operate on the data members of the Example class (and further, some of these subroutines make use of random numbers for various sampling functions).
My issue is while my program executes and produces results (sometimes accurate, sometimes not), every time I run it the results of the combined computation of the various worker threads is different. I figured it must be a shared memory problem, so I input into the Solver constructor copies of every data member of the Example class, including the utility that contained the Random rng. Further, I copied the subroutines that I require even directly into the Solver class (even though it's able to call those methods from Example without this). Why would I be getting different values each time? Is there something I need to implement, such as locking mechanisms or synchronization?
Alternatively, is there a simpler way to inject some parallelization into that method? Rewriting the "Example" class or drastically changing my class structuring is not an option as I need it in its current form for a variety of other aspects of my software/system.
Below is my code vignette (well, it's an incredibly abstracted/reduced form so as to show you basic structure and the target area, even if it's a bit longer than usual vignettes):
public class Tools{
Random rng;
public Tools(Random rng){
this.rng = rng;
}...
}
public class Solver implements Callable<Tuple>{
public Tools toolkit;
public Item W;
public Item v;
Item input;
double param;
public Solver(Item input, double param, Item W, Item v, Tools toolkit){
this.input = input;
this.param = param;
//...so on & so forth for rest of arguments
}
public Item call() throws Exception {
//does computation that utilizes the data members W, v
//and calls some methods housed in the "toolkit" object
}
public Item subroutine_A(Item in){....}
public Item subroutine_B(Item in){....}
}
public class Example{
private static final int NTHREDS = 4;
public Tools toolkit;
public Item W;
public Item v;
public Example(...,Tools toolkit...){
this.toolkit = toolkit; ...
}
public Item subroutine_A(Item in){
// some of its internal computation involves sampling & random # generation using
// a call to toolkit, which houses functions that use the initialize Random rng
...
}
public Item subroutine_B(Item in){....}
public void function_to_parallelize(Item input, double param,...){
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
List<Future<Tuple>> list = new ArrayList<Future<Tuple>>();
while(some_stopping_condition){
// extract subset of input and feed into Solver constructor below
Callable<Tuple> worker = new Solver(input, param, W, v, toolkit);
Future<Tuple> submit = executor.submit(worker);
list.add(submit);
}
for(Future<Tuple> future : list){
try {
Item out = future.get();
// update W via some operation using "out" (like multiplying matrices for example)
}catch(InterruptedException e) {
e.printStackTrace();
}catch(ExecutionException e) {
e.printStackTrace();
}
}
executor.shutdown(); // properly terminate the threadpool
}
}
ADDENDUM: While flob's answer below did address a problem with my vignette/code (you should make sure that you are setting your code up to wait for all threads to catch up with .await()), the issue did not go away after I made this correction. It turns out that the problem lies in how Random works with threads. In essence, the threads are scheduled in various orders (via the OS/scheduler) and hence will not repeat the order in which they are executed every run of the program to ensure that a purely deterministic result is obtained. I examined the thread-safe version of Random (and used it to gain a bit more efficiency) but alas it does not allow you to set the seed. However, I highly recommend those who are looking to incorporate random computations within their thread workers to use this as the RNG for multi-threaded work.

The problem I see is you don't wait for all the tasks to finish before updating W and because of that some of the Callable instances will get the updated W instead of the one you were expecting
At this point W is updated even if not all tasks have finished
Blockquote
// update W via some operation using "out" (like multiplying matrices for example)
The tasks that are not finished will take the W updated above instead the one you expect
A quick solution (if you know how many Solver tasks you'll have) would be to use a CountDownLatch in order to see when all the tasks have finished:
public void function_to_parallelize(Item input, double param,...){
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
List<Future<Tuple>> list = new ArrayList<Future<Tuple>>();
CountDownLatch latch = new CountDownLatch(<number_of_tasks_created_in_next_loop>);
while(some_stopping_condition){
// extract subset of input and feed into Solver constructor below
Callable<Tuple> worker = new Solver(input, param, W, v, toolkit,latch);
Future<Tuple> submit = executor.submit(worker);
list.add(submit);
}
latch.await();
for(Future<Tuple> future : list){
try {
Item out = future.get();
// update W via some operation using "out" (like multiplying matrices for example)
}catch(InterruptedException e) {
e.printStackTrace();
}catch(ExecutionException e) {
e.printStackTrace();
}
}
executor.shutdown(); // properly terminate the threadpool
}
then in the Solver class you have to decrement the latch when call method ends:
public Item call() throws Exception {
//does computation that utilizes the data members W, v
//and calls some methods housed in the "toolkit" object
latch.countDown();
}

Junit test the correct number of threads has started

So I have a method that starts five threads. I want to write a unit test just to check that the five threads have been started. How do I do that? Sample codes are much appreciated.

Instead of writing your own method to start threads, why not use an Executor, which can be injected into your class? Then you can easily test it by passing in a dummy Executor.
Edit: Here's a simple example of how your code could be structured:
public class ResultCalculator {
private final ExecutorService pool;
private final List<Future<Integer>> pendingResults;
public ResultCalculator(ExecutorService pool) {
this.pool = pool;
this.pendingResults = new ArrayList<Future<Integer>>();
}
public void startComputation() {
for (int i = 0; i < 5; i++) {
Future<Integer> future = pool.submit(new Robot(i));
pendingResults.add(future);
}
}
public int getFinalResult() throws ExecutionException {
int total = 0;
for (Future<Integer> robotResult : pendingResults) {
total += robotResult.get();
}
return total;
}
}
public class Robot implements Callable<Integer> {
private final int input;
public Robot(int input) {
this.input = input;
}
#Override
public Integer call() {
// Some very long calculation
Thread.sleep(10000);
return input * input;
}
}
And here's how you'd call it from your main():
public static void main(String args) throws Exception {
// Note that the number of threads is now specified here
ExecutorService pool = Executors.newFixedThreadPool(5);
ResultCalculator calc = new ResultCalculator(pool);
try {
calc.startComputation();
// Maybe do something while we're waiting
System.out.printf("Result is: %d\n", calc.getFinalResult());
} finally {
pool.shutdownNow();
}
}
And here's how you'd test it (assuming JUnit 4 and Mockito):
#Test
#SuppressWarnings("unchecked")
public void testStartComputationAddsRobotsToQueue() {
ExecutorService pool = mock(ExecutorService.class);
Future<Integer> future = mock(Future.class);
when(pool.submit(any(Callable.class)).thenReturn(future);
ResultCalculator calc = new ResultCalculator(pool);
calc.startComputation();
verify(pool, times(5)).submit(any(Callable.class));
}
Note that all this code is just a sketch which I have not tested or even tried to compile yet. But it should give you an idea of how the code can be structured.

Rather than saying you are going to "test the five threads have been started", it would be better to step back and think about what the five threads are actually supposed to do. Then test to make sure that that "something" is actually being done.
If you really just want to test that the threads have been started, there are a few things you could do. Are you keeping references to the threads somewhere? If so, you could retrieve the references, count them, and call isAlive() on each one (checking that it returns true).
I believe there is some method on some Java platform class which you can call to find how many threads are running, or to find all the threads which are running in a ThreadGroup, but you would have to search to find out what it is.
More thoughts in response to your comment
If your code is as simple as new Thread(runnable).start(), I wouldn't bother to test that the threads are actually starting. If you do so, you're basically just testing that the Java platform works (it does). If your code for initializing and starting the threads is more complicated, I would stub out the thread.start() part and make sure that the stub is called the desired number of times, with the correct arguments, etc.
Regardless of what you do about that, I would definitely test that the task is completed correctly when running in multithreaded mode. From personal experience, I can tell you that as soon as you start doing anything remotely complicated with threads, it is devilishly easy to get subtle bugs which only show up under certain conditions, and perhaps only occasionally. Dealing with the complexity of multithreaded code is a very slippery slope.
Because of that, if you can do it, I would highly recommend you do more than just simple unit testing. Do stress tests where you run your task with many threads, on a multicore machine, on very large data sets, and make sure all the answers are exactly as expected.
Also, although you are expecting a performance increase from using threads, I highly recommend that you benchmark your program with varying numbers of threads, to make sure that the desired performance increase is actually achieved. Depending on how your system is designed, it's possible to wind up with concurrency bottlenecks which may make your program hardly faster with threads than without. In some cases, it can even be slower!

Java Threading Tutorial Type Question

I am fairly naive when it comes to the world of Java Threading and Concurrency. I am currently trying to learn. I made a simple example to try to figure out how concurrency works.
Here is my code:
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class ThreadedService {
private ExecutorService exec;
/**
* #param delegate
* #param poolSize
*/
public ThreadedService(int poolSize) {
if (poolSize < 1) {
this.exec = Executors.newCachedThreadPool();
} else {
this.exec = Executors.newFixedThreadPool(poolSize);
}
}
public void add(final String str) {
exec.execute(new Runnable() {
public void run() {
System.out.println(str);
}
});
}
public static void main(String args[]) {
ThreadedService t = new ThreadedService(25);
for (int i = 0; i < 100; i++) {
t.add("ADD: " + i);
}
}
}
What do I need to do to make the code print out the numbers 0-99 in sequential order?

Thread pools are usually used for operations which do not need synchronization or are highly parallel.
Printing the numbers 0-99 sequentially is not a concurrent problem and requires threads to be synchronized to avoid printing out of order.
I recommend taking a look at the Java concurrency lesson to get an idea of concurrency in Java.

The idea of threads is not to do things sequentially.
You will need some shared state to coordinate. In the example, adding instance fields to your outer class will work in this example. Remove the parameter from add. Add a lock object and a counter. Grab the lock, increment print the number, increment the number, release the number.

The simplest solution to your problem is to use a ThreadPool size of 1. However, this isn't really the kind of problem one would use threads to solve.
To expand, if you create your executor with:
this.exec = Executors.newSingleThreadExecutor();
then your threads will all be scheduled and executed in the order they were submitted for execution. There are a few scenarios where this is a logical thing to do, but in most cases Threads are the wrong tool to use to solve this problem.
This kind of thing makes sense to do when you need to execute the task in a different thread -- perhaps it takes a long time to execute and you don't want to block a GUI thread -- but you don't need or don't want the submitted tasks to run at the same time.

The problem is by definition not suited to threads. Threads are run independently and there isn't really a way to predict which thread is run first.
If you want to change your code to run sequentially, change add to:
public void add(final String str) {
System.out.println(str);
}
You are not using threads (not your own at least) and everything happens sequentially.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.