Faster multiple threading vs single threading - java

I'm implementing an online store where customers drop in an order and my program generates a summary of who the client is and what items they bought.
public class Summarizer{
private TreeSet<Order> allOrders = new TreeSet<Order>(new OrderComparator());
public void oneThreadProcessing(){
Thread t1;
for(Order order: allOrders){
t1 = new Thread(order);
t1.start();
try {
t1.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public void multipleThreadProcessing(){
Thread[] threads = new Thread[allOrders.size()];
int i = 0;
for(Order order: allOrders){
threads[i] = new Thread(order);
threads[i].start();
i++;
}
for(Thread t: threads){
try {
t.join();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
public static void main (String[] args) {
Summarizer s = new Summarizer();
long startTime = System.currentTimeMillis();
s.oneThreadProcessing();
long endTime = System.currentTimeMillis();
System.out.println("Processing time (msec): " + (endTime - startTime));
System.out.println("-------------Multiple Thread-------------------------------");
long startTime1 = System.currentTimeMillis();
s.multipleThreadProcessing();
long endTime1 = System.currentTimeMillis();
System.out.println("Processing time (msec): " + (endTime1 - startTime1));
This is my Order class:
public class Order implements Runnable, Comparable<Order> {
private int clientId;
#Override
public void run() {
/*
Print out summary in the form:
Client id: 1001
Item: Shoes, Quantity: 2, Cost per item: $30.00, Total Cost: $60.00
Item: Bag, Quantity: 1, Cost per item: $15.00, Total Cost: $15.00
Total: $75.00
/*
}
}
Assuming I fill in this TreeSet with all the orders in a particular day sorted by the client ID numbers, I wanna generate the summaries of all these orders using only one thread and then multiple threads and compare the performance. The requirement is to have the multiple threaded performance be better and I would assume it would be but every time I run my main method. The multiple threading is actually almost always slower. Am I doing something wrong? I know one can never be certain that the multiple threading program will be faster but since I'm not dealing with any locking as of yet, shouldn't my program be faster in the multipleThreadedProcessing()? Is there any way to ensure it is faster then?

Multi threading is not a magic powder one sprinkes onto code to make it run faster. It requires careful thought about what you're doing.
Let's analyze the multi threaded code you've written.
You have many Orders (How many btw? dozens? hundreds? thousands? millions?) and when executed, each order prints itself to the screen. That means that you're spawning a possibly enormous number of threads, just so each of them would spend most of its time waiting for System.out to become available (since it would be occupied by other threads). Each thread requires OS and JVM involvement, plus it requires CPU time, and memory resources; so why should such a multi threaded approach run faster than a single thread? A single thread requires no additional memory, and no additional context switching, and doesn't have to wait for System.out. So naturally it would be faster.
To get the multi threaded version to work faster, you need to think of a way to distribute the work between threads without creating unnecessary contention over resources. For one, you probably don't need more than one thread performing the I/O task of writing to the screen. Multiple threads writing to System.out would just block each other. If you have several IO devices you need to write to, then create one or more threads for each (depending on the nature of the device). Computational tasks can sometimes be executed quicker in parallel, but:
A) You don't need more threads doing computation work than the number of cores your CPU has. If you have too many CPU-bound threads, they'd just waste time context switching.
B) You need to reason through your parallel processing, and plan it thoroughly. Give each thread a chunk of the computational tasks that need to be done, then combine the results of each of them.

Related

Get results of scheduled non-blocking operations in Java

I am trying to do some blocking operations (say HTTP request) in a scheduled and non-blocking manner. Let's say I have 10 requests and one request takes 3 seconds but I would like not to wait for 3 seconds but wait 1 second and send the next one. After all executions are finished I would like to gather all results in a list and return to the user.
Below, there is a prototype of my scenario (thread sleep used as blocking operation instead of HTTP req.)
public static List<Integer> getResults(List<Integer> inputs) throws InterruptedException, ExecutionException {
List<Integer> results = new LinkedList<Integer>();
Queue<Callable<Integer>> tasks = new LinkedList<Callable<Integer>>();
List<Future<Integer>> futures = new LinkedList<Future<Integer>>();
for (Integer input : inputs) {
Callable<Integer> task = new Callable<Integer>() {
public Integer call() throws InterruptedException {
Thread.sleep(3000);
return input + 1000;
}
};
tasks.add(task);
}
ExecutorService es = Executors.newCachedThreadPool();
ScheduledExecutorService ses = Executors.newScheduledThreadPool(1);
ses.scheduleAtFixedRate(new Runnable() {
#Override
public void run() {
Callable<Integer> task = tasks.poll();
if (task == null) {
ses.shutdown();
es.shutdown();
return;
}
futures.add(es.submit(task));
}
}, 0, 1000, TimeUnit.MILLISECONDS);
while(true) {
if(futures.size() == inputs.size()) {
for (Future<Integer> future : futures) {
Integer result = future.get();
results.add(result);
}
return results;
}
}
}
public static void main(String[] args) throws InterruptedException, ExecutionException {
List<Integer> results = getResults(new LinkedList<Integer>(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)));
System.out.println(Arrays.toString(results.toArray()));
}
I am waiting in a while loop until all tasks return a proper result. But it never enters inside the breaking condition and it infinitely loops. Whenever I put an I/O operation like logger or even a breakpoint, it just break the while loop and everything becomes ok.
I am relatively new to Java concurrency and trying to understand what is happening and whether this is the correct way to do. I guess I/O operation triggers something on thread scheduler and make it check the collections' sizes.
You need to synchronize your threads. You have two different threads (the main thread and the exectuor service thread) accessing the futures list and since LinkedList is not synchronized, these two threads see two different values of futures.
while(true) {
synchronized(futures) {
if(futures.size() == inputs.size()) {
...
}
}
}
This happens because threads in java use the cpu cache to improve performance. So each thread could have different values of a variable until they are synchronized.
This SO question has more information on this.
Also from this answer:
It's all about memory. Threads communicate through shared memory, but when there are multiple CPUs in a system, all trying to access the same memory system, then the memory system becomes a bottleneck. Therefore, the CPUs in a typical multi-CPU computer are allowed to delay, re-order, and cache memory operations in order to speed things up.
That works great when threads are not interacting with one another, but it causes problems when they actually do want to interact: If thread A stores a value into an ordinary variable, Java makes no guarantee about when (or even if) thread B will see the value change.
In order to overcome that problem when it's important, Java gives you certain means of synchronizing threads. That is, getting the threads to agree on the state of the program's memory. The volatile keyword and the synchronized keyword are two means of establishing synchronization between threads.
And finally, the futures list does not update in your code because the main thread is continuously occupied, because of the infinte while block. Doing any I/O operation in your while loop gives the cpu enough breathing space to update its local cache.
An infinite while loop is generally a bad idea because it is very resource intensive. Adding a small delay before the next iteration can make it a little better (though still inefficient).

non-fair ReentrantReadWriteLock write and read lock priorities [duplicate]

ReentrantReadWriteLock has a fair and non-fair(default) mode, but the document is so hard for me to understand it.
How can I understand it? It's great if there is some code example to demo it.
UPDATE
If I have a writing thread, and many many reading thread, which mode is better to use? If I use non-fair mode, is it possible the writing thread has little chance to get the lock?
Non-fair means that when the lock is ready to be obtained by a new thread, the lock gives no guarantees to the fairness of who obtains the lock (assuming there are multiple threads requesting the lock at the time). In other words, it is conceivable that one thread might be continuously starved because other threads always manage to arbitrarily get the lock instead of it.
Fair mode acts more like first-come-first-served, where threads are guaranteed some level of fairness that they will obtain the lock in a fair manner (e.g. before a thread that started waiting long after).
Edit
Here is an example program that demonstrates the fairness of locks (in that write lock requests for a fair lock are first come, first served). Compare the results when FAIR = true (the threads are always served in order) versus FAIR = false (the threads are sometimes served out of order).
import java.util.concurrent.locks.ReentrantReadWriteLock;
public class FairLocking {
public static final boolean FAIR = true;
private static final int NUM_THREADS = 3;
private static volatile int expectedIndex = 0;
public static void main(String[] args) throws InterruptedException {
ReentrantReadWriteLock.WriteLock lock = new ReentrantReadWriteLock(FAIR).writeLock();
// we grab the lock to start to make sure the threads don't start until we're ready
lock.lock();
for (int i = 0; i < NUM_THREADS; i++) {
new Thread(new ExampleRunnable(i, lock)).start();
// a cheap way to make sure that runnable 0 requests the first lock
// before runnable 1
Thread.sleep(10);
}
// let the threads go
lock.unlock();
}
private static class ExampleRunnable implements Runnable {
private final int index;
private final ReentrantReadWriteLock.WriteLock writeLock;
public ExampleRunnable(int index, ReentrantReadWriteLock.WriteLock writeLock) {
this.index = index;
this.writeLock = writeLock;
}
public void run() {
while(true) {
writeLock.lock();
try {
// this sleep is a cheap way to make sure the previous thread loops
// around before another thread grabs the lock, does its work,
// loops around and requests the lock again ahead of it.
Thread.sleep(10);
} catch (InterruptedException e) {
//ignored
}
if (index != expectedIndex) {
System.out.printf("Unexpected thread obtained lock! " +
"Expected: %d Actual: %d%n", expectedIndex, index);
System.exit(0);
}
expectedIndex = (expectedIndex+1) % NUM_THREADS;
writeLock.unlock();
}
}
}
}
Edit (again)
Regarding your update, with non-fair locking it's not that there's a possibility that a thread will have a low chance of getting a lock, but rather that there's a low chance that a thread will have to wait a bit.
Now, typically as the starvation period increases, the probability of that length of time actually occuring decreases...just as flipping a coin "heads" 10 consecutive times is less likely to occur than flipping a coin "heads" 9 consecutive times.
But if the selection algorithm for multiple waiting threads was something non-randomized, like "the thread with the alphabetically-first name always gets the lock" then you might have a real problem because the probability does not necessarily decrease as the thread gets more and more starved...if a coin is weighted to "heads" 10 consecutive heads is essentially as likely as 9 consecutive heads.
I believe that in implementations of non-fair locking a somewhat "fair" coin is used. So the question really becomes fairness (and thus, latency) vs throughput. Using non-fair locking typically results in better throughput but at the expense of the occasional spike in latency for a lock request. Which is better for you depends on your own requirements.
When some threads waiting for a lock, and the lock has to select one thread to get the access to the critical section:
In non-fair mode, it selects thread without any criteria.
In fair mode, it selects thread that has waiting for the most time.
Note: Take into account that the behavior explained previously is only used with the lock() and unlock() methods. As the tryLock() method doesn't put the thread to sleep if the Lock interface is used, the fair attribute doesn't affect its functionality.

Java Multithreading large arrays access

My main class, generates multiple threads based on some rules. (20-40 threads live for long time).
Each thread create several threads (short time ) --> I am using executer for this one.
I need to work on Multi dimension arrays in the short time threads --> I wrote it like it is in the code below --> but I think that it is not efficient since I pass it so many times to so many threads / tasks --. I tried to access it directly from the threads (by declaring it as public --> no success) --> will be happy to get comments / advices on how to improve it.
I also look at next step to return a 1 dimension array as a result (which might be better just to update it at the Assetfactory class ) --> and I am not sure how to.
please see the code below.
thanks
Paz
import java.util.concurrent.*;
import java.util.logging.Level;
public class AssetFactory implements Runnable{
private volatile boolean stop = false;
private volatile String feed ;
private double[][][] PeriodRates= new double[10][500][4];
private String TimeStr,Bid,periodicalRateIndicator;
private final BlockingQueue<String> workQueue;
ExecutorService IndicatorPool = Executors.newCachedThreadPool();
public AssetFactory(BlockingQueue<String> workQueue) {
this.workQueue = workQueue;
}
#Override
public void run(){
while (!stop) {
try{
feed = workQueue.take();
periodicalRateIndicator = CheckPeriod(TimeStr, Bid) ;
if (periodicalRateIndicator.length() >0) {
IndicatorPool.submit(new CalcMvg(periodicalRateIndicator,PeriodRates));
}
}
if ("Stop".equals(feed)) {
stop = true ;
}
} // try
catch (InterruptedException ex) {
logger.log(Level.SEVERE, null, ex);
stop = true;
}
} // while
} // run
Here is the CalcMVG class
public class CalcMvg implements Runnable {
private double [][][] PeriodRates = new double[10][500][4];
public CalcMvg(String Periods, double[][][] PeriodRates) {
System.out.println(Periods);
this.PeriodRates = PeriodRates ;
}
#Override
public void run(){
try{
// do some work with the data of PeriodRates array e.g. print it (no changes to array
System.out.println(PeriodRates[1][1][1]);
}
catch (Exception ex){
System.out.println(Thread.currentThread().getName() + ex.getMessage());
logger.log(Level.SEVERE, null, ex);
}
}//run
} // mvg class
There are several things going on here which seem to be wrong, but it is hard to give a good answer with the limited amount of code presented.
First the actual coding issues:
There is no need to define a variable as volatile if only one thread ever accesses it (stop, feed)
You should declare variables that are only used in a local context (run method) locally in that function and not globally for the whole instance (almost all variables). This allows the JIT to do various optimizations.
The InterruptedException should terminate the thread. Because it is thrown as a request to terminate the thread's work.
In your code example the workQueue doesn't seem to do anything but to put the threads to sleep or stop them. Why doesn't it just immediately feed the actual worker-threads with the required workload?
And then the code structure issues:
You use threads to feed threads with work. This is inefficient, as you only have a limited amount of cores that can actually do the work. As the execution order of threads is undefined, it is likely that the IndicatorPool is either mostly idle or overfilling with tasks that have not yet been done.
If you have a finite set of work to be done, the ExecutorCompletionService might be helpful for your task.
I think you will gain the best speed increase by redesigning the code structure. Imagine the following (assuming that I understood your question correctly):
There is a blocking queue of tasks that is fed by some data source (e.g. file-stream, network).
A set of worker-threads equal to the amount of cores is waiting on that data source for input, which is then processed and put into a completion queue.
A specific data set is the "terminator" for your work (e.g. "null"). If a thread encounters this terminator, it finishes it's loop and shuts down.
Now the following holds true for this construct:
Case 1: The data source is the bottle-neck. It cannot be speed-up by using multiple threads, as your harddisk/network won't work faster if you ask more often.
Case 2: The processing power on your machine is the bottle neck, as you cannot process more data than the worker threads/cores on your machine can handle.
In both cases the conclusion is, that the worker threads need to be the ones that seek for new data as soon as they are ready to process it. As either they need to be put on hold or they need to throttle the incoming data. This will ensure maximum throughput.
If all worker threads have terminated, the work is done. This can be i.E. tracked through the use of a CyclicBarrier or Phaser class.
Pseudo-code for the worker threads:
public void run() {
DataType e;
try {
while ((e = dataSource.next()) != null) {
process(e);
}
barrier.await();
} catch (InterruptedException ex) {
}
}
I hope this is helpful on your case.
Passing the array as an argument to the constructor is a reasonable approach, although unless you intend to copy the array it isn't necessary to initialize PeriodRates with a large array. It seems wasteful to allocate a large block of memory and then reassign its only reference straight away in the constructor. I would initialize it like this:
private final double [][][] PeriodRates;
public CalcMvg(String Periods, double[][][] PeriodRates) {
System.out.println(Periods);
this.PeriodRates = PeriodRates;
}
The other option is to define CalcMvg as an inner class of AssetFactory and declare PeriodRate as final. This would allow instances of CalcMvg to access PeriodRate in the outer instance of AssetFactory.
Returning the result is more difficult since it involves publishing the result across threads. One way to do this is to use synchronized methods:
private double[] result = null;
private synchronized void setResult(double[] result) {
this.result = result;
}
public synchronized double[] getResult() {
if (result == null) {
throw new RuntimeException("Result has not been initialized for this instance: " + this);
}
return result;
}
There are more advanced multi-threading concepts available in the Java libraries, e.g. Future, that might be appropriate in this case.
Regarding your concerns about the number of threads, allowing a library class to manage the allocation of work to a thread pool might solve this concern. Something like an Executor might help with this.

Java threads - High cpu utilization?

I have two threads. I am invoking one (the SocketThread) first and then from that one I am invoking another thread (the 'ProcessThread'). My issue is that, during the execution the CPU usage is 50%. It reduces to 0% when I add TimeUnit.NANOSECONDS.sleep(1) in the ProcessThread run method. Is this the right method to modify? Or any advice in general for reducing the CUP utilization.
Below is my code:
public class SocketThread extends Thread {
private Set<Object> setSocketOutput = new HashSet<Object>(1, 1);
private BlockingQueue<Set<Object>> bqSocketOutput;
ProcessThread pThread;
#Override
public void run() {
pThread = new ProcessThread(bqSocketOutput);
pThread.start();
for(long i=0; i<= 30000; i++) {
System.out.println("SocketThread - Testing" + i);
}
}
}
public class ProcessThread extends Thread {
public ProcessThread(BlockingQueue<Set<Object>> bqTrace) {
System.out.println("ProcessThread - Constructor");
}
#Override
public void run() {
System.out.println("ProcessThread - Exectution");
while (true) {
/*
try {
TimeUnit.NANOSECONDS.sleep(1);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}*/
}
}
}
You can "reduce the CPU utilization" by sleeping threads, but that means that you are not getting any work done on those threads (or if you are, it's getting done dramatically slower than if you just let the threads run full-out). It's like saying you can reduce fuel consumption in your car by stopping every few miles and turning the engine off.
Typically a while(true) loop being run without some sort of blocking thread synchronization (like a .Wait(), or in your situation, a BlockingQueue.take() as #Martin-James suggests) is a code smell and indicates code that should be refactored.
If your worker thread waits on the blocking queue by calling take() it should not consume CPU resources.
If its in a tight loop that does nothing (as the code in the example suggests) then of course it will consume resources. Calling sleep inside a loop as a limiter probably isn't the best idea.
You have two tight loops that will hog the CPU as long there's work to be done. Sleeping is one way to slow down your application (and decrease CPU utilization) but it's rarely, if ever, the desired result.
If you insist on sleeping, you need to increase your sleep times to be at least 20 milliseconds and tune from there. You can also look into sleeping after a batch of tasks. You'll also need a similar sleep in the SocketThread print loop.

Java Concurrency JDK 1.6: Busy wait does better than signalling? Effective Java #51

Joshua Bloch's "Effective Java", Item 51 is not about depending on the thread scheduler as well as not keeping threads unnecessarily in the runnable state. Quoted text:
The main technique for keeping the number of runnable threads down is to have each thread
do a small amount of work and then wait for some condition using Object.wait or for some
time to elapse using Thread.sleep. Threads should not busy-wait, repeatedly checking a data
structure waiting for something to happen. Besides making the program vulnerable to the
vagaries of the scheduler, busy-waiting can greatly increase the load on the processor,
reducing the amount of useful work that other processes can accomplish on the same machine.
And then goes on to show a microbenchmark of a busy wait vs using signals properly. In the book, the busy wait does 17 round trips/s whereas the wait/notify version does 23,000 round trips per second.
However, when I tried the same benchmark on JDK 1.6, I see just the opposite - the busy wait does 760K roundtrips/second whereas the wait/notify version does 53.3K roundtrips/s - that is, wait/notify should have been ~1400 times faster, but turns out to be ~13 times slower?
I understand the busy waits aren't good and signalling is still better - cpu utilization is ~50% on the busy wait version whereas it stays at ~30% on the wait/notify version - but is there something that explains the numbers?
If it helps, I'm running JDK1.6 (32 bit) on Win 7 x64 (core i5).
UPDATE: Source below. To run the busy work bench, change the base class of PingPongQueue to BusyWorkQueue
import java.util.LinkedList;
import java.util.List;
abstract class SignalWorkQueue {
private final List queue = new LinkedList();
private boolean stopped = false;
protected SignalWorkQueue() { new WorkerThread().start(); }
public final void enqueue(Object workItem) {
synchronized (queue) {
queue.add(workItem);
queue.notify();
}
}
public final void stop() {
synchronized (queue) {
stopped = true;
queue.notify();
}
}
protected abstract void processItem(Object workItem)
throws InterruptedException;
private class WorkerThread extends Thread {
public void run() {
while (true) { // Main loop
Object workItem = null;
synchronized (queue) {
try {
while (queue.isEmpty() && !stopped)
queue.wait();
} catch (InterruptedException e) {
return;
}
if (stopped)
return;
workItem = queue.remove(0);
}
try {
processItem(workItem); // No lock held
} catch (InterruptedException e) {
return;
}
}
}
}
}
// HORRIBLE PROGRAM - uses busy-wait instead of Object.wait!
abstract class BusyWorkQueue {
private final List queue = new LinkedList();
private boolean stopped = false;
protected BusyWorkQueue() {
new WorkerThread().start();
}
public final void enqueue(Object workItem) {
synchronized (queue) {
queue.add(workItem);
}
}
public final void stop() {
synchronized (queue) {
stopped = true;
}
}
protected abstract void processItem(Object workItem)
throws InterruptedException;
private class WorkerThread extends Thread {
public void run() {
final Object QUEUE_IS_EMPTY = new Object();
while (true) { // Main loop
Object workItem = QUEUE_IS_EMPTY;
synchronized (queue) {
if (stopped)
return;
if (!queue.isEmpty())
workItem = queue.remove(0);
}
if (workItem != QUEUE_IS_EMPTY) {
try {
processItem(workItem);
} catch (InterruptedException e) {
return;
}
}
}
}
}
}
class PingPongQueue extends SignalWorkQueue {
volatile int count = 0;
protected void processItem(final Object sender) {
count++;
SignalWorkQueue recipient = (SignalWorkQueue) sender;
recipient.enqueue(this);
}
}
public class WaitQueuePerf {
public static void main(String[] args) {
PingPongQueue q1 = new PingPongQueue();
PingPongQueue q2 = new PingPongQueue();
q1.enqueue(q2); // Kick-start the system
// Give the system 10 seconds to warm up
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
}
// Measure the number of round trips in 10 seconds
int count = q1.count;
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
}
System.out.println(q1.count - count);
q1.stop();
q2.stop();
}
}
In your test, the queue gets new items continuously, therefore the busy-wait does very little actual waiting.
If the queue get one new item every 1ms, you can see the busy-wait will spend most time burning CPU for nothing. It will slow down other part of the application.
So it depends. If you busy wait on an user input, that is definitely wrong; while the busy-wait in lockless datastructures like AtomicInteger is definitely good.
Yes, busy wait will respond more quickly and execute more loops, but I think the point was that it puts an disproportionally heavier load on the entire system.
Try running 1000 busy wait threads vs 1000 wait/notify threads and check your total throughput.
I think the difference you observed is probably sun re-optimizing the compiler for what people do rather than what people should do. Sun does that all the time. The original benchmark in the book may have even been due to some scheduler bug that Sun fixed--with that ratio it certainly sounds wrong.
It's depends on the amount of threads and the degree of conflicts: Busy waits are bad, if happens often and/or consume many CPU cycles.
But atomic Integers (AtomicInteger, AtomicIntegerArray ...) are better than synchronzing an Integer or int[], even the thread also perfom busy waits.
Use the java.util.concurrent package and in your case ConcurrentLinkedQueueas often as possible
Busy waiting is not always a bad thing. "Proper" (at the low-level) way of doing things - using Java synchronization primitives - carries an overhead, oftentimes significant, of bookkeeping, necessary to implement general-purpose mechanisms, performing fairly well in most scenarios. Busy waiting, on the other hand, is very lightweight, and in some situations can be quite an improvement over the one-size-fits-all synchronization. While synchronization based solely on busy-waiting is definitely a no-no in any general setting, it's ocassionaly quite useful. It's true not only for Java - spinlocks (fancy name for busy-waiting based locks) are widely used in database servers, for instance.
In fact, if you take a walk through java.util.concurrent package sources, you'll find many places containing "tricky", seemingly fragile code. I find SynchronousQueue a nice example (you can take a look at the source in JDK distribution or here, both OpenJDK and Oracle seem to use the same implementation). Busy waiting is used as an optimization - after certain amount of "spins", the thread goes into proper "sleep". Apart from that, it has some other niceties as well - volatile piggybacking, spin treshold dependant on number of CPUs etc. It's really... illuminating, in that it shows what it takes to implement efficient low-level concurrency. Even better, the code itself is really clean, well-documented and high-quality in general.

Categories

Resources