I have two threads. I am invoking one (the SocketThread) first and then from that one I am invoking another thread (the 'ProcessThread'). My issue is that, during the execution the CPU usage is 50%. It reduces to 0% when I add TimeUnit.NANOSECONDS.sleep(1) in the ProcessThread run method. Is this the right method to modify? Or any advice in general for reducing the CUP utilization.
Below is my code:
public class SocketThread extends Thread {
private Set<Object> setSocketOutput = new HashSet<Object>(1, 1);
private BlockingQueue<Set<Object>> bqSocketOutput;
ProcessThread pThread;
#Override
public void run() {
pThread = new ProcessThread(bqSocketOutput);
pThread.start();
for(long i=0; i<= 30000; i++) {
System.out.println("SocketThread - Testing" + i);
}
}
}
public class ProcessThread extends Thread {
public ProcessThread(BlockingQueue<Set<Object>> bqTrace) {
System.out.println("ProcessThread - Constructor");
}
#Override
public void run() {
System.out.println("ProcessThread - Exectution");
while (true) {
/*
try {
TimeUnit.NANOSECONDS.sleep(1);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}*/
}
}
}
You can "reduce the CPU utilization" by sleeping threads, but that means that you are not getting any work done on those threads (or if you are, it's getting done dramatically slower than if you just let the threads run full-out). It's like saying you can reduce fuel consumption in your car by stopping every few miles and turning the engine off.
Typically a while(true) loop being run without some sort of blocking thread synchronization (like a .Wait(), or in your situation, a BlockingQueue.take() as #Martin-James suggests) is a code smell and indicates code that should be refactored.
If your worker thread waits on the blocking queue by calling take() it should not consume CPU resources.
If its in a tight loop that does nothing (as the code in the example suggests) then of course it will consume resources. Calling sleep inside a loop as a limiter probably isn't the best idea.
You have two tight loops that will hog the CPU as long there's work to be done. Sleeping is one way to slow down your application (and decrease CPU utilization) but it's rarely, if ever, the desired result.
If you insist on sleeping, you need to increase your sleep times to be at least 20 milliseconds and tune from there. You can also look into sleeping after a batch of tasks. You'll also need a similar sleep in the SocketThread print loop.
Related
Like this, I have two thread. The SleepRunner thread add some random numbers to a list then change flag to true and sleep. The main thread wait SleepRunner thread until the flag in SleepRunner object change from false to true then main thread will interrupte SleepRunner thread and the program will end.
But the question is, when the while loop is no body code in main thread, the variable 'runner' is not updated inside loop in other words The program is not over after SleepRunner thread change flag from false to true. So I tried to use debug tools in idea, but the program ended smoothly. And If I write some code, like System.out.println() or Thread.sleep(1) in while loop body at main thread, the program ended successfully too. it's too incredible! Does anyone know why this happens? Thanks.
public class Test1 {
public static void main(String[] args) {
SleepRunner runner = new SleepRunner();
Thread thread = new Thread(runner);
thread.start();
while(!(runner.isFlag())){
/*try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}*/
}
System.out.println("END");
thread.interrupt();
}
}
public class SleepRunner implements Runnable {
private boolean flag = false;
public boolean isFlag() {
return flag;
}
#Override
public void run() {
List<Integer> list = new ArrayList<>();
for (int i = 0; i < 100; i++) {
try {
Thread.sleep((long) (Math.random() * 200));
}
catch (InterruptedException e) {
System.out.println("Interrupted");
}
int num = (int) (Math.random() * 100);
System.out.println(Thread.currentThread().getName() + " " + num);
list.add(num);
}
flag = true;
System.out.println("30 Seconds");
try {
Thread.sleep(30000);
}
catch (InterruptedException e) {
System.out.println("Interrupted in 30 seconds");
}
System.out.println("sleep runner thread end");
}
}
You've violated the java memory model.
Here's how the JMM works*:
Each thread, whenever any field (from any object) is read or updated, flips a coin. On heads, it will make a copy and update/read from that. On tails, it won't. Your job is to ensure your code functions correctly regardless of how the coin lands, and you can't force the coinflip in a unit test. The coin need not be 'fair'. The coin's behaviour depends on the music playing in your music player, the whims of a toddler, and the phase of the moon. (In other words, any update/read may be done to a local cache copy, or not, up to the java implementation).
You may safely conclude that the only way to do it correctly, is to ensure the thread never flips that coin.
The way to accomplish that is to establish so-called 'comes before' relationships. Establishing them is done primarily by using synchronization primitives, or by calling methods that use synchronization primitives. For example, if I do this:
thread X:
synchronized(x) {
x.foo();
System.out.println(shared.y);
shared.y = 10;
}
thread Y:
synchronized(x) {
x.foo();
System.out.println(shared.y);
shared.y = 20;
}
then you've established a relationship: code block A comes before code block B, or vice versa, but you've at least established that they must run in order.
As a consequence, this will print either 0 10 or 0 20, guaranteed. Without the synchronized block, it can legally print 0 0 as well. All 3 results would be an acceptable result (the java lang spec says it's okay, and any bugs filed that you think this makes no sense would be disregarded as 'working as intended').
volatile can also be used, but volatile is quite limited.
Generally, because this cannot be adequately tested, there are only 3 ways to do threading properly in java:
'in the large': Use a webserver or other app framework that takes care of the multithreading. You don't write the psv main() method, that framework does, and all you write are 'handlers'. None of your handlers touch any shared data at all. The handlers either don't share data, or share it via a bus designed to do it right, such as a DB in serializable transaction isolation mode, or rabbitmq or some other message bus.
'in the small': Use fork/join to parallellize a giant task. The handler for the task cannot, of course, use any shared data.
read Concurrency in Practice (the book), prefer using the classes in the java.util.concurrent package, and in general be a guru about how this stuff works, because doing threading any other way is likely to result in you programming bugs which your tests probably won't catch, but will either blow up at production time, or will result in no actual multithreading (e.g. if you overzealously synchronize everything, you end up having all cores except one core just waiting around, and your code will actually run way slower than if it was just single threaded).
*) The full explanation is about a book's worth. I'm just giving you oversimplified highlights, as this is merely an SO answer.
My program looks like this:
Executor executor = Executors.newSingleThreadExecutor();
void work1(){
while (true) {
// do heavy work 1
Object data;
executor.execute(() -> work2(data));
}
}
void work2(Object data){
// do heavy work 2
}
I noticed that when work2 becomes heavy it affects work1 as well. It gets to the point when there is almost no gain in splitting the process into two threads.
What could be the reasons for this behavior and what tools do I have to find and analyze those problems?
Oh and here are my machine specs:
"while (true) {}" works fast but work2 is heavy and works slow. As a result, the number of tasks waiting for the single thread increases infinitely. So available core memory is exhausted and virtual memory is used, which is much slower. Standard thread pool is not designed to handle large number of tasks. A correct solution is as follows:
class WorkerThread extends Thread {
ArrayBlockingQueue<Runnable> queue = new ArrayBlockingQueue<>(10);
public void run() {
while true() {
queue.take().run();
}
}
}
WorkerThread workerThread = new WorkerThread();
workerThread.start();
void work1(){
while (true) {
Object data;
// do heavy work
workerThread.queue.put(() -> work2(data));
}
}
Using ArrayBlockingQueue keeps number of waiting tasks small.
I'm implementing an online store where customers drop in an order and my program generates a summary of who the client is and what items they bought.
public class Summarizer{
private TreeSet<Order> allOrders = new TreeSet<Order>(new OrderComparator());
public void oneThreadProcessing(){
Thread t1;
for(Order order: allOrders){
t1 = new Thread(order);
t1.start();
try {
t1.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public void multipleThreadProcessing(){
Thread[] threads = new Thread[allOrders.size()];
int i = 0;
for(Order order: allOrders){
threads[i] = new Thread(order);
threads[i].start();
i++;
}
for(Thread t: threads){
try {
t.join();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
public static void main (String[] args) {
Summarizer s = new Summarizer();
long startTime = System.currentTimeMillis();
s.oneThreadProcessing();
long endTime = System.currentTimeMillis();
System.out.println("Processing time (msec): " + (endTime - startTime));
System.out.println("-------------Multiple Thread-------------------------------");
long startTime1 = System.currentTimeMillis();
s.multipleThreadProcessing();
long endTime1 = System.currentTimeMillis();
System.out.println("Processing time (msec): " + (endTime1 - startTime1));
This is my Order class:
public class Order implements Runnable, Comparable<Order> {
private int clientId;
#Override
public void run() {
/*
Print out summary in the form:
Client id: 1001
Item: Shoes, Quantity: 2, Cost per item: $30.00, Total Cost: $60.00
Item: Bag, Quantity: 1, Cost per item: $15.00, Total Cost: $15.00
Total: $75.00
/*
}
}
Assuming I fill in this TreeSet with all the orders in a particular day sorted by the client ID numbers, I wanna generate the summaries of all these orders using only one thread and then multiple threads and compare the performance. The requirement is to have the multiple threaded performance be better and I would assume it would be but every time I run my main method. The multiple threading is actually almost always slower. Am I doing something wrong? I know one can never be certain that the multiple threading program will be faster but since I'm not dealing with any locking as of yet, shouldn't my program be faster in the multipleThreadedProcessing()? Is there any way to ensure it is faster then?
Multi threading is not a magic powder one sprinkes onto code to make it run faster. It requires careful thought about what you're doing.
Let's analyze the multi threaded code you've written.
You have many Orders (How many btw? dozens? hundreds? thousands? millions?) and when executed, each order prints itself to the screen. That means that you're spawning a possibly enormous number of threads, just so each of them would spend most of its time waiting for System.out to become available (since it would be occupied by other threads). Each thread requires OS and JVM involvement, plus it requires CPU time, and memory resources; so why should such a multi threaded approach run faster than a single thread? A single thread requires no additional memory, and no additional context switching, and doesn't have to wait for System.out. So naturally it would be faster.
To get the multi threaded version to work faster, you need to think of a way to distribute the work between threads without creating unnecessary contention over resources. For one, you probably don't need more than one thread performing the I/O task of writing to the screen. Multiple threads writing to System.out would just block each other. If you have several IO devices you need to write to, then create one or more threads for each (depending on the nature of the device). Computational tasks can sometimes be executed quicker in parallel, but:
A) You don't need more threads doing computation work than the number of cores your CPU has. If you have too many CPU-bound threads, they'd just waste time context switching.
B) You need to reason through your parallel processing, and plan it thoroughly. Give each thread a chunk of the computational tasks that need to be done, then combine the results of each of them.
I want to understand logic of thread pool, and below there is a simple incorrect and not full implementation of it:
class ThreadPool {
private BlockingQueue<Runnable> taskQueue;
public ThreadPool(int numberOfThreads) {
taskQueue = new LinkedBlockingQueue<Runnable>(10);
for (int i = 0; i < numberOfThreads; i++) {
new PoolThread(taskQueue).start();
}
}
public void execute(Runnable task) throws InterruptedException {
taskQueue.put(task);
}
}
class PoolThread extends Thread {
private BlockingQueue<Runnable> taskQueue;
public PoolThread(BlockingQueue<Runnable> queue) {
taskQueue = queue;
}
public void run() {
while (true) {
try {
taskQueue.take().run();
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
What if the number of threads to execute exceed the taskQueue size, will the calling thread be blocked?ThreadPoolExecutor - here we can see that in this case it's a work of rejected execution handler, but I still can not understand how does it work. Thanks in advance for any help.
EDIT:
set max size of blocking queue to 10
Imagine a group of bricklayers (your threads) building a wall, and a pile of bricks (your BlockingQueue).
Each bricklayer takes a brick from the pile, positions it, and then pick another one (taskQueue.take()) - until there are bricks in the pile, the bricklayers are kept busy.
A truck arrives from time to time, filling the pile with more bricks - but there is only a limited space on the pile, if there is no space the truck stops and wait until enough bricks have been used by the bricklayers.
As long there are enough bricks in the pile (more than the number of bricklayers) you can rest assured all bricklayers will have enough to work with - but when the pile start being empty the bricklayers will have to stop working until new bricks are delivered.
You have to pick a suitable number of bricklayers, to few and the truck will be often waiting for space in the pile, too many and most of them will be idle waiting for new bricks.
Implementation-wise, in general, Java gives you a threadpool, you rarely create your own -
ExecutorService threadExecutor = Executors.newFixedThreadPool( 3 );
and then you call:
threadExecutor.submit(Runnable...);
to add a task to the queue.
What if the number of threads to execute exceed the taskQueue size, will the calling thread be blocked?
The size of the queue is the number of tasks which are NOT running. Typically it will be empty even when the threads are busy. Having a queue length which matches the number of threads has no significance and nothing special happens at this point.
here we can see that in this case it's a work of rejected execution handler
The rejection handler is only called if the queue is full. Your queue has no limit so it wouldn't be called even if you supported this feature.
However, if it did have a limit and it supported this feature, the typical behaviour is to throw an exception. You can make it do other things such as block, have the current thread run the task (which is my preference) or ignore the task.
I still can not understand how does it work.
When you offer() a task to a queue, it return false if the queue could not accept it. When this happens call the rejected execution handler.
Joshua Bloch's "Effective Java", Item 51 is not about depending on the thread scheduler as well as not keeping threads unnecessarily in the runnable state. Quoted text:
The main technique for keeping the number of runnable threads down is to have each thread
do a small amount of work and then wait for some condition using Object.wait or for some
time to elapse using Thread.sleep. Threads should not busy-wait, repeatedly checking a data
structure waiting for something to happen. Besides making the program vulnerable to the
vagaries of the scheduler, busy-waiting can greatly increase the load on the processor,
reducing the amount of useful work that other processes can accomplish on the same machine.
And then goes on to show a microbenchmark of a busy wait vs using signals properly. In the book, the busy wait does 17 round trips/s whereas the wait/notify version does 23,000 round trips per second.
However, when I tried the same benchmark on JDK 1.6, I see just the opposite - the busy wait does 760K roundtrips/second whereas the wait/notify version does 53.3K roundtrips/s - that is, wait/notify should have been ~1400 times faster, but turns out to be ~13 times slower?
I understand the busy waits aren't good and signalling is still better - cpu utilization is ~50% on the busy wait version whereas it stays at ~30% on the wait/notify version - but is there something that explains the numbers?
If it helps, I'm running JDK1.6 (32 bit) on Win 7 x64 (core i5).
UPDATE: Source below. To run the busy work bench, change the base class of PingPongQueue to BusyWorkQueue
import java.util.LinkedList;
import java.util.List;
abstract class SignalWorkQueue {
private final List queue = new LinkedList();
private boolean stopped = false;
protected SignalWorkQueue() { new WorkerThread().start(); }
public final void enqueue(Object workItem) {
synchronized (queue) {
queue.add(workItem);
queue.notify();
}
}
public final void stop() {
synchronized (queue) {
stopped = true;
queue.notify();
}
}
protected abstract void processItem(Object workItem)
throws InterruptedException;
private class WorkerThread extends Thread {
public void run() {
while (true) { // Main loop
Object workItem = null;
synchronized (queue) {
try {
while (queue.isEmpty() && !stopped)
queue.wait();
} catch (InterruptedException e) {
return;
}
if (stopped)
return;
workItem = queue.remove(0);
}
try {
processItem(workItem); // No lock held
} catch (InterruptedException e) {
return;
}
}
}
}
}
// HORRIBLE PROGRAM - uses busy-wait instead of Object.wait!
abstract class BusyWorkQueue {
private final List queue = new LinkedList();
private boolean stopped = false;
protected BusyWorkQueue() {
new WorkerThread().start();
}
public final void enqueue(Object workItem) {
synchronized (queue) {
queue.add(workItem);
}
}
public final void stop() {
synchronized (queue) {
stopped = true;
}
}
protected abstract void processItem(Object workItem)
throws InterruptedException;
private class WorkerThread extends Thread {
public void run() {
final Object QUEUE_IS_EMPTY = new Object();
while (true) { // Main loop
Object workItem = QUEUE_IS_EMPTY;
synchronized (queue) {
if (stopped)
return;
if (!queue.isEmpty())
workItem = queue.remove(0);
}
if (workItem != QUEUE_IS_EMPTY) {
try {
processItem(workItem);
} catch (InterruptedException e) {
return;
}
}
}
}
}
}
class PingPongQueue extends SignalWorkQueue {
volatile int count = 0;
protected void processItem(final Object sender) {
count++;
SignalWorkQueue recipient = (SignalWorkQueue) sender;
recipient.enqueue(this);
}
}
public class WaitQueuePerf {
public static void main(String[] args) {
PingPongQueue q1 = new PingPongQueue();
PingPongQueue q2 = new PingPongQueue();
q1.enqueue(q2); // Kick-start the system
// Give the system 10 seconds to warm up
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
}
// Measure the number of round trips in 10 seconds
int count = q1.count;
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
}
System.out.println(q1.count - count);
q1.stop();
q2.stop();
}
}
In your test, the queue gets new items continuously, therefore the busy-wait does very little actual waiting.
If the queue get one new item every 1ms, you can see the busy-wait will spend most time burning CPU for nothing. It will slow down other part of the application.
So it depends. If you busy wait on an user input, that is definitely wrong; while the busy-wait in lockless datastructures like AtomicInteger is definitely good.
Yes, busy wait will respond more quickly and execute more loops, but I think the point was that it puts an disproportionally heavier load on the entire system.
Try running 1000 busy wait threads vs 1000 wait/notify threads and check your total throughput.
I think the difference you observed is probably sun re-optimizing the compiler for what people do rather than what people should do. Sun does that all the time. The original benchmark in the book may have even been due to some scheduler bug that Sun fixed--with that ratio it certainly sounds wrong.
It's depends on the amount of threads and the degree of conflicts: Busy waits are bad, if happens often and/or consume many CPU cycles.
But atomic Integers (AtomicInteger, AtomicIntegerArray ...) are better than synchronzing an Integer or int[], even the thread also perfom busy waits.
Use the java.util.concurrent package and in your case ConcurrentLinkedQueueas often as possible
Busy waiting is not always a bad thing. "Proper" (at the low-level) way of doing things - using Java synchronization primitives - carries an overhead, oftentimes significant, of bookkeeping, necessary to implement general-purpose mechanisms, performing fairly well in most scenarios. Busy waiting, on the other hand, is very lightweight, and in some situations can be quite an improvement over the one-size-fits-all synchronization. While synchronization based solely on busy-waiting is definitely a no-no in any general setting, it's ocassionaly quite useful. It's true not only for Java - spinlocks (fancy name for busy-waiting based locks) are widely used in database servers, for instance.
In fact, if you take a walk through java.util.concurrent package sources, you'll find many places containing "tricky", seemingly fragile code. I find SynchronousQueue a nice example (you can take a look at the source in JDK distribution or here, both OpenJDK and Oracle seem to use the same implementation). Busy waiting is used as an optimization - after certain amount of "spins", the thread goes into proper "sleep". Apart from that, it has some other niceties as well - volatile piggybacking, spin treshold dependant on number of CPUs etc. It's really... illuminating, in that it shows what it takes to implement efficient low-level concurrency. Even better, the code itself is really clean, well-documented and high-quality in general.