Computing map: computing value ahead of time - java

I have a computing map (with soft values) that I am using to cache the results of an expensive computation.
Now I have a situation where I know that a particular key is likely to be looked up within the next few seconds. That key is also more expensive to compute than most.
I would like to compute the value in advance, in a minimum-priority thread, so that when the value is eventually requested it will already be cached, improving the response time.
What is a good way to do this such that:
I have control over the thread (specifically its priority) in which the computation is performed.
Duplicate work is avoided, i.e. the computation is only done once. If the computation task is already running then the calling thread waits for that task instead of computing the value again (FutureTask implements this. With Guava's computing maps this is true if you only call get but not if you mix it with calls to put.)
The "compute value in advance" method is asynchronous and idempotent. If a computation is already in progress it should return immediately without waiting for that computation to finish.
Avoid priority inversion, e.g. if a high-priority thread requests the value while a medium-priority thread is doing something unrelated but the the computation task is queued on a low-priority thread, the high-priority thread must not be starved. Maybe this could be achieved by temporarily boosting the priority of the computing thread(s) and/or running the computation on the calling thread.
How could this be coordinated between all the threads involved?
Additional info
The computations in my application are image filtering operations, which means they are all CPU-bound. These operations include affine transforms (ranging from 50µs to 1ms) and convolutions (up to 10ms.) Of course the effectiveness of varying thread priorities depends on the ability of the OS to preempt the larger tasks.

You can arrange for "once only" execution of the background computation by using a Future with the ComputedMap. The Future represents the task that computes the value. The future is created by the ComputedMap and at the same time, passed to an ExecutorService for background execution. The executor can be configured with your own ThreadFactory implementation that creates low priority threads, e.g.
class LowPriorityThreadFactory implements ThreadFactory
{
public Thread newThread(Runnable r) {
Tread t = new Thread(r);
t.setPriority(MIN_PRIORITY);
return t;
}
}
When the value is needed, your high-priority thread then fetches the future from the map, and calls the get() method to retrieve the result, waiting for it to be computed if necessary. To avoid priority inversion you add some additional code to the task:
class HandlePriorityInversionTask extends FutureTask<ResultType>
{
Integer priority; // non null if set
Integer originalPriority;
Thread thread;
public ResultType get() {
if (!isDone())
setPriority(Thread.currentThread().getPriority());
return super.get();
}
public void run() {
synchronized (this) {
thread = Thread.currentThread();
originalPriority = thread.getPriority();
if (priority!=null) setPriority(priority);
}
super.run();
}
protected synchronized void done() {
if (originalPriority!=null) setPriority(originalPriority);
thread = null;
}
void synchronized setPriority(int priority) {
this.priority = Integer.valueOf(priority);
if (thread!=null)
thread.setPriority(priority);
}
}
This takes care of raising the priority of the task to the priority of the thread calling get() if the task has not completed, and returns the priority to the original when the task completes, normally or otherwise. (To keep it brief, the code doesn't check if the priority is indeed greater, but that's easy to add.)
When the high priority task calls get(), the future may not yet have begun executing. You might be tempted to avoid this by setting a large upper bound on the number of threads used by the executor service, but this may be a bad idea, since each thread could be running at high priority, consuming as much cpu as it can before the OS switches it out. The pool should probably be the same size as the number of hardware threads, e.g. size the pool to Runtime.availableProcessors(). If the task has not started executing, rather than wait for the executor to schedule it (which is a form of priority inversion, since your high priority thread is waiting for the low-priority threads to complete) then you may choose to cancel it from the current executor and re-submit on an executor running only high-priority threads.

One common way of coordinating this type of situation is to have a map whose values are FutureTask objects. So, stealing as an example some code I wrote from a web server of mine, the essential idea is that for a given parameter, we see if there is already a FutureTask (meaning that the calculation with that parameter has already been scheduled), and if so we wait for it. In this example, we otherwise schedule the lookup, but that could be done elsewhere with a separate call if that was desirable:
private final ConcurrentMap<WordLookupJob, Future<CharSequence>> cache = ...
private Future<CharSequence> getOrScheduleLookup(final WordLookupJob word) {
Future<CharSequence> f = cache.get(word);
if (f == null) {
Callable<CharSequence> ex = new Callable<CharSequence>() {
public CharSequence call() throws Exception {
return doCalculation(word);
}
};
Future<CharSequence> ft = executor.submit(ex);
f = cache.putIfAbsent(word, ft);
if (f != null) {
// somebody slipped in with the same word -- cancel the
// lookup we've just started and return the previous one
ft.cancel(true);
} else {
f = ft;
}
}
return f;
}
In terms of thread priorities: I wonder if this will achieve what you think it will? I don't quite understand your point about raising the priority of the lookup above the waiting thread: if the thread is waiting, then it's waiting, whatever the relative priorities of other threads... (You might want to have a look at some articles I've written on thread priorities and thread scheduling, but to cut a long story short, I'm not sure that changing the priority will necessarily buy you what you're expecting.)

I suspect that you are heading down the wrong path by focusing on thread priorities. Usually the data that a cache holds is expensive to compute due I/O (out-of-memory data) vs. CPU bound (logic computation). If you're prefetching to guess a user's future action, such as looking at unread emails, then it indicates to me that your work is likely I/O bound. This means that as long as thread starvation does not occur (which schedulers disallow), playing games with thread priority won't offer much of a performance improvement.
If the cost is an I/O call then the background thread is blocked waiting for the data to arrive and processing that data should be fairly cheap (e.g. deserialization). As the change in thread priority won't offer much of a speed-up, performing the work asynchronously on background threadpool should be sufficient. If the cache miss penalty is too high, then using multiple layers of caching tends to help to further reduce the user perceived latency.

As an alternative to thread priorities, you could perform a low-priority task only if no high-priority tasks are in progress. Here's a simple way to do that:
AtomicInteger highPriorityCount = new AtomicInteger();
void highPriorityTask() {
highPriorityCount.incrementAndGet();
try {
highPriorityImpl();
} finally {
highPriorityCount.decrementAndGet();
}
}
void lowPriorityTask() {
if (highPriorityCount.get() == 0) {
lowPriorityImpl();
}
}
In your use case, both Impl() methods would call get() on the computing map, highPriorityImpl() in the same thread and lowPriorityImpl() in a different thread.
You could write a more sophisticated version that defers low-priority tasks until the high-priority tasks complete and limits the number of concurrent low-priority tasks.

Related

Configuring threadpool size in a service

I am writing a service which takes two urls urlA and urlB to fetch two integers a and b. The service returns the sum of a and b.
In its most simple form the service works like this:
public Integer getSumFromUrls(String urlA, String urlB) {
Integer a = fetchFromUrl(urlA);
Integer b = fetchFromUrl(urlB);
return a + b;
}
Here fetchFromUrl is a synchronous operation, so it blocks the processing thread unless the value is available. To make things efficient I would rather use ExecutorService to schedule the two fetches and return when the results are available. Here is the changed code (ignore the syntactic nuances)
public Integer getSumFromUrls(String urlA, String urlB) {
Future<Integer> aFuture = Executors.newSingleThreadScheduledExecutor().submit(new Callable<Integer>() {
public Integer call() {
return fetchFromUrl(urlA);
}
});
Future<Integer> bFuture = Executors.newSingleThreadScheduledExecutor().submit(new Callable<Integer>() {
public Integer call() {
return fetchFromUrl(urlB);
}
});
Integer a = aFuture.get();
Integer b = bFuture.get();
return a + b;
}
Here, I have created single thread executors to execute the requests concurrently.
Since, this code would be running in the context of a web service, I should probably be not creating the single thread executors locally inside the function but should rather use some N sized thread pools shared across the requests.
My questions here are:
Is the above understanding (italicised part) correct?
If yes, how should I choose the optimum size of the thread pool. Should it be a function of the thread pool size of my service container, or request throughput or both etc?
Is there a better way of optimising this scenario so that service threads are not blocked on doing IO most of the time.
Note: The details provided in this question are not the completely real scenarios but are representative of the same set of complexities required to answer the question.
If your function getSumFromUrls executed in every time a new request comes that means it will create a new ThreadPool each time and submit the task. Suppose if you have 1000 request hit at any point of time then 1000 ThreadPool will be created and which eventually create 1000s of thread. I believe if you create 1000s or more of thread at any point of time it will be an issue for your application. Generally at any point of time number of the active thread should be about/equal to the number of available core size of the system , however that totally depends on the use cases suppose your task is CPU intensive then number of threads should be as CPU core size but if your task is IO intensive then you can have more number of thread. More number of threads means more number of context switch will happen and which has it own cost and may degrade application performance.
Is the above understanding (italicised part) correct?
-> Yes.
If yes, how should I choose the optimum size of the thread pool. Should it be a function of the thread pool size of my service container, or request throughput or both etc?
-> As I have mentioned above it depends on the which type of task you are doing. You should use common thread pool to execute those task.
Is there a better way of optimizing this scenario so that service threads are not blocked on doing IO most of the time?
-> You should benchmark thread pool size and operating system automatically assign the CPU to another thread when a thread doing IO
operation and do not need the CPU.

Java - Priority in semaphore

I have multiple threads accessing an external resource – a broswer. But only one thread can access it at a time. So, I am using a semaphore to synchronise them. However, one thread, which takes input from the GUI and then access the browser for the results, should have priority over other threads and I am not sure how to use a semaphore to achieve it.
I was thinking that every thread after acquiring the semaphore checks if there is the priority thread waiting in the queue and if yes, then it releases it and waits again. Only the priority thread doesn't release it once it is acquired.
Is this a good solution or is there anything else in Java API I could use?
There're no synchronization primitives in Java that would allow you to prioritise one thread over others in the manner you want.
But you could use another approach to solving your problem. Instead of synchronizing threads, make them produce small tasks (for instance, Runnable objects) and put those tasks into a PriorityBlockingQueue with tasks from the GUI thread having the highest priority. A single working thread will poll tasks from this queue and execute them. That would guarantee both mutual exclusion and prioritization.
There're special constructors in ThreadPoolExecutor that accept blocking queues. So, all you need is such an executor with a single thread provided with your PriorityBlockingQueue<Runnable>. Then submit your tasks to this executor and it will take care of the rest.
Should you decide to choose this approach, this post might be of interest to you: How to implement PriorityBlockingQueue with ThreadPoolExecutor and custom tasks
Here's a simple, no frills answer. This is similar to how a read/write lock works, except that every locker has exclusive access (normally all readers proceed in parallel). Note that it does not use Semaphore because that is almost always the wrong construct to use.
public class PrioLock {
private boolean _locked;
private boolean _priorityWaiting;
public synchronized void lock() throws InterruptedException {
while(_locked || _priorityWaiting) {
wait();
}
_locked = true;
}
public synchronized void lockPriority() throws InterruptedException {
_priorityWaiting = true;
try {
while(_locked) {
wait();
}
_locked = true;
} finally {
_priorityWaiting = false;
}
}
public synchronized void unlock() {
_locked = false;
notifyAll();
}
}
You would use it like one of the Lock types in java.util.concurrent:
Normal threads:
_prioLock.lock();
try {
// ... use resource here ...
} finally {
_prioLock.unlock();
}
"Priority" thread:
_prioLock.lockPriority();
try {
// ... use resource here ...
} finally {
_prioLock.unlock();
}
UPDATE:
Response to comment regarding "preemptive" thread interactions:
In the general sense, you cannot do that. you could build custom functionality which added "pause points" to the locked section which would allow a low priority thread to yield to a high priority thread, but that would be fraught with peril.
The only thing you could realistically do is interrupt the working thread causing it to exit the locked code block (assuming that your working code responded to interruption). This would allow a high priority thread to proceed quicker at the expense of the low priority thread losing in progress work (and you might have to implement rollback logic as well).
in order to implement this you would need to:
record the "current thread" when locking succeeds.
in lockPriority(), interrupt the "current thread" if found
implement the logic between the lock()/unlock() (low priority) calls so that:
it responds to interruption in a reasonable time-frame
it implements any necessary "rollback" code when interrupted
potentially implement "retry" logic outside the lock()/unlock() (low priority) calls in order to re-do any work lost when interrupted
You are mixing up concepts here.
Semaphores are just one of the many options to "synchronize" the interactions of threads. They have nothing to do with thread priorities and thread scheduling.
Thread priorities, on the other hand are a topic on its own. You have means in Java to affect them; but the results of such actions heavily depend on the underlying platform/OS; and the JVM implementation itself. In theory, using those priorities is easy, but as said; reality is more complicated.
In other words: you can only use your semaphore to ensure that only one thread is using your queue at one point in time. It doesn't help at all with ensuring that your GUI-reading thread wins over other threads when CPU cycles become a problem. But if your lucky, the answer to your problem will be simple calls to setPriority(); using different priorities.

How to configure a single-threaded ForkJoinPool?

Is it possible to configure ForkJoinPool to use 1 execution thread?
I am executing code that invokes Random inside a ForkJoinPool. Every time it runs, I end up with different runtime behavior, making it difficult to investigate regressions.
I would like the codebase to offer "debug" and "release" modes. "debug" mode would configure Random with a fixed seed, and ForkJoinPool with a single execution thread. "release" mode would use system-provided Random seeds and use the default number of ForkJoinPool threads.
I tried configuring ForkJoinPool with a parallelism of 1, but it uses 2 threads (main and a second worker thread). Any ideas?
So, it turns out I was wrong.
When you configure a ForkJoinPool with parallelism set to 1, only one thread executes the tasks. The main thread is blocked on ForkJoin.get(). It doesn't actually execute any tasks.
That said, it turns out that it is really tricky providing deterministic behavior. Here are some of the problems I had to correct:
ForkJoinPool was executing tasks using different worker threads (with different names) if the worker thread became idle long enough. For example, if the main thread got suspended on a debugging breakpoint, the worker thread would become idle and shut down. When I would resume execution, ForkJoinThread would spin up a new worker thread with a different name. To solve this, I had to provide a custom ForkJoinWorkerThreadFactory implementation that ensures only one thread runs at a time, and that its name is hard-coded. I also had ensure that my code was returning the same Random instance even if a worker thread shut down and came back again.
Collections with non-deterministic iteration order such as HashMap or HashSet led to elements grabbing random numbers in a different order on every run. I corrected this by using LinkedHashMap and LinkedHashSet.
Objects with non-deterministic hashCode() implementations, such as Enum.hashCode(). I forget what problems this caused but I corrected it by calculating the hashCode() myself instead of relying on the built-in method.
Here is a sample implementation of ForkJoinWorkerThreadFactory:
class MyForkJoinWorkerThread extends ForkJoinWorkerThread
{
MyForkJoinWorkerThread(ForkJoinPool pool)
{
super(pool);
// Change thread name after ForkJoinPool.registerWorker() does the same
setName("DETERMINISTIC_WORKER");
}
}
ForkJoinWorkerThreadFactory factory = new ForkJoinWorkerThreadFactory()
{
private WeakReference<Thread> currentWorker = new WeakReference<>(null);
#Override
public synchronized ForkJoinWorkerThread newThread(ForkJoinPool pool)
{
// If the pool already has a live thread, wait for it to shut down.
Thread thread = currentWorker.get();
if (thread != null && thread.isAlive())
{
try
{
thread.join();
}
catch (InterruptedException e)
{
log.error("", e);
}
}
ForkJoinWorkerThread result = new MyForkJoinWorkerThread(pool);
currentWorker = new WeakReference<>(result);
return result;
}
};
Main thread is always the first thread your application will create. So when you create a ForkJoinPool with parallelism of 1, you are creating another thread. Effectively there will be two threads in the application now ( because you created a pool of threads ).
If you need only one thread that is Main, you can execute your code in sequence ( and not in parallel at all ).

Java Executor with throttling/throughput control

I'm looking for a Java Executor that allows me to specify throttling/throughput/pacing limitations, for example, no more than say 100 tasks can be processed in a second -- if more tasks get submitted they should get queued and executed later. The main purpose of this is to avoid running into limits when hitting foreign APIs or servers.
I'm wondering whether either base Java (which I doubt, because I checked) or somewhere else reliable (e.g. Apache Commons) provides this, or if I have to write my own. Preferably something lightweight. I don't mind writing it myself, but if there's a "standard" version out there somewhere I'd at least like to look at it first.
Take a look at guavas RateLimiter:
A rate limiter. Conceptually, a rate limiter distributes permits at a
configurable rate. Each acquire() blocks if necessary until a permit
is available, and then takes it. Once acquired, permits need not be
released. Rate limiters are often used to restrict the rate at which
some physical or logical resource is accessed. This is in contrast to
Semaphore which restricts the number of concurrent accesses instead of
the rate (note though that concurrency and rate are closely related,
e.g. see Little's Law).
Its threadsafe, but still #Beta. Might be worth a try anyway.
You would have to wrap each call to the Executor with respect to the rate limiter. For a more clean solution you could create some kind of wrapper for the ExecutorService.
From the javadoc:
final RateLimiter rateLimiter = RateLimiter.create(2.0); // rate is "2 permits per second"
void submitTasks(List<Runnable> tasks, Executor executor) {
for (Runnable task : tasks) {
rateLimiter.acquire(); // may wait
executor.execute(task);
}
}
The Java Executor doesn't offer such a limitation, only limitation by amount of threads, which is not what you are looking for.
In general the Executor is the wrong place to limit such actions anyway, it should be at the moment where the Thread tries to call the outside server. You can do this for example by having a limiting Semaphore that threads wait on before they submit their requests.
Calling Thread:
public void run() {
// ...
requestLimiter.acquire();
connection.send();
// ...
}
While at the same time you schedule a (single) secondary thread to periodically (like every 60 seconds) releases acquired resources:
public void run() {
// ...
requestLimiter.drainPermits(); // make sure not more than max are released by draining the Semaphore empty
requestLimiter.release(MAX_NUM_REQUESTS);
// ...
}
no more than say 100 tasks can be processed in a second -- if more
tasks get submitted they should get queued and executed later
You need to look into Executors.newFixedThreadPool(int limit). This will allow you to limit the number of threads that can be executed simultaneously. If you submit more than one thread, they will be queued and executed later.
ExecutorService threadPool = Executors.newFixedThreadPool(100);
Future<?> result1 = threadPool.submit(runnable1);
Future<?> result2 = threadPool.submit(runnable2);
Futurte<SomeClass> result3 = threadPool.submit(callable1);
...
Snippet above shows how you would work with an ExecutorService that allows no more than 100 threads to be executed simultaneously.
Update:
After going over the comments, here is what I have come up with (kinda stupid). How about manually keeping a track of threads that are to be executed ? How about storing them first in an ArrayList and then submitting them to the Executor based on how many threads have already been executed in the last one second.
So, lets say 200 tasks have been submitted into our maintained ArrayList, We can iterate and add 100 to the Executor. When a second passes, we can add few more threads based on how many have completed in theExecutor and so on
Depending on the scenario, and as suggested in one of the previous responses, the basic functionalities of a ThreadPoolExecutor may do the trick.
But if the threadpool is shared by multiple clients and you want to throttle, to restrict the usage of each one of them, making sure that one client won't use all the threads, then a BoundedExecutor will do the work.
More details can be found in the following example:
http://jcip.net/listings/BoundedExecutor.java
Personally I found this scenario quite interesting. In my case, I wanted to stress that the interesting phase to throttle is the consuming side one, as in classical Producer/Consumer concurrent theory. That's the opposite of some of the suggested answers before. This is, we don't want to block the submitting thread, but block the consuming threads based in a rate (tasks/second) policy. So, even if there are tasks ready in the queue, executing/consuming Threads may block waiting to meet the throtle policy.
That said, I think a good candidate would be the Executors.newScheduledThreadPool(int corePoolSize). This way you would need a simple queue in front of the executor (a simple LinkedBlockingQueue would suit), and then schedule a periodic task to pick actual tasks from the queue (ScheduledExecutorService.scheduleAtFixedRate). So, is not an straightforward solution, but it should perform goog enough if you try to throttle the consumers as discussed before.
Can limit it inside Runnable:
public static Runnable throttle (Runnable realRunner, long delay) {
Runnable throttleRunner = new Runnable() {
// whether is waiting to run
private boolean _isWaiting = false;
// target time to run realRunner
private long _timeToRun;
// specified delay time to wait
private long _delay = delay;
// Runnable that has the real task to run
private Runnable _realRunner = realRunner;
#Override
public void run() {
// current time
long now;
synchronized (this) {
// another thread is waiting, skip
if (_isWaiting) return;
now = System.currentTimeMillis();
// update time to run
// do not update it each time since
// you do not want to postpone it unlimited
_timeToRun = now+_delay;
// set waiting status
_isWaiting = true;
}
try {
Thread.sleep(_timeToRun-now);
} catch (InterruptedException e) {
e.printStackTrace();
} finally {
// clear waiting status before run
_isWaiting = false;
// do the real task
_realRunner.run();
}
}};
return throttleRunner;
}
Take from JAVA Thread Debounce and Throttle

What is the latency of a BlockingQueue's take() method?

I'd like to understand how take() works and if it's a suitable method to consume "fastly" elements that are pushed on a queue.
Note that, for the sake of understanding how it works, I'm not considering here the observer pattern: I know that I could use that pattern to "react quicly" to events but that's not what my question is about.
For example if I have a BlockingQueue (mostly empty) and a thread "stuck" waiting for an element to be pushed on that queue so that it can be consumed, what would be a good way to minimize the time spent (reduce the latency) between the moment an element is pushed on the queue and the moment it is consumed?
For example what's the difference between a thread doing this:
while( true ) {
elem = queue.peek();
if ( elem == null ) {
Thread.sleep( 25 ); // prevents busy-looping
} else {
... // do something here
}
}
and another one doing this:
while ( true ) {
elem = queue.take();
... // do something with elem here
}
(I take it that to simplify things we can ignore discussing about exceptions here!?)
What goes on under the hood when you call take() and the queue is empty? The JVM somehow has to "sleep" the thread under the hood because it can't be busy-looping constantly checking if there's something on the queue? Is take() using some CAS operation under the hood? And if so what determines how often take() does call that CAS operation?
What when something suddenly makes it to the queue? How's that thread blocked on take() somehow "notified" that it should act promptly?
Lastly, is it "common" to have one thread "stuck" on take() on a BlockingQueue for the lifetime of the application?
It's all one big question related to how the blocking take() works and I take it that answering my various questions (at least the one that makes sense) would help me understand all this better.
Internally, take waits on the notEmpty condition, which is signaled in the insert method; in other words, the waiting thread goes to sleep, and wakes up on insert. This should be fast.
Some blocking queues, e.g. ArrayBlockingQueue and SynchronousQueue, have a constructor that accepts the queue's fairness property; passing in true should prevent threads from getting stuck on take, otherwise this is a possibility. (This parameter specifies whether the underlying ReentrantLock is fair.)
Well, here's the implementation of LinkedBlockingQueue<E>.take() :
public E take() throws InterruptedException {
E x;
int c = -1;
final AtomicInteger count = this.count;
final ReentrantLock takeLock = this.takeLock;
takeLock.lockInterruptibly();
try {
while (count.get() == 0) {
notEmpty.await();
}
x = dequeue();
c = count.getAndDecrement();
if (c > 1)
notEmpty.signal();
} finally {
takeLock.unlock();
}
if (c == capacity)
signalNotFull();
return x;
}
When the queue is empty, notEmpty.await() is called, which :
Causes the current thread to wait until it is signalled or
interrupted.
The lock associated with this Condition is atomically released and the
current thread becomes disabled for thread scheduling purposes and
lies dormant until one of four things happens:
Some other thread invokes the signal method for this Condition and the
current thread happens to be chosen as the thread to be awakened; or
Some other thread invokes the signalAll method for this Condition; or
Some other thread interrupts the current thread, and interruption of
thread suspension is supported; or
A "spurious wakeup" occurs.
When another threads puts something in the queue, it calls signal, which awakes one of the threads waiting to consume items from this queue. This should work faster then your peek/sleep loop.
You can assume that take() will be notified that it can wake as soon your OS can pass such a signal between threads. Note: your OS will be involved worst case. Typically this is 1 - 10 micro-seconds, and in rare case 100 or even 1000 micro-seconds in very rare cases. Note: Thread.sleep will wait for a minimum of 1000 microseconds and 25 milli-seconds is 25,000 micro-seconds so I would hope the difference is obvious to you.
The only really way of avoiding rare but long context switches is to busy wait on a affinity lock CPU. (This allocates a CPU to your thread) If your application is that latency sensible, simpler solution is to not pass the work between threads at all. ;)
Since two threads are involved, the peek/sleep with a hypothetical micro/nano-sleep implementation would not differ too much from take() since they both involve passing information from one thread to the next via main memory (using volatile write/read and a healthy amount of CAS), unless the JVMs find other ways to do inter-thread synchronization. You can try to implement a benchmark using two BlockingQueues and two threads who each act as producer for one queue and consumer for the other and move a token back and forth, taking it from one queue and offering to the next. Then you could see how fast they can produce/consume and compare that to peek/sleep. I guess performance depends a lot on the amount work spent on each token (in this case zero, so we measure pure overhead) and the distance of CPU to memory. In my experience, single CPUs come out way ahead of multi-socket machines.
The difference is that the first thread sleeps for up to 25ms too long, whereas the second thread doesn't waste any time at all.

Categories

Resources