Can somebody tell me how I can find out "how many threads are in deadlock condition" in a Java multi-threading application? What is the way to find out the list of deadlocked threads?
I heard about Thread Dump and Stack Traces, but I don't know how to implement it.
I also want to know what new features have been introduced in Java 5 for Threading?
Please let me know with your comments and suggestions.
Way of obtaining thread dumps:
ctrl-break (Windows) or ctrl-\, possibly ctrl-4 and kill -3 on Linux/UNIX
jstack and your process id (use jps)
jconsole or visualvm
just about any debugger
Major new threading features in J2SE 5.0 (released 2004, in End Of Service Life Period):
java.util.concurrent
New Java Memory Model.
use kill -3 on the process id
this will print out to the console a thread dump and an overview of thread contention
From within your program, the ThreadMXBean class has a method findMonitorDeadlockedThreads(), as well as methods for querying the current stack traces of threads. From the console in Windows, doing Ctrl+Break gives you a list of stack traces and indicates deadlocked threads.
As well as some tweaks to the Java memory model that tidy up some concurrency "loopholes", the most significant feature underlyingly in Java 5 is that it exposes Compare-And-Set (CAS) operations to the programmer. Then, on the back of this, a whole raft of concurrency utilities are provided in the platform. There's really a whole host of stuff, but they include:
concurrent collections
executors, which effectively allow you to implement things such as thread pools
other common concurrency constructs (queues, latches, barriers)
atomic variables
You may be interested in some tutorials I've written on many of the Java 5 concurrency features.
If you want to learn about the new concurrent features in Java 5 you could do a lot worse than getting a copy of Java Concurrency in Practice by Brian Goetz (Brian Goetz and a number of the coauthors designed the Java 5 concurrency libraries). It is both highly readable and authoritative , and combining practical examples and theory.
The executive summary of the new concurrent utilities is as follows:
Task Scheduling Framework - The Executor framework is a framework for standardizing invocation, scheduling, execution, and control of asynchronous tasks according to a set of execution policies. Implementations are provided that allow tasks to be executed within the submitting thread, in a single background thread (as with events in Swing), in a newly created thread, or in a thread pool, and developers can create of Executor supporting arbitrary execution policies. The built-in implementations offer configurable policies such as queue length limits and saturation policy which can improve the stability of applications by preventing runaway resource consumption.
Concurrent Collections - Several new Collections classes have been added, including the new Queue and BlockingQueue interfaces, and high-performance, concurrent implementations of Map, List, and Queue.
Atomic Variables - Classes for atomically manipulating single variables (primitive types or references), providing high-performance atomic arithmetic and compare-and-set methods. The atomic variable implementations in java.util.concurrent.atomic offer higher performance than would be available by using synchronization (on most platforms), making them useful for implementing high-performance concurrent algorithms as well as conveniently implementing counters and sequence number generators.
Synchronizers - General purpose synchronization classes, including semaphores, mutexes, barriers, latches, and exchangers, which facilitate coordination between threads.
Locks - While locking is built into the Java language via the synchronized keyword, there are a number of inconvenient limitations to built-in monitor locks. The java.util.concurrent.locks package provides a high-performance lock implementation with the same memory semantics as synchronization, but which also supports specifying a timeout when attempting to acquire a lock, multiple condition variables per lock, non-lexically scoped locks, and support for interrupting threads which are waiting to acquire a lock.
Nanosecond-granularity timing - The System.nanoTime method enables access to a nanosecond-granularity time source for making relative time measurements, and methods which accept timeouts (such as the BlockingQueue.offer, BlockingQueue.poll, Lock.tryLock, Condition.await, and Thread.sleep) can take timeout values in nanoseconds. The actual precision of System.nanoTime is platform-dependent.
Related
I was reading on threading and learned about fork/join API.
I found that you can either run threads with the commonPool being the default pool managing the threads, or I can submit the threads to a newly created ForkJoinPool.
The difference between the two is as follows, to my understanding:
The commonPool is the main pool created statically (where some pool methods don't work as they normally do with other pools like shutting it down), and is used mainly for the application to run.
The number of parallelism in the default/commonPool is the number of cores - 1, where the default number of parallelism of a newly created pool = number of cores (or the number specified by system property parallelism - I'm ignoring the fully qualified system property key name -).
Based on the documentation, the commonPool is fine for most uses.
This all boils down to my question:
When should I use the common pool? And why so? When should I create a new pool? And why so?
Short Story
The answer, like most things in software engineering, is: "It depends".
Pros of using the common pool
If you look at this wonderful article:
According to Oracle’s documentation, using the predefined common pool reduces resource consumption, since this discourages the creation of a separate thread pool per task.
and
Using the fork/join framework can speed up processing of large tasks,
but to achieve this outcome, some guidelines should be followed:
Use as few thread pools as possible – in most cases, the best decision
is to use one thread pool per application or system
Use the default
common thread pool, if no specific tuning is needed
Use a reasonable threshold for splitting ForkJoingTask into subtasks
Avoid any blocking in your ForkJoingTasks
Pros of using dedicated pools
However, there are also some arguments AGAINST following this approach:
Dedicated Pool for Complex Applications
Having a dedicated pool per logical working unit in a complex application is sometimes the preferred approach. Imagine an application that:
Takes in a lot of events and groups them (that can be done in parallel)
Then workers do the work (that can be done in parallel as well)
Finally, some cleanup workers do some cleanup (that can be done in parallel as well).
So your application has 3 logical work groups each of which might have its own demands for parallelism. (Keep in mind that this pool has parallelism set to something fairly low on most machines)
Better to not step on each other's toes, right? Note that this can scale up to a certain level, where it's recommended to have a separate microservice for each of these work units, but if for one reason or another you are not there already, then a dedicated forkJoinPool per logical work unit is not a bad idea.
Other libraries
If your app's code has only one place where you want parallelism, you don't have a guarantee that some developer wouldn't pull some 3-rd party dependency which also relies on the common ForkJoinPool, and you still have two places where this pool is in demand. That might be okay for your use case, and it might not be, especially if your default pool's parallelism is 4 or below.
Imagine the situation when your app critical code (e.g event handling or saving data to a database) is having to compete for the common pool with some library which exports logs in parallel to some log sink.
Dedicated ForkJoinPool Makes Logging Neater
Additionally, the common forkJoinPool has a rather non-descriptive naming so if you are debugging or looking at logs, chances are you will have to sift through a ton of
ForkJoinPool.commonPool-worker-xx
In the situation described above, compare that with:
ForkJoinPool.grouping-worker-xx
ForkJoinPool.payload-handler-worker-xx
ForkJoinPool.cleanup-worker
Therefore you can see there is some benefit in logging cleanness when using a dedicated ForkJoinPool per logical work group.
TL;DR
Using the common ForkJoinPool has lower memory impact, less resources and thread creation and lower garbage collection demands. However, this approach might be insufficient for some use cases, as pointed above.
Using a dedicated ForkJoinPool per logical work unit in your application provides neater logging, is not a bad idea to use when you have low parallelism level (i.e not many cores), and when you want to avoid thread contention between logically different parts of your application. This, however, comes at a price of higher cpu utilization, higher memory overhead, and more thread creation.
Is there a difference between 'ReentrantLock' and 'synchronized' on how it's implemented on CPU level?
Or do they use the same 'CAS' approach?
If we are talking about ReentrantLock vs synchronized (also known as "intrinsic lock") then it's a good idea to look at Lock documentation:
All Lock implementations must enforce the same memory synchronization semantics as provided by the built-in monitor lock:
A successful lock operation acts like a successful monitorEnter
action
A successful unlock operation acts like a successful monitorExit
action
So in general consider that synchronized is just an easy-to-use and concise approach of locking. You can achieve exactly the same synchronization effects by writing code with ReentrantLock with a bit more code (but it offers more options and flexibility).
Some time ago ReentrantLock was way faster under certain conditions (high contention for example), but now Java uses different optimizations techniques (like lock coarsening and adaptive locking) to make performance differences in many typical scenarios barely visible to the programmer.
There was also done a great job to optimize intrinsic lock in low-contention cases (e.g. biased locking). Authors of Java platform do like synchronized keyword and intrinsic-locking approach, they want programmers do not fear to use this handy tool (and prevent possible bugs). That's why synchronized optimizations and "synchronization is slow" myth busting was such a big deal for Sun and Oracle.
"CPU-part" of the question:
synchronized uses a locking mechanism that is built into the JVM and MONITORENTER / MONITOREXIT bytecode instructions. So the underlying implementation is JVM-specific (that is why it is called intrinsic lock) and AFAIK usually (subject to change) uses a pretty conservative strategy: once lock is "inflated" after threads collision on lock acquiring, synchronized begin to use OS-based locking ("fat locking") instead of fast CAS ("thin locking") and do not "like" to use CAS again soon (even if contention is gone).
ReentrantLock implementation is based on AbstractQueuedSynchronizer and coded in pure Java (uses CAS instructions and thread descheduling which was introduced it Java 5), so it is more stable across platforms, offers more flexibility and tries to use fast CAS appoach for acquiring a lock every time (and OS-level locking if failed).
So, the main difference between these locks implementations in terms of performance is a lock acquiring strategy (which may not exist in specific JVM implementation or situation).
And there is no general answer which locking is better + it is a subject to change during the time and platforms. You should look at the specific problem and its nature to pick the most suitable solution (as usually in Java)
PS: you're pretty curious and I highly recommend you to look at HotSpot sources to go deeper (and to find out exact implementations for specific platform version). It may really help. Starting point is somewhere here: http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/runtime/synchronizer.cpp
The ReentrantLock class, which implements Lock, has the same concurrency and memory semantics as synchronized, but also adds features like lock polling, timed lock waits, and interruptible lock waits. Additionally, it offers far better performance under heavy contention.
Source
Above answer is extract from Brian Goetz's article. You should read entire article. It helped me to understand differences in both.
I have been doing some research in Google and cant quite get my head around the differences (if any) between concurrent and parallel programs in java. Some of the information I have looked at suggests no differences between both. Is this the case??
It depends on who is defining it. The people who created the Go programming language call code Concurrent if it is broken up into pieces which could be treated in parallel, whereas Parallelism implies that those pieces are actually running at the same time.
Since these are programming principles, the programming language has no bearing on how they are defined. However, Java 8 will have more features to enable both concurrency and parallelism without messing up your code too much. For example, code like this:
List<Integer> coolItemIds = new List<Integer>();
for(Item item : getItems())
{
if(item.isCool())
{
int itemId = item.getId();
coolItemIds.add(item);
}
}
... which is non-concurrent and non-parallel, could be written like this (my syntax is probably wrong, but hopefully you get the idea):
Iterable<Item> items = getItems();
Iterable<Item> coolItems = items.filter(item -> item.isCool());
Iterable<Integer> coolItemIds = coolItems.map(item -> item.getId());
The above code is written in a concurrent manner: none of the given code requires that the coolItems be filtered one at a time, or that you can only call getId() on one item at a time, or even that the items at the beginning of the list need to be filtered or mapped before items at the end. Depending on what type of Iterable is returned from getItems(), the given operations may or may not run in parallel, but the code you've written is concurrent.
Also of interest:
Concurrency is not Parallelism (presentation video)
Concurrency is not Parallelism? (Discussion on StackOverflow
I suppose it depends on your definitions, but my understanding goes roughly like this:
Concurrency refers to things happening in some unspecified order. Multitasking - executing multiple programs by interleaving instructions via time slicing - is an good way to think about this sense of concurrency.
Parallelism (or "true" parallelism) refers to things happening at literally the same time. This requires hardware support (coprocessors, multi-core processors, networked machines, etc.). All parallelism is concurrent, but not all concurrency is parallel.
As far as I'm aware, neither term is Java-specific, or has any Java-specific nuances.
Parallelization (or Parallelism or Parallel computing) is a form of computation in which many calculations are carried out simultaneously. In essence, if a CPU intensive problem can be divided in smaller, independent tasks, then those tasks can be assigned to different processors
Concurrency is more about multitasking which is executing many actions but not necessary CPU intensive problem.
I don't think the two terms have well-defined distinct meanings. They're both terms of art rather than technical terms.
That said, the way i interpret them is that something is concurrent if it can be done at the same time as other things, and parallel if it can be done by multiple threads at the same time. I take this usage largely from the JVM garbage collection documentation, which says things like
The concurrent mark sweep collector, also known as the concurrent collector or CMS, is targeted at applications that are sensitive to garbage collection pauses. It performs most garbage collection activity concurrently, i.e., while the application threads are running
and
CMS collector now uses multiple threads to perform the concurrent marking task in parallel on platforms with multiple processors.
Admittedly, this is a very specific context, and it is probably unwise to generalise from it.
If you program using threads (concurrent programming), it's not necessarily going to be executed as such (parallel execution), since it depends on whether the machine can handle several threads.
Here's a visual example. Threads on a non-threaded machine:
-- -- --
/ \
>---- -- -- -- -- ---->>
Threads on a threaded machine:
------
/ \
>-------------->>
The dashes represent executed code. As you can see, they both split up and execute separately, but the threaded machine can execute several separate pieces at once.
Please refer this What is the difference between concurrent programming and parallel programming?
From oracle documentation page:
In a multithreaded process on a single processor, the processor can switch execution resources between threads, resulting in concurrent execution.
In the same multithreaded process in a shared-memory multiprocessor environment, each thread in the process can run on a separate processor at the same time, resulting in parallel execution.
When the process has fewer or as many threads as there are processors, the threads support system in conjunction with the operating environment ensure that each thread runs on a different processor.
Java SE 7 further enhanced parallel processing by adding ForkJoinPool API.
Refer to below posts for more details:
Parallel programming with threads in Java ( Java specific )
Concurrency vs Parallelism - What is the difference? ( Language agnostic)
Concurrency is an architectural design pattern, which allows you to run multiple operations at once (which can but dont have to be executed in parallel).
In case of a single core execution of such operations, parallelizm can be "simulated" by for example context switching ( assuming your programming language uses threads for parallel execution).
Lets say you have two threads, in one you enqueue jobs. Second one awaits untill any job exists and picks it up for execution. Despite using a single core processor, both of them are running and communicating (via queue).
This is a concurrent execution - even through threads are executed sequentially on a single core (share it).
Parallel version of the same exercise would look similar with with one difference:
Execution of threads would happen on a multi-core processor. Threads would be running in parallel to each other and not sequentially (each on his own core).
Question is quite old, but I'd like to summarize these two in a very clear and concise manner:
Concurrency - think of multitasking by one actor:is when x number of processes/threads (x>1) compete for the same resource. In case of concurrency, when two processes/threads are executed on one CPU, they're not really parallel, meaning, that CPU clock will be switching back and forth from process to process in a super fast way, which will depict the illusion of parallelism, but again - it's 1 CPU shared among different processes/threads. Imagine 5 instructions are to be executed, and they compete to get the resource of CPU, in order to get executed;
Parallelism - think of a multiple tasks where each task is taken care by a separate actor:
is when x number of processes/threads (x>1) execute in parallel, at the same time. Imagine if there are 5 processes/threads, and there are 5 CPU cores, which means that each core can independently execute each thread/process.
In a life without Java Executors, new threads would have to be created for each Runnable tasks. Making new threads requires thread overhead (creation and teardown) that adds complexity and wasted time to a non-Executor program.
Referring to code:
no Java Executor -
new Thread (aRunnableObject).start ();
with Java Executor -
Executor executor = some Executor factory method;
exector.execute (aRunnable);
Bottom line is that Executors abstract the low-level details of how to manage threads.
Is that true?
Thanks.
Bottom line is that Executors abstract the low-level details of how to manage threads. Is that true?
Yes.
They deal with issues such as creating the thread objects, maintaining a pool of threads, controlling the number of threads are running, and graceful / less that graceful shutdown. Doing these things by hand is non-trivial.
EDIT
There may or may not be a performance hit in doing this ... compared with a custom implementation perfectly tuned to the precise needs of your application. But the chances are that:
your custom implementation wouldn't be perfectly tuned, and
the performance difference wouldn't be significant anyway.
Besides, the Executor support classes allow you to simply tune various parameters (e.g. thread pool sizes) if there is an issue that needs to be addressed. I don't see how garbage collection overheads would be significantly be impacted by using Executors, one way or the other.
As a general rule, you should focus on writing your applications simply and robustly (e.g. using the high level concurrency support classes), and only worry about performance if:
your application is running "too slow", and
the profiling tools tell you that you've got a problem in a particular area.
Couple of benefits of executors as against normal threads.
Throttling can be achieved easily by varying the size of ThreadPools. This helps keeping control/check on the number of threads flowing through your application. Particularly helpful when benchmarking your application for load bearing.
Better management of Runnable tasks can be achieved using the RejectionHandlers.
I think all that executors do is that they will do the low level tasks
for you, but you still have to judiciously decide which thread pool do
you want. I mean if your use case needs maximum 5 threads and you go
and use thread pool having 100 threads, then certainly it is going to
have impact on performance. Other than this there is noting extra
being done at low level which is going to halt the system. And last of
all, it is always better to get an idea what is being done at low
level so that it will give us fair idea about the underground things.
I am having some trouble grasping the idea of a concurrent queue. I understand a queue is a FIFO, or first come first serve, data structure.
Now when we add the concurrency part, which I interpret as thread safety (please let me know if that is incorrect) things get a bit fuzzy. By concurrency we mean the way various threads can add to the queue, or delete (service an item) from the queue? Is concurrency providing a sense of ordering to this operations?
I would greatly appreciate a general description of the functionality of a concurrent queue. A similar post here is not as general as I hoped.
Also is there such a thing as a concurrent priority queue? What would be its usage?
Many thanks in advance, for any brief explanations or helpful links on this subject.
The notion that a BlockingQueue offers little overhead is a bit miss leading. Acquiring a lock invokes pretty substantial overhead. Alone with the context switching we are talking thousands of instructions. Not just that but the progress of one thread will directly affect another thread. Now, its not as bad as it was years ago, but compared to non blocking, it is substantial.
BlockingQueue's use locks for mutual exclusion
ArrayBlockingQueue, LinkedBlockingQueue, PriorityBlockingQUeue: are three blocking queue's while
ConcurrentLinkedQueue, java 1.7 LinkedTransferQueue: Uses the Michael and Scott, non blocking queue algorithm.
Under moderate to low contention (which is more of a real world scenario), the non blocking queues significantly out perform blocking queues.
And to note on Steve's comment about the lack of bottlenecks. Under heavy contention a non blocking algorithm can bottle neck on the constant cas attempts, while blocking will suspend the threads. We then see that a BlockingQueue under heavy contention slightly out performs a non blocking queue, but that type of contention isn't a norm by any means.
I understand by "concurrency" that the queue is thread-safe. This does not mean that it will be efficient. However, I would imagine that the Java queue use a lock-free implementation which means that there is little or no penatly when two threads attempt a push or a pop at the same time. What generally happens is that they use atomic locking at an assembler level which ensures that the same object cannot be popped twice.
I once wrote a lock-free FIFO queue (in Delphi ) which worked very well. Much more efficient that a previous version which used Critical sections. The CS version ground to a halt especially with many threads all trying to access the queue. The lock-free version however had no bottlenecks depsite many threads accessing it a lot.
You should start by checking out the BlockingQueue interface definition as this is the cornerstone for using queues for communication between threads and contains utility methods to allow producer and consumer threads to access the queue in either a blocking or non-blocking fashion. This, along with thread-safe access is my understanding of what constitutes a "concurrent queue" (although I've never heard of that phrase - BlockingQueue merely exists in the java.util.concurrent package).
To answer the second part of your question, the priority queue implementation you should study is PriorityBlockingQueue. This may be useful if your producer thread(s) are producing tasks of varying priorities (e.g. requests from "normal users" and "power users") and you wish to control the order in which tasks are processed by your consumer thread(s). One possible pitfall to avoid is the starvation of low priority tasks that are never removed from the queue due to the constant influx of higher priority tasks.
Just leaving here a link to the java.util.concurrent package that I think contains very important information about some questions raised here.
See: Concurrent Collections and Memory Consistency Properties