Features for profiling concurrent program behaviour in Java

Features for profiling concurrent program behaviour in Java - java

Nowadays there are some profilers which promise to profile concurrent behavior of program execution in order to understand the threaded execution.
I am collection features which would be useful for a Java profiler concentrating on profiling concurrency only:
What I've collected so far:
construction of waits-for graphs to detect potential deadlocks
time measurement of accessing resources (data-structures, etc.)
show states of every thread (alive, interrupted, dead)
which thread called which thread for accessing shared ressources (wait, blocked, etc.)
What ideas do you have? Personally I am aiming to unveil some bad programming habits when dealing with concurrency in Java.

Summary statistics for each thread: how much time spend in each state (running, runnable, blocked etc).

Tools for detecting "hot" monitors in order to find where the contention is. For example, show locks sorted by the total time spent waiting for them, with the ability to see the bits of code that held the lock and the bits of code that waited for it.

Any
Contended monitor or lock
Failed CAS
Volatile reads and writes
What would be fantastic would be a way to see shared data that wasn't protected by happens-before and was therefore racy. Hard to do though.

When each thread is blocked, if the thread code is at all complex, simply knowing that it's blocked will not be very informative, even if you can tell which other thread it's waiting for. I would want to know why it's blocked.
The way to tell why it's blocked is to capture its call stack at the time it becomes blocked. Each function call site on the stack gives one link in the chain of reasoning of why it is there.

Related

Confusion regarding the Blocking of "peer threads" when a user-level thread blocks

I was reading about differences between threads and processes, and literally everywhere online, one difference is commonly written without much explanation:
If a process gets blocked, remaining processes can continue execution.
If a user level thread gets blocked, all of its peer threads also get
blocked.
It doesn't make any sense to me. What would be the sense of concurrency if a scheduler cannot switch between a blocked thread and a ready/runnable thread. The reason given is that since the OS doesn't differentiate between the various threads of a given parent process, it blocks all of them at once.
I find it very unconvincing, since all modern OS have thread control blocks with a thread ID, even if it is valid only within the memory space of the parent process. Like the example given in Galvin's Operating Systems book, I wouldn't want the thread which is handling my typing to be blocked if the spell checking thread cannot connect to some online dictionary, perhaps.
Either I am understanding this concept wrong, or all these websites have just copied some old thread differences over the years. Moreover, I cannot find this statement in books, like Galvin's or maybe in William Stalling's COA book where threads have been discussed.
These are resouces where I found the statements:
https://www.geeksforgeeks.org/difference-between-process-and-thread/
https://www.tutorialspoint.com/difference-between-process-and-thread
https://www.guru99.com/difference-between-process-and-thread.html
https://www.javatpoint.com/process-vs-thread

There is a difference between kernel-level and user-level threads. In simple words:
Kernel-level threads: Threads that are managed by the operating system, including scheduling. They are what is executed on the processor. That's what probably most of us think of threads.
User-level threads: Threads that are managed by the program itself. They are also called fibers or coroutines in some contexts. In contrast to kernel-level threads, they need to "yield the execution", i.e. switching from one user-level to another user-level thread is done explicitly by the program. User-level threads are mapped to kernel-level threads.
As user-level threads need to be mapped to kernel-level threads, you need to choose a suiteable mapping. You could map each user-level to a separate kernel-level thread. You could also map many user-level to one kernel-level thread. In the latter mapping, you let multiple concurrent execution paths be executed by a single thread "as we know it". If one of those paths blocks, recall that user-level threads need to yield the execution, then the executing (kernel-level) thread blocks, which causes all other assigned paths to also be effectively blocked. I think, this is what the statement refers to. FYI: In Java, user-level threads – the multithreading you do in your programs – are mapped to kernel-level threads by the JVM, i.e. the runtime system.
Related stuff:
Understanding java's native threads and the jvm
What is the difference between a thread and a fiber?
What is difference between User space and Kernel space?
What is the difference between concurrent programming and parallel programming?
Implementing threads

Back in the early days of Java at least, user-level threads were called "green threads", to implement Java threading on OSes that didn't support native threading. There's still a Wiki article https://en.wikipedia.org/wiki/Green_threads which explains the origin and meaning.
(This was back when desktops/laptops were uniprocessor systems, with a single-core CPU in their 1 physical socket, and SMP machines mostly only existed as multi-socket.)
You're right, this was terrible, and once mainstream OSes grew up to support native threads, people mostly stopped ever doing this. For Java specifically at least, Green threads refers to the name of the original thread library for the programming language Java (that was released in version 1.1 and then Green threads were abandoned in version 1.3 to native threads).
So use Java version 1.3 or later if you don't want your spell-check thread to block your whole application. :P This is ancient history.
Although there is some scope for using non-blocking IO and context switching when a system call returns that it would block, but usually it's better to let the kernel handle threads blocking and unblocking, so that's what normal modern systems do.
IIRC on Solaris there was also some use of an N:M model where N user-space threads might be handled by fewer than N kernel threads. This could mean having some "peer" threads (that share the same kernel thread) like in your quote without being fully terrible purely userspace green threads.
(i.e. only some of your total threads are sharing the same kernel thread.)
pthreads on Linux uses a 1:1 model where every software thread is a separate task for the kernel to schedule.
Google found https://flylib.com/books/en/3.19.1.51/1/ which defines those thread models and talks about them some, including the N:M hybrid model, and the N:1 user-space aka green threads model that needs to use non-blocking I/O if it wants to avoid blocking other threads. (e.g. do a user-space context switch if a system call returns EAGAIN or after queueing an async read or write.)

Okay, the other answers provide detailed information.
But to hit your main convern right in the middle:
the article is putting that a bit wrong, lacking the necessary context (see all the details in #akuzminykh 's explanation of user-level threads and kernel-level threads)
what this means for a Java programmer: don't bother with those explanations. If one of your Java threads blocks (due to I/O etc), that will have NO IMPACT on any other of your threads (unless, of course, you explicitly WANT them to, but then you'd have to explicitly use mechanisms for that)
How do Threads get blocked in Java?
If you call sleep() or wait() etc, the Thread that currently executes that code (NOT the objects you call them on) will be blocked. These will get released on certain events: sleep will finish once the timer runs out or the thread gets interrupted by another, wait will release once it gets notified by another thread.
if you run into a synchronized(lockObj) block or method: this will release once the other thread occupying that lockObj releases it
closely related to that, if you enter ThreadGates, mutexes etc, all those 1000s of specialized classes for extended thread control like rendezvous etc
If you call a blocking I/O method, like block reading from InputStream etc: int amountOfBytesRead = read(buffer, offset, length), or String line = myBufferedReader.readLine();
opposed to that, there are many non-blocking I/O operations, like most of the java.nio (non-blocking I/O) package, that return immediately, but may indicate invalid result values
If the Garbage Collector does a quick cleanup cycle (which are usually so short you will not even notice, and the Threads get released automatically again)
if you call .parallelStream() functions for certain long-lasting lambda functions on streams (like myList.parallelStream().forEach(myConsumerAction)) that - if too complex or with too many elements - get handled by automated multithreading mechanisms (which you will not notice, because after the whole stuff is done, your calling thread will resume normally, just as if a normal method was called). See more here: https://www.baeldung.com/java-when-to-use-parallel-stream

CPU usage is 100% during Thread.onSpinWait()

I'm writing a backtesting raw data collector for my crypto trading bot and I've run into a weird optimization issue.
I constantly have 30 runnables in an Executors.newCachedThreadPool() running get requests from an API. Since the API has a request limit of 1200 per minute I have this bit of code in my runnable:
while (minuteRequests.get() >= 1170) {
Thread.onSpinWait();
}
Yes, minuteRequests is an AtomicInteger, so I'm not running into any issues there.
Everything works, the issue is that even though I'm using the recommended busy-waiting onSpinWait method, I shoot from 24% CPU usage or so to 100% when the waiting is initiated. For reference I'm running this on a 3900X (24 thread).
Any recommendations on how to better handle this situation?

My recommendation would be to not do busy waiting at all.
The javadocs for Thread.onSpinWait say this:
Indicates that the caller is momentarily unable to progress, until the occurrence of one or more actions on the part of other activities. By invoking this method within each iteration of a spin-wait loop construct, the calling thread indicates to the runtime that it is busy-waiting. The runtime may take action to improve the performance of invoking spin-wait loop constructions.
Note the highlighted section uses the word may rather than will. That means that it also may not do anything. Also "improve the performance" does not mean that your code will be objectively efficient.
The javadoc also implies that the improvements may be hardware dependent.
In short, this is the right way to use onSpinwait ... but you are expecting too much of it. It won't make your busy-wait code efficient.
So what would I recommend you actually do?
I would recommend that you replace the AtomicInteger with a Semaphore (javadoc). This particular loop would be replaced by the following:
semaphore.acquire();
This blocks1 until 1 "permit" is available and acquires it. Refer to the class javadocs for an explanation of how semaphores work.
Note: since you haven't show us the complete implementation of your rate limiting, it is not clear how your current approach actually works. Therefore, I can't tell you exactly how to replace AtomicInteger with Semaphore throughout.
1 - The blocked thread is "parked" until some other thread releases a permit. While it is parked, the thread does not run and is not associated with a CPU core. The core is either left idle (typically in a low power state) or it is assigned to some other thread. This is typically handled by the operating system's thread scheduler. When another thread releases a permit, the Semaphore.release method will tell the OS to unpark one of the threads that is blocked in acquire.

Letting a thread wait vs stopping and starting

I have a consumer thread blocking on removing from a queue.
There are going to be periods during which I know nothing will be added to the queue.
My question is: is it worth adding the complexity of managing when to start/stop the thread, or should I just leave it waiting until queue starts getting elements again?

If the concurrent queue implementation that you're using is worth it's salt then the thread will not be busy-waiting for very long. Some implementations may do this briefly for performance reasons but after that then it will block and will not be consuming CPU cycles. Therefore the difference between a stopped thread and a blocked thread becomes more or less meaningless.
Use a concurrent queue. See Which concurrent Queue implementation should I use in Java?

When dealing with Multithreading its a best practice to just act when you have a performance problem. Otherwise I would just leave it like it is to avoid trouble.

I dont think there is a big impact on the performance since the thread is blocked (inactive waiting). It could make sense if the thread is holding expensive resources which can be released for that time. I would keep this as simple as possible, especially in a concurrent enviroment complexity can lead to strange errors.

Why prefer wait/notify to while cycle?

I have some misunderstanding with advantages of wait/notify. As i understand processor core will do nothing helpful in both cases so what's the reason tro write complex wait/notify block codes instead of just waiting in cycle?
I'm clear that wait/notify will not steal processor time in case when two threads are executed on only one core.

"Waiting in a cycle" is most commonly referred to as a "busy loop" or "busy wait":
while ( ! condition()) {
// do nothing
}
workThatDependsOnConditionBeingTrue();
This is very disrespectful of other threads or processes that may need CPU time (it takes 100% time from that core if it can). So there is another variant:
while ( ! condition()) {
sleepForShortInterval();
// do nothing
}
workThatDependsOnConditionBeingTrue();
The small sleep in this variant will drop CPU usage dramatically, even if it is ~100ms long, which should not be noticeable unless your application is real-time.
Note that there will generally be a delay between when the condition actually becomes true and when sleepForShortInterval() ends. If, to be more polite to others, you sleep longer -- the delay will increase. This is generally unacceptable in real-time scenarios.
The nice way to do this, assuming that whatever condition() is checking is being changed from another thread, is to have the other thread wake you up when it finishes whatever you are waiting for. Cleaner code, no wasted CPU, and no delays.
Of course, it's quicker to implement a busy wait, and it may justified for quick'n'dirty situations.
Beware that, in a multithreaded scenario where condition() can be changed to false as well as true, you will need to protect your code between the while and the workThatDependsOnConditionBeingTrue() to avoid other threads changing its value in this precise point of time (this is called a race codition, and is very hard to debug after the fact).

I think you answered your question almost by saying
I'm clear that wait/notify will not steal processor time in case.
Only thing I would add is, this true irrespective of one core or multi-core. wait/notify wont keep the cpu in a busy-wait situation compared to while loop or periodic check.
what's the reason not to run core but wait? There's no helpful work in any case and you're unable to use core when it's in waiting state.
I think you are looking at it from a single application perspective where there is only one application with one thread is running. Think of it from a real world application (like web/app servers or standalone) where there are many threads running and competing for cpu cycles - you can see the advantage of wait/notify. You would definitely not want even a single thread to just do a busy-wait and burn the cpu cycles.
Even if it a single application/thread running on the system there are always OS process running and its related processes that keep competing for the CPU cycles. You don't want them to starve them because the application is doing a while busy-wait.
Quoting from Gordon's comment
waiting in cycle as you suggest you are constantly checking whether the thing you are waiting for has finished, which is wasteful and if you use sleeps you are just guessing with timing, whereas with wait/notify you sit idle until the process that you are waiting on tells you it is finished.

In general, your application is not the only one running on the CPU. Using non-spinning waiting is, first of all, an act of courtesy towards the other processes/threads which are competing for the CPU in order to do some useful job. The CPU scheduler cannot know a-priori if your thread is going to do something useful or just spin on a false flag. So, it can't tune itself based on that, unless you tell it you don't want to be run, because there's nothing for you to do.
Indeed, busy-waiting is faster than getting the thread to sleep, and that's why usually the wait() method is implemented in a hybrid way. It first spins for a while, and then it actually goes to sleep.
Besides, it's not just waiting in a loop. You still need to synchronize access to the resources you're spinning on. Otherwise, you'll fall victim of race conditions.
If you feel the need of a simpler interface, you might also consider using CyclicBarrier, CountDownLatch or a SynchronousQueue.

How to know how many time is a java process awaiting

I am developing a Java application that has two threads:
A producer thread that feeds an ArrayBlockingQueue at a frequency of 10 KHz (It is really a C code through JNI).
A consumer thread that takes data from the queue, using take method, and then process it (you can't assume the processing time is always the same). Due to I am using take method, this thread can be blocked if no data is available in the queue.
I would like to know how can I monitor or profiling the consumer thread to know how many time it is waiting or blocked.
I am not interested in answers such as taking times with System.currentTimeMillis() and taking differences. I want to know how to analyze the whole thread life and sum up how many time has been in every thread state, if this is possible.
How do you do this kind of monitoring?
Thanks in advance!

Any decent Java Profiler can separate statistics by thread, even the otherwise rather basic JVisualVM included with the JDK. Here's a screenshot of JVisualVM watching itself:
The same information can also be displayed in a table:

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.