How to think about Java Threads? aka Thread.stop

How to think about Java Threads? aka Thread.stop - java

Exposition:
I think the Java VM is awesome. It's guarantee of the safety of bytecode, the the standard libraries, ... are amazing, especially the ability to load a Java class on the fly, and know that it can't crash the VM (good luck with *.so files or kernel modules).
One thing I don't understand, is how Java treats Thread.stop
I've read http://java.sun.com/j2se/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html
but it seems weird for the following reasons:
1) Resource Management
On Unix OS, if a process is hogging up resources, I can kill -9 it.
2) Breaking of Abstraction:
If I start a computationally expensive job, and I no longer need the computation, I can kill -9 it. Under this Java threading model, my computation thread has to periodically check for some boolean flag to see whether it should quit [this seems like breaking abstraction layers -- when I'm writing computation code, I should focus on computation code, not where to spread out checks for whether it should terminate.
3) Safety of Lock/Monitors
So the official reason is "what is a thread is holding a Lock/Monitor and it gets Thread.stopped ? The objects will be left in damaged states" -- yet, in OSes this is not a problem, we have interrupt handlers. Why can't Java threads have interrupt handlers that work like OS interrupt handlers?
Question:
Clearly, I am thinking about Java Threads with the wrong mental model. How should I be thinking about Java threads?
Thanks!

I think the key thing to remember is that threads are not processes. The only thing that threads "own" is the execution thread, since everything else can potentially be shared with other threads within the same process (memory space). Stopping a thread can't clean anything up because things may still be completely valid for other threads to handle.
In an operating system process, the OS keeps track of everything that the process owns (memory, files, locks, etc) and cleans things up properly when you SIGKILL a process.

You seems to be confusing threads and processes.
Resources(Allocated memory, Locks, file handles and so on) belong to the process not to any specific thread in the process. The entire point(And danger) of threads is that multiple threads within the same process share resources.
So it does not sense to talk about any resource belonging to a single thread.
ps: I am pretty sure you can't use kill/kill -9 to kill a thread in Linux. The man page for kill say it can only kill a process or process groups.

Related

Reading from disk and processing in parallel

This is going to be the most basic and even may be stupid question here. When we talk about using multi threading for better resource utilization. For example, an application reads and processes files from the local file system. Lets say that reading of file from disk takes 5 seconds and processing it takes 2 seconds.
In above scenario, we say that using two threads one to read and other to process will save time. Because even when one thread is processing first file, other thread in parallel can start reading second file.
Question: Is this because of the way CPUs are designed. As in there is a different processing unit and different read/write unit so these two threads can work in parallel on even a single core machine as they are actually handled by different modules? Or this needs multiple core.
Sorry for being stupid. :)

On a single processor, multithreading is achieved through time slicing. One thread will do some work then it will switch to the other thread.
When a thread is waiting on some I/O, such as a file read, it will give up it's CPU time-slice prematurely allowing another thread to make use of the CPU.
The result is overall improved throughput compared to a single thread even on a single core.
Key for below:
= Doing work on CPU
- I/O
_ Idle
Single thread:
====--====--====--====--
Two threads:
====--__====--__====--__
____====--__====--__====
So you can see how more can get done in the same time as the CPU is kept busy where it would have been kept waiting before. The storage device is also being used more.

In theory yes. Single core has same parallelism. One thread waiting for read from file (I/O Wait), another thread is process file that already read before. First thread actually can not running state until I/O operations is completed. Rougly not use cpu resource at this state. Second thread consume CPU resource and complete task. Indeed, multi core CPU has better performance.

To start with, there is a difference between concurrency and parallelism. Theoretically, a single core machine does not support parallelism.
About the question on performance improvement as a result of concurrency (using threads), it is very implementation dependent. Take for instance, Android or Swing. Both of them have a main thread (or the UI thread). Doing large calculation on the main thread will block the UI and make in unresponsive. So from a layman perspective that would be a bad performance.
In your case(I am assuming there is no UI Thread) where you will benefit from delegating your processing to another thread depends on a lot of factors, specially the implementation of your threads. e.g. Synchronized threads would not be as good as the unsynchronized ones. Your problem statement reminds me of classic consumer producer problem. So use of threads should not really be the best thing for your work as you need synchronized threads. IMO It's better to do all the reading and processing in a single thread.
Multithreading will also have a context switching cost. It is not as big as Process's context switching, but it's still there. See this link.
[EDIT] You should preferably be using BlockingQueue for such producer consumer scenario.

how to make Java uses multiple cores with threads?

This is a similar question to the one appearing at: How to ensure Java threads run on different cores. However, there might have been a lot of progress in that in Java, and also, I couldn't find the answer I am looking for in that question.
I just finished writing a multithreaded program. The program spawns several threads, but it doesn't seem to be using more than a single core. The program is faster (I am parallelizing something which makes it faster), but it definitely does not use all cores available, judging by running "top".
Any ideas? Is that an expected behavior?
The general code structure is as following:
for (some values in i)
{
start a thread of instantiated as MyThread(i)
(this thread uses heavily ConcurrentHashMap and arrays and basic arithmetic, no IO)
add the thread to a list T
}
foreach (thread in T)
{
do thread.join()
}

If its almost exactly 100% of one CPU, it can mean you really have
one core thread which is doing all the work and the others are not doing so much.
one resource which you are locking on and only one thread has a chance to run.
If you are using approximately one CPU it can mean this is all the work your CPUs have because you are waiting for something such as IO (network and/or disk)
I suggest you look at the state of your threads in VisualVM. It will help you identify which threads are running and give you an ideal of their pattern of behaviour. I also suggest you use a CPU profiler to help find your bottlenecks.

I think I read in the SCJP book by Katherine Sierra that JVM's ask the underlying OS to create a new OS thread for every Java thread.
So it's up to the underlying Operating System to decide how to balance Java (and any other kind of) threads between the available CPU's.

Manipulate Thread Implementation in JVM

Recently, I've been working on the deployment of concurrent objects onto multicore. In a sample, I use BlockingQueue.take() method whose specification mentions that it is blocking. It means that the method does not release the enclosing thread's resources such that it can be re-used for other concurrent tasks. This is useful since the total number of live threads in a JVM instance is limited and if the application would need thousands of live threads, then it is vital to be able to re-use suspended threads. On the other hand, JVM uses a 1:1 mapping from application-level threads to OS-level threads in Java; i.e. each Java Thread instance becomes an underlying OS-level thread.
The current solution is based on java.util.concurrency in Java 1.5+. Still, we need worker threads that are such scalable to a large number. Now, I am interested to find the following answers:
Is there any way to replace the implementation of java.lang.Thread in JVM such that I can plug my own Thread implementation?
Is this only possible through tweaking C++ sections of the thread implementation in JVM and recompiling it?
Is there any library to provide a way to replace the classical thread in Java?
Again, in the same line, is there a library or a way to guide how some threads in Java can be mapped to only one thread in the OS-level?
I also found this discussing different implementations of JVM and I am not sure if they could help.
Thanks for your comments and ideas in advance.

If you are creating thousands of threads, you're doing it wrong.
Instead, consider using the Executor framework. (Start with the Executors and ThreadPoolExecutor classes.) They allow you to queue thousands of tasks while having a sane number of threads handling them.
I guess this approach is what you meant by "library to replace the classical threads". I highly recommend you look into executors.
One caveat: Executors, by default, use non-daemon threads. Therefore, you must shut down your executor when you're done with it. You can do this at program exit, if there is a normal way to exit your program that doesn't simply involve waiting for all threads to finish. :-)

Communication between Java thread and OS threads

As for as I know, Java threads can communicate using some thread APIs. But I want to know how the Java threads and the OS threads are communicting with each other. For example a Java thread needs to wait for some OS thread finishes its execution and returns some results to this Java thread and it process the same.

Many mix up threads and processes here, the jvm is a process which may spawn more threads. Threads are lighter processes which share memory within their process. A process on the other hand lives in his own address space, which makes the context switch more expensive. You can communicate between different processes via the IPC mechanisms provided by your OS and you can communicate between different threads within the same process due to shared memory and other techniques. What you can't is communicate from ThreadA(ProcessA) to ThreadA(ProcessB) without going through plain old IPC: ThreadA(ProcessA) -> ProcessA -> IPC(OS) -> ProcessB -> ThreadA(ProcessB)).
You can use RMI to communicate between two java processes, if you want to "talk" to native OS processes, you have to go JNI to call the IPC mechanisms your OS of choice provides imo.
Feel free to correct me here :)
Sidenote:
You cant see the threads of your JVM with a process manager (as long as your JVM does not map threads to native processes, which would be stupid but possible), you need to use jps and jstack to do that.

Every Instance of JVM is essentially an OS process.

Java threads usually but don't necessarily run on native threads and Java concurrency classes could but don't necessarily map onto native equivalents.
If you had to sync between a native thread and a Java thread, you will most likely have to consider writing a JNI method that your Java thread calls. This JNI method would do whatever native synchronization operation it needs to do and then return. Every platform is going to do this differently but I assume this wouldn't be too much of an issue if you need to inspect native threads in the first place.

Understanding java's native threads and the jvm

I understand that the jvm is itself an application that turns the bytecode of the java executable into native machine code, but when using native threads I have some questions that I just cannot seem to answer.
Does every thread create their own
instance of the jvm to handle their
particular execution?
If not then does the jvm have to have some way to schedule which thread it will handle next, if so wouldn't this render the multi-threaded nature of java useless since only one thread can be ran at a time?

Does every thread create their own instance of the JVM to handle their particular execution?
No. They execute in the same JVM so that (for example) they can share objects and values of static fields.
If not then does the JVM have to have some way to schedule which thread it will handle next
There are two kinds of thread implementation in Java. Native threads are mapped onto a thread abstraction which is implemented by the host OS. The OS takes care of native thread scheduling, and time slicing.
The second kind of thread is "green threads". These are implemented and managed by the JVM itself, with the JVM implementing thread scheduling. Java green thread implementations have not been supported by Sun / Oracle JVMs since Java 1.2. (See Green Threads vs Non Green Threads)
If so wouldn't this render the multi-threaded nature of Java useless since only one thread can be ran at a time?
We are talking about green threads now, and this is of historic interest (only) from the Java perspective.
Green threads have the advantage that scheduling and context switching are faster in the non-I/O case. (Based on measurements made with Java on Linux 2.2; http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.9238)
With pure green threads, N programming language threads are mapped to a single native thread. In this model you don't get true parallel execution, as you noted.
In a hybrid thread implementation, N programming language threads are mapped onto M native threads (where N > M). In this model, the in-process thread scheduler is responsible for the green thread to native thread scheduling AND you get true parallel execution (if M > 1); see https://stackoverflow.com/a/16965741/139985.
But even with the pure green threads, you still get concurrency. Control is switched to another threads a thread blocks on an I/O operation, whick acquiring a lock, and so on. Furthermore, the JVM's runtime could implement periodic thread preemption so that a CPU intensive thread doesn't monopolize the (single) core to the exclusion of other threads

Does every thread create their own instance of the jvm to handle their particular execution?
No, your application running in the JVM can have many threads that all exist within that instance of the JVM.
If not then does the jvm have to have some way to schedule which thread it will handle next...
Yes, the JVM has a thread scheduler. There are many different algorithms for thread scheduling, and which one is used is JVM-vendor dependent. (Scheduling in general is an interesting topic.)
...if so wouldn't this render the multi-threaded nature of java useless since only one thread can be ran at a time?
I'm not sure I understand this part of your question. This is kind of the point of threading. You typically have more threads than CPUs, and you want to run more than one thing at a time. Threading allows you to take full(er) advantage of your CPU by making sure it's busy processing one thread while another is waiting on I/O, or is for some other reason not busy.

A Java thread may be mapped one-to-one to a kernel thread. But this must not be so. There could be n kernel threads running m java threads, where m may be much larger than n, and n should be larger than the number of processors. The JVM itself starts the n kernel threads, and each one of them picks a java thread and runs it for a while, then switches to some other java thread. The operating system picks kernel threads and assigns them to a cpu. So there may be thread scheduling on several levels.
You may be interested to look at the GO programming language, where thousands of so called "Goroutines" are run by dozens of threads.

Java threads are mapped to native OS threads. They have little to do with the JVM itself.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.