Manipulate Thread Implementation in JVM

Manipulate Thread Implementation in JVM - java

Recently, I've been working on the deployment of concurrent objects onto multicore. In a sample, I use BlockingQueue.take() method whose specification mentions that it is blocking. It means that the method does not release the enclosing thread's resources such that it can be re-used for other concurrent tasks. This is useful since the total number of live threads in a JVM instance is limited and if the application would need thousands of live threads, then it is vital to be able to re-use suspended threads. On the other hand, JVM uses a 1:1 mapping from application-level threads to OS-level threads in Java; i.e. each Java Thread instance becomes an underlying OS-level thread.
The current solution is based on java.util.concurrency in Java 1.5+. Still, we need worker threads that are such scalable to a large number. Now, I am interested to find the following answers:
Is there any way to replace the implementation of java.lang.Thread in JVM such that I can plug my own Thread implementation?
Is this only possible through tweaking C++ sections of the thread implementation in JVM and recompiling it?
Is there any library to provide a way to replace the classical thread in Java?
Again, in the same line, is there a library or a way to guide how some threads in Java can be mapped to only one thread in the OS-level?
I also found this discussing different implementations of JVM and I am not sure if they could help.
Thanks for your comments and ideas in advance.

If you are creating thousands of threads, you're doing it wrong.
Instead, consider using the Executor framework. (Start with the Executors and ThreadPoolExecutor classes.) They allow you to queue thousands of tasks while having a sane number of threads handling them.
I guess this approach is what you meant by "library to replace the classical threads". I highly recommend you look into executors.
One caveat: Executors, by default, use non-daemon threads. Therefore, you must shut down your executor when you're done with it. You can do this at program exit, if there is a normal way to exit your program that doesn't simply involve waiting for all threads to finish. :-)

Related

Reading from disk and processing in parallel

This is going to be the most basic and even may be stupid question here. When we talk about using multi threading for better resource utilization. For example, an application reads and processes files from the local file system. Lets say that reading of file from disk takes 5 seconds and processing it takes 2 seconds.
In above scenario, we say that using two threads one to read and other to process will save time. Because even when one thread is processing first file, other thread in parallel can start reading second file.
Question: Is this because of the way CPUs are designed. As in there is a different processing unit and different read/write unit so these two threads can work in parallel on even a single core machine as they are actually handled by different modules? Or this needs multiple core.
Sorry for being stupid. :)

On a single processor, multithreading is achieved through time slicing. One thread will do some work then it will switch to the other thread.
When a thread is waiting on some I/O, such as a file read, it will give up it's CPU time-slice prematurely allowing another thread to make use of the CPU.
The result is overall improved throughput compared to a single thread even on a single core.
Key for below:
= Doing work on CPU
- I/O
_ Idle
Single thread:
====--====--====--====--
Two threads:
====--__====--__====--__
____====--__====--__====
So you can see how more can get done in the same time as the CPU is kept busy where it would have been kept waiting before. The storage device is also being used more.

In theory yes. Single core has same parallelism. One thread waiting for read from file (I/O Wait), another thread is process file that already read before. First thread actually can not running state until I/O operations is completed. Rougly not use cpu resource at this state. Second thread consume CPU resource and complete task. Indeed, multi core CPU has better performance.

To start with, there is a difference between concurrency and parallelism. Theoretically, a single core machine does not support parallelism.
About the question on performance improvement as a result of concurrency (using threads), it is very implementation dependent. Take for instance, Android or Swing. Both of them have a main thread (or the UI thread). Doing large calculation on the main thread will block the UI and make in unresponsive. So from a layman perspective that would be a bad performance.
In your case(I am assuming there is no UI Thread) where you will benefit from delegating your processing to another thread depends on a lot of factors, specially the implementation of your threads. e.g. Synchronized threads would not be as good as the unsynchronized ones. Your problem statement reminds me of classic consumer producer problem. So use of threads should not really be the best thing for your work as you need synchronized threads. IMO It's better to do all the reading and processing in a single thread.
Multithreading will also have a context switching cost. It is not as big as Process's context switching, but it's still there. See this link.
[EDIT] You should preferably be using BlockingQueue for such producer consumer scenario.

How to schedule Java Threads

I have read that Java threads are user-level threads and one of the differences between user level threads and kernel level threads is that kernel level threads are scheduled by the kernel(we cannot change it) where as for user level threads we can define our own scheduling algorithm.
So how do we schedule threads in Java? At any given time, when multiple threads are ready to be executed, the runtime system chooses the Runnable thread with the highest priority for execution. If two threads of the same priority are waiting for the CPU, the scheduler chooses one of them to run in a round-robin fashion. What if I don't want RR? is there a way I can change it or am I missing something here?

You cannot change the scheduling algorithm as for the JVM this is outside the scope. The JVM uses the threading of user threads provided by the underlying OS.
So from the Java perspective you cannot change the scheduling algorithm. The scheduling is done automatically.
The only thing in Java you can do is set the priority of the thread. But how this affects the scheduling algorithm is not defined.
You can try to change the scheduling algorithm of the OS where your VM is running on. But this is highly dependend on the OS used.

For the last 10 years or so JVM threads are system-level threads and not user-level ('green') threads. Even for user-level threads, you don't get to manage them (the JVM does).

The JVM Spec does not state how threads are supposed to be scheduled by an implementation. The Hotspot VM (and most likely almost every other implementation as well) use the OS scheduling mechanisms (as Uwe stated). See also What is the JVM Scheduling algorithm?.
A simple, yet most likely not very efficient way to influence scheduling of your application threads would be to have only n runnable threads for the OS to schedule (n being the number of threads you'd actually like to run in parallel).
That could e.g. be your own implementation of ExecutorService, which makes all threads you don't want to be scheduled by the OS wait until you think they should run.
Of course this way you don't have any influence on other VM threads, let alone other applications or the OS.
A lot more involved (and not plattform independent) would be to change the OS scheduler itself to something more tailored to the needs of a JVM. A quick google research found this abstract, and I guess there's more work done on that field.

In Effective Java, 2nd Ed., Joshua Bloch devotes an item to the discussion of thread scheduling. He goes on at length about how trying to tweak thread scheduling usually only leads to solutions that are JVM implementation dependent, non-portable, and fragile.
If there's a particular scheduling problem that you have, then for new code you should not deal with low-level thread calls anyway. Java has higher level concurrency libraries that simplify many of these tasks. Rather than defining the solution to your problem with threads, you ought to be thinking of Executors and Tasks. There are also higher level facilities that simplify inter-thread communication, such as CountDownLatch.
Low level thread calls such as wait, notify, and notifyAll can be difficult to do properly.

You could write your own thread scheduler, analogous to the Quartz job scheduler for batch jobs.
This would allow you to execute threads at various times of the day during the run of your application.
If all you want is to determine the order of your thread execution, execute the code from one master thread.

Does Java 7 fork/join guarantees executing thread in seperate CPU

Recently, I came to know about the Java 7 fork/join framework - what I learned is that it could be useful for divide-and-conquer like problems.
My question is, does the framework guarantees executing threads on separate CPUs? Or is it event possible to instruct the threads I create using classes of concurrent package to run on separate CPUs available in my server?

It'll be built upon the standard JVM concurrency primitives, in which case they will (eventually) be scheduled onto real OS threads. You cannot guarantee that your OS scheduler is going to schedule threads onto separate CPUS, although it's quite likely in most instances.
Trying to guess what a concurrent scheduler is going to do at runtime is a really bad idea. Just assume that you will be able to make use of no more than as many CPUs as you have active threads, and don't try to second-guess the runtime behaviour unless you're trying to do a particular kind of very low-level optimisation.

At least it will do its best. The fork/join framework is designed to take advantage of multiple processors. By default ForkJoinPool is created with the number of worker threads equal to the number of processors.

Does the framework guarantee executing threads on separate CPUs?
No. No guarantees.
Or is it event possible to instruct the threads I create using classes of concurrent package to run on separate CPUs available in my server?
Not using the standard Java libraries. In theory, anything is possible (up to the limit of what the OS allows) if you are willing to dig around in the native layers of the JVM. But you will be in for a lot of unnecessary work / pain.
My advice:
You probably don't need that level of control. (IMO) it is likely that the default behaviour of the native thread scheduler is "good enough" to achieve satisfactory performance.
If you really need that level of control, you would be better off using a different programming language; i.e. one where you can interact directly with the host OS'es native thread scheduler. You may even need a different operating system ...

how to make Java uses multiple cores with threads?

This is a similar question to the one appearing at: How to ensure Java threads run on different cores. However, there might have been a lot of progress in that in Java, and also, I couldn't find the answer I am looking for in that question.
I just finished writing a multithreaded program. The program spawns several threads, but it doesn't seem to be using more than a single core. The program is faster (I am parallelizing something which makes it faster), but it definitely does not use all cores available, judging by running "top".
Any ideas? Is that an expected behavior?
The general code structure is as following:
for (some values in i)
{
start a thread of instantiated as MyThread(i)
(this thread uses heavily ConcurrentHashMap and arrays and basic arithmetic, no IO)
add the thread to a list T
}
foreach (thread in T)
{
do thread.join()
}

If its almost exactly 100% of one CPU, it can mean you really have
one core thread which is doing all the work and the others are not doing so much.
one resource which you are locking on and only one thread has a chance to run.
If you are using approximately one CPU it can mean this is all the work your CPUs have because you are waiting for something such as IO (network and/or disk)
I suggest you look at the state of your threads in VisualVM. It will help you identify which threads are running and give you an ideal of their pattern of behaviour. I also suggest you use a CPU profiler to help find your bottlenecks.

I think I read in the SCJP book by Katherine Sierra that JVM's ask the underlying OS to create a new OS thread for every Java thread.
So it's up to the underlying Operating System to decide how to balance Java (and any other kind of) threads between the available CPU's.

Understanding java's native threads and the jvm

I understand that the jvm is itself an application that turns the bytecode of the java executable into native machine code, but when using native threads I have some questions that I just cannot seem to answer.
Does every thread create their own
instance of the jvm to handle their
particular execution?
If not then does the jvm have to have some way to schedule which thread it will handle next, if so wouldn't this render the multi-threaded nature of java useless since only one thread can be ran at a time?

Does every thread create their own instance of the JVM to handle their particular execution?
No. They execute in the same JVM so that (for example) they can share objects and values of static fields.
If not then does the JVM have to have some way to schedule which thread it will handle next
There are two kinds of thread implementation in Java. Native threads are mapped onto a thread abstraction which is implemented by the host OS. The OS takes care of native thread scheduling, and time slicing.
The second kind of thread is "green threads". These are implemented and managed by the JVM itself, with the JVM implementing thread scheduling. Java green thread implementations have not been supported by Sun / Oracle JVMs since Java 1.2. (See Green Threads vs Non Green Threads)
If so wouldn't this render the multi-threaded nature of Java useless since only one thread can be ran at a time?
We are talking about green threads now, and this is of historic interest (only) from the Java perspective.
Green threads have the advantage that scheduling and context switching are faster in the non-I/O case. (Based on measurements made with Java on Linux 2.2; http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.9238)
With pure green threads, N programming language threads are mapped to a single native thread. In this model you don't get true parallel execution, as you noted.
In a hybrid thread implementation, N programming language threads are mapped onto M native threads (where N > M). In this model, the in-process thread scheduler is responsible for the green thread to native thread scheduling AND you get true parallel execution (if M > 1); see https://stackoverflow.com/a/16965741/139985.
But even with the pure green threads, you still get concurrency. Control is switched to another threads a thread blocks on an I/O operation, whick acquiring a lock, and so on. Furthermore, the JVM's runtime could implement periodic thread preemption so that a CPU intensive thread doesn't monopolize the (single) core to the exclusion of other threads

Does every thread create their own instance of the jvm to handle their particular execution?
No, your application running in the JVM can have many threads that all exist within that instance of the JVM.
If not then does the jvm have to have some way to schedule which thread it will handle next...
Yes, the JVM has a thread scheduler. There are many different algorithms for thread scheduling, and which one is used is JVM-vendor dependent. (Scheduling in general is an interesting topic.)
...if so wouldn't this render the multi-threaded nature of java useless since only one thread can be ran at a time?
I'm not sure I understand this part of your question. This is kind of the point of threading. You typically have more threads than CPUs, and you want to run more than one thing at a time. Threading allows you to take full(er) advantage of your CPU by making sure it's busy processing one thread while another is waiting on I/O, or is for some other reason not busy.

A Java thread may be mapped one-to-one to a kernel thread. But this must not be so. There could be n kernel threads running m java threads, where m may be much larger than n, and n should be larger than the number of processors. The JVM itself starts the n kernel threads, and each one of them picks a java thread and runs it for a while, then switches to some other java thread. The operating system picks kernel threads and assigns them to a cpu. So there may be thread scheduling on several levels.
You may be interested to look at the GO programming language, where thousands of so called "Goroutines" are run by dozens of threads.

Java threads are mapped to native OS threads. They have little to do with the JVM itself.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.