I am implementing a worker pool in Java.
This is essentially a whole load of objects which will pick up chunks of data, process the data and then store the result. Because of IO latency there will be significantly more workers than processor cores.
The server is dedicated to this task and I want to wring the maximum performance out of the hardware (but no I don't want to implement it in C++).
The simplest implementation would be to have a single Java process which creates and monitors a number of worker threads. An alternative would be to run a Java process for each worker.
Assuming for arguments sake a quadcore Linux server which of these solutions would you anticipate being more performant and why?
You can assume the workers never need to communicate with one another.
One process, multiple threads - for a few reasons.
When context-switching between jobs, it's cheaper on some processors to switch between threads than between processes. This is especially important in this kind of I/O-bound case with more workers than cores. The more work you do between getting I/O blocked, the less important this is. Good buffering will pay for threads or processes, though.
When switching between threads in the same JVM, at least some Linux implementations (x86, in particular) don't need to flush cache. See Tsuna's blog. Cache pollution between threads will be minimized, since they can share the program cache, are performing the same task, and are sharing the same copy of the code. We're talking savings on the order of 100's of nanoseconds to several microseconds per switch. If that's small potatoes for you, then read on...
Depending on the design, the I/O data path may be shorter for one process.
The startup and warmup time for a thread is generally much shorter. The OS doesn't have to start a process, Java doesn't have to start another JVM, classloading is only done once, JIT-compilation is only done once, and HotSpot optimizations are done once, and sooner.
Well usually, when discussing multi processing (/w one thread per process) versus multi threading in the same process, while the theoretical overhead is bigger in the first case than in the latter (and thus multi processing is theoretically slower than multi threading), in reality on most modern OSs this is not such a big issue. However when discussing it in the Java context, starting a new process is a lot more costly then starting a new thread. Starting a new process means starting up a new instance of the JVM which is very costly especially in terms of memory. I recommend that you start multiple threads in the same JVM.
Moreover, if you say inter-thread communication is not an issue, you can use Java's Executor Service to get a fixed thread pool of size 2x(number of available CPUs). The number of available CPU's can be autodetected at runtime via Java's Runtime class. This way you get a quick simple multithreading going without any boiler plate code.
Actually, if you do this with large scale taks using multiple jvm process is way faster than one jvm with multple threads. At least we never got one jvm runnning as fast as multple jvms.
We do some calculations where each task uses around 2-3GB ram and does some heavy number crunching. If we spawn 30 jvm's and run 30 task they perform around 15-20% better than spawning 30 threads in one jvm. We tried tuning the gc and the various memory sections and never catched up to the first variant.
We did this on various machines 14 tasks on a 16 core server, 34 tasks on a 36 core server etc. Multithreading in java always performed worde than multiple jvm processes.
It may not make any difference on simple tasks but on heavy calculations it seems jvm performce bad on threads.
Related
I am new to multithreading in Java, after looking at Java virtual machine - maximum number of threads it would appear there isn't a limit to how many threads a Java/Android app can run. However, is there an advisable limit? What I mean by this is, is there a number of threads where if you run past this number then it is unwise because you are unable to determine what thread does what at what time? I hope my question makes sense.
There are some advisable limits, however they don't really have anything to do with keeping track of them.
Most multithreading comes with locking. If you are using central data storage or global mutable state then the more threads you have, the more lock contention you will get. This is app-specific and depends on how much of said state you have and how often threads read and write it.
There are no limits in desktop JVMs by default, but there are OS limits.It should be in the tens of thousands for modern Windows machines, but don't rely on the ability to create much more than that.
Running multiple tasks in parallel is great, but the hardware can only cope with so much. If you are using small threads that get fired up sometimes, and spend most their time idle, that's no biggie (Java servers were written like this for years). However if your threads are very intensive, making more of them than the number of cores you have is not likely to give you any benefit. (I believe the standard practice is twice the number of cores if you anticipate threads going idle sometimes).
Threads have a cost to them. Whenever you switch Threads you switch context, and while it isn't that expensive, doing it constantly will hurt performance. It's not a good idea to create a Thread to sum up two integers and write back a result.
If Threads need visibility of each others state, then they are greatly slowed down, since a lot of their writes have to be written back to main memory. Threads are best used for standalone tasks that require little interaction with each other.
TL;DR
Depends on OS and Hardware: on servers creating thousands of threads is fine, on desktop machines you should limit yourself to 50-200 and choose carefully what you do with them.
Note: Androids default and suggested "UI multithread helper" - the AsyncTask is not actually a thread. It's a task invoked from a ThreadPool, and as such there is no limit or penalty to using it. It has an upper limit on the number of threads it spawns and reuses them rather than creating new ones. Most Android apps should use it instead of spawning their own threads. In general, Thread Pools are fairly widespread and are a great choice unless you are forced into blocking operations.
Based on the question on Linux, this is effective way to hogging the CPU until 2.6.38. How about JVM? Assume we have implemented the lock free algorithm, all these threads are totally independent from each other. Will more threads help us to gain more CPU time from the system?
The short answer is yes. More processes will also result in getting more CPU time.
The default assumption of a typical scheduler on a modern operating system is that anything that asks for CPU time intends to use the CPU to make useful forward progress and it's generally more important to make as much forward progress as possible than to be "fair". If you have some notion of fairness that's important to your particular workload, you can specifically configure it in most operating systems.
More threads will use more cpu time. However, you also get much more overhead and you can end up getting less useful work done. For a cpu bound process where your threads can work independently, the optimal number of threads can be number of cpus you have, rarely more. For system resource limited processes, the optimal number can be one. For external to the system limited processes, you can actually gain by having more threads than cpus but it would be a mistake to assume this is always the case.
in short, is your goal to burn cpu or is it to get something done.
Assume we have implemented the lock free algorithm, all these threads are totally independent from each other. Will more threads help us to gain more CPU time from the system?
Not exactly sure what you are asking. The JVM can certainly go above 100% and take over more than a single CPU if your threads use a lot of CPU. IO bound applications, regardless of the number of threads, might spike over 100% but never do it sustained. If you program is waiting on web connections or reading and writing to the disk, they may max out the IO chain and not run much concurrent processing.
When the JVM forks a thread (depending on the arch) it works with the OS to allocate an OS thread. In linux these are called clones and you can see them in the process table with the right args to ps. If you fork 100 threads in your JVM, there are 100 corresponding clones in the linux kernel. If all of them are spinning (i.e. not waiting on some system resource) then the OS will give them all time slices of available CPU depending on priority and competition with other applications and system processes.
I have a java application that creates SSL socket with remote hosts. I want to employ threads to hasten the process.
I want the maximum possible utilization that does not affect the program performance. How can I decide the suitable number of threads to use? After running the following line: Runtime.getRuntime().availableProcessors(); I got 4. My processor is an Intel core i7, with 8 GB RAM.
If you have 4 cores, then in theory you should have exactly four worker threads going at any given time for maximum optimization. Unfortunately what happens in theory never happens in practice. You may have worker threads who, for whatever reason, have significant amounts of downtime. Perhaps they're hitting the web for more data, or reading from a disk, much of which is just waiting and not utilizing the cpu.
Depending on how much waiting you're doing, you'll want to bump up the number. The costs to increasing the number of threads is that you'll have more context switching and competition for resources. The benefits are that you'll have another thread ready to work in case one of the other threads decides it has to break for something.
Your best bet is to set it to something (let's start with 4) and work your way up. Profile your code with each setting, and see if your benchmarks go up or down. You should soon see a pattern and a break-even point.
When it comes to optimization, you can theorize all you want about what should be the fastest, but you won't beat actually running and timing your code to truly answer this question.
As DarthVader said, you can use a ThreadPool (CachedThreadPool). With this construct you don't have to specify a concrete number of threads.
From the oracle site:
The newCachedThreadPool method creates an executor with an expandable thread pool. This executor is suitable for applications that launch many short-lived tasks.
Maybe thats what you are looking for.
About the number of cores is hard to say. You have 4 hyperthreading cores, at least one core you should leave for your OS. i would say 4-6 Threads.
Since JVM creates only one process initially, does creating multiple threads in this process boost CPU performance assuming you have multiple CPU processors? This is because since all the threads inside the same process share the resources of the process. So, technically the execution is sequential?
In other words, unless you create two or more processes and associate threads to each of them, you cant avail the full benefit of parallel execution in multiple CPU processors?
Yes, distributing the workload over multiple threads can boost the performance of your program. It also increases the responsiveness.
However there is an increased overhead due to communication and synchronization. Also, not all algorithms are able to be parallelized.
Actually I'm using java to monitor the cpu usage for a certain java process.Here are my questions:
First,is there a limit that a single process can only consume cpu processing time on 1 or limited cpu cores?Or it can use cpu time on each of the cpu core?
Second,if I want to monitor the cpu usage of a certain java process for each cpu core,how can I do that?
And I prefer to handle it using pure java,not native method.
To the operating system, a single thread (which I assume is what you mean by "Java process") essentially cannot use CPU on more than one "processor" (which may or may not mean a physical core-- see below) simultaneously.
Generally, whenever a given thread gets a "turn at running", Windows (and I assume other operating systems) will attempt to schedule a given thread on to the same "processor" that it last ran on.
However, the situation is complicated by hyperthreading CPUs which actually present to the operating system several "processors" for what is actually a single core physical core. In this case, it is actually the CPU itself that switches between what instruction of what thread is running on which component of the given core at any one time. (Because, e.g. the core's arithmetic unit could be performing an arithmetic instruction for Thread 1 while the load/store unit is fetching data from memory for an instruction for Thread 2, etc.)
So given the complexity of this situation, even if you can get per-core measurements, I'm not entirely sure quite what useful meaning you would attach to them
P.S. If you'll permit the plug, I don't know if this Java-focussed article on thread scheduling that I wrote a couple of years ago might be useful. I should say I wrote it before either Windows 7 or the latest Intel Core CPUs were released, and there may be one or two updates to the information that would be pertinent (in particular, I don't address the issue of variable core speeds and how that could affect scheduling).