I'm using a thread pool that should be able to execute hundreds of concurrent tasks. However the tasks usually do very little computation and spend most of their time waiting on some server response. So if the thread pool size contains hundreds of threads just a few of them will be active while most of them will be waiting.
I know in general this is not a good practice for thread pools usage but the current design does not permit making my tasks asynchronous so that they can return the control without waiting for the server's response. So given this limitation I guess my biggest problem is increased memory consumption for the threads stack space.
So is there any way to use some kind of light-weight threads that does not consume much memory?
I now there's a JVM option -Xss to control the stacks memory but it's seems there's no way to control this per thread pool or thread only as opposed to changing it for all the threads inside the VM, right?
Also do you have any suggestions for a better solution to my problem?
I know in general this is not a good practice for thread pools usage
I disagree. I think this is a perfect practice. Are you seeing problems with this approach, because otherwise, switching from standard threads, smacks of premature optimization to me.
So is there any way to use some kind of light-weight threads that does not consume much memory?
I think you are already there. Threads are relatively lightweight already and I see no reason to worry about hundreds of them unless you are working in a very constrained JVM.
Also do you have any suggestions for a better solution to my problem?
Any solution that I see would be a lot more complicated and would again be the definition of premature optimization. For example, you could use NIO and do your own scheduling of the thread when the server response was available but this is the sort of thing that you get for free with threads.
So is there any way to use some kind of light-weight threads that does not consume much memory?
Using plain threads in a thread pool is likely to be light weight enough.
I now there's a JVM option -Xss to control the stacks memory but it's seems there's no way to control this per thread pool or thread only as opposed to changing it for all the threads inside the VM, right?
This is the maximum size per thread. Its the size at which you want to get a StackOverFlowError rather than keep running. IMHO, There is little benefit in tuning this on a per thread basis.
The thread stack uses main memory for the portion which is actually used and virtual memory for the rest. Virtual memory is cheap if you have a 64-bit JVM. If this is a concern I would switch to 64-bit.
Also do you have any suggestions for a better solution to my problem?
If you have thousands of threads you might consider using non-blocking IO. It doesn't sound like you need to worry. In tests I have done, having 10,000 active threads consumes one CPU (if the threads are otherwise not doing anything) For every hundred threads, you could be wasting 1% of one CPU. This is unlikely to be a problem if you have spare CPU.
Related
While leveraging Java's blocking sockets where I intend to both read and write independently, I see that my two options are either to dedicate a separate thread for each operation or to poll on a timeout using setSoTimeout().
Making a choice between the two implementations appears to be a trade-off of memory (threads) versus CPU time (polling).
I see scaling issues that may occur with regards to scheduler and context switching of many threads which may outweigh the CPU time spent polling as well as latency which may occur between reading and writing from a single thread depending on the size of packets received. Alternatively, a small pool of threads could be used for scale in combination with the polling of several sockets if tuned appropriately.
With the exception of Java's NIO as an alternative, which is outside the scope of this question, am I correctly understanding the options available to me for working with blocking sockets?
First of all, I think you have excluded the only option that will scale; i.e. using NIO.
Neither per-socket threads or polling will scale.
In the thread case, you will need two threads per socket. (A thread pool doesn't work.) That consumes space for the thread stacks and Thread objects, kernel resources for native thread descriptors. Then there are secondary effects such as context switching, extra GC tracing, and so on.
In the polling case, you need to make regular syscalls for each socket. Whether you do this with one thread or a small pool of threads, the number of syscalls is the same. If you poll more frequently, the syscall rate increases. If you poll less frequently, your system becomes less responsive.
AFAIK, there are no other options, given the restrictions that you have set.
Now if you are trying to figure out which of threads and polling is better, the answer will be, "it depends". There are lots of variables:
The amount of spare physical memory and spare CPU cycles.
The number of sockets.
The relative activity of the sockets.
Requirements for responsiveness.
What else is going on in the JVM (e.g. to trigger GCs)
What else is going on outside of the JVM.
Operating system performance characteristics.
These all add up to a complicated scenario which would probably be too difficult to analyze mathematically, too difficult to simulate (accurately) and difficult to measure empirically.
Not quite.
For one, reading with a timeout is no more expensive than reading without one. In either case, the thread goes to sleep after telling the OS to wake it if there is data for it. If you have a timeout, it additionally tells the OS to wake it after the specified delay. No cpu cycles are wasted waiting.
For another, context switching overhead in on the order of a couple thousand cpu cycles, so about a few micro seconds. Delay in network communication is > 1ms. Until this overhead brings a server to its knees, you can probably serve thousands of concurrent connections.
I have read a lot about how Play's non-blocking approach works. Ideally, incoming requests are supposed to be blazing fast, off-loading heavy-duty work to a worker thread, and returning to the pool for servicing more requests. That's one of the reasons that initially the pool of application threads is very close to the number of available cores on the machine's CPU. Numerous sources even warn against tinkering with the pool at all.
Let's face it though. In reality, not all part of the codebase can be made purely non-blocking. In fact, the bigger my application became, the more it started resembling a traditional Java-based server app. The alternative to making everything non-blocking and going around commodity JVM frameworks is just too costly.
I started questioning Play's original proposition. The question is should I really increase the default thread pool? Would this have any unexpected consequences (besides, of course, increasing the memory footprint)?
Based on the question on Linux, this is effective way to hogging the CPU until 2.6.38. How about JVM? Assume we have implemented the lock free algorithm, all these threads are totally independent from each other. Will more threads help us to gain more CPU time from the system?
The short answer is yes. More processes will also result in getting more CPU time.
The default assumption of a typical scheduler on a modern operating system is that anything that asks for CPU time intends to use the CPU to make useful forward progress and it's generally more important to make as much forward progress as possible than to be "fair". If you have some notion of fairness that's important to your particular workload, you can specifically configure it in most operating systems.
More threads will use more cpu time. However, you also get much more overhead and you can end up getting less useful work done. For a cpu bound process where your threads can work independently, the optimal number of threads can be number of cpus you have, rarely more. For system resource limited processes, the optimal number can be one. For external to the system limited processes, you can actually gain by having more threads than cpus but it would be a mistake to assume this is always the case.
in short, is your goal to burn cpu or is it to get something done.
Assume we have implemented the lock free algorithm, all these threads are totally independent from each other. Will more threads help us to gain more CPU time from the system?
Not exactly sure what you are asking. The JVM can certainly go above 100% and take over more than a single CPU if your threads use a lot of CPU. IO bound applications, regardless of the number of threads, might spike over 100% but never do it sustained. If you program is waiting on web connections or reading and writing to the disk, they may max out the IO chain and not run much concurrent processing.
When the JVM forks a thread (depending on the arch) it works with the OS to allocate an OS thread. In linux these are called clones and you can see them in the process table with the right args to ps. If you fork 100 threads in your JVM, there are 100 corresponding clones in the linux kernel. If all of them are spinning (i.e. not waiting on some system resource) then the OS will give them all time slices of available CPU depending on priority and competition with other applications and system processes.
I am involved in a project where multithreading is used. Around 4-5 threads are spawned for every call (the system was developed for a taxi call center). The issue here is, after reading the information in the JMS queue a new thread has to spawn which is not happening. This problem occurs randomly. I earlier posted similar question in StackOverflow where I was advised to do load injection.
After studying about load injection I felt that, it is not feasible to do a test in my development server, as my system will be accessed from a call flow which controls the user access. I spent some time studying about the JVM tuning and thread pooling. Approx this particular system process around 14K-15K calls/day and during peak hours it the queue will be very high (might hit 400-500 calls waiting in the queue) for each calls around 4-5 threads has to be spawned. From the logs I don't see any thing like on OutOfMemoryError. Is there any other reason which might stop spawning of thread?
My JVM conf is xms:128m Xmx:1024m
Environment is windows server 32bit, 4GB ram.
Will including the threadstacksize help spawning the thread without any hindrance?
I am also studying the feasibility of thread pooling. While spawning a fixed amount of threads I need to study whether it will impact the systems overall performance?
Creating a thread is a very expensive operation and uses a lot of system resources. Most importantly each thread needs a lot of memory for its stack (512 kB by default). If you excessively create new threads, you will run into all sorts of problems. A JVM can typically only support a couple of thousand of threads, depending on the operating system, the -XX:ThreadStackSize setting and the free memory.
Thread pooling will not make your performance worse, it will make it better. So you should definitely go that way. If your thread pool size is too small, you might have some liveness problems, but that is easy to tune.
Maybe changes in the architecture can help solve the problem - I'd try thread pooling because of its efficiency but alone it is not guaranteed to solve the problem. It is possible the you'll need to reconsider if all the spawned threads are really needed (having multiple threads competing for single resource is perf. impact) and tune the size of the pools. Look at Executor, it could help you with some changes.
I am implementing a worker pool in Java.
This is essentially a whole load of objects which will pick up chunks of data, process the data and then store the result. Because of IO latency there will be significantly more workers than processor cores.
The server is dedicated to this task and I want to wring the maximum performance out of the hardware (but no I don't want to implement it in C++).
The simplest implementation would be to have a single Java process which creates and monitors a number of worker threads. An alternative would be to run a Java process for each worker.
Assuming for arguments sake a quadcore Linux server which of these solutions would you anticipate being more performant and why?
You can assume the workers never need to communicate with one another.
One process, multiple threads - for a few reasons.
When context-switching between jobs, it's cheaper on some processors to switch between threads than between processes. This is especially important in this kind of I/O-bound case with more workers than cores. The more work you do between getting I/O blocked, the less important this is. Good buffering will pay for threads or processes, though.
When switching between threads in the same JVM, at least some Linux implementations (x86, in particular) don't need to flush cache. See Tsuna's blog. Cache pollution between threads will be minimized, since they can share the program cache, are performing the same task, and are sharing the same copy of the code. We're talking savings on the order of 100's of nanoseconds to several microseconds per switch. If that's small potatoes for you, then read on...
Depending on the design, the I/O data path may be shorter for one process.
The startup and warmup time for a thread is generally much shorter. The OS doesn't have to start a process, Java doesn't have to start another JVM, classloading is only done once, JIT-compilation is only done once, and HotSpot optimizations are done once, and sooner.
Well usually, when discussing multi processing (/w one thread per process) versus multi threading in the same process, while the theoretical overhead is bigger in the first case than in the latter (and thus multi processing is theoretically slower than multi threading), in reality on most modern OSs this is not such a big issue. However when discussing it in the Java context, starting a new process is a lot more costly then starting a new thread. Starting a new process means starting up a new instance of the JVM which is very costly especially in terms of memory. I recommend that you start multiple threads in the same JVM.
Moreover, if you say inter-thread communication is not an issue, you can use Java's Executor Service to get a fixed thread pool of size 2x(number of available CPUs). The number of available CPU's can be autodetected at runtime via Java's Runtime class. This way you get a quick simple multithreading going without any boiler plate code.
Actually, if you do this with large scale taks using multiple jvm process is way faster than one jvm with multple threads. At least we never got one jvm runnning as fast as multple jvms.
We do some calculations where each task uses around 2-3GB ram and does some heavy number crunching. If we spawn 30 jvm's and run 30 task they perform around 15-20% better than spawning 30 threads in one jvm. We tried tuning the gc and the various memory sections and never catched up to the first variant.
We did this on various machines 14 tasks on a 16 core server, 34 tasks on a 36 core server etc. Multithreading in java always performed worde than multiple jvm processes.
It may not make any difference on simple tasks but on heavy calculations it seems jvm performce bad on threads.