Same Jetty code, different number of threads

Same Jetty code, different number of threads - java

We are running same Jetty service on two servers but are seeing different number of threads created by both services (50 vs ~100 threads).
Both servers are running identical Java code on RedHat5 (they do have slightly different kernels). Yet Jetty on one of the servers creates more threads than the other one. How is it possible?

Thread counts are dynamic, depends on many many factors.
The number of threads that you see at any one point can vary greatly, based on hardware differences (number of cpu cores, number of network interfaces, etc), kernel differences, java differences, load differences, active user counts, active connection counts, transactions per second, if there are external dependencies (like databases), how async processing is done, how async I/O is done, use of http/2 vs http/1, use of websocket, and even ${jetty.base} configuration differences.
As for the counts you are seeing, 50 vs 100, that's positively tiny for a production server. Many production servers on moderately busy systems can use 500 (java) threads, and on very busy commodity systems its can be in the 5,000+ range. Even on specialized hardware (like an Azul systems devices) its not unheard of to be in the 90,000+ thread range with multiple active network interfaces.

Related

A confusion over a concurrent architecture concept

I have basic idea about concurrency but I have a confusion about the following architecture. I think it is concurrent but my colleague thinks it is not. The architecture is as follows:
I have multiple robots which publish its data to its individual gateways and there's another java service which listens on the gateways. The service creates a new thread to listen to each gateway.
My understanding is that the service is performing concurrent execution but my colleague says this is not concurrent as concurrency involves sharing of hardware.
Appreciate if some one can clarify or elaborate on this topic.

My understanding is that the service is performing concurrent execution but my colleague says this is not concurrent as concurrency involves sharing of hardware.
TL/DR: Words are squishy. That's why we have code.
"Concurrent" simply means two or more things happening at the same time. As it applies to computation, true concurrency means two or more threads of execution running at the same time, which requires separate hardware. That certainly can be separate cores of the same CPU or separate CPUs in the same chassis, so that there is some degree of shared hardware. It can also be separate cores in different chassis, however, such as in a computational cluster, though perhaps this is where your colleague is drawing his line. Such a line would be pretty arbitrary, though.
In contrast, long before it was common for even servers to feature multiple CPU (core)s, many computer systems implemented one flavor or another of multitasking, whereby multiple tasks can all be in progress at the same time by virtue of the operating system allotting slices of CPU time to each and switching them in and out. All modern general-purpose operating systems still do this. On a single core, however, this provides only simulated concurrency, because at any given instant in time, only one computation is actually making progress.
Your colleague does have a point, however, that multiple, spatially distributed robots all operating at the same time without coordination is a bit beyond what people usually mean when they talk about concurrent computation. Certainly such robots are operating concurrently, in the general-use sense of "at the same time", but it's a bit of a stretch to characterize them as participating in a concurrent computation.
The server that allocates a separate thread to handle communication with each robot may thereby be performing a concurrent computation. But as long as we're splitting hairs, do recognize that communication over a single network interface is serialized, so unless your server has multiple network interfaces, the actual communication cannot be truly concurrent. If the server is primarily just recording the data as it arrives, as opposed to incorporating it into an ongoing concurrent computation, then it would be potentially misleading to describe it as performing a concurrent operation.

Even by your colleague's definition, this is a concurrent system since there are multiple threads executing on the hardware on which the service resides.

Threading configuration for microservices written in java/jersey/grizzly

I am designing a micro-services based system. Most of the services are deployed as standalone Jersey processes with an embedded Grizzly web server.
Assuming that many of those services will execute on the same machine, shall I change any threading configuration in Grizzly to prevent a situation of too many threads machine-wide?
What is the default threading model for Grizzly? Is there a limit for number of threads that a single web server can create?

It depends on what you do with the incoming data.
If you need to process the data (cpu time > io time), then you need to match the number of physical cores to the number of data processing threads.
If most of the time is spent in IO (retrieving/storing the data) then you can start with cores * 2 and set the max to something that you must determine through testing the cpu usage and the throughput. I personally like the powers of 4 per core (4, 16, 64, 256). This will quickly narrow you down down onto the order of magnitude.
https://javaee.github.io/grizzly/coreconfig.html#/Thread_Pool_Configuration

How does a single machine share thread-pool?

I noticed that some web frameworks such as Play Framework allows you to configure multiple thread-pools with different sizes (num of threads within it). Let's say we run this play within a single machine with single core. Wouldn't there be a huge overhead by having multiple thread-pools?
For example, smaller thread pool is assuming asynchronous operations vs large thread-pool indicate a lot of blocking calls so threads can context-switch. Both cases is assuming that parallelism factor based on number of cores are in machine. My concern is that processor is further shared.
How does this work?
Thanks!

Play certainly allows you to configure multiple execution contexts (the equivalent of a thread pool), but that does not mean that you should do it, especially if you have a machine with a single core. By default the configuration should be kept low (close to the number of cores) for high-throughput - assuming, of course, that the operations are all non-blocking. If you have blocking operations the idea is to have them run on a separate execution context, as they otherwise lead to the exhaustion of the default request processing ExecutionContext (the request processing pipeline in Play runs on the default ExecutionContext, which is by default limited to a small number of threads).
As to what happens when you have more threads than cores and what happens when you do so highly depends on the operations you're running (in regards to I/O, etc.). One thread per core is supposedly optimal if you only do CPU-bound operations. See also this question.

Java in 2011: threaded sockets VS NIO: what to choose on 64bit OS and latest Java version?

I've read several posts about java.net vs java.nio here on StackOverflow and on some blogs. But I still cannot catch an idea of when should one prefer NIO over threaded sockets. Can you please examine my conclusions below and tell me which ones are incorrect and which ones are missed?
Since in threaded model you need to dedicate a thread to each active connection and each thread takes like 250Kilobytes of memory for it's stack, with thread per socket model you will quickly run out of memory on large number of concurrent connections. Unlike NIO.
In modern operating systems and processors a large number of active threads and context switch time can be considered almost insignificant for performance
NIO throughoutput can be lower because select() and poll() used by asynchronous NIO libraries in high-load environments is more expensive than waking up and putting to sleep threads.
NIO has always been slower but it allows you to process more concurrent connections. It's essentially a time/space trade-off: traditional IO is faster but has a heavier memory footprint, NIO is slower but uses less resources.
Java has a hard limit per concurrent threads of 15000 / 30000 depending on JVM and this will limit thread per connection model to this number of concurrent connections maximum, but JVM7 will have no such limit (cannot confirm this data).
So, as a conclusion, you can have this:
If you have tens of thousands concurrent connections - NIO is a better choice unless a request processing speed is a key factor for you
If you have less than that - thread per connection is a better choice (given that you can afford amount of RAM to hold stacks of all concurrent threads up to maximum)
With Java 7 you may want to go over NIO 2.0 in either case.
Am I correct?

That seems right to me, except for the part about Java limiting the number of threads – that is typically limited by the OS it's running on (see How many threads can a Java VM support? and Can't get past 2542 Threads in Java on 4GB iMac OSX 10.6.3 Snow Leopard (32bit)).
To reach that many threads you'll probably need to adjust the stack size of the JVM.

I still think the context switch overhead for the threads in traditional IO is significant. At a high level, you only gain performance using multiple threads if they won't contend for the same resources as much, or they spend time much higher than the context switch overhead on the resources.
The reason for bringing this up, is with new storage technologies like SSD, your threads come back to contend on the CPU much quicker

There is not a single "best" way to build NIO servers, but the preponderance of this particular question on SO suggests that people think there is! Your question summarizes the use cases that are suited to both options well enough to help you make the decision that is right for you.
Also, hybrid solutions are possible too! You could hand the channel off to threads when they are going to do something worthy of their expense, and stick to NIO when it is better.

I would say start with thread-per-connection and adapt from there if you run into problems.
If you really need to handle a million connections you should consider writing (or finding) a simple request broker in C (or whatever) that will use far less memory per connection than any java implementation can. The broker can receive requests asynchronously and queue them to backend workers written in your language of choice.
The backends thus only need a thread per active request, and you can just have a fixed number of them so the memory and database use is predetermined to some degree. When large numbers of requests are running in parallel the requests are made to wait a bit longer.
Thus I think you should never have to resort to NIO select channels or asynchronous I/O (NIO 2) on 64-bit systems. The thread-per-connection model works well enough and you can do your scaling to "tens or hundreds of thousands" of connections using some more appropriate low-level technology.
It is always helpful to avoid premature optimization (i.e. writing NIO code before you really have massive numbers of connections coming in) and don't reinvent the wheel (Jetty, nginx, etc.) if possible.

What most often is overlooked is that NIO allows zero copy handling. E.g. if you listen to the same multicast traffic from within multiple processes using old school sockets on one single server, any multicast packet is copied from the network/kernel buffer to each listening application. So if you build a GRID of e.g. 20 processes, you get memory bandwidth issues. With nio you can examine the incoming buffer without having to copy it to application space. The process then copies only parts of the incoming traffic it is interested in.
another application example:
see http://www.ibm.com/developerworks/java/library/j-zerocopy/ for an example.

Why do we use multi application server instances on the same server

I guess there is a good reason, but I don't understand why sometimes we put for example 5 instances having the same webapplications on the same physical server.
Has it something to do with an optimisation for a multi processor architecture?
The max allowed ram limit for JVM or something else?

Hmmm... After a long time I am seeing this question again :)
Well a multiple JVM instances on a single machine solves a lot of issues. First of let us face this: Although JDK 1.7 is coming into picture, a lot of legacy application were developed using JDK 1.3 or 1.4 or 1.5. And still a major chunk of JDK is divided among them.
Now to your question:
Historically, there are three primary issues that system architects have addressed by deploying multiple JVMs on a single box:
Garbage collection inefficiencies: As heap sizes grow, garbage collection cycles--especially for major collections--tended to introduce significant delays into processing, thanks to the single-threaded GC. Multiple JVMs combat this by allowing smaller heap sizes in general and enabling some measure of concurrency during GC cycles (e.g., with four nodes, when one goes into GC, you still have three others actively processing).
Resource utilization: Older JVMs were unable to scale efficiently past four CPUs or so. The answer? Run a separate JVM for every 2 CPUs in the box (mileage may vary depending on the application, of course).
64-bit issues: Older JVMs were unable to allocate heap sizes beyond the 32-bit maximum. Again, multiple JVMs allow you to maximize your resource utilization.
Availability: One final reason that people sometimes run multiple JVMs on a single box is for availability. While it's true that this practice doesn't address hardware failures, it does address a failure in a single instance of an application server.
Taken from ( http://www.theserverside.com/discussions/thread.tss?thread_id=20044 )
I have mostly seen weblogic. Here is a link for further reading:
http://download.oracle.com/docs/cd/E13222_01/wls/docs92/perform/WLSTuning.html#wp1104298
Hope this will help you.

I guess you are referring to application clustering.
AFAIK, JVM's spawned with really large heap size have issues when it comes to garbage collection though I'm sure by playing around with the GC algorithm and parameters you can bring down the damage to a minimum. Plus, clustered applications don't have a single point of failure. If one node goes down, the remaining nodes can keep servicing the clients. This is one of the reasons why "message based architectures" are a good fit for scalability. Each request is mapped to a message which can then be picked up by any node in a cluster.
Another point would be to service multiple requests simultaneously in case your application unfortunately uses synchronized keyword judiciously. We currently have a legacy application which has a lot of shared state (unfortunately) and hence concurrent request handling is done by spawning around 20 JVM processes with a central dispatching unit which does all the dispatching work. ;-)

I would suggest you use around least JVM per NUMA region. If a single JVM uses more than one NUMA region (often a single CPU) the performance can degrade significantly, due to a significant increase in the cost of accessing main memory of another CPU.
Additionally using multiple servers can allow you to
use different versions of java or your your applications server.
isolate different applications which could interfere (they shouldn't but they might)
limit GC pause times between services.
EDIT: It could be historical. There may have been any number of reasons to have separate JVMs in the past but since you don't know what they were, you don't know if they still apply and it may be simpler to leave things as they are.

An additional reason to use mutliple instance is serviceability.
For example if you multiple different applications for multiple customers then having seperate instances of the appserver for each application can make life a little easier when you have to do an appserver restart during a release.

Suppose you have a average configuration host and installed single instance of the web/app server. Now your application becomes more popular and number of hits increases 2 fold. What you do now ?
Add one more physical server of same configuration and instal the application and load balance these two hosts.
This is not end of life for your application. Your application will keep on becoming more popular and hence the need to scale it up. What's going to be your strategy ?
keep adding more hosts of same configuration
buy a more powerful machine where you can create more logical application servers
Which option will you go far ?
You will do cost analysis, which will involve factors like- actual hardware cost, Cost of managing these servers (power cost, space occupied in data center) etc.
Apparently, it comes that the decision is not very easy. And in most cases it's more cost effective to have a more powerful machine.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.