Context
I have a metric on a server that publishes the amount of threads that I have at any given time. With a recent deployment, I have noticed the number of threads increase by about 30 threads on average (originally stagnated around 370, now at 400 threads consistently).
What I've done
There are many packages/possibilities that could be the root cause for this increase. This is why I looked into analyzing threads. I learned how to get and got a thread dump but I can't see any useful information for me on why these threads were created/how they are used.
My service is not impacted negatively (latency/CPU/Memory) but I would still like to root cause this issue as it could be grounds for a memory leak.
My Question
If there is some resource to be able to get the class/package that created the thread, that would be very helpful (I have searched online for a while for a resource like that).
Any advice to root cause this is much appreciated!
Find The Source Of Threads
Use Java Executors instead of dealing with Threads directly if you aren't already. If you are using Executors you can define names for the threads in your thread pool. These names are included in thread dumps of your java process. So if a thread has one of your custom defined names you will know which thread pool created it. It's common practice to maintain different thread pools for different types of tasks in your application and to give the threads in each pool different names. This way the you can determine how many threads each part of your application is creating when you take a thread dump. See this question for how to define custom thread names.
How To Limit Threads
The ThreadPoolExecutor allows you define the maximum number of
Threads in the pool. So if you know all the pools you are using and what their maximum sizes are you should be able to define the maximum number concurrent Threads that can run in your application.
You may have to pay special attention to server, client, and networking libraries though since they probably create their own threads. Libraries typically name their threads as well, so you can probably google an unfamiliar name or stack trace in your thread dump to figure out where it's coming from.
Related
Users of my application are suffering from OutOfMemoryErrors that is possibly caused by having too many threads. Application code is carefully reviewed. A thread pool of no more than 2x of CPU core number threads is used for most background tasks. Some modules have dedicated threads for their own use, but a number of such threads are fixed and very limited. However, the application usually has 130+ threads at run-time. In a particular crash report, I even see 400+ threads running on the user's device. Since I'm using some third-party libraries in the application, I would like to investigate their behaviors. Most of the running threads are named "pool-xx-thread-1", which is the default name generated by Executors.DefaultThreadFactory. It seems that some library code creates a lot of single-thread pools.
How can I locate the code that creates a lot of threads?
When a single user is accessing an application, multiple threads can be used, and they can run parallel if multiple cores are present. If only one processor exists, then threads will run one after another.
When multiple users are accessing an application, how are the threads handled?
I can talk from Java perspective, so your question is "when multiple users are accessing an application, how are the threads handled?".
The answer is it all depends on how you programmed it, if you are using some web/app container they provide thread pool mechanism where you can have more than one threads to server user reuqests, Per user there is one request initiated and which in turn is handled by one thread, so if there are 10 simultaneous users there will be 10 threads to handle the 10 requests simultaneously, now we do have Non-blocking IO now a days where the request processing can be off loaded to other threads so allowing less than 10 threads to handle 10 users.
Now if you want to know how exactly thread scheduling done around CPU core, it again depends on the OS. One thing common though 'thread is the basic unit of allocation to a CPU'. Start with green threads here, and you will understand it better.
The incorrect assuption is
If only one processor exists, then threads will run one after another.
How threads are being executed is up to the runtime environment.
With java there are some definitions that certain parts of your code will not be causing synchronisation with other threads and thus will not cause (potential) rescheduling of threads.
In general, the OS will be in charge of scheduling units-of-execution. In former days mostly such entities have been processes. Now there may by processes and threads (some do scheduling only at thread level). For simplicity let ssume OS is dealing with threads only.
The OS then may allow a thread to run until it reaches a point where it can't continue, e.g. waiting for an I/O operation to cpmplete. This is good for the thread as it can use CPU for max. This is bad for all the other threads that want to get some CPU cycles on their own. (In general there always will be more threads than available CPUs.So, the problem is independent of number of CPUs.) To improve interactive behaviour an OS might use time slices that allow a thread to run for a certain time. After the time slice is expired the thread is forcible removed from the CPU and the OS selects a new thread for being run (could even be the one just interrupted).
This will allow each thread to make some progress (adding some overhead for scheduling). This way, even on a single processor system, threads my (seem) to run in parallel.
So for the OS it is not at all important whether a set of thread is resulting from a single user (or even from a single call to a web application) or has been created by a number of users and web calls.
You need understand about thread scheduler.
In fact, in a single core, CPU divides its time among multiple threads (the process is not exactly sequential). In a multiple core, two (or more) threads can run simultaneously.
Read thread article in wikipedia.
I recommend Tanenbaum's OS book.
Tomcat uses Java multi-threading support to serve http requests.
To serve an http request tomcat starts a thread from the thread pool. Pool is maintained for efficiency as creation of thread is expensive.
Refer to java documentation about concurrency to read more https://docs.oracle.com/javase/tutorial/essential/concurrency/
Please see tomcat thread pool configuration for more information https://tomcat.apache.org/tomcat-8.0-doc/config/executor.html
There are two points to answer to your question : Thread Scheduling & Thread Communication
Thread Scheduling implementation is specific to Operating System. Programmer does not have any control in this regard except setting priority for a Thread.
Thread Communication is driven by program/programmer.
Assume that you have multiple processors and multiple threads. Multiple threads can run in parallel with multiple processors. But how the data is shared and accessed is specific to program.
You can run your threads in parallel Or you can wait for threads to complete the execution before proceeding further (join, invokeAll, CountDownLatch etc.). Programmer has full control over Thread life cycle management.
There is no difference if you have one user or several. Threads work depending the logic of your program. The processor runs every thread for a certain ammount of time and then follows to the next one. The time is very short, so if there are not too much threads (or different processes) working, the user won't notice it. If the processor uses a 20 ms unit, and there are 1000 threads, then every thread will have to wait for two seconds for its next turn. Fortunately, current processors, even with just one core, have two process units which can be used for parallel threads.
In "classic" implementations, all web requests arriving to the same port are first serviced by the same single thread. However as soon as request is received (Socket.accept returns), almost all servers would immediately fork or reuse another thread to complete the request. Some specialized single user servers and also some advanced next generation servers like Netty may not.
The simple (and common) approach would be to pick or reuse a new thread for the whole duration of the single web request (GET, POST, etc). After the request has been served, the thread likely will be reused for another request that may belong to the same or different user.
However it is fully possible to write the custom code for the server that binds and then reuses particular thread to the web request of the logged in user, or IP address. This may be difficult to scale. I think standard simple servers like Tomcat typically do not do this.
I just want to ask a rookie question: How to set an appropriate thread number for my thread pool on the server side?
Are there any general rules or formulas I can follow?
What are the issues I have to consider? For example, the number of network requests per second, the number of CPU cores, the CPU and memory usage rate in my application, the hardware I use on my server, etc.
Well, basically the size of the pool should be set to the the maximum possible of commands executed concurrently on your configuration, like if you have 4 cores (without HyperThreading), then you can set it to 4. With hyperthreading, you can set it to 8.
There are however questions like: what is the expected behaviour of the application, if it wants to get a thread from the pool, but the pool is empty (like you had 8 threads in the pool, every single one if them is working on a video encoding job in the next 10 minutes, and you get a new request in your manager thread).
You should consider however, that it is NOT guaranteed, that all your threads will run in every moment, even if your application handles threading exceptionally perfectly, as other applications are running on your computer meanwhile (your OS for example), and they need CPU as well.
On the other hand it is also a big question, that what does a thread do in your pool? You provided no informations about what is this thread pool used for, are they used in your own app, or you want to configure an open-source app/commercial app, etc. Creating and managing threads do have serious costs (scheduling, context switching, etc.), which may worth only if, the your threads stay alive long enough (you can provide enough job them to work on).
For further details, a quite good starting point in this subject could be Google I guess, for the following keywords: "scheduling, concurrency, threads, java executor service, hyperthreading".
I am involved in a project where multithreading is used. Around 4-5 threads are spawned for every call (the system was developed for a taxi call center). The issue here is, after reading the information in the JMS queue a new thread has to spawn which is not happening. This problem occurs randomly. I earlier posted similar question in StackOverflow where I was advised to do load injection.
After studying about load injection I felt that, it is not feasible to do a test in my development server, as my system will be accessed from a call flow which controls the user access. I spent some time studying about the JVM tuning and thread pooling. Approx this particular system process around 14K-15K calls/day and during peak hours it the queue will be very high (might hit 400-500 calls waiting in the queue) for each calls around 4-5 threads has to be spawned. From the logs I don't see any thing like on OutOfMemoryError. Is there any other reason which might stop spawning of thread?
My JVM conf is xms:128m Xmx:1024m
Environment is windows server 32bit, 4GB ram.
Will including the threadstacksize help spawning the thread without any hindrance?
I am also studying the feasibility of thread pooling. While spawning a fixed amount of threads I need to study whether it will impact the systems overall performance?
Creating a thread is a very expensive operation and uses a lot of system resources. Most importantly each thread needs a lot of memory for its stack (512 kB by default). If you excessively create new threads, you will run into all sorts of problems. A JVM can typically only support a couple of thousand of threads, depending on the operating system, the -XX:ThreadStackSize setting and the free memory.
Thread pooling will not make your performance worse, it will make it better. So you should definitely go that way. If your thread pool size is too small, you might have some liveness problems, but that is easy to tune.
Maybe changes in the architecture can help solve the problem - I'd try thread pooling because of its efficiency but alone it is not guaranteed to solve the problem. It is possible the you'll need to reconsider if all the spawned threads are really needed (having multiple threads competing for single resource is perf. impact) and tune the size of the pools. Look at Executor, it could help you with some changes.
Recently, I've been working on the deployment of concurrent objects onto multicore. In a sample, I use BlockingQueue.take() method whose specification mentions that it is blocking. It means that the method does not release the enclosing thread's resources such that it can be re-used for other concurrent tasks. This is useful since the total number of live threads in a JVM instance is limited and if the application would need thousands of live threads, then it is vital to be able to re-use suspended threads. On the other hand, JVM uses a 1:1 mapping from application-level threads to OS-level threads in Java; i.e. each Java Thread instance becomes an underlying OS-level thread.
The current solution is based on java.util.concurrency in Java 1.5+. Still, we need worker threads that are such scalable to a large number. Now, I am interested to find the following answers:
Is there any way to replace the implementation of java.lang.Thread in JVM such that I can plug my own Thread implementation?
Is this only possible through tweaking C++ sections of the thread implementation in JVM and recompiling it?
Is there any library to provide a way to replace the classical thread in Java?
Again, in the same line, is there a library or a way to guide how some threads in Java can be mapped to only one thread in the OS-level?
I also found this discussing different implementations of JVM and I am not sure if they could help.
Thanks for your comments and ideas in advance.
If you are creating thousands of threads, you're doing it wrong.
Instead, consider using the Executor framework. (Start with the Executors and ThreadPoolExecutor classes.) They allow you to queue thousands of tasks while having a sane number of threads handling them.
I guess this approach is what you meant by "library to replace the classical threads". I highly recommend you look into executors.
One caveat: Executors, by default, use non-daemon threads. Therefore, you must shut down your executor when you're done with it. You can do this at program exit, if there is a normal way to exit your program that doesn't simply involve waiting for all threads to finish. :-)