Guideline to Configure max threads in Play framework

Guideline to Configure max threads in Play framework - java

We use Playframework 1.x.
We haven't touched thread pool size and we use the default value which (nb processors + 1). Our production server has 4 core processor and I assume 5 threads at a time.
For use we need atleast 100 threads to be served at a time. Can we increase the thread pool size to 100, Will it make any issues?

In my project, we use about 30 thread pool to serve about 100 concurrent. Play 1.x works very fast so the threads can be released before next request to process.
But you should make load test your code... I think it's not good if you increase thread pool to 100.
By the way, you should use async job to implement your application as Play recommended: http://www.playframework.com/documentation/1.2.7/asynchronous

Play is build around the idea of handling short requests as fast as possible and therefor being able to keep the thread pool as small as possible. The main reasons for wanting a small pool are to keep resources consumption low instead of wasting.
Play and Java can happily run with a higher thread pool, like 100 or 1000 (although your server might not always support it, some Linux distributions for example have a fixed limit of threads per application per user), but it is recommended to analyze your problem and see if you really need that big pool.
In most situations, needing a big pool means that you have to many blocking threads and should look into Play's async features or that you have an action that tries to do to many things at once, that would perform better when chopped into smaller pieces.
If a request results in a long blocking thread on the server, this usually means it also results in a long, blocked interface on the users end.

Related

Scaling a Spring Integration web application

There is this stateless REST application/API written and being maintained by me using Spring Integration API with the following underlying concepts working hand-in-hand:
1) Inbound HTTP gateway as the RESTful entrypoint
2) A handful of Service Activators, Routers, Channels and Transformers
3) A Splitter (and an Aggregator) with the former subscribed to a channel which in turn, has a task executor wired-in comprising a thread pool of size 100 for parallelised execution of the split(ted) messages
The application is performing seamlessly so far - as the next step, my attempt is to scale this application to handle a higher number of requests in order to accommodate a worst case situation where all 100 threads in the pool are occupied at the exact same time.
Please note that the behaviour of the service is always meant to be synchronous (this is a business need) and there are times when the service can be a slightly long-running one. The worst-case roundtrip is ~15 seconds and the best case is ~2 seconds, both of which are within acceptable limits for the business team.
The application server at hand is WebSphere 8.5 in a multi-instance clustered environment and there is a provision to grow the size of the cluster as well as the power of each instance in terms of memory and processor cores.
That said, I am exploring ways to solve the problem of scaling the application within the implementation layer and these are a couple of ways I could think of:
1) Increase the size of the task executor thread pool by many times, say, to 1000 or 10000 instead of 100 to accommodate a higher number of parallel requests.
2) Keep the size of the task executor thread pool intact and instead, scale-up by using some Spring code to convert the single application context into a pool of contexts so that each request can grab one that is available and every context has full access to the thread pool.
Example: A pool of 250 application contexts with each context having a thread pool of size 100, facilitating a total of 250 × 100 = 25000 threads in parallel.
The 2nd approach may lead to high memory consumption so I am thinking if I should start with approach 1).
However, what I am not sure of is if either of the approaches is practical in the long run.
Can anyone kindly throw some light? Thanks in advance for your time.
Sincerely,
Bharath

In my experience, it is very easy to hit a road block when scaling up. In contrast, scaling out is more flexible but adds complexity to the system.
The application server at hand is WebSphere 8.5 in a multi-instance
clustered environment and there is a provision to grow the size of the
cluster as well as the power of each instance in terms of memory and
processor cores.
I would continue in this direction (scaling out by adding instances to the cluster), if possible I would add a load balance mechanism in front of it. Start by distributing the load randomly and enhance by distributing the load by "free threads in the instance's pool".
Moreover, identify the heavier portions of the systems and evaluate if you would gain anything by migrating them to their own dedicated services.
Please note that the behaviour of the service is always meant to be
synchronous (this is a business need) and there are times when the
service can be a slightly long-running one.
The statement above raises some eyebrows. I understand when the business says "only return the results when everything is done". If that is the case then this system would benefit a lot if you could change the paradigm from a synchronous request/response to an Observer Pattern.

Multithreading - multiple users

When a single user is accessing an application, multiple threads can be used, and they can run parallel if multiple cores are present. If only one processor exists, then threads will run one after another.
When multiple users are accessing an application, how are the threads handled?

I can talk from Java perspective, so your question is "when multiple users are accessing an application, how are the threads handled?".
The answer is it all depends on how you programmed it, if you are using some web/app container they provide thread pool mechanism where you can have more than one threads to server user reuqests, Per user there is one request initiated and which in turn is handled by one thread, so if there are 10 simultaneous users there will be 10 threads to handle the 10 requests simultaneously, now we do have Non-blocking IO now a days where the request processing can be off loaded to other threads so allowing less than 10 threads to handle 10 users.
Now if you want to know how exactly thread scheduling done around CPU core, it again depends on the OS. One thing common though 'thread is the basic unit of allocation to a CPU'. Start with green threads here, and you will understand it better.

The incorrect assuption is
If only one processor exists, then threads will run one after another.
How threads are being executed is up to the runtime environment.
With java there are some definitions that certain parts of your code will not be causing synchronisation with other threads and thus will not cause (potential) rescheduling of threads.
In general, the OS will be in charge of scheduling units-of-execution. In former days mostly such entities have been processes. Now there may by processes and threads (some do scheduling only at thread level). For simplicity let ssume OS is dealing with threads only.
The OS then may allow a thread to run until it reaches a point where it can't continue, e.g. waiting for an I/O operation to cpmplete. This is good for the thread as it can use CPU for max. This is bad for all the other threads that want to get some CPU cycles on their own. (In general there always will be more threads than available CPUs.So, the problem is independent of number of CPUs.) To improve interactive behaviour an OS might use time slices that allow a thread to run for a certain time. After the time slice is expired the thread is forcible removed from the CPU and the OS selects a new thread for being run (could even be the one just interrupted).
This will allow each thread to make some progress (adding some overhead for scheduling). This way, even on a single processor system, threads my (seem) to run in parallel.
So for the OS it is not at all important whether a set of thread is resulting from a single user (or even from a single call to a web application) or has been created by a number of users and web calls.

You need understand about thread scheduler.
In fact, in a single core, CPU divides its time among multiple threads (the process is not exactly sequential). In a multiple core, two (or more) threads can run simultaneously.
Read thread article in wikipedia.
I recommend Tanenbaum's OS book.

Tomcat uses Java multi-threading support to serve http requests.
To serve an http request tomcat starts a thread from the thread pool. Pool is maintained for efficiency as creation of thread is expensive.
Refer to java documentation about concurrency to read more https://docs.oracle.com/javase/tutorial/essential/concurrency/
Please see tomcat thread pool configuration for more information https://tomcat.apache.org/tomcat-8.0-doc/config/executor.html

There are two points to answer to your question : Thread Scheduling & Thread Communication
Thread Scheduling implementation is specific to Operating System. Programmer does not have any control in this regard except setting priority for a Thread.
Thread Communication is driven by program/programmer.
Assume that you have multiple processors and multiple threads. Multiple threads can run in parallel with multiple processors. But how the data is shared and accessed is specific to program.
You can run your threads in parallel Or you can wait for threads to complete the execution before proceeding further (join, invokeAll, CountDownLatch etc.). Programmer has full control over Thread life cycle management.

There is no difference if you have one user or several. Threads work depending the logic of your program. The processor runs every thread for a certain ammount of time and then follows to the next one. The time is very short, so if there are not too much threads (or different processes) working, the user won't notice it. If the processor uses a 20 ms unit, and there are 1000 threads, then every thread will have to wait for two seconds for its next turn. Fortunately, current processors, even with just one core, have two process units which can be used for parallel threads.

In "classic" implementations, all web requests arriving to the same port are first serviced by the same single thread. However as soon as request is received (Socket.accept returns), almost all servers would immediately fork or reuse another thread to complete the request. Some specialized single user servers and also some advanced next generation servers like Netty may not.
The simple (and common) approach would be to pick or reuse a new thread for the whole duration of the single web request (GET, POST, etc). After the request has been served, the thread likely will be reused for another request that may belong to the same or different user.
However it is fully possible to write the custom code for the server that binds and then reuses particular thread to the web request of the logged in user, or IP address. This may be difficult to scale. I think standard simple servers like Tomcat typically do not do this.

How does a single machine share thread-pool?

I noticed that some web frameworks such as Play Framework allows you to configure multiple thread-pools with different sizes (num of threads within it). Let's say we run this play within a single machine with single core. Wouldn't there be a huge overhead by having multiple thread-pools?
For example, smaller thread pool is assuming asynchronous operations vs large thread-pool indicate a lot of blocking calls so threads can context-switch. Both cases is assuming that parallelism factor based on number of cores are in machine. My concern is that processor is further shared.
How does this work?
Thanks!

Play certainly allows you to configure multiple execution contexts (the equivalent of a thread pool), but that does not mean that you should do it, especially if you have a machine with a single core. By default the configuration should be kept low (close to the number of cores) for high-throughput - assuming, of course, that the operations are all non-blocking. If you have blocking operations the idea is to have them run on a separate execution context, as they otherwise lead to the exhaustion of the default request processing ExecutionContext (the request processing pipeline in Play runs on the default ExecutionContext, which is by default limited to a small number of threads).
As to what happens when you have more threads than cores and what happens when you do so highly depends on the operations you're running (in regards to I/O, etc.). One thread per core is supposedly optimal if you only do CPU-bound operations. See also this question.

How to set an appropriate thread number for my thread pool on the server side?

I just want to ask a rookie question: How to set an appropriate thread number for my thread pool on the server side?
Are there any general rules or formulas I can follow?
What are the issues I have to consider? For example, the number of network requests per second, the number of CPU cores, the CPU and memory usage rate in my application, the hardware I use on my server, etc.

Well, basically the size of the pool should be set to the the maximum possible of commands executed concurrently on your configuration, like if you have 4 cores (without HyperThreading), then you can set it to 4. With hyperthreading, you can set it to 8.
There are however questions like: what is the expected behaviour of the application, if it wants to get a thread from the pool, but the pool is empty (like you had 8 threads in the pool, every single one if them is working on a video encoding job in the next 10 minutes, and you get a new request in your manager thread).
You should consider however, that it is NOT guaranteed, that all your threads will run in every moment, even if your application handles threading exceptionally perfectly, as other applications are running on your computer meanwhile (your OS for example), and they need CPU as well.
On the other hand it is also a big question, that what does a thread do in your pool? You provided no informations about what is this thread pool used for, are they used in your own app, or you want to configure an open-source app/commercial app, etc. Creating and managing threads do have serious costs (scheduling, context switching, etc.), which may worth only if, the your threads stay alive long enough (you can provide enough job them to work on).
For further details, a quite good starting point in this subject could be Google I guess, for the following keywords: "scheduling, concurrency, threads, java executor service, hyperthreading".

How to decide the suitable number of threads to create in java?

I have a java application that creates SSL socket with remote hosts. I want to employ threads to hasten the process.
I want the maximum possible utilization that does not affect the program performance. How can I decide the suitable number of threads to use? After running the following line: Runtime.getRuntime().availableProcessors(); I got 4. My processor is an Intel core i7, with 8 GB RAM.

If you have 4 cores, then in theory you should have exactly four worker threads going at any given time for maximum optimization. Unfortunately what happens in theory never happens in practice. You may have worker threads who, for whatever reason, have significant amounts of downtime. Perhaps they're hitting the web for more data, or reading from a disk, much of which is just waiting and not utilizing the cpu.
Depending on how much waiting you're doing, you'll want to bump up the number. The costs to increasing the number of threads is that you'll have more context switching and competition for resources. The benefits are that you'll have another thread ready to work in case one of the other threads decides it has to break for something.
Your best bet is to set it to something (let's start with 4) and work your way up. Profile your code with each setting, and see if your benchmarks go up or down. You should soon see a pattern and a break-even point.
When it comes to optimization, you can theorize all you want about what should be the fastest, but you won't beat actually running and timing your code to truly answer this question.

As DarthVader said, you can use a ThreadPool (CachedThreadPool). With this construct you don't have to specify a concrete number of threads.
From the oracle site:
The newCachedThreadPool method creates an executor with an expandable thread pool. This executor is suitable for applications that launch many short-lived tasks.
Maybe thats what you are looking for.
About the number of cores is hard to say. You have 4 hyperthreading cores, at least one core you should leave for your OS. i would say 4-6 Threads.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.