Assume we have a computer with four physical cores, and we want to decrease latency of a task and best number of threads to do that is 4.
But if we are in a web application, and we use an application server(or servlet container) like tomcat, jetty, netty, I think for doing throughput performance issue, that application server uses 4 threads.
At this point, if we want to use 4 threads to decrease latency of a task, Given that 4 threads is used by application server. With using multi threading, we can not get the most or great benefits. Is that true for web applications?
Thank you so much in advance.
Related
We have the scenario, that we want to deploy ~300 java-applications for a in-house use-case in a kubernetes cluster. A lot of them are just used 4 times a year - and the rest of the year they are just wasting RAM.
To reduce the memory footprint we're currently discussing the following options:
Using a kubernetes-"buildt-in" mechanism, which starts the container when a request will arrive. After a timeout of (f.e. 10 hours) the container will be suspended/hibernated.
Offloading the RAM to disc (for specific containers) is allowed too.
Starting the containers by a "Proxy-Webpage": First, the user have to login to a web-app, where he is searching for and selecting the desired application. OnDemand (perhaps by a kubectl command in background etc) the application will be started.
Does someone have this special use-case, too?
We're starting this roject right now. So other options are helpful too. Just Java as development language is fixed.
Is there a built-in solution in kubernetes, to reduce the memory footprint?
Is our option #3 really a "good" solution?
There is this stateless REST application/API written and being maintained by me using Spring Integration API with the following underlying concepts working hand-in-hand:
1) Inbound HTTP gateway as the RESTful entrypoint
2) A handful of Service Activators, Routers, Channels and Transformers
3) A Splitter (and an Aggregator) with the former subscribed to a channel which in turn, has a task executor wired-in comprising a thread pool of size 100 for parallelised execution of the split(ted) messages
The application is performing seamlessly so far - as the next step, my attempt is to scale this application to handle a higher number of requests in order to accommodate a worst case situation where all 100 threads in the pool are occupied at the exact same time.
Please note that the behaviour of the service is always meant to be synchronous (this is a business need) and there are times when the service can be a slightly long-running one. The worst-case roundtrip is ~15 seconds and the best case is ~2 seconds, both of which are within acceptable limits for the business team.
The application server at hand is WebSphere 8.5 in a multi-instance clustered environment and there is a provision to grow the size of the cluster as well as the power of each instance in terms of memory and processor cores.
That said, I am exploring ways to solve the problem of scaling the application within the implementation layer and these are a couple of ways I could think of:
1) Increase the size of the task executor thread pool by many times, say, to 1000 or 10000 instead of 100 to accommodate a higher number of parallel requests.
2) Keep the size of the task executor thread pool intact and instead, scale-up by using some Spring code to convert the single application context into a pool of contexts so that each request can grab one that is available and every context has full access to the thread pool.
Example: A pool of 250 application contexts with each context having a thread pool of size 100, facilitating a total of 250 × 100 = 25000 threads in parallel.
The 2nd approach may lead to high memory consumption so I am thinking if I should start with approach 1).
However, what I am not sure of is if either of the approaches is practical in the long run.
Can anyone kindly throw some light? Thanks in advance for your time.
Sincerely,
Bharath
In my experience, it is very easy to hit a road block when scaling up. In contrast, scaling out is more flexible but adds complexity to the system.
The application server at hand is WebSphere 8.5 in a multi-instance
clustered environment and there is a provision to grow the size of the
cluster as well as the power of each instance in terms of memory and
processor cores.
I would continue in this direction (scaling out by adding instances to the cluster), if possible I would add a load balance mechanism in front of it. Start by distributing the load randomly and enhance by distributing the load by "free threads in the instance's pool".
Moreover, identify the heavier portions of the systems and evaluate if you would gain anything by migrating them to their own dedicated services.
Please note that the behaviour of the service is always meant to be
synchronous (this is a business need) and there are times when the
service can be a slightly long-running one.
The statement above raises some eyebrows. I understand when the business says "only return the results when everything is done". If that is the case then this system would benefit a lot if you could change the paradigm from a synchronous request/response to an Observer Pattern.
We make a Java EE Web Application that runs on TomEE and we sell it to different customers, my boss asked me yesterday if there's a way to calculate the application ROM, RAM and CPU requirements based on the the number of clients our customer is expecting to have daily.
Is there a tool or a technique to find this information?
The application is expected to receive, analyze and store electronic invoices.
This is almost entirely dependent on your application. TomEE itself is very lightweight, startup with no apps occurs in a few ms and the memory idle overhead is about ~20mb, depending on what features your application uses. TomEE generally is a constant factor when it comes to scalability.
The scientific way to calculate the production values for your application is to perform load testing experiments and monitor it with a profiler. Simulate a bunch of users with JMeter or Selenium IDE. Monitor your application via JMX and jvisualvm and track the histograms. Monitor your CPU, garbage collection cycles, and heap memory as you add users and figure out if application scales linearly or exponentially.
Good luck!
I am designing a micro-services based system. Most of the services are deployed as standalone Jersey processes with an embedded Grizzly web server.
Assuming that many of those services will execute on the same machine, shall I change any threading configuration in Grizzly to prevent a situation of too many threads machine-wide?
What is the default threading model for Grizzly? Is there a limit for number of threads that a single web server can create?
It depends on what you do with the incoming data.
If you need to process the data (cpu time > io time), then you need to match the number of physical cores to the number of data processing threads.
If most of the time is spent in IO (retrieving/storing the data) then you can start with cores * 2 and set the max to something that you must determine through testing the cpu usage and the throughput. I personally like the powers of 4 per core (4, 16, 64, 256). This will quickly narrow you down down onto the order of magnitude.
https://javaee.github.io/grizzly/coreconfig.html#/Thread_Pool_Configuration
a jk_connector worker is basically a tomcat instance waiting to process requests from a web server.
The apache docs tell you that you should have multiple workers if you have multiple apps, but doesnt really explain why.
What are the pros/cons of having a worker per web app vs 1 worker for multiple apps?
Processor affinity for one. If the workset is bound to one executional unit its built in cache be utilized more effectively. The more applications to share the space the more contention.
Most systems today are based on multiple cpu cores where threads can execute independently on each core. This means that a busy server can better utilize system resources if there are more threads (e.g., 1 thread/cpu), both for multicore (SMP) and multithreading (SMT) systems. A common way for servers is to provide a process/thread pool of workers which can be used and reused to serve multiple simultaneous requests.