How to manage Java Spring applications autoscaling in Kubernetes PROPERLY?

How to manage Java Spring applications autoscaling in Kubernetes PROPERLY? - java

I'm trying to set up autoscaling in Kubernetes (hosted in Google Kubernetes Engine) for my Java Spring application. I have faced two problems:
Spring application uses a lot of cpu at the start (something like 250mCPU*, but sometimes it is even 500mCPU) which really breaks autoscaling, because some instances of that application, after more or less than 1 minute (Spring context start etc.), use only 50mCPU.
Because at some environments that aplication uses small amount of mCPU (and almost at every environment at night), I would like to set requested cpu=200mCPU max (=80% limit cpu) (or even less!). So then autoscaling would have much more sense. But I can't really do that, because of that heavy start of Spring, which won't be finished if i give him too less cpu.
When application starts receiving traffic (when new pod is created because of autoscaling event) at the beginning its cpu usage can jump to something like 200% of standard usage, and then go back to that 100% - it doesn't look like it's because of too many request are being pushed to that new pod, it looks more like JVM is just slower at the start and he receives too much traffic at the begging. It looks like JVM would need something like warm up (so don't push 1/n of traffic to new pod suddenly, but switch traffic to that new pod slower). Thanks to that behaviour autoscaling sometimes get crazy - when it really needs just one pod more, it can scale up a lot of them, and then scale down...
* in GKE 1000mCPU = 1 core
On uploaded images we can see cpu charts.
In the first, we can see that cpu usage after start is much smaller than at the beginning. In the second, we can spot both problems: high cpu usage at the start, then grace period (readiness probe initial* delay hasn't finished), and then high pick at the beginning of receiving traffic.
* I have set readiness probe initial delay to be longer than context loading.
Chart 1 Chart 2
The only thing that I've found in the internet is to add container to that pod, which will do nothing but "sleep x", and then die. And add set to that container requested mCPU to amount which will be used at spring app startup (then I would have to increase cpu limit for that spring app container, but it shouldn't harm anyway, because autoscaling should prevent spring app from starving other apps in the node).
I would really appreciate any advice.

It is true that Spring applications are not the most container friendly thing out there but there are few things you can try:
On startup, Spring autowires the beans and performs dependency injection, creates objects in memory, etc. All of those things are CPU intensive. If you assign less CPU to your pod, it will logically increase the startup time.
Things you can do here are:
Use a startupProbe and give time to your application to start. It
is explained pretty good here on how to calculate the delays and
thresholds
Adjust the maxSurge and maxUnavailable in your deployment
strategy as it fits best to your case (for example, maybe you have 10
replicas and max surge /max unavailable of 10% so your pods will
rollout slowly, one by one). This will help to reduce spikes in
traffic on the overall application replicas (docs are here).
If your use case allows, you can look into lazy loading your Spring
application, meaning that it will not create all objects upon
startup, rather it will wait until they are used. This can be
somewhat dangerous due to potentially not being able to discover
issues on startup in some cases.
If you have HPA enabled + defined replicas value in the deployment, you might experience issues upon deploying, I can't find the relevant GH issue ATM but you might want to run some tests there on how it behaves (scaling more than it should, etc). Things you can do here are:
Tweak the autoscaling thresholds and times (default is 3min, afaik) to allow your deployments to rollout smoothly without triggering the autoscale.
Write a custom autoscaling metric instead of scaling by CPU. This one requires some work but might solve your scaling issues for good (relevant docs).
Lastly, what you are suggesting with a sidecar looks like a hack :)
Haven't tried it though so can't really tell the pros and cons.
Unfortunately, there is no silver bullet for Spring Boot (or Java) + K8s but things are getting better than they were a few years back. If I find some helpful resources. I will come back and link them here.
Hope the above helps.
Cheers

Related

On Demand Containers in Kubernetes

We have the scenario, that we want to deploy ~300 java-applications for a in-house use-case in a kubernetes cluster. A lot of them are just used 4 times a year - and the rest of the year they are just wasting RAM.
To reduce the memory footprint we're currently discussing the following options:
Using a kubernetes-"buildt-in" mechanism, which starts the container when a request will arrive. After a timeout of (f.e. 10 hours) the container will be suspended/hibernated.
Offloading the RAM to disc (for specific containers) is allowed too.
Starting the containers by a "Proxy-Webpage": First, the user have to login to a web-app, where he is searching for and selecting the desired application. OnDemand (perhaps by a kubectl command in background etc) the application will be started.
Does someone have this special use-case, too?
We're starting this roject right now. So other options are helpful too. Just Java as development language is fixed.
Is there a built-in solution in kubernetes, to reduce the memory footprint?
Is our option #3 really a "good" solution?

Prometheus data gaps when monitoring app under heavy load

spring boot + spring integration app monitored by prometheus throught the build in micrometer.io. the spring boot app will expose locahost:8080/actuator/prometheus. the monitoring data arrives in prometheus and can be displayed as a graph. this is working fine.
my problem is that i get some gaps in the prometheus data. these gaps happen when the app is under heavy load. it is normal when the app is very busy that the response times for locahost:8080/actuator/prometheus get longer. in my case without load is less then 1 second, but with load gets around 1 minute. the target is shown in the prometheus status->targets as offline. one possibility would be to set the scrape_interval = 2min but it would be important to see more detail info.
my question: is there a solution for this scenario? (setting priority to monitoring url?, storing temporary the info in the spring boot app and send it later)
update: i am trying to monitor the spring integration metrics, but for this question is not important which metric. could be anything like jvm heap.

Under normal circumstances querying the metrics endpoint using is quite fast.
There are three scenarios that came to my mind that could be the reason why its getting slower:
a) your app is so much under heavy load that it takes too much time until it accepts the http request. This means that your app serves too many requests then it can handle. In that case give it more resources, threads or whatever is the bottleneck. (see here)
b) you have custom gauges registered that needs lots of time to calculate or the get the value. E.g. having a DB query in a Gauge getter function is a killer, as every time the metric endpoint is queried, your app needs to query the database and only then it can render the metrics. Even worse if you have multiple of these (which are handled sequencially) and their performance is dependent on your applications load (e.g. when the DB server gets slower when your app is under heavy load, this will make it worse)
c) Your metrics labels cardinality are dependent on your application usage (which as a bad practice). E.g. having a label for each user or each session will increase the amount of metrics when your application is under heavy usage. This will not only stress your application (as each metric needs some memory) but it will also stress your Prometheus server as it creates files for each unique label value combination.
What you could do, but this will not solve the cause of your issues is increasing the value for scrape_timeout (see here).

Decision to go for distributed application?

I have a legacy product in financial domain.Using tomcat 6. We get millions of request 10k of request in hour. I am wondering at high level
should i go for ditributed application where my mvc component is on one system and service/dao on another box(can use spring remote/EJB).
The reason i am planning to go in this direction so that load is distribute and get better performance With this it becomes scalable also.
I only see the positive side of it but somehow not able to figure out what can be the negative aspect of it?
If some expert can help
what is the criteria i should consider to go for distributed model and pros/cons of it? I also tried googling where i could get some stats
like how much load a given webserver (tomcat in my case)handle efiiciently with given hardware(16 gb ram, windows 7, processor ).
Yes i am going
to do POC where i will be measuring performance with distributed model vs without bit high level input will be highly appreciated?

It is impossible to answer this questions without more details - how long does it take to reply to one request on the current server? How many resources are allocated for one request?
having 10k requests per hour means ~3 requests per second. If performing the necessary operations and replying to a request, using 1 CPU takes ~300ms - one simple machine is totally fine. This is simple math, and doesn't always work. I guess you still have peaks within those 10k requests per hour and they aren't gradually distributed.
If we assume, one reply can take up to 1 second, than you can handle as many replies per second as your system has CPUs (given that a CPU would be the bottle neck) If the CPU isn't the bottle neck for your application server, there's probably something wrong. You should set up the database(s) on a different machine and only perform computation tasks on the application server machine.
Especially in the financial sector with a legacy software, I wouldn't try splitting a running product. How old is the current server? I believe that a new Server should be cheaper than rewriting an application. Unless you expect 50-100k requests per hour very soon, I don't think, splitting up such small parts makes sense.
Instead - run it on an up to date server hardware, split application server and data storage and you should be fine.

I am wondering at high level if should i go for ditributed application where my mvc component is on one system and service/dao on another box(can use spring remote/EJB).
I'm not sure what you mean for "system" in this context, but if it means that you are planning to run your application in two servers,
one dedicated to presentation and other dedicated to business layer, take in mind that a simpler approach (and probably more suitable for your app)
is build a co-located architecture.
Basically, the idea is to replicate your app in several servers (at least two) and put in front of them a load balancer that routes the incoming requests among the available servers.
All servers share the same database instance. This will give you vertical scalability and also will improve the availability of your system.
I only see the positive side of it but somehow not able to figure out what can be the negative aspect of it?
Distributing your business logic will probably involve a refactor of your application code, if the system is working well you will add some bugs for sure.
The necessary remote calls will add latency and the fact that you execute your business logic in several servers doesn't resolve the performance problems on the presentation tier.
In Expert One-on-One J2EE Development Without EJB (pag. 65), you can find a good reading about why not distribute your business logic.

Java Web App has a high rate of CPU consumption

I'm new here and I'm not that very good in CPU consumption and Multi Threading. But I was wondering why my web app is consuming too much of the CPU process? What my program does is update values in the background so that users don't have to wait for the processing of the data and will only need to fetch it upon request. The updating processes are scheduled tasks using executor library that fires off 8 threads every 5 seconds to update my data.
Now I'm wondering why my application is consuming too much of the CPU. Is it because of bad code or is it because of a low spec server? (2 cores with 2 database and 1 major application running with my web app)
Thank you very much for your help.

You need to profile your application to find out where the CPU is actually being consumed. Java has some basic profiling methods built in, or if your environment permits it, you could run the built in "hprof" compiler:
java -Xrunhprof ...
(In reality, you probably want to set some extra options: Google "hprof" for more details.)
The latter is easier in principle, but I mention the possibility of adding your own profiling routine because it's more flexible and you can do it e.g. in a Servlet environment where running another profiler is more cumbersome.

Paulo,
It is not possible for someone here to say whether the problem is that your code is inefficient or the server is under spec. It could be either or both of those, or something else.
You are going to need to do some research of your own:
Profile the code. This will allow you to identify where your webapp is spending most of its time.
Look at the OS-level stats that are available to you. This might tell you that the real problem is memory usage or disk I/O.
Look at the performance of the back-end database. Is it using a lot of CPU?
Once you have identified the area(s) where the CPU is being used, you need to figure out the real cause of the problem is and work out how to fix it. And once you've got a potential fix implemented, you can rerun your profiling, etc to see it has helped.

Scalability of a single server for running a Java Web application

I want to gain more insight regarding the scale of workload a single-server Java Web application deployed to a single Tomcat instance can handle. In particular, let's pretend that I am developing a Wiki application that has a similar usage pattern like Wikipedia. How many simultaneous requests can my server handle reliably before going out of memory or show signs of excess stress if I deploy it on a machine with the following configuration:
4-Core high-end Intel Xeon CPU
8GB RAM
2 HDDs in RAID-1 (No SSDs, no PCIe based Solid State storages)
RedHat or Centos Linux (64-bit)
Java 6 (64-bit)
MySQL 5.1 / InnoDB
Also let's assume that the MySQL DB is installed on the same machine as Tomcat and that all the Wiki data are stored inside the DB. Furthermore, let's pretend that the Java application is built on top of the following stack:
SpringMVC for the front-end
Hibernate/JPA for persistence
Spring for DI and Security, etc.
If you haven't used the exact configuration but have experience in evaluating the scalability of a similar architecture, I would be very interested in hearing about that as well.
Thanks in advance.
EDIT: I think I have not articulated my question properly. I mark the answer with the most up votes as the best answer and I'll rewrite my question in the community wiki area. In short, I just wanted to learn about your experiences on the scale of workload your Java application has been able to handle on one physical server as well as some description regarding the type and architecture of the application itself.

You will need to use group of tools :
Loadtesting Tool - JMeter can be used.
Monitoring Tool - This tool will be used to monitor various numbers of resources load. There are Lot paid as well as free ones. Jprofiler,visualvm,etc
Collection and reporting tool. (Not used any tool)
With above tools you can find optimal value. I would approach it in following way.
will get to know what should be ratio of pages being accessed. What are background processes and their frequency.
Configure my JMeter accordingly (for ratios) , and monitor performance for load applied ( time to serve page ...can be done in JMeter), monitor other resources using Monitor tool. Also check count of error ratio. (NOTE: you need to decide upon what error ratio is not acceptable.)
Keep increasing Load step by step and keep writting various numbers of interest till server fails completely.
You can decide upon optimal value based on many criterias, Low error rate, Max serving time etc.
JMeter supports lot of ways to apply load.

To be honest, it's almost impossible to say. There's probably about 3 ways (of the top of my head to build such a system) and each would have fairly different performance characteristics. You best bet is to build and test.
Firstly try to get some idea of what the estimated volumes you'll have and the latency constraints that you'll need to meet.
Come up with a basic architecture and implement a thin slice end to end through the system (ideally the most common use case). Use a load testing tool like (Grinder or Apache JMeter) to inject load and start measuring the performance. If the performance is acceptable - be conservative your simple implementation will likely include less functionality and be faster than the full system - continue building the system and testing to make sure you don't introduce a major performance bottleneck. If not come up with a different design.
If your code is reasonable the bottleneck will likely be the database and somewhere in the region 100s of db ops per second. If that is insufficient then you may need to think about caching.

Definitely take a look at Spring Insight for performance monitoring and analysis.

English Wikipedia has 14GB data. A 8GB mem cache would have very high hit/miss ratio, and I think harddisk read would be well within its capacity. Therefore, the app is most likely network bound.
English Wikipedia has about 3000 page views per second. It is possible that tomcat can handle the load by careful tuning, and the network has enough throughput to server the traffic.
So the entire wikipedia site can be hosted on one moderate machine? Probably not. Just an idea.
-
http://stats.wikimedia.org/EN/TablesWikipediaEN.htm
http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm

Tomcat doesn't allow for spreading over multiple machines. If you really are concerned about scalability, you must consider what to do when your application outgrows a single machine.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.