I have a Spring Boot application that works with an embedded Tomcat web server. The application uses a H2 db over JPA. The front end is a single page application that communicates over a REST interface with the Spring back end, that only contains a Business layer and a Domain layer. Nothing complex.
As the application is a prototype for a future product that will run on a minimalistic system, therefore I was measuring the CPU load and memory usage.
That's when I found an odd behaviour, which I currently cannot explain.
During the start up, the app uses about 3/4 of the CPU which is okay as the whole framework gets initialized.
But after the app has started (Log message "Started Application in XX seconds" has appeared) it still uses about 50% of the CPU, slowly decreasing until it finally reaches 15% after about 2 or 3 minutes, although my implementation is not doing anything active. It's pretty much only waiting for a request over the REST interface.
It appears to me that Spring or the embedded Tomcat is doing something, which I don't know of.
Has anybody already experienced the same issue and/or maybe knows what could be going on?
Related
spring boot + spring integration app monitored by prometheus throught the build in micrometer.io. the spring boot app will expose locahost:8080/actuator/prometheus. the monitoring data arrives in prometheus and can be displayed as a graph. this is working fine.
my problem is that i get some gaps in the prometheus data. these gaps happen when the app is under heavy load. it is normal when the app is very busy that the response times for locahost:8080/actuator/prometheus get longer. in my case without load is less then 1 second, but with load gets around 1 minute. the target is shown in the prometheus status->targets as offline. one possibility would be to set the scrape_interval = 2min but it would be important to see more detail info.
my question: is there a solution for this scenario? (setting priority to monitoring url?, storing temporary the info in the spring boot app and send it later)
update: i am trying to monitor the spring integration metrics, but for this question is not important which metric. could be anything like jvm heap.
Under normal circumstances querying the metrics endpoint using is quite fast.
There are three scenarios that came to my mind that could be the reason why its getting slower:
a) your app is so much under heavy load that it takes too much time until it accepts the http request. This means that your app serves too many requests then it can handle. In that case give it more resources, threads or whatever is the bottleneck. (see here)
b) you have custom gauges registered that needs lots of time to calculate or the get the value. E.g. having a DB query in a Gauge getter function is a killer, as every time the metric endpoint is queried, your app needs to query the database and only then it can render the metrics. Even worse if you have multiple of these (which are handled sequencially) and their performance is dependent on your applications load (e.g. when the DB server gets slower when your app is under heavy load, this will make it worse)
c) Your metrics labels cardinality are dependent on your application usage (which as a bad practice). E.g. having a label for each user or each session will increase the amount of metrics when your application is under heavy usage. This will not only stress your application (as each metric needs some memory) but it will also stress your Prometheus server as it creates files for each unique label value combination.
What you could do, but this will not solve the cause of your issues is increasing the value for scrape_timeout (see here).
I'm trying to set up autoscaling in Kubernetes (hosted in Google Kubernetes Engine) for my Java Spring application. I have faced two problems:
Spring application uses a lot of cpu at the start (something like 250mCPU*, but sometimes it is even 500mCPU) which really breaks autoscaling, because some instances of that application, after more or less than 1 minute (Spring context start etc.), use only 50mCPU.
Because at some environments that aplication uses small amount of mCPU (and almost at every environment at night), I would like to set requested cpu=200mCPU max (=80% limit cpu) (or even less!). So then autoscaling would have much more sense. But I can't really do that, because of that heavy start of Spring, which won't be finished if i give him too less cpu.
When application starts receiving traffic (when new pod is created because of autoscaling event) at the beginning its cpu usage can jump to something like 200% of standard usage, and then go back to that 100% - it doesn't look like it's because of too many request are being pushed to that new pod, it looks more like JVM is just slower at the start and he receives too much traffic at the begging. It looks like JVM would need something like warm up (so don't push 1/n of traffic to new pod suddenly, but switch traffic to that new pod slower). Thanks to that behaviour autoscaling sometimes get crazy - when it really needs just one pod more, it can scale up a lot of them, and then scale down...
* in GKE 1000mCPU = 1 core
On uploaded images we can see cpu charts.
In the first, we can see that cpu usage after start is much smaller than at the beginning. In the second, we can spot both problems: high cpu usage at the start, then grace period (readiness probe initial* delay hasn't finished), and then high pick at the beginning of receiving traffic.
* I have set readiness probe initial delay to be longer than context loading.
Chart 1 Chart 2
The only thing that I've found in the internet is to add container to that pod, which will do nothing but "sleep x", and then die. And add set to that container requested mCPU to amount which will be used at spring app startup (then I would have to increase cpu limit for that spring app container, but it shouldn't harm anyway, because autoscaling should prevent spring app from starving other apps in the node).
I would really appreciate any advice.
It is true that Spring applications are not the most container friendly thing out there but there are few things you can try:
On startup, Spring autowires the beans and performs dependency injection, creates objects in memory, etc. All of those things are CPU intensive. If you assign less CPU to your pod, it will logically increase the startup time.
Things you can do here are:
Use a startupProbe and give time to your application to start. It
is explained pretty good here on how to calculate the delays and
thresholds
Adjust the maxSurge and maxUnavailable in your deployment
strategy as it fits best to your case (for example, maybe you have 10
replicas and max surge /max unavailable of 10% so your pods will
rollout slowly, one by one). This will help to reduce spikes in
traffic on the overall application replicas (docs are here).
If your use case allows, you can look into lazy loading your Spring
application, meaning that it will not create all objects upon
startup, rather it will wait until they are used. This can be
somewhat dangerous due to potentially not being able to discover
issues on startup in some cases.
If you have HPA enabled + defined replicas value in the deployment, you might experience issues upon deploying, I can't find the relevant GH issue ATM but you might want to run some tests there on how it behaves (scaling more than it should, etc). Things you can do here are:
Tweak the autoscaling thresholds and times (default is 3min, afaik) to allow your deployments to rollout smoothly without triggering the autoscale.
Write a custom autoscaling metric instead of scaling by CPU. This one requires some work but might solve your scaling issues for good (relevant docs).
Lastly, what you are suggesting with a sidecar looks like a hack :)
Haven't tried it though so can't really tell the pros and cons.
Unfortunately, there is no silver bullet for Spring Boot (or Java) + K8s but things are getting better than they were a few years back. If I find some helpful resources. I will come back and link them here.
Hope the above helps.
Cheers
I have a spring boot application which I'm running inside docker containers in an openshift cluster. In steady state, there are N instances of the application (say N=5) and requests are load balanced to these N instances. Everything runs fine and response time is low (~5ms with total throughput of ~60k).
Whenever I add a new instance, response time goes up briefly (upto ~70ms) and then comes back to normal.
I checked NewRelic JVM stats.
As you can see, whenever the app starts, there is GC-MarkSweep which I think is probably related to the initial high response time.
How can I avoid this? I'm using Java 8. Will using a different GC (G1) help or can I somehow tune my GC settings?
JVM itself requires quite a work when starting, and Spring Boot is also adding a lot of it's own work and classes. Try to remove/switch off all unused features, since autoconfiguration magic can cause a lot of unnecessary overhead.
I read that Google App Engine (GAE) will shut down your application if it goes idle, and startup/boot everything again when it gets a request. And i know that Spring startup is slow, like 2-3 seconds even for a small web app. Is working on GAE using Spring really suffer from this badly?
Thanks in advance.
It's really not that bad but considering your instances are being shutdown and started constantly, you should work on getting your startup as fast as possible. A few pointers to consider:
Enable warmup requests
Enable resident instances
Optimize Spring config (There are great suggestions in this article)
We have created a spring web app. using:
Spring 3.1.0
Hibernate 3.5.4 final
tomcat 6.24
The application is reasonably heavy, we are sending about 1000 contacts per user request.
We tested our application with 9 concurrent users with repeated requests and profiled with visual vm the results are as follows:
Looking at the results, the high peaks are the repeated requests and the lower points are when all requests are stopped. The first ~200MB of memory does not seem to be released at all. Is spring actually just this heavy or do I have a potential memory issue? The release version of this web app will potentially handle much more users.
I have similar results testing on tomcat 7 as well.
its not any memory issue, GC is smart enough that release objects after there is no reference in your application, make sure that there is no global reference for which can be used as local to any method, and as per your graph it is releasing objects, 200 mb may be required tor permgen, so you should not worry.