On Demand Containers in Kubernetes

On Demand Containers in Kubernetes - java

We have the scenario, that we want to deploy ~300 java-applications for a in-house use-case in a kubernetes cluster. A lot of them are just used 4 times a year - and the rest of the year they are just wasting RAM.
To reduce the memory footprint we're currently discussing the following options:
Using a kubernetes-"buildt-in" mechanism, which starts the container when a request will arrive. After a timeout of (f.e. 10 hours) the container will be suspended/hibernated.
Offloading the RAM to disc (for specific containers) is allowed too.
Starting the containers by a "Proxy-Webpage": First, the user have to login to a web-app, where he is searching for and selecting the desired application. OnDemand (perhaps by a kubectl command in background etc) the application will be started.
Does someone have this special use-case, too?
We're starting this roject right now. So other options are helpful too. Just Java as development language is fixed.
Is there a built-in solution in kubernetes, to reduce the memory footprint?
Is our option #3 really a "good" solution?

Related

How to mimic traffic (TPS 9.05) of production Spring Boot REST API using Apache Jmeter performance testing?

I have an API (Spring Boot/Spring Web using Swagger) that has a throughput (TPS?) of 9.05 (not sure how this is being calculated, but its displayed on some metrics page). The API gets hit thousands of times per hour, sometimes peaking at 9,000 calls. Average response time is anywhere between ~2000-3000ms, approximately. This is a simple API that accepts a POST request and then queries a Postgres Database and returns this data as an HTTP response to client. This API is containerized via Docker and running on an ECS cluster on AWS (m5.large2) instance.
Instance Size vCPU Memory(GiB) Instance Storage(GiB) Network Bandwidth(Gbps) EBS Bandwidth (Mbps)
m5a.2xlarge 8 32 EBS-Only Up to 10 Up to 2,880
I have apache Jmeter installed and I am trying to mimic the production API calls to lower environments so I can fine-tune some CPU and memory configurations of our Docker containers running in AWS Elastic Container Service (ECS).
I am currently running 5 Threads, with 1/sec ramp-up, and 900 second duration time -
Is there a systematic way to I can replicate the traffic load in the lower environments so I can reproduce PROD load so I can correctly fine-tune CPU and memory?

As per Performance Testing in Scaled Down Environments. Part One: The Challenges article:
An application’s underlying infrastructure is constructed of many different components such as caches, web servers, application servers and disks(I/O). Bandwidth and CDNs also play a role in its function and therefore have to be taken into consideration during scaling. Each component behaves differently in the application according to how it was configured and scaled. However, the tiered structure makes it difficult to calculate how each should be tested and scaled.
Furthermore, there are two ways to scale the application. Scaling-up adds supplementary resources, like CPUs and memory, to a single computer. Scaling-out clusters additional computers together as one system to generate combined computing power. All of these options make it almost impossible to estimate actual data from performance testing in a smaller environment.
So there is no formula of extrapolation the behaviour of "lower environment" in comparison to production-like environment, I would say you're quite limited in what you can do, for example:
Run a Soak Test, this way you will be able to determine memory leaks
Run a test with a profiler tool telemetry enabled and inspect the longest running functions, largest objects, garbage collection activity, etc.
Monitor database slow queries and inspect their query plans for optimization in case of high cardinality/cost

How to manage Java Spring applications autoscaling in Kubernetes PROPERLY?

I'm trying to set up autoscaling in Kubernetes (hosted in Google Kubernetes Engine) for my Java Spring application. I have faced two problems:
Spring application uses a lot of cpu at the start (something like 250mCPU*, but sometimes it is even 500mCPU) which really breaks autoscaling, because some instances of that application, after more or less than 1 minute (Spring context start etc.), use only 50mCPU.
Because at some environments that aplication uses small amount of mCPU (and almost at every environment at night), I would like to set requested cpu=200mCPU max (=80% limit cpu) (or even less!). So then autoscaling would have much more sense. But I can't really do that, because of that heavy start of Spring, which won't be finished if i give him too less cpu.
When application starts receiving traffic (when new pod is created because of autoscaling event) at the beginning its cpu usage can jump to something like 200% of standard usage, and then go back to that 100% - it doesn't look like it's because of too many request are being pushed to that new pod, it looks more like JVM is just slower at the start and he receives too much traffic at the begging. It looks like JVM would need something like warm up (so don't push 1/n of traffic to new pod suddenly, but switch traffic to that new pod slower). Thanks to that behaviour autoscaling sometimes get crazy - when it really needs just one pod more, it can scale up a lot of them, and then scale down...
* in GKE 1000mCPU = 1 core
On uploaded images we can see cpu charts.
In the first, we can see that cpu usage after start is much smaller than at the beginning. In the second, we can spot both problems: high cpu usage at the start, then grace period (readiness probe initial* delay hasn't finished), and then high pick at the beginning of receiving traffic.
* I have set readiness probe initial delay to be longer than context loading.
Chart 1 Chart 2
The only thing that I've found in the internet is to add container to that pod, which will do nothing but "sleep x", and then die. And add set to that container requested mCPU to amount which will be used at spring app startup (then I would have to increase cpu limit for that spring app container, but it shouldn't harm anyway, because autoscaling should prevent spring app from starving other apps in the node).
I would really appreciate any advice.

It is true that Spring applications are not the most container friendly thing out there but there are few things you can try:
On startup, Spring autowires the beans and performs dependency injection, creates objects in memory, etc. All of those things are CPU intensive. If you assign less CPU to your pod, it will logically increase the startup time.
Things you can do here are:
Use a startupProbe and give time to your application to start. It
is explained pretty good here on how to calculate the delays and
thresholds
Adjust the maxSurge and maxUnavailable in your deployment
strategy as it fits best to your case (for example, maybe you have 10
replicas and max surge /max unavailable of 10% so your pods will
rollout slowly, one by one). This will help to reduce spikes in
traffic on the overall application replicas (docs are here).
If your use case allows, you can look into lazy loading your Spring
application, meaning that it will not create all objects upon
startup, rather it will wait until they are used. This can be
somewhat dangerous due to potentially not being able to discover
issues on startup in some cases.
If you have HPA enabled + defined replicas value in the deployment, you might experience issues upon deploying, I can't find the relevant GH issue ATM but you might want to run some tests there on how it behaves (scaling more than it should, etc). Things you can do here are:
Tweak the autoscaling thresholds and times (default is 3min, afaik) to allow your deployments to rollout smoothly without triggering the autoscale.
Write a custom autoscaling metric instead of scaling by CPU. This one requires some work but might solve your scaling issues for good (relevant docs).
Lastly, what you are suggesting with a sidecar looks like a hack :)
Haven't tried it though so can't really tell the pros and cons.
Unfortunately, there is no silver bullet for Spring Boot (or Java) + K8s but things are getting better than they were a few years back. If I find some helpful resources. I will come back and link them here.
Hope the above helps.
Cheers

Decision to go for distributed application?

I have a legacy product in financial domain.Using tomcat 6. We get millions of request 10k of request in hour. I am wondering at high level
should i go for ditributed application where my mvc component is on one system and service/dao on another box(can use spring remote/EJB).
The reason i am planning to go in this direction so that load is distribute and get better performance With this it becomes scalable also.
I only see the positive side of it but somehow not able to figure out what can be the negative aspect of it?
If some expert can help
what is the criteria i should consider to go for distributed model and pros/cons of it? I also tried googling where i could get some stats
like how much load a given webserver (tomcat in my case)handle efiiciently with given hardware(16 gb ram, windows 7, processor ).
Yes i am going
to do POC where i will be measuring performance with distributed model vs without bit high level input will be highly appreciated?

It is impossible to answer this questions without more details - how long does it take to reply to one request on the current server? How many resources are allocated for one request?
having 10k requests per hour means ~3 requests per second. If performing the necessary operations and replying to a request, using 1 CPU takes ~300ms - one simple machine is totally fine. This is simple math, and doesn't always work. I guess you still have peaks within those 10k requests per hour and they aren't gradually distributed.
If we assume, one reply can take up to 1 second, than you can handle as many replies per second as your system has CPUs (given that a CPU would be the bottle neck) If the CPU isn't the bottle neck for your application server, there's probably something wrong. You should set up the database(s) on a different machine and only perform computation tasks on the application server machine.
Especially in the financial sector with a legacy software, I wouldn't try splitting a running product. How old is the current server? I believe that a new Server should be cheaper than rewriting an application. Unless you expect 50-100k requests per hour very soon, I don't think, splitting up such small parts makes sense.
Instead - run it on an up to date server hardware, split application server and data storage and you should be fine.

I am wondering at high level if should i go for ditributed application where my mvc component is on one system and service/dao on another box(can use spring remote/EJB).
I'm not sure what you mean for "system" in this context, but if it means that you are planning to run your application in two servers,
one dedicated to presentation and other dedicated to business layer, take in mind that a simpler approach (and probably more suitable for your app)
is build a co-located architecture.
Basically, the idea is to replicate your app in several servers (at least two) and put in front of them a load balancer that routes the incoming requests among the available servers.
All servers share the same database instance. This will give you vertical scalability and also will improve the availability of your system.
I only see the positive side of it but somehow not able to figure out what can be the negative aspect of it?
Distributing your business logic will probably involve a refactor of your application code, if the system is working well you will add some bugs for sure.
The necessary remote calls will add latency and the fact that you execute your business logic in several servers doesn't resolve the performance problems on the presentation tier.
In Expert One-on-One J2EE Development Without EJB (pag. 65), you can find a good reading about why not distribute your business logic.

JVM, Tomcat on 64 bit Linux

On 32 bit systems, JVM has a memory limit of 1.5 to 2 GB. What is a good value of JVM memory on 64 bit Linux ? How that can be mapped to maximum number of threads and maximum requests in tomcat ?
I am using JDK 6+ and tomcat 7. RAM available will be 12 GB on a quad core processor.
MRD

I don't think there's an out of the box answer to this question. This depends heavily on what kind of applications you are going to host and how much is it going to be on your system. I administer a small server with 3-4 applöications on a 64bit linux system. Using 4GB is more than enough for me.
My advise is make a wild guess how much ram is required for your applications. Then startup tomcat with a monitor tool then watch how much load is there on your tomcat. You might have allocated too much resource for tomcat. Maybe too few. you never know
Please read this article on Simultaneous users, and also the article about load balancing in tomcat
Basically you have to differentiate between users and requests. you might have 5000 users browsing your site, but only 100 making requests for a new page at one moment. By default tomcat supports 50 concurrent requests (not 100% sure though). But this number can be changed in your tomcat configuration. Obviously you might need more hardware. In the second article, max 200 requests per tomcat instance is recommended. only simple calculation rules as mentioned in the second article and doing some monitoring can help.
There's even a load balancer manager for tomcat. Check it out
load balancer for tomcat
One more thing to think of, is although you have the hardware and the right load balancing to support 5000 users, you also need enough bandwidth to do so. Again explained in the second article "load balancing in tomcat"
Good luck

It depends on how many users will visit you application simultaneously.
Sometimes, the app will run very slowly at a particular time point,
For instance, at 8:00 AM, login action causes the app can't stand.
I suggest you to estimate average memory per user, according the “total number of users",
Then you may get a nearly almost RIGHT memory setting.

Scalability of a single server for running a Java Web application

I want to gain more insight regarding the scale of workload a single-server Java Web application deployed to a single Tomcat instance can handle. In particular, let's pretend that I am developing a Wiki application that has a similar usage pattern like Wikipedia. How many simultaneous requests can my server handle reliably before going out of memory or show signs of excess stress if I deploy it on a machine with the following configuration:
4-Core high-end Intel Xeon CPU
8GB RAM
2 HDDs in RAID-1 (No SSDs, no PCIe based Solid State storages)
RedHat or Centos Linux (64-bit)
Java 6 (64-bit)
MySQL 5.1 / InnoDB
Also let's assume that the MySQL DB is installed on the same machine as Tomcat and that all the Wiki data are stored inside the DB. Furthermore, let's pretend that the Java application is built on top of the following stack:
SpringMVC for the front-end
Hibernate/JPA for persistence
Spring for DI and Security, etc.
If you haven't used the exact configuration but have experience in evaluating the scalability of a similar architecture, I would be very interested in hearing about that as well.
Thanks in advance.
EDIT: I think I have not articulated my question properly. I mark the answer with the most up votes as the best answer and I'll rewrite my question in the community wiki area. In short, I just wanted to learn about your experiences on the scale of workload your Java application has been able to handle on one physical server as well as some description regarding the type and architecture of the application itself.

You will need to use group of tools :
Loadtesting Tool - JMeter can be used.
Monitoring Tool - This tool will be used to monitor various numbers of resources load. There are Lot paid as well as free ones. Jprofiler,visualvm,etc
Collection and reporting tool. (Not used any tool)
With above tools you can find optimal value. I would approach it in following way.
will get to know what should be ratio of pages being accessed. What are background processes and their frequency.
Configure my JMeter accordingly (for ratios) , and monitor performance for load applied ( time to serve page ...can be done in JMeter), monitor other resources using Monitor tool. Also check count of error ratio. (NOTE: you need to decide upon what error ratio is not acceptable.)
Keep increasing Load step by step and keep writting various numbers of interest till server fails completely.
You can decide upon optimal value based on many criterias, Low error rate, Max serving time etc.
JMeter supports lot of ways to apply load.

To be honest, it's almost impossible to say. There's probably about 3 ways (of the top of my head to build such a system) and each would have fairly different performance characteristics. You best bet is to build and test.
Firstly try to get some idea of what the estimated volumes you'll have and the latency constraints that you'll need to meet.
Come up with a basic architecture and implement a thin slice end to end through the system (ideally the most common use case). Use a load testing tool like (Grinder or Apache JMeter) to inject load and start measuring the performance. If the performance is acceptable - be conservative your simple implementation will likely include less functionality and be faster than the full system - continue building the system and testing to make sure you don't introduce a major performance bottleneck. If not come up with a different design.
If your code is reasonable the bottleneck will likely be the database and somewhere in the region 100s of db ops per second. If that is insufficient then you may need to think about caching.

Definitely take a look at Spring Insight for performance monitoring and analysis.

English Wikipedia has 14GB data. A 8GB mem cache would have very high hit/miss ratio, and I think harddisk read would be well within its capacity. Therefore, the app is most likely network bound.
English Wikipedia has about 3000 page views per second. It is possible that tomcat can handle the load by careful tuning, and the network has enough throughput to server the traffic.
So the entire wikipedia site can be hosted on one moderate machine? Probably not. Just an idea.
-
http://stats.wikimedia.org/EN/TablesWikipediaEN.htm
http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm

Tomcat doesn't allow for spreading over multiple machines. If you really are concerned about scalability, you must consider what to do when your application outgrows a single machine.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.