AppEngine performance tuning with idle instances and pending latency settings

AppEngine performance tuning with idle instances and pending latency settings - java

I have an app-engine (paid) app that's averaging around 200 visits per day (1000 page views, sporadically it spikes up to 1000 visits and 10000 page views or more) and I am currently waking it up via cron jobs every 5 minutes to ensure reasonable performance. This doesn't always work during app-engine latency spikes (fortunately this does not happen too often as of lately), and when that happens my ajax calls miserably time-out. Also the cron-job strategy is not ideal because it eats away at the quotas.
At the moment I have Idle Instances and Pending Latency settings all on "Automatic".
Does anyone have experience with manually tweaking those settings and what are some typical values that could guarantee better performance on my app given the traffic?

Instead of cron job, just set Idle Instances to 1. Idle Instances are instances that are in "reserve", giving you instant response to increased load. So if you have a load that requires three instances and you set Idle Instance to one, then you will have 4 instances running.
The downside is that you'll always be paying for one more instance than currently utilised. However, keep in mind you get 28 free instance hours a day, covering one Idle Instance for free ( except for times when you have one instance actually serving requests, then one more Idle instance will be an additional cost).
Also, if you have Idle Instances set, then Pending Latency will have little or no effect, since Pending Latency is consulted when new instances need to be started, but you always have one instance in reserve. Caveat: this my not be true if app code goes haywire or is poorly written (like calling external services inside a request handler), resulting in abnormally long response times.
Bottom line: set Idle Instances to 1, then set Pending Latency to some max value that is still acceptable by your app.

Related

Gatling (and JMeter) struggling to maintain Requests per Second (RPS)?

I'm load testing an API. We have a problem - our response times are too large, sometimes close to a minute. We want to be at the range of under a second. But that is besides the point.
When I use a load testing tool, such as Gatling, the RPS sent seem to hit a halt. As you can see in the attached image, there is an initial 15 seconds of 20RPS, and suddenly almost no RPS at all. How can I maintain constant RPS? Probably it has to do with the poor response times, but what if I don't care about the response times? I just want the RPS constant.
My initial tests with JMeter also show similar behaviour.

What injection strategy are you using? How scenario looks? Is every user making one request, chain of requests or any of above in a loop?
Assuming that you want to test single endpoint, best approach to get constant requests per second (not constant responses as you already know) is to use scenario that executes single request and strategy that injects constant number of users per second fe:
setUp(
scn.inject(constantUsersPerSec(25) during(15 minute))
)
If your user performs more then 1 requests there is option to throttle requests, but you need to remember that it will only throttle down not up, so you need to make sure that active users will make enough requests per second to reach that limit fe:
setUp(scn.inject(
constantUsersPerSec(10) during(15 minutes)
).throttle(
jumpToRps(25), holdFor(15 minutes)
))
So here if fe. single user makes 5 requests you can reach even 50 req/s but it will be throttled to 25. But you must remember that new users will be added every second so if it takes more time to finish 1 user then number of active users will increase. Also if response time is high then active users may not produce enough req/s since most of their time is waiting for response.

In JMeter, You can achieve that by using Constant Throughput Timer at your test plan level.
Constant Throughput timer allows you to maintain the throughput of your server (requests/sec). Constant Throughput Timer is only capable of pausing JMeter threads in order to slow them down to reach the target throughput. Also, it works only on a minute level so you need to properly calculate the ramp-up period and let your test run long enough.
Let's see a brief thought on this:
To achieve the target throughput, you need to have enough number of threads in your test plan.
To calculate the number of threads you need for this test, you can use the below formula:
RPS * max response time in second
In your case, If you want 20 RPS and your max response time is 60 seconds, you need at least 1200 (20*60=1200) Threads in your test plan.
As Constant Throughput Timer works on a minute level, to achieve 20 RPS you have to configure your "Target Throughput" value to 1200/min and "Calculate Throughput based on" value as "All active threads".
Constant Throughput Timer Config:
Now, if you have more than single requests in your test plan (i.e 4 requests), then 1200 requests/min will be distributed among 4 samplers. That means you will get 5RPS for each sampler.
Now, for the Thread Group configurations, as you have mentioned "Calculate Throughput based on" value in Constant Throughput Timer for "All active threads", so all of your 1200 threads need to be started on the server to achieve that 20 RPS. Use Ramp-Up Period config to control these threads to start.
Ramp-Up Period is the time in which all the threads arrive on your tested application server. So if you use 60 seconds, then it will take 60 seconds to start all of your 1200 threads. 1200 threads will be active in 60 seconds.
You also need to set your test duration accordingly. Say, you want to keep that 20 RPS for 5 minutes. In this case, you have to set your test duration for 7 mins (2 min extra is for: Starting 1 min for 1200 threads to start which is the ramp up time and last 1 min for the 1200 threads ramp-down time). Don't forget to check the loop counts to Forever if you are using Thread Group.
Thread Group Config for the above-mentioned scenario:
You can also use another handy JMeter plugin which is Ultimate Thread Group if you are confused with the default Thread Group configurations. You can download JMeter Plugins by using JMeter Plugins Manager.
Here is the Ultimate Thread Group Config for the above-mentioned scenario:
Now while after the test finishes, you can check those 5 minutes results where all of your 1200 threads were active by using the Hits Per Second Listener and as well as Active Threads Over Time Listener.
Do not use JMeter GUI for Load testing, use the Non-GUI mode. Also, remove assertions if you have any in your test plan while you're trying to achieve some target RPS.

How does AWS Lambda serve multiple requests?

How does AWS Lambda serve multiple requests?
I want to know is it a multi-thread kind of a model here as well?
If I am calling a Lambda from an API gateway. And there are 1000 requests in 10 secs to the API. How many containers will be created and how many threads.

How does AWS Lambda serve multiple requests?
Independently.
I want to know is it a multi-thread kind of a model here as well?
No, it is not a multi-threaded model in the sense that you are asking.
Your code can, of course, be written to use multiple threads and/or child processes to accomplish whatever purpose it is intended to accomplish for one invocation, but Lambda doesn't send more than one invocation at a time to the same container. The container is not used for a second invocation until the first one finishes. If a second request arrives while a first one is running, the second one will run in a different container.
If I am calling a Lambda from an API gateway. And there are 1000 requests in 10 secs to the API. How many containers will be created and how many threads?
As many containers will be created as are needed to process each of the arriving requests in its own container.
The duration of each invocation will be the largest determinant of this.
1000 very quick requests in 10 seconds are roughly equivalent to 100 requests in 1 second. Assuming each request finishes in less than 1 second and arrival times are evenly-distributed, you could expect fewer than 100 containers to be created.
On the other hand, if 1000 requests arrived in 10 seconds and each request took 30 seconds to complete, you would have 1000 containers in existence during this event.
After a spike in traffic inflates the number of containers, they will all tend to linger for a few minutes, ready to handle the additional load if it arrives, and then Lambda will start terminating them.

AWS Lambda is capable of serving multiple requests by horizontally scaling for multiple containers. Lambda can support up to 1000 parallel container executions by default.
there are 1000 requests in 10 secs to the API. How many containers will be created and how many threads.
Requests per second = 1000/10 = 100
There will be 100 parallel Lambda executions assuming each execution takes 1 second or more to complete.
Note: Also you can spawn multiple threads but its difficult to predict the performance gain.
Also keep in mind that, having multiple threads is not always
efficient The CPU available to your Lambda function is shared between
all threads and processes your Lambda function creates. Generally you
will not get more CPU in a Lambda function by running work in parallel
among multiple threads. Your code in this case isn’t actually running
on two cores, but on two “hyperthreads” on a single core; depending on
the workload, this may be better or worse than a single thread. The
service team is looking at ways to better leverage multiple cores in
the Lambda execution environment, and we will take your feedback as a
+1 for that feature.
Reference: AWS Forum Post
For further details on concurrent executions of Lambda, refer this aws documentation.

There are a few angles to discuss.
AWS Lambda does support handling requests in parallel, but any single instance / container of a Lambda will only process one request at a time. If all existing instances are busy then new ones will be provisioned (depending on concurrency settings, discussed below).
Within a single Lambda instance multi-threading is supported, but still only one request will be handled per instance. In practice parallelization is rarely beneficial in Lambda, it adds significant overhead and is best used for processing very large sets. Additionally, Lambdas need to have more than 1 virtual core for it to have any benefit. Cores are configured by raising the memory setting--many Lambdas run with a low enough memory setting to have just one core.
Determining exactly how many containers / instances are created isn't always possible due to there being many factors:
Lambda will reuse any existing, paused, instances
Existing instances are often very fast to handle requests, a small number of warm instances can process many, many requests in the time it takes to provision new instances (especially with runtimes like Java or .NET Core, which often have startup times of 1+ seconds)
The concurrency settings of your Lambda are a significant factor
If you have Reserved Concurrency of X, you will never have more than X instances
If you have unreserved concurrency, then the limit is based on available concurrency. This defaults to 1000 instances per account, so if 990 instances of any Lambdas already exist then only 10 could be created
If you have provisioned concurrency then you will always have a minimum number of instances, reducing cold-starts
But, to try to answer your story problem, let's assume you are sending your 1000 requests at a steady pace over the 10 minutes. That's one request every 600 milliseconds. Let's also assume your Java app is given a fairly high memory allocation, and its initialization is relatively quick -- let's say 1 second for a cold start. Once the cold start is complete invocation is fast -- let's say 10ms. And, let's assume there are no instances when the traffic begins.
The first request will see a response time of ~1,010ms -- 1 second for a cold start, and 10ms for handling the request. A second request will arrive while the first is still processing, so it's likely that Lambda will provision a second instance, and the second request will see a similar response time.
By the time the third request comes in (1800ms after the start) both instances are now idle and can be reused--so this request will not experience a cold start, and the response time will be 10ms. From this point forward it's likely that no additional instances are needed--but this all assumes a steady rate of requests.
But--changing any variable can have a big impact.

Hashing Password in Google App Engine and Instance Hours Quota

I've been reading a lot about password storing, hashing, salting, "peppering", MAC, etc because I'm about to make a new website and security it's really important to me, however there are some reasons why I'm considering not using Google Authentication (or Facebook, OpenID or any other) which are not relevant right now, but it brings me to this point.
I'm new to Google App Engine, this is going to be my first project on it, and I'm a little confused about the "Instance Hours" and how it no longer has "CPU time" but the aforementioned quota. Even worst, I haven't been able to understand what is the Instance Hours Free Quota.
Here's why I'm worried about the quotas and what does that has anything to do with my security concerns: One recommendation I've read everywhere is to make multiple iterations and hash the password several times, because that would make and attacker spend much much much more time (I don't have numbers, but they are everywhere on https://security.stackexchange.com/).
Multiple iterations have direct impact on CPU time, and if GAE had a CPU time quota I think making 1000 iterations every time a user logs in could be a problem, however if what they count is Instance Hours from the moment the request is done to up to fifteen minutes later and as read on GAE quota docs is:
In general, instance usage is billed on an hourly basis based on the
instance's uptime. Billing begins when the instance starts and ends
fifteen minutes after the instance shuts down. You will be billed only
for idle instances up to the number of maximum idle instances set in
the Performance Settings tab of the Admin Console. Runtime overhead is
counted against the instance memory.
then it means that if my users log in (hash 1000 times), then they continue to use the site, the Instance Hours will continue to sum until all of them leave the page + 15 minutes? If this is true, then making it iterate 1000 times wouldn't have a significant impact on my quota, other than the "extra" time it takes for the user to log in, but I'm aware of that and it's a price I'm willing to pay.
The number of iterations I'll make will be the ones that make the time to log in acceptable and imperceptible to the user, so don't worry about this.
My questions are:
Will making MANY iterations have a direct impact on the Instance Hours, or my assumptions about how the Instance Hours are summed are correct?
Is there a CPU time quota on Google App Engine I'm missing somehow? Does it have a Free Quota?
What is the Instance Hours Free Quota?
Answers:
Look Moishe accepted answer and the other question he asked (which has not been answered but has usefull comments) When does the App Engine scheduler use a new thread vs. a new instance?
According to Google there is no CPU time quota: http://googleappengine.blogspot.com.es/2009/02/skys-almost-limit-high-cpu-is-no-more.html
Found an answer to question number 3 here: Google App Engine Frontend Instance Hours Limit Reached

If it takes a long time to process a request, because eg. you're doing something very computationally intensive, and you don't want other users to wait a long time, the App Engine scheduler may spin up another instance of your application to serve incoming requests.
Imagine that computing the hash for a password takes 1 minute and during that minute your application gets a request from another user. That user could wait for a minute to get a response to their request, or the App Engine scheduler could spin up another instance to service that request and get a response back much sooner. You can tune whether or not another instance will come up using the Performance sliders on your Application Settings page in the admin console.
Basically the question you need to ask about instance hours is: is it likely you'll get overlapping requests (ie. a new request coming in before the current request is complete). If this happens not-infrequently, and you want snappy response for your users, you'll need to budget more instance hours.
I suspect that the big computation you'll need to do will be infrequent -- only on initial sign-in to generate a cookie, say, rather than for every request.
To explicitly answer your question #1, making many iterations will only have an effect on your instance hours if it causes overlapping requests. If you only get one request every 30 seconds, you could spend 30 seconds serving each request (including calculating each hash, and doing other operations) and not exceed your free instance-hours quota. Conversely if you get 10 requests per second and spend any more than 100ms serving the request, then you'll start to exceed your instance hours fairly quickly.

Instance hours are for long as the server is running, answering requests, etc. If your server isn't running, it can't wake up on a request or anything.
Imagine instance hours as having the computer on. You are billed when it's on, and not when it's off.
You could have multiple instances, so let's say you have two instances, you're burning twice as many instance hours.
Your password hashing won't affect this because it will only incur instance hours when the instance is on, and when its off, it won't be incurring any instance hours, but it won't be hashing either.

There are multiple sources covering passwords. You evidently have read some that encourage multi-pass hashing. Consider the first link below before finalizing this decision. Excerpt from this page: "It's easy to get carried away and try to combine different hash functions, hoping that the result will be more secure. In practice, though, there is no benefit to doing it. All it does is create interoperability problems, and can sometimes even make the hashes less secure."
Two valuable links to consider( first has quote above, second is good "how to" source):
http://crackstation.net/hashing-security.htm
http://throwingfire.com/storing-passwords-securely/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+throwingfire+%28Throwing+Fire%29#notpasswordhashes

Quartz Performance

It seems there is a limit on the number of jobs that Quartz scheduler can run per second. In our scenario we are having about 20 jobs per second firing up for 24x7 and quartz worked well upto 10 jobs per second (with 100 quartz threads and 100 database connection pool size for a JDBC backed JobStore), however, when we increased it to 20 jobs per second, quartz became very very slow and its triggered jobs are very late compared to their actual scheduled time causing many many Misfires and eventually slowing down the overall performance of the system significantly. One interesting fact is that JobExecutionContext.getScheduledFireTime().getTime() for such delayed triggers comes to be 10-20 and even more minutes after their schedule time.
How many jobs the quartz scheduler can run per second without affecting the scheduled time of the jobs and what should be the optimum number of quartz threads for such load?
Or am I missing something here?
Details about what we want to achieve:
We have almost 10k items (categorized among 2 or more categories, in current case we have 2 categories) on which we need to some processing at given frequency e.g. 15,30,60... minutes and these items should be processed within that frequency with a given throttle per minute. e.g. lets say for 60 minutes frequency 5k items for each category should be processed with a throttle of 500 items per minute. So, ideally these items should be processed within first 10 (5000/500) minutes of each hour of the day with each minute having 500 items to be processed which are distributed evenly across the each second of the minute so we would have around 8-9 items per second for one category.
Now for to achieve this we have used Quartz as scheduler which triggers jobs for processing these items. However, we don't process each item with in the Job.execute method because it would take 5-50 seconds (averaging to 30 seconds) per item processing which involves webservice call. We rather push a message for each item processing on JMS queue and separate server machines process those jobs. I have noticed the time being taken by the Job.execute method not to be more than 30 milliseconds.
Server Details:
Solaris Sparc 64 Bit server with 8/16 cores/threads cpu for scheduler with 16GB RAM and we have two such machines in the scheduler cluster.

In a previous project, I was confronted with the same problem. In our case, Quartz performed good up a granularity of a second. Sub-second scheduling was a stretch and as you are observing, misfires happened often and the system became unreliable.
Solved this issue by creating 2 levels of scheduling: Quartz would schedule a job 'set' of n consecutive jobs. With a clustered Quartz, this means that a given server in the system would get this job 'set' to execute. The n tasks in the set are then taken in by a "micro-scheduler": basically a timing facility that used the native JDK API to further time the jobs up to the 10ms granularity.
To handle the individual jobs, we used a master-worker design, where the master was taking care of the scheduled delivery (throttling) of the jobs to a multi-threaded pool of workers.
If I had to do this again today, I'd rely on a ScheduledThreadPoolExecutor to manage the 'micro-scheduling'. For your case, it would look something like this:
ScheduledThreadPoolExecutor scheduledExecutor;
...
scheduledExecutor = new ScheduledThreadPoolExecutor(THREAD_POOL_SIZE);
...
// Evenly spread the execution of a set of tasks over a period of time
public void schedule(Set<Task> taskSet, long timePeriod, TimeUnit timeUnit) {
if (taskSet.isEmpty()) return; // or indicate some failure ...
long period = TimeUnit.MILLISECOND.convert(timePeriod, timeUnit);
long delay = period/taskSet.size();
long accumulativeDelay = 0;
for (Task task:taskSet) {
scheduledExecutor.schedule(task, accumulativeDelay, TimeUnit.MILLISECOND);
accumulativeDelay += delay;
}
}
This gives you a general idea on how use the JDK facility to micro-schedule tasks. (Disclaimer: You need to make this robust for a prod environment, like check failing tasks, manage retries (if supported), etc...).
With some testing + tuning, we found an optimal balance between the Quartz jobs and the amount of jobs in one scheduled set.
We experienced a 100X throughput improvement in this way. Network bandwidth was our actual limit.

First of all check How do I improve the performance of JDBC-JobStore? in Quartz documentation.
As you can probably guess there is in absolute value and definite metric. It all depends on your setup. However here are few hints:
20 jobs per second means around 100 database queries per second, including updates and locking. That's quite a lot!
Consider distributing your Quartz setup to cluster. However if database is a bottleneck, it won't help you. Maybe TerracottaJobStore will come to the rescue?
Having K cores in the system everything less than K will underutilize your system. If your jobs are CPU intensive, K is fine. If they are calling external web services, blocking or sleeping, consider much bigger values. However more than 100-200 threads will significantly slow down your system due to context switching.
Have you tried profiling? What is your machine doing most of the time? Can you post thread dump? I suspect poor database performance rather than CPU, but it depends on your use case.

You should limit your number of threads to somewhere between n and n*3 where n is the number of processors available. Spinning up more threads is going to cause a lot of context switching, since most of them will be blocked most of the time.
As far as jobs per second, it really depends on how long the jobs run and how often they're blocked for operations like network and disk io.
Also, something to consider is that perhaps quartz isn't the tool you need. If you're sending off 1-2 million jobs a day, you might want to look into a custom solution. What are you even doing with 2 million jobs a day?!
Another option, which is a really bad way to approach the problem, but sometimes works... what is the server it's running on? Is it an older server? It might be bumping up the ram or other specs on it will give you some extra 'umph'. Not the best solution, for sure, because that delays the problem, not addresses, but if you're in a crunch it might help.

In situations with high amount of jobs per second make sure your sql server uses row lock and not table lock. In mysql this is done by using InnoDB storage engine, and not the default MyISAM storage engine which only supplies table lock.

Fundamentally the approach of doing 1 item at a time is doomed and inefficient when you're dealing with such a large number of things to do within such a short time. You need to group things - the suggested approach of using a job set that then micro-schedules each individual job is a first step, but that still means doing a whole lot of almost nothing per job. Better would be to improve your webservice so you can tell it to process N items at a time, and then invoke it with sets of items to process. And even better is to avoid doing this sort of thing via webservices and process them all inside a database, as sets, which is what databases are good for. Any sort of job that processes one item at a time is fundamentally an unscalable design.

Optimise billing on app engine for continual polling

I'm creating a service on appengine that feeds back measurements to the user. The measurements are collected by polling another server every fifteen minutes (the user needs four measurements over the last hour). The other sever replies with the data immediately so this isn't a "long poll request". I don't expect there to be a high load on the server because there aren't a lot of users (maybe 20 requests a day or so) so there won't be many requests coming in for the data, but because the user needs data over the last hour I am forced to poll continuously. This makes me concerned about billing because the new billing system charges per instance hour at a 15 min granularity and this would mean I'd have an instance actively running 24/7 (as far as I can tell).
Question
So, I expect a low request rate and am not too concerned about latency etc. How can I optomise this setup for the lowest possible billing?
What I had planned
What I was planning to do was try and get away with the free quota for now by setting max idle instance to 1 and only using the frontend to do both polling and serving (I'm guessing site responsivness will suffer a fair amount) because the frontend has far more free instance hours (28) than the backend (9). Can the frontend even be set up to poll every 15 mins?

There's nothing you can really tweak here for this. You'll want to use cron or the task queue for the polling anyway; these use frontend instances, not backend instances. As long as you have multithreading enabled, frontend latency will not be affected, and you'll likely remain within your free quota as long as you don't do enough polling or get enough traffic to require more than one concurrent instance.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.