Optimise billing on app engine for continual polling

Optimise billing on app engine for continual polling - java

I'm creating a service on appengine that feeds back measurements to the user. The measurements are collected by polling another server every fifteen minutes (the user needs four measurements over the last hour). The other sever replies with the data immediately so this isn't a "long poll request". I don't expect there to be a high load on the server because there aren't a lot of users (maybe 20 requests a day or so) so there won't be many requests coming in for the data, but because the user needs data over the last hour I am forced to poll continuously. This makes me concerned about billing because the new billing system charges per instance hour at a 15 min granularity and this would mean I'd have an instance actively running 24/7 (as far as I can tell).
Question
So, I expect a low request rate and am not too concerned about latency etc. How can I optomise this setup for the lowest possible billing?
What I had planned
What I was planning to do was try and get away with the free quota for now by setting max idle instance to 1 and only using the frontend to do both polling and serving (I'm guessing site responsivness will suffer a fair amount) because the frontend has far more free instance hours (28) than the backend (9). Can the frontend even be set up to poll every 15 mins?

There's nothing you can really tweak here for this. You'll want to use cron or the task queue for the polling anyway; these use frontend instances, not backend instances. As long as you have multithreading enabled, frontend latency will not be affected, and you'll likely remain within your free quota as long as you don't do enough polling or get enough traffic to require more than one concurrent instance.

Related

How many active users maximum does a Parse.com database allow on the free plan?

I am exploring Parse.com and building a proof-of-concept web application using the Parse.com data storage. I keep receiving a Daily App Stats email. That's OK, and I noticed one of my database applications had 8 ACTIVE USERS and it reported 14% of Monthly quote. I tried to search around Parse.com for its meaning but didn't get anything.
My question is how many users at maximum a database can handle on the free plan? My app will have as many as 300+ users.

My reading of the pricing and FAQ pages is that you are not accounted based on the number of users you have. Rather you are accounted based on your aggregate requests per second, file storage, database storage and data transfer.
The FAQ has this advice:
Q: Does requests per second roughly translate to # of concurrent users? I anticipate millions of users but I can't afford to pay $10M per month.
A: Generally when your user count doubles, your requests per second also double. However, different apps send different numbers of requests per second depending on how frequently they save objects or issue queries. We estimate that the average app's active user will issue 10 requests. Thus, if you had a million users on a particular day, and their traffic was evenly spread throughout the day, you could estimate your app would need about 10,000,000 total API requests, or about 120 requests per second. Every app is different, so we strongly encourage you to measure how many requests your users send. You can see this in the Performance Analytics tab of your account.
So ....
My question is how many users at maximum a database can handle on the free plan? My app will have as many as 300+ users.
There is no limit the number of users, but if you have lots of simultaneous active users that is likely to translate into lots of requests per second. The actual ratio depends on your application and what your users are doing. That is impossible for us to predict.
However, if 8 active users takes about 15% of the allowed free usage, then a simple linear extrapolation says that are liable to need to start paying for capacity when you reach about 50 to 60 users. But that is little better than a guess .... and it takes no account of how evenly the usage is spread over a typical 24 hour period.

What's a good way to enforce a single rate limit on multiple machines?

I have a web service with a load balancer that maps requests to multiple machines. Each of these requests end up sending a http call to an external API, and for that reason I would like to rate limit the number of requests I send to the external API.
My current design:
Service has a queue in memory that stores all received requests
I rate limit how often we can grab a request from the queue and process it.
This doesn't work when I'm using multiple machines, because each machine has its own queue and rate limiter. For example: when I set my rate limiter to 10,000 requests/day, and I use 10 machines, I will end up processing 100,000 requests/day at full load because each machine processes 10,000 requests/day. I would like to rate limit so that only 10,000 requests get processed/day, while still load balancing those 10,000 requests.
I'm using Java and MYSQL.

use memcached or redis keep api request counter per client. check every request if out rate limit.
if you think checking at every request is too expensive,you can try storm to process request log, and async calculate request counter.

The two things you stated were:
1)"I would like to rate limit so that only 10,000 requests get processed/day"
2)"while still load balancing those 10,000 requests."
First off, it seems like you are using a divide and conquer approach where each request from your end user gets mapped to one of the n machines. So, for ensuring that only the 10,000 requests get processed within the given time span, there are two options:
1) Implement a combiner which will route the results from all n machines to
another endpoint which the external API is then able to access. This endpoint is able
to keep a count of the amount of jobs being processed, and if it's over your threshold,
then reject the job.
2) Another approach is to store the amount of jobs you've processed for the day as a variable
inside of your database. Then, it's common practice to check if your threshold value
has been reached by the value in your database upon the initial request of the job
(before you even pass it off to one of your machines). If the threshold value has been
reached, then reject the job at the beginning. This, coupled with an appropriate message, has an advantage as having a better experience for the end user.
In order to ensure that all these 10,000 requests are still being load balanced so that no 1 CPU is processing more jobs than any other cpu, you should use a simple round robin approach to distribute your jobs over the m CPU's. With round robin, as apposed to a bin/categorization approach, you'll ensure that the job request is being distributed as uniformly as possible over your n CPU's. A downside to round robin, is that depending on the type of job you're processing you might be replicating a lot data as you start to scale up. If this is a concern for you, you should think about implementing a form of locality-sensitive hash (LSH) function. While a good hash function distributes the data as uniformly as possible, LSH exposes you to having a CPU process more jobs than other CPU's if a skew in the attribute you choose to hash against has a high probability of occurring. As always, there's tradeoffs associated with both, so you'll know best for your use cases.

Why not implement a simple counter in your database and make the API client implement the throttling?
User Agent -> LB -> Your Service -> Q -> Your Q consumers(s) -> API Client -> External API
API client checks the number (for today) and you can implement whatever rate limiting algorithm you like. eg if the number is > 10k the client could simply blow up, have the exception put the message back on the queue and continue processing until today is now tomorrow and all the queued up requests can get processed.
Alternatively you could implement a tiered throttling system, eg flat out til 8k, then 1 message every 5 seconds per node up til you hit the limit at which point you can send 503 errors back to the User Agent.
Otherwise you could go the complex route and implement a distributed queue (eg AMQP server) however this may not solve the issue entirely since your only control mechanism would be throttling such that you never process any faster than the something less than the max limit per day. eg your max limit is 10k so you never go any faster than 1 message every 8 seconds.

If you're not adverse to using a library/service https://github.com/jdwyah/ratelimit-java is an easy way to get distributed rate limits.
If performance is of utmost concern you can acquire more than 1 token in a request so that you don't need to make an API request to the limiter 100k times. See https://www.ratelim.it/documentation/batches for details of that

AppEngine performance tuning with idle instances and pending latency settings

I have an app-engine (paid) app that's averaging around 200 visits per day (1000 page views, sporadically it spikes up to 1000 visits and 10000 page views or more) and I am currently waking it up via cron jobs every 5 minutes to ensure reasonable performance. This doesn't always work during app-engine latency spikes (fortunately this does not happen too often as of lately), and when that happens my ajax calls miserably time-out. Also the cron-job strategy is not ideal because it eats away at the quotas.
At the moment I have Idle Instances and Pending Latency settings all on "Automatic".
Does anyone have experience with manually tweaking those settings and what are some typical values that could guarantee better performance on my app given the traffic?

Instead of cron job, just set Idle Instances to 1. Idle Instances are instances that are in "reserve", giving you instant response to increased load. So if you have a load that requires three instances and you set Idle Instance to one, then you will have 4 instances running.
The downside is that you'll always be paying for one more instance than currently utilised. However, keep in mind you get 28 free instance hours a day, covering one Idle Instance for free ( except for times when you have one instance actually serving requests, then one more Idle instance will be an additional cost).
Also, if you have Idle Instances set, then Pending Latency will have little or no effect, since Pending Latency is consulted when new instances need to be started, but you always have one instance in reserve. Caveat: this my not be true if app code goes haywire or is poorly written (like calling external services inside a request handler), resulting in abnormally long response times.
Bottom line: set Idle Instances to 1, then set Pending Latency to some max value that is still acceptable by your app.

Hashing Password in Google App Engine and Instance Hours Quota

I've been reading a lot about password storing, hashing, salting, "peppering", MAC, etc because I'm about to make a new website and security it's really important to me, however there are some reasons why I'm considering not using Google Authentication (or Facebook, OpenID or any other) which are not relevant right now, but it brings me to this point.
I'm new to Google App Engine, this is going to be my first project on it, and I'm a little confused about the "Instance Hours" and how it no longer has "CPU time" but the aforementioned quota. Even worst, I haven't been able to understand what is the Instance Hours Free Quota.
Here's why I'm worried about the quotas and what does that has anything to do with my security concerns: One recommendation I've read everywhere is to make multiple iterations and hash the password several times, because that would make and attacker spend much much much more time (I don't have numbers, but they are everywhere on https://security.stackexchange.com/).
Multiple iterations have direct impact on CPU time, and if GAE had a CPU time quota I think making 1000 iterations every time a user logs in could be a problem, however if what they count is Instance Hours from the moment the request is done to up to fifteen minutes later and as read on GAE quota docs is:
In general, instance usage is billed on an hourly basis based on the
instance's uptime. Billing begins when the instance starts and ends
fifteen minutes after the instance shuts down. You will be billed only
for idle instances up to the number of maximum idle instances set in
the Performance Settings tab of the Admin Console. Runtime overhead is
counted against the instance memory.
then it means that if my users log in (hash 1000 times), then they continue to use the site, the Instance Hours will continue to sum until all of them leave the page + 15 minutes? If this is true, then making it iterate 1000 times wouldn't have a significant impact on my quota, other than the "extra" time it takes for the user to log in, but I'm aware of that and it's a price I'm willing to pay.
The number of iterations I'll make will be the ones that make the time to log in acceptable and imperceptible to the user, so don't worry about this.
My questions are:
Will making MANY iterations have a direct impact on the Instance Hours, or my assumptions about how the Instance Hours are summed are correct?
Is there a CPU time quota on Google App Engine I'm missing somehow? Does it have a Free Quota?
What is the Instance Hours Free Quota?
Answers:
Look Moishe accepted answer and the other question he asked (which has not been answered but has usefull comments) When does the App Engine scheduler use a new thread vs. a new instance?
According to Google there is no CPU time quota: http://googleappengine.blogspot.com.es/2009/02/skys-almost-limit-high-cpu-is-no-more.html
Found an answer to question number 3 here: Google App Engine Frontend Instance Hours Limit Reached

If it takes a long time to process a request, because eg. you're doing something very computationally intensive, and you don't want other users to wait a long time, the App Engine scheduler may spin up another instance of your application to serve incoming requests.
Imagine that computing the hash for a password takes 1 minute and during that minute your application gets a request from another user. That user could wait for a minute to get a response to their request, or the App Engine scheduler could spin up another instance to service that request and get a response back much sooner. You can tune whether or not another instance will come up using the Performance sliders on your Application Settings page in the admin console.
Basically the question you need to ask about instance hours is: is it likely you'll get overlapping requests (ie. a new request coming in before the current request is complete). If this happens not-infrequently, and you want snappy response for your users, you'll need to budget more instance hours.
I suspect that the big computation you'll need to do will be infrequent -- only on initial sign-in to generate a cookie, say, rather than for every request.
To explicitly answer your question #1, making many iterations will only have an effect on your instance hours if it causes overlapping requests. If you only get one request every 30 seconds, you could spend 30 seconds serving each request (including calculating each hash, and doing other operations) and not exceed your free instance-hours quota. Conversely if you get 10 requests per second and spend any more than 100ms serving the request, then you'll start to exceed your instance hours fairly quickly.

Instance hours are for long as the server is running, answering requests, etc. If your server isn't running, it can't wake up on a request or anything.
Imagine instance hours as having the computer on. You are billed when it's on, and not when it's off.
You could have multiple instances, so let's say you have two instances, you're burning twice as many instance hours.
Your password hashing won't affect this because it will only incur instance hours when the instance is on, and when its off, it won't be incurring any instance hours, but it won't be hashing either.

There are multiple sources covering passwords. You evidently have read some that encourage multi-pass hashing. Consider the first link below before finalizing this decision. Excerpt from this page: "It's easy to get carried away and try to combine different hash functions, hoping that the result will be more secure. In practice, though, there is no benefit to doing it. All it does is create interoperability problems, and can sometimes even make the hashes less secure."
Two valuable links to consider( first has quote above, second is good "how to" source):
http://crackstation.net/hashing-security.htm
http://throwingfire.com/storing-passwords-securely/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+throwingfire+%28Throwing+Fire%29#notpasswordhashes

How much more efficient is Java Google App Engine in threadsafe mode?

In Java Google App Engine you can turn on Concurrent Requests / Threadsafe mode: https://developers.google.com/appengine/docs/java/config/appconfig#Using_Concurrent_Requests
The only reason to do this is that the Google servers will need to spin up fewer instances of your app to serve a given number of requests and therefore potentially save you money. Of course doing this will also mean you will have to write threadsafe code.
So the interesting question is: how much money does this tend to save? Has anyone attempted to measure it under some benchmark configuration / application functionality / load ?

This really depends on your code:
In single request mode, you can easily calculate requests per second: if a request on average takes 100ms to finish, then one instance will be able to perform 10 requests per second.
In concurrent request mode this is depends on two factors:
A. The type of instance you are using - AFAIK they are all the same you just get different number of cores. More cores means higher concurrent performance.
B. The ratio of CPU-bound code versus IO-bound code a request is performing. If your code is more IO-bound (= waiting for Datastore or other external service) then CPU will be able to run more of it in parallel.
In my app I see 15-20 rps at 200ms per request on the basic instance, so I could say that the factor between single-request and multi-request mode is about 3-4.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.