Hashing Password in Google App Engine and Instance Hours Quota - java

I've been reading a lot about password storing, hashing, salting, "peppering", MAC, etc because I'm about to make a new website and security it's really important to me, however there are some reasons why I'm considering not using Google Authentication (or Facebook, OpenID or any other) which are not relevant right now, but it brings me to this point.
I'm new to Google App Engine, this is going to be my first project on it, and I'm a little confused about the "Instance Hours" and how it no longer has "CPU time" but the aforementioned quota. Even worst, I haven't been able to understand what is the Instance Hours Free Quota.
Here's why I'm worried about the quotas and what does that has anything to do with my security concerns: One recommendation I've read everywhere is to make multiple iterations and hash the password several times, because that would make and attacker spend much much much more time (I don't have numbers, but they are everywhere on https://security.stackexchange.com/).
Multiple iterations have direct impact on CPU time, and if GAE had a CPU time quota I think making 1000 iterations every time a user logs in could be a problem, however if what they count is Instance Hours from the moment the request is done to up to fifteen minutes later and as read on GAE quota docs is:
In general, instance usage is billed on an hourly basis based on the
instance's uptime. Billing begins when the instance starts and ends
fifteen minutes after the instance shuts down. You will be billed only
for idle instances up to the number of maximum idle instances set in
the Performance Settings tab of the Admin Console. Runtime overhead is
counted against the instance memory.
then it means that if my users log in (hash 1000 times), then they continue to use the site, the Instance Hours will continue to sum until all of them leave the page + 15 minutes? If this is true, then making it iterate 1000 times wouldn't have a significant impact on my quota, other than the "extra" time it takes for the user to log in, but I'm aware of that and it's a price I'm willing to pay.
The number of iterations I'll make will be the ones that make the time to log in acceptable and imperceptible to the user, so don't worry about this.
My questions are:
Will making MANY iterations have a direct impact on the Instance Hours, or my assumptions about how the Instance Hours are summed are correct?
Is there a CPU time quota on Google App Engine I'm missing somehow? Does it have a Free Quota?
What is the Instance Hours Free Quota?
Answers:
Look Moishe accepted answer and the other question he asked (which has not been answered but has usefull comments) When does the App Engine scheduler use a new thread vs. a new instance?
According to Google there is no CPU time quota: http://googleappengine.blogspot.com.es/2009/02/skys-almost-limit-high-cpu-is-no-more.html
Found an answer to question number 3 here: Google App Engine Frontend Instance Hours Limit Reached

If it takes a long time to process a request, because eg. you're doing something very computationally intensive, and you don't want other users to wait a long time, the App Engine scheduler may spin up another instance of your application to serve incoming requests.
Imagine that computing the hash for a password takes 1 minute and during that minute your application gets a request from another user. That user could wait for a minute to get a response to their request, or the App Engine scheduler could spin up another instance to service that request and get a response back much sooner. You can tune whether or not another instance will come up using the Performance sliders on your Application Settings page in the admin console.
Basically the question you need to ask about instance hours is: is it likely you'll get overlapping requests (ie. a new request coming in before the current request is complete). If this happens not-infrequently, and you want snappy response for your users, you'll need to budget more instance hours.
I suspect that the big computation you'll need to do will be infrequent -- only on initial sign-in to generate a cookie, say, rather than for every request.
To explicitly answer your question #1, making many iterations will only have an effect on your instance hours if it causes overlapping requests. If you only get one request every 30 seconds, you could spend 30 seconds serving each request (including calculating each hash, and doing other operations) and not exceed your free instance-hours quota. Conversely if you get 10 requests per second and spend any more than 100ms serving the request, then you'll start to exceed your instance hours fairly quickly.

Instance hours are for long as the server is running, answering requests, etc. If your server isn't running, it can't wake up on a request or anything.
Imagine instance hours as having the computer on. You are billed when it's on, and not when it's off.
You could have multiple instances, so let's say you have two instances, you're burning twice as many instance hours.
Your password hashing won't affect this because it will only incur instance hours when the instance is on, and when its off, it won't be incurring any instance hours, but it won't be hashing either.

There are multiple sources covering passwords. You evidently have read some that encourage multi-pass hashing. Consider the first link below before finalizing this decision. Excerpt from this page: "It's easy to get carried away and try to combine different hash functions, hoping that the result will be more secure. In practice, though, there is no benefit to doing it. All it does is create interoperability problems, and can sometimes even make the hashes less secure."
Two valuable links to consider( first has quote above, second is good "how to" source):
http://crackstation.net/hashing-security.htm
http://throwingfire.com/storing-passwords-securely/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+throwingfire+%28Throwing+Fire%29#notpasswordhashes

Related

Adobe CQ Evaluation: Are there problems with Multi Site Manager / TarOptimizer?

I work at a retailer and we consider to introduce CQ5 as a CMS.
However, after doing some research and talking to consultants it turns out, that there may be things that may be "complicated". Perhaps one of you can shed a little light on this.
The first thing is, we were told that when you use the Multi Site Manager to create multi language pages (about 80 languages) the update process can be as slow as half an hour until a change is ultimately published. Did someone of you experience something similar?
The other thing is, that the TarOptimizer has pretty long running times. I was told that runs that take up to 24 hours are not uncommon. Again my question: Did someone of you had such a problem or has an explanation for this?
I am really looking forward to your response.
These are really 2 separate question, but I'll address them based on my experience.
The update process for creating new multi-language pages will vary based on the number of languages, and also the number of publish instances and web-servers (assuming you're using dispatcher to cache) you are running. This is because the replication process is where the bottleneck is (at least in my experience), and as such if you're trying to push out a large amount of content across a large number of publishers with a large number of front-end web-servers whose cache needs to be cleared, there will be some delay in getting this to happen since replication is an asynchronous process. The longest delay I've seen for this has been in the 10-15 minute range, that was with 12 publishers and 12 front end webservers, but this comes with the obvious caveat that your mileage may vary.
For the Tar Optimzation job, I'd encourage you to take a look at this page as it has a lot of good info about the Tar Optizer job and how to tune it. The job can take a long time to run when you have a large repository, especially on an instance with a large number of write operations, but the run times can be configured so that it only runs during a given time period, and it will pick up where it left off the night before if the total run time is longer than the allowed run time. By default, it runs from 2-5 am each night, so if it takes more than that 3 hour period, it will continue where it left off the next night, allowing it to optimize the entire repository over a period of a few days if needed.

AppEngine performance tuning with idle instances and pending latency settings

I have an app-engine (paid) app that's averaging around 200 visits per day (1000 page views, sporadically it spikes up to 1000 visits and 10000 page views or more) and I am currently waking it up via cron jobs every 5 minutes to ensure reasonable performance. This doesn't always work during app-engine latency spikes (fortunately this does not happen too often as of lately), and when that happens my ajax calls miserably time-out. Also the cron-job strategy is not ideal because it eats away at the quotas.
At the moment I have Idle Instances and Pending Latency settings all on "Automatic".
Does anyone have experience with manually tweaking those settings and what are some typical values that could guarantee better performance on my app given the traffic?
Instead of cron job, just set Idle Instances to 1. Idle Instances are instances that are in "reserve", giving you instant response to increased load. So if you have a load that requires three instances and you set Idle Instance to one, then you will have 4 instances running.
The downside is that you'll always be paying for one more instance than currently utilised. However, keep in mind you get 28 free instance hours a day, covering one Idle Instance for free ( except for times when you have one instance actually serving requests, then one more Idle instance will be an additional cost).
Also, if you have Idle Instances set, then Pending Latency will have little or no effect, since Pending Latency is consulted when new instances need to be started, but you always have one instance in reserve. Caveat: this my not be true if app code goes haywire or is poorly written (like calling external services inside a request handler), resulting in abnormally long response times.
Bottom line: set Idle Instances to 1, then set Pending Latency to some max value that is still acceptable by your app.

Optimise billing on app engine for continual polling

I'm creating a service on appengine that feeds back measurements to the user. The measurements are collected by polling another server every fifteen minutes (the user needs four measurements over the last hour). The other sever replies with the data immediately so this isn't a "long poll request". I don't expect there to be a high load on the server because there aren't a lot of users (maybe 20 requests a day or so) so there won't be many requests coming in for the data, but because the user needs data over the last hour I am forced to poll continuously. This makes me concerned about billing because the new billing system charges per instance hour at a 15 min granularity and this would mean I'd have an instance actively running 24/7 (as far as I can tell).
Question
So, I expect a low request rate and am not too concerned about latency etc. How can I optomise this setup for the lowest possible billing?
What I had planned
What I was planning to do was try and get away with the free quota for now by setting max idle instance to 1 and only using the frontend to do both polling and serving (I'm guessing site responsivness will suffer a fair amount) because the frontend has far more free instance hours (28) than the backend (9). Can the frontend even be set up to poll every 15 mins?
There's nothing you can really tweak here for this. You'll want to use cron or the task queue for the polling anyway; these use frontend instances, not backend instances. As long as you have multithreading enabled, frontend latency will not be affected, and you'll likely remain within your free quota as long as you don't do enough polling or get enough traffic to require more than one concurrent instance.

Cons of using Thread.sleep to slow down script?

This pertains to Google App Engine, using Java.
("threadsafe" in this scenario would be set to true, so please include that in your thought process.)
Which of the App Engine quotas are affected?
The goal would be to slow down some users (non-paying users), in such a way that their requests would take longer to arrive. (The time spent on the website/app is in the range of hours, per user.) So in a situation where the user is slowed down by double the wait, half of the theoretical bandwidth usage is saved, per user, since they don't spend even more time on to compensate; they go to bed at the end of the day, in this scenario.
Specifically, my goal is for this to be more cost effective, more than anything.
Are CPU quotas affected by thread delays?
(Excuse my ignorance on this topic. I assume that such a Java thread delay doesn't incur CPU usage itself, but I could be wrong, which is why I'm asking you guys, who are more knowledgeable.)
Any other things I should consider to be detrimental?
(Make sure to factor in that, in this scenario, the user themselves aren't negatively impacted by the site loading slower, and that they patiently wait for the page to load. It is the type of application where the user isn't bothered by a little bit of lag because they spend a lot of time on the site just to kill time. Hope this is understood.)
Thanks in advance for any thoughts, and thanks for at least reading my question.
Under the current billing model, your app will not scale up if your handlers take more than 1000ms on average to return a response. Deliberately adding sleep calls will slow your app down, and make it less likely to be autoscaled.
Under the new billing model, as Chris points out, you will be charged by instance-hours, which means that although your app will scale fine, you will pay more to slow your users down.
When most people are trying to shave milliseconds off their response times, you're asking about ways to artificially slow requests down; this seems very odd, and likely to drive your users off. Wouldn't you be better off serving them a page informing them they're over-quota and inviting them to come back later (or, say, pay you)?

How do you determine an acceptable response time for App Engine DB requests?

According to this discussion of Google App Engine on Hacker News,
A DB (read) request takes over 100ms on the
datastore. That's insane and unusable
for about 90% of applications.
How do you determine what is an acceptable response time for a DB read request?
I have been using App Engine without noticing any issues with DB responsiveness. But, on the other hand, I'm not sure I would even know what to look for in that regard :)
You can measure precisely how much each RPC call (datastore or otherwise) is taking, thanks to Guido van Rossum's AppStats relatively-new component (it's part of the standard SDK since 1.3.1). See here for more. 100 milliseconds is fine for most well-designed apps -- if you need to make two or three queries to serve a page, you can still serve in less than half a seconds even if there's lots of processing and rendering involved... not too shabby. Plus, you can use memcache to reduce many of those latencies, etc.
The poster is wrong. Datastore get operations are much faster - about 15-20ms each, currently. Datastore query operations can be slower, because they're much more involved and return more data, but they still complete in anywhere from 30-100ms for a typical query. Other posters have amply addressed whether that's "acceptable" or not.
What do you mean by acceptable? What kind of application are you writing? Acceptable means different things for different domains/applications/people. First, you should decide how quickly you want your app to respond to a request. Let's pick 1 second, just for argument's sake. Now, how many DB requests do you need to make to fulfill that request? Let's say 5. Let's also say that we also have 400ms worth of other processing to do. OK, so that's 5 reads times 100ms each, plus 400ms of other stuff. 900ms total, which is less than our goal of 1 second. Perfect! 100ms is an acceptable read rate. In fact, 120ms would still be acceptable, just barely.
Now, let's generalize:
numberOfReads * readTime + otherStuffTime = TotalTime
Fill in your numbers, and you can see what is an acceptable time for your particular situation.
If you haven't noticed any issues then it is by definition an acceptable response time. The only question is how long your users are happy to wait.
An "an acceptable response time for a DB read request" depends entirely on your application and your users.
If the net result is that your site runs fast enough to satisfy you and your users then the slow response time of the services provided by Google in their AppEngine are acceptable.
Now, looking deeper at this particular issue, it sounds like we are talking about GET's. Here are the figures for GET latency and it looks to me that the average latency is closer to 50ms then 100. I'm not saying that is good, but I don't think it is accurate to say 100ms.

Categories

Resources