why is my pbkdf2 implementation so slow (vs. SQLCipher)?

why is my pbkdf2 implementation so slow (vs. SQLCipher)? - java

I have written a simple Android App on my Xoom tablet, which simply stores some string notes in a SQLCipher database.
The user is prompted to type in a passphrase which will be used for the database by the SQLCipher lib. This works fine so far and very smooth.
Now I have also implemented a small PBKDF2 algorithm for authentication purposes
(in fact, i want to encrypt some other files in the future, wich cannot be stored in a database).
But as for now, i only came to check if my pbkdf2 algorithm is correct.
I only used the javax.crypto and java.security libs.
Code snippet as follows:
int derivedKeyLength = 128;
int iterations = 500;
KeySpec spec = new PBEKeySpec(passphrase.toCharArray(), salt, iterations, derivedKeyLength);
SecretKeyFactory f = SecretKeyFactory.getInstance("PBKDF2WithHmacSHA1");
byte[] derivedKey = f.generateSecret(spec).getEncoded();
The salt is a 16 byte random number, generated with SecureRandom.
So I hardcoded the key and the salt and compare the the derivedKey for authentication (only a test case!)
My Problem now is, that on my Xoom it lasts about 5 seconds until the deriving function is done, although the iteration is set to 500 only.
AFAIK SQLCipher is using an iteration number of 4000 by default, and it responds instant, if the key is wrong or correct.
(if I set the iteration to 4000, it takes at least 15seconds)
The question is, did I implemented that inefficient or is it because SQLCipher is just that good in performance (native NDK functions, etc..)?
Thank you in advance
p.s: sorry, my english isnt that great yet!
Edit:
Sorry, I was not clear enough :-)
I know PBKDF2 is supposed to be slow (in specific the iteration amount, to slow down brute force attacks), thats exactly the reason I am asking! I wanted to set the iteration number to lets say 5000 (which is not acceptable, with over 15seconds)
I'm just wondering because, like I said, SQLCipher also uses PBKDF2 (Iteration = 4k, while I am using 500) for deriving a key from a given password. I'm not talking about the encryption with AES in the end, its only about the difference in deriving the key.
Of course it seems legit that SQLCipher is way faster than an self made keyderiving function, but I did not think that it would be this much difference, since SCLCipher's PBKDF2 really works instant!
Greetings!

OK, that (see below) is not exactly your problem, PBKDF2 is slow but should be nowhere as slow as described with those parameters on that hardware.
There are some stats (and tips) here on Android PBE/KDF performance: http://nelenkov.blogspot.com/2012/04/using-password-based-encryption-on.html . SecretKeyFactory performance problems are not unknown: Any way around awful SecretKeyFactory performance with LVL and AESObfuscator? .
SecretKeyFactory is likely using pure Java implementation. SQLCipher has two relevant features:
it uses OpenSSL, compiled native code (on my desktop OpenSSL's PBKDF2 is nearly 100x faster than
a JVM6 SecretKeyFactory version for 2000 iterations, excluding JVM startup time. I haven't
compared AES speed, it appears other people find it slow on Android too)
the 4000 iteration PBKDF2 is only done on database open, after that there's at most 2 iterations
for the page HMAC secret (assuming the default configuration, as documented)
Your code seems correct, there should not be such a large (linear?) performance degradation when you increase your iterations. The Xoom should be running a non-ancient JVM with JIT, can you verify the performance problem with other code?
PBKDF2 is designed to be slow (see the answer to this question https://security.stackexchange.com/questions/7689/clarification-needed-for-nists-whitepaper-recommendation-for-password-based-ke ) due to the intended key stretching operation. The iteration counter lets you trade off speed for security.
AES was always intended to be fast and is
fast (speed comparison PDF, the chosen AES candidate is referred to by its original name Rijndael in that paper).
I assume you are comparing the PBKDF2 computation time directly to the time taken to perform an SQL operation on your SQLCipher database which will almost certainly have been designed to be fast.
You are effectively comparing two different operations with different requirements, hence the speed difference.

Ok I figured out what the problem was.
If I disconnect the device from my PC it works instant. Also if I reconnect it after that.
Now even with an iteration amount of 5000 and above, the deriving function only needs less than a second!! This is great, since my Xoom isn't the newest of all devices!
May be it is because of the debug mode or something, I don't really know actually!
Anyways, thanks to mr.spuratic. Hope this helps someone in the future :-)

Related

Improve performance using Bcrypt in VertX

I'm creating a register method in vertx which use Bcrypt to encode password in database.
My problem came from the slow performance in encoding password using BCrypt.
When im using :
- Bcrypt my query take around ~1200ms
- Whitout Bcrypt ~220ms
So what can i do to improve performance ? Is there antoher way to encode password in VertX ?
i'm using Bcrypt (http://www.mindrot.org/projects/jBCrypt/) in vertx.

As you stated: that's not a Vert.x issue/problem. The BCrypt algorithm takes an X amount of time to encode/encrypt a given value — and it's slow on purpose.
I guess you can leverage on the Vert.x capabilities and have N instances of "worker verticles" doing the encryption work. Again, the time "won't shrink", but you will have "some dedicated guys" just for doing that — you can always tweak the number of instances to your needs. Maybe that's too much, but I'm just throwing it in case you haven't thought about it.
Moreover, I think using BCrypt is (one of) the way(s) to go; it's a one time operation and later on "checking" a given value it is not-so-time-consuming. Additionally, it will give you a better/strong security compared to other (hashing included) algorithms if you use the proper salt size, jadda, jadda.

Note that BCrypt is slow on purpose (see e.g. here: Bcrypt for password hashing because it is slow?) so it is not a "bug" but a feature.
(as mentioned in the link, the slowness adds to extra security - it is slower to brute-force the password)
So you really should think twice before wanting the BCrypt password encryption to be fast.

Honestly, that is probably the best way. BCrypt does a better hash encoding. The faster algorithms aren't nearly as good and certainly don't future proof whatever system you are making. But yes, you can use MD5 and it'll go much faster.

What is the maximum number of bytes I can extract from Java’s SecureRandom CSPRNG before needing to reseed

I am implementing an application in Java and I am using SecureRandom for generating randomness. I need to be able to encrypt 1 billion files. I looked on StackOverflow for the answer of my question but I found no exact answers. Everyone is pretty vague:
You don’t need to reseed SecureRandom. It has a “large” period. But what is large?
You don’t need to reseed SecureRandom because it is a “well designed" CSPRNG. But what is a well designed CSPRNG’s period?
So I decided to do the math myself and see if anyone can check it for me. What facts do we know about SecureRandom’s current implementation in Java 8? Actually there is some controversy from the sources I found.
What I know about Java's SecureRandom implementation
Internally it uses SHA1PRNG when generating randomness via the calls to nextBytes(), nextInt(), nextLong() etc. (see Tersesystems, Cigital).
The Cigital explanation of what SecureRandom does under the hood is contradicting the official explanation from the Java docs. The Official documentation from Oracle states that NativePRNG, NativePRNGBlocking, NativePRNGNonBlocking and Windows-PRNG always use the native source of randomness instead of Java’s own SHA1PRNG algorithm (also mentioned by this answer and this one). Cigital’s link says that Java always uses SHA1PRNG, but the type of SecureRandom dictates where it is seeded from (/dev/random, /dev/urandom etc.).
Is SecureRandom always using SHA1PRNG under the hood? This is what I assume in my math calculations so if this is not the case please correct me.
The official Oracle documents state that SHA1PRNG truncates its output to 64 bits, from the full 160 bit hash output? Looking at the OpenJDK’s implementation of SecureRandom I see nowhere a truncation of the SHA1 output. Actually it does the opposite: it saves any unused output from the SHA1 hash for future calls to engineNextBytes(). If the official Oracle documents are the main authority on the subject then why OpenJDK’s implementation does something different? This is strange.
When nextBytes() is called immediately on the SecureRandom instance, it is seeded automatically by the operating system’s CSPRNG (/dev/random in Linux/Unix and CryptGenRandom in Windows) or when those are not available by custom made entropy generators like ThreadLocalRandom. But ThreadLocalRandom provides very low and slow entropy so it should be avoided when possible.
Nowhere did I see evidence of automatic reseeding functionality in SecureRandom. The Fortuna CSPRNG, as explained in Cryptography Engineering, has an elaborate mechanism of reseeding and I would expect that any modern CSPRNG would abstract that logic from the client and handle it automatically. I am not sure why there is such a lack of information and understanding about CSPRNG reseeding. It needs to happen if you exhaust the period of the CSPRNG. There is no doubt about it. The question is what is the period and is it a real concern in Java.
Doing the math
Due to the birthday attack we know that we should generally get a collision when we produce half the output size of the internal hash number of outputs. We know that the internal hash is truncated to 64 bits. This means that we can at most generate 2^32-1 rounds of 64 bit randomness before we have a 50% chance of collision. This will be the time we want to reseed.
bits of randomness per seed of SecureRandom
Converted to bytes we get
GBytes of randomness per seed of SecureRandom.
This means that we can generally expect around 33.55GB of high quality randomness generated by SecureRandom’s SHA1PRNG before it needs to be reseeded with a strong seed.
Is my math correct?
Why this matters? Because my use case is encrypting 1 billion files. Each one needs an IV (128 bits) and a Key (256 bits) or 384 bits. This comes out to 46.875GB of randomness needed in the best case. Because of this I will exhaust the period of SecureRandom if that period is only 33.55GB. A reseeding strategy will be needed in this case, which I’ll post as a separate question.
My issue with collisions and SecureRandom
I have a lot to learn about cryptography so my understanding is improving all the time. I didn’t find any information about collisions being a problem for CSPRNG, but guessing the internal state of the generator is detrimental (see Wiki and Schneir’s cryptanalysis paper on CSPRNG).
Here is my train of thought for collisions, though. I generate a table of (2^32-1) random, unique 64 bit values (this comes out to be ~33.55GB so pretty achievable). Then I have 50% chance of guessing any 64 bit output of SecureRandom. An IV/Key, for example, is comprised of 2 x 64 =128 bits. This means I can guess the first 64 bits of the IV/Key with 50% probability and the second 64 bits with 50% probability, without needing to know the internal state of the CSPRNG. The combined probability of guessing the full key will be less than 50%, but much more than the negligible probability we are looking at when working with such cryptographic primitives. This can lead to generating weak keys which is what I am trying to avoid. In other words, I believe that 64 bit output of a CSPRNG is too small for serious cryptographic work, solely based on the collision properties of a 64 bit output (because of the birthday attack). If, on the other hand, SecureRandom was using a 512 bit SHA-2 hash and truncated it to 256 bit output then the collision attack I am theorizing here would be impossible due to the sheer size of the key space. Am I making sense or I got all of this wrong?

Find message from hash?

I recently read that MD5 is not secure because it can be traced within a small amount of time.
If I give only a fixed 512 bit data as input.
MD5 will give 128 bit hash (32 hex values)
If MD5 is flawed, then can anyone suggest the best way to reconstruct the 512 bit input, given the 128 bit hash?
(Side note: I badly want to implement this. Would C++ be a better choice for speed or Java for its inbuilt security packages?)

There are 2 ** 384 (about 4x10**115) different 512-bit blocks that hash to the same MD5. Reversing isn't possible even in principle.
It is possible, however, to find one of those 4x10**115 blocks that produces the same MD5 as the block you want, and that's why it's considered insecure. For example, if you posted a file to the net along with an MD5 hash to verify its integrity, a hacker might be able to replace it with a different file with the same hash.
With a more secure hash like SHA256, even this wouldn't be possible.

MD5 is simple and fast algorithm especially when implemented using GPU, so it can be cracked by brute force that is why it is not 'secure'.
But the typical context in which it is insecure is in the respect of passwords which have limited number of characters and their typical combinations (dictionary worlds)
For 512bit long message the brute force would take too long even with MD5, it is equivalent to 64 characters passwords, currently the brute force attack limit is around 10 characters.

How long does SHA-1 take to create hashes?

Roughly how long, and how much processing power is required to create SHA-1 hashes of data? Does this differ a lot depending on the original data size? Would generating the hash of a standard HTML file take significantly longer than the string "blah"? How would C++, Java, and PHP compare in speed?

You've asked a lot of questions, so hopefully I can try to answer each one in turn.
SHA-1 (and many other hashes designed to be cryptographically strong) are based on repeated application of an encryption or decryption routine to fixed-sized blocks of data. Consequently, when computing a hash value of a long string, the algorithm takes proportionally more time than computing the hash value of a small string. Mathematically, we say that the runtime to hash a string of length N is O(N) when using SHA-1. Consequently, hashing an HTML document should take longer than hashing the string "blah," but only proportionally so. It won't take dramatically longer to do the hash.
As for comparing C++, Java, and PHP in terms of speed, this is dangerous territory and my answer is likely to get blasted, but generally speaking C++ is slightly faster than Java, which is slightly faster than PHP. A good hash implementation written in one of those languages might dramatically outperform the others if they aren't written well. However, you shouldn't need to worry about this. It is generally considered a bad idea to implement your own hash functions, encryption routines, or decryption routines because they are often vulnerable to side-channel attacks in which an attacker can break your security by using bugs in the implementation that are often extremely difficult to have anticipated. If you want to use a good hash function, use a prewritten version. It's likely to be faster, safer, and less error-prone than anything you do by hand.
Finally, I'd suggest not using SHA-1 at all. SHA-1 has known cryptographic weaknesses and you should consider using a strong hash algorithm instead, such as SHA-256.
Hope this helps!

The "speed" of cryptographic hash functions is often measured in "clock cycles per byte". See this page for an admittedly outdated comparison - you can see how implementation and architecture influence the results. The results vary largely not only due to the algorithm being used, but they are also largely dependent on your processor architecture, the quality of the implementation and if the implementation uses the hardware efficiently. That's why some companies specialize in creating hardware especially well suited for the exact purpose of performing certain cryptographic algorithms as efficiently as possible.
A good example is SHA-512, although it works on larger data chunks than SHA-256 one might be inclined to think that it should generally perform slower than SHA-256 working on smaller input - but SHA-512 is especially well suited for 64 bit processors and performs sometimes even better than SHA-256 there.
All modern hash algorithms are working on fixed-size blocks of data. They perform a fixed number of deterministic operations on a block, and do this for every block until you finally get the result. This also means that the longer your input, the longer the operation will take. From the characteristics just explained we can deduce that the length of the operation is directly proportional to the input size of a message. Mathematically oŕ computer-scientifically speaking we coin this as being an O(n) operation, where n is the input size of the message, as templatetypedef already pointed out.
You should not let the speed of hashing influence your choice of programming language, all modern hash algorithms are really, really fast, regardless of the language. Although C-based implementations will do slightly better than Java, which again will probably be slightly faster than PHP, I bet in practice you won't know the difference.

SHA-1 processes the data by chunks of 64 bytes. The CPU time needed to hash a file of length n bytes is thus roughly equal to n/64 times the CPU time needed to process one chunk. For a short string, you must first convert the string to a sequence of bytes (SHA-1 works on bytes, not on characters); the string "blah" will become 4 or 8 bytes (if you use UTF-8 or UTF-16, respectively) so it will be hashed as a single chunk. Note that the conversion from characters to bytes may take more time than the hashing itself.
Using the pure Java SHA-1 implementation from sphlib, on my PC (x86 Core2, 2.4 GHz, 64-bit mode), I can hash long messages at a bandwidth of 132 MB/s (that's using a single CPU core). Note that this exceeds the speed of a common hard disk, so when hashing a big file, chances are that the disk will be the bottleneck, not the CPU: the time needed to hash the file will be the time needed to read the file from the disk.
(Also, using native code written in C, SHA-1 speed goes up to 330 MB/s.)
SHA-256 is considered to be widely more secure than SHA-1, and a pure Java implementation of SHA-256 ranks at 85 MB/s on my PC, which is still quite fast. As of 2011, SHA-1 is not recommended.

Choosing a encryption key from Diffie-Hellman output

I implemented Diffie–Hellman key exchange in Java with some large groups from RFC 3526. My output is a fairly large array of bytes. Is it safe to use the first 448 bits (56 bytes) of the output for a blowfish key? Should I transform the bytes in any way, or pick any specific bytes for the key?

From a theoretical point of view, no, it is not safe. Not that I could pinpoint an actual attack; but the output of a Diffie-Hellman key exchange is an element of a group consisting in q elements and offering sqrt(q) security at most. Truncating parts of the encoding of that element does not look like a good idea...
The "proper" way is to use a one-way key derivation function. In simple words, process the Diffie-Hellman output with a good hash function such as SHA-256 and use the hash result as key. Hashing time will be negligible with regards to the Diffie-Hellman step. Java already includes fine implementations of SHA-256 and SHA-512, and if you are after compatibility with very old Java implementations (e.g. the Microsoft JVM which was coming with Internet Explorer 5.5) then you can use an independent Java implementation of SHA-2 such as the one in sphlib. Or possibly reimplement it from the spec (that's not hard): FIPS 180-3 (a PDF file).
If you need more than 128 bits for your key then this means that you are a time-traveler from year 2050 or so; 128 bits are (much) more than enough to protect you for the time being, assuming that you use a proper symmetric encryption scheme.
Speaking of which: Blowfish is not really recommended anymore. It has 64-bit blocks, which implies trouble when the encrypted data length reaches a few gigabytes, a size which is not that big nowadays. You would be better off using a 128-bit block cipher such as the AES. Also, in any serious symmetric encryption system you will need a keyed integrity check. This can be done with a MAC (Message Authentication Code) such as HMAC, itself built over a hash function (then again, easy to implement, and there is a Java implementation in sphlib). Or, even better, use the AES in a combined encryption/MAC mode which will handle the tricky details for you (because using a block cipher properly is not easy); lookup CWC and GCM (both are patent-free; the latter has been approved by NIST).

The solution that you propose depends on whether the most significant bits of a Diffie-Hellman exchange are hard core. There are some small results known that show that the most significant bits are unpredictable, but I'm not aware of a paper that is strong enough to show that your approach is correct.
However, there are several proposals for a key derivation from Diffie-Hellman keys.
E.g. a nice paper is NIST SP 800-135. So far this is only a draft and can be found here. However, it reviews some existing standards. Of course, using a standard is always preferable to develop it yourself.
While Thomas Pornin's proposal looks reasonable it is nonetheless an ad hoc solution. And to be on the safe side you should probably not use it. Rather I'd use something that has been analyzed (e.g. the key derivation scheme use in TLS version 1.2).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.