Find message from hash? - java

I recently read that MD5 is not secure because it can be traced within a small amount of time.
If I give only a fixed 512 bit data as input.
MD5 will give 128 bit hash (32 hex values)
If MD5 is flawed, then can anyone suggest the best way to reconstruct the 512 bit input, given the 128 bit hash?
(Side note: I badly want to implement this. Would C++ be a better choice for speed or Java for its inbuilt security packages?)

There are 2 ** 384 (about 4x10**115) different 512-bit blocks that hash to the same MD5. Reversing isn't possible even in principle.
It is possible, however, to find one of those 4x10**115 blocks that produces the same MD5 as the block you want, and that's why it's considered insecure. For example, if you posted a file to the net along with an MD5 hash to verify its integrity, a hacker might be able to replace it with a different file with the same hash.
With a more secure hash like SHA256, even this wouldn't be possible.

MD5 is simple and fast algorithm especially when implemented using GPU, so it can be cracked by brute force that is why it is not 'secure'.
But the typical context in which it is insecure is in the respect of passwords which have limited number of characters and their typical combinations (dictionary worlds)
For 512bit long message the brute force would take too long even with MD5, it is equivalent to 64 characters passwords, currently the brute force attack limit is around 10 characters.

Related

why is my pbkdf2 implementation so slow (vs. SQLCipher)?

I have written a simple Android App on my Xoom tablet, which simply stores some string notes in a SQLCipher database.
The user is prompted to type in a passphrase which will be used for the database by the SQLCipher lib. This works fine so far and very smooth.
Now I have also implemented a small PBKDF2 algorithm for authentication purposes
(in fact, i want to encrypt some other files in the future, wich cannot be stored in a database).
But as for now, i only came to check if my pbkdf2 algorithm is correct.
I only used the javax.crypto and java.security libs.
Code snippet as follows:
int derivedKeyLength = 128;
int iterations = 500;
KeySpec spec = new PBEKeySpec(passphrase.toCharArray(), salt, iterations, derivedKeyLength);
SecretKeyFactory f = SecretKeyFactory.getInstance("PBKDF2WithHmacSHA1");
byte[] derivedKey = f.generateSecret(spec).getEncoded();
The salt is a 16 byte random number, generated with SecureRandom.
So I hardcoded the key and the salt and compare the the derivedKey for authentication (only a test case!)
My Problem now is, that on my Xoom it lasts about 5 seconds until the deriving function is done, although the iteration is set to 500 only.
AFAIK SQLCipher is using an iteration number of 4000 by default, and it responds instant, if the key is wrong or correct.
(if I set the iteration to 4000, it takes at least 15seconds)
The question is, did I implemented that inefficient or is it because SQLCipher is just that good in performance (native NDK functions, etc..)?
Thank you in advance
p.s: sorry, my english isnt that great yet!
Edit:
Sorry, I was not clear enough :-)
I know PBKDF2 is supposed to be slow (in specific the iteration amount, to slow down brute force attacks), thats exactly the reason I am asking! I wanted to set the iteration number to lets say 5000 (which is not acceptable, with over 15seconds)
I'm just wondering because, like I said, SQLCipher also uses PBKDF2 (Iteration = 4k, while I am using 500) for deriving a key from a given password. I'm not talking about the encryption with AES in the end, its only about the difference in deriving the key.
Of course it seems legit that SQLCipher is way faster than an self made keyderiving function, but I did not think that it would be this much difference, since SCLCipher's PBKDF2 really works instant!
Greetings!
OK, that (see below) is not exactly your problem, PBKDF2 is slow but should be nowhere as slow as described with those parameters on that hardware.
There are some stats (and tips) here on Android PBE/KDF performance: http://nelenkov.blogspot.com/2012/04/using-password-based-encryption-on.html . SecretKeyFactory performance problems are not unknown: Any way around awful SecretKeyFactory performance with LVL and AESObfuscator? .
SecretKeyFactory is likely using pure Java implementation. SQLCipher has two relevant features:
it uses OpenSSL, compiled native code (on my desktop OpenSSL's PBKDF2 is nearly 100x faster than
a JVM6 SecretKeyFactory version for 2000 iterations, excluding JVM startup time. I haven't
compared AES speed, it appears other people find it slow on Android too)
the 4000 iteration PBKDF2 is only done on database open, after that there's at most 2 iterations
for the page HMAC secret (assuming the default configuration, as documented)
Your code seems correct, there should not be such a large (linear?) performance degradation when you increase your iterations. The Xoom should be running a non-ancient JVM with JIT, can you verify the performance problem with other code?
PBKDF2 is designed to be slow (see the answer to this question https://security.stackexchange.com/questions/7689/clarification-needed-for-nists-whitepaper-recommendation-for-password-based-ke ) due to the intended key stretching operation. The iteration counter lets you trade off speed for security.
AES was always intended to be fast and is
fast (speed comparison PDF, the chosen AES candidate is referred to by its original name Rijndael in that paper).
I assume you are comparing the PBKDF2 computation time directly to the time taken to perform an SQL operation on your SQLCipher database which will almost certainly have been designed to be fast.
You are effectively comparing two different operations with different requirements, hence the speed difference.
Ok I figured out what the problem was.
If I disconnect the device from my PC it works instant. Also if I reconnect it after that.
Now even with an iteration amount of 5000 and above, the deriving function only needs less than a second!! This is great, since my Xoom isn't the newest of all devices!
May be it is because of the debug mode or something, I don't really know actually!
Anyways, thanks to mr.spuratic. Hope this helps someone in the future :-)

Java AES key generation

I'm trying to write a simple password manager in java. I would like to encrypt the file with the stored passwords using AES 256 bit encryption. In addition I would like the user to be able to decrypt the file with a password. When reading other posts online almost all of them stress that it is not secure to simply use a password as a key, they mention using random salts to add security. But I do not understand how I can use random salts when generating the key. If I create the key from the user's password and a random salt then when they try to decrypt their file how will I know what the salt was? This has me completely confused.
Currently I run their password through several different hashes using a constant salt at each step. Is this sufficiently secure or I am I missing something? Any help on how to securely generate a key from a password would be greatly appreciated! Thanks in advance.
Remember that a salt isn't a secret. You can just append it to the encrypted data. The point of the salt is to prevent somebody from using a pre-computed dictionary of common pieces of data encrypted with common passwords as a way into "cracking" the encrypted file.
By making sure that the salt is random and combining it with the password, you remove the possibility of a dictionary attack because there's (effectively) no chance that a hacker will have a database of data pre-encrypted with your "salt+password". (As a starter, see this page, from one of my tutorials, on salts in password-based encryption.)
You also (effectively) eliminate the problem of collisions: where using the same password on two files may give an attacker a clue to the content if the same block of data occurring in both files looks the same in the encrypted version.
You still usually need to take other precautions, though, simply because a typical password doesn't usually contain much entropy. For example, 8 perfectly random lower case letters will generate about 40 bits of entropy; 8 lower case letters obeying typical patterns of English will generate about 20 bits of entropy. In other words, of the 2^256 possible keys, in reality typical users will be choosing among some small fraction in the range 2^20-2^40. In the case of a savvy user, the situation gets a little better, but you will be very unlikely to get close to 256 bits of entropy. (Consider that in a "pass phrase", there'll be about 2.5-3 bits of entropy per character, so a 30-character pass phrase gives you about 75 bits of entropy-- and let's be honest, how many people use anything like a 30 character password?; 8 perfectly random characters using the 'full' range of printable ASCII will give you a little under 64 bits.)
One way of alleviating this situation a little is to transform the password (with salt appended) using a computationally complex one-way function so that it will take a hacker a little longer to try each key that they want to guess. Again, see this page for more details.
To give you a rough idea of the pitfalls of password-based encryption of files, you may also want to have a look at the Arcmexer library I wrote a couple of years ago, which includes a method named isProbablyCorrectPassword(). Combined with a dictionary/algorithm for generating candidate passwords, you can use it to gauge the effectiveness of the above methods (since ZIP file encryption uses a combination of these techniques).
Use this library: http://www.jcraft.com/jsch/
There's a good AES example ere:
http://www.jcraft.com/jsch/examples/AES.java.html
A lot of big names use this package, Maven, Eclipse, etc.

How long does SHA-1 take to create hashes?

Roughly how long, and how much processing power is required to create SHA-1 hashes of data? Does this differ a lot depending on the original data size? Would generating the hash of a standard HTML file take significantly longer than the string "blah"? How would C++, Java, and PHP compare in speed?
You've asked a lot of questions, so hopefully I can try to answer each one in turn.
SHA-1 (and many other hashes designed to be cryptographically strong) are based on repeated application of an encryption or decryption routine to fixed-sized blocks of data. Consequently, when computing a hash value of a long string, the algorithm takes proportionally more time than computing the hash value of a small string. Mathematically, we say that the runtime to hash a string of length N is O(N) when using SHA-1. Consequently, hashing an HTML document should take longer than hashing the string "blah," but only proportionally so. It won't take dramatically longer to do the hash.
As for comparing C++, Java, and PHP in terms of speed, this is dangerous territory and my answer is likely to get blasted, but generally speaking C++ is slightly faster than Java, which is slightly faster than PHP. A good hash implementation written in one of those languages might dramatically outperform the others if they aren't written well. However, you shouldn't need to worry about this. It is generally considered a bad idea to implement your own hash functions, encryption routines, or decryption routines because they are often vulnerable to side-channel attacks in which an attacker can break your security by using bugs in the implementation that are often extremely difficult to have anticipated. If you want to use a good hash function, use a prewritten version. It's likely to be faster, safer, and less error-prone than anything you do by hand.
Finally, I'd suggest not using SHA-1 at all. SHA-1 has known cryptographic weaknesses and you should consider using a strong hash algorithm instead, such as SHA-256.
Hope this helps!
The "speed" of cryptographic hash functions is often measured in "clock cycles per byte". See this page for an admittedly outdated comparison - you can see how implementation and architecture influence the results. The results vary largely not only due to the algorithm being used, but they are also largely dependent on your processor architecture, the quality of the implementation and if the implementation uses the hardware efficiently. That's why some companies specialize in creating hardware especially well suited for the exact purpose of performing certain cryptographic algorithms as efficiently as possible.
A good example is SHA-512, although it works on larger data chunks than SHA-256 one might be inclined to think that it should generally perform slower than SHA-256 working on smaller input - but SHA-512 is especially well suited for 64 bit processors and performs sometimes even better than SHA-256 there.
All modern hash algorithms are working on fixed-size blocks of data. They perform a fixed number of deterministic operations on a block, and do this for every block until you finally get the result. This also means that the longer your input, the longer the operation will take. From the characteristics just explained we can deduce that the length of the operation is directly proportional to the input size of a message. Mathematically oŕ computer-scientifically speaking we coin this as being an O(n) operation, where n is the input size of the message, as templatetypedef already pointed out.
You should not let the speed of hashing influence your choice of programming language, all modern hash algorithms are really, really fast, regardless of the language. Although C-based implementations will do slightly better than Java, which again will probably be slightly faster than PHP, I bet in practice you won't know the difference.
SHA-1 processes the data by chunks of 64 bytes. The CPU time needed to hash a file of length n bytes is thus roughly equal to n/64 times the CPU time needed to process one chunk. For a short string, you must first convert the string to a sequence of bytes (SHA-1 works on bytes, not on characters); the string "blah" will become 4 or 8 bytes (if you use UTF-8 or UTF-16, respectively) so it will be hashed as a single chunk. Note that the conversion from characters to bytes may take more time than the hashing itself.
Using the pure Java SHA-1 implementation from sphlib, on my PC (x86 Core2, 2.4 GHz, 64-bit mode), I can hash long messages at a bandwidth of 132 MB/s (that's using a single CPU core). Note that this exceeds the speed of a common hard disk, so when hashing a big file, chances are that the disk will be the bottleneck, not the CPU: the time needed to hash the file will be the time needed to read the file from the disk.
(Also, using native code written in C, SHA-1 speed goes up to 330 MB/s.)
SHA-256 is considered to be widely more secure than SHA-1, and a pure Java implementation of SHA-256 ranks at 85 MB/s on my PC, which is still quite fast. As of 2011, SHA-1 is not recommended.

BCrypt (blowfish) password for AES 256 (Rijndael) encrypted text

I decided to try BCrypt for hashing key for AES256 (Rijndael/CBC).
Problem is that AES256 key has to be 32 bytes long. BCrypt key is 60 bytes long and naturally always different. Maybe pretty hard and long week is to blame but I am not able to see how could I use a key hashed with BCrypt in combination with AES256. Am I just tired and blind or there is no way to do this?
Thanks
Are you trying to hash something (like a password) and use that as an AES Key?
I'm not familiar with BCrypt, but SHA-256 would create a hash that is the same size as an AES 256 key. Or if your bent on using BCrypt you could just read the first 32 bytes of that hash and discard the rest.
I don't think you should ever discard bytes from cryptography calculations, because those bytes are supposed to support the other bytes you kept - discarding some weakens the output.
What you need is a secure Key Derivation Function. Truncating the bytes as suggested in the comments works sometimes, but it always depends on the context, so don't do it if you're not absolutely sure about it.
Truncating won't work anyway in situations where you need to "stretch" your input, it's also where the most mistakes are made. If you can't create your key using a secure random generator, typically, what you want to do is transform some non-random input (e.g. password) into something worth as key material. Obviously, the entropy of non-random data is normally not good enough for the purpose.
Look into PKCS#5 and use its PBKDF2 if you want to transform passwords into arbitrary-length keys for AES or any other symmetric encryption algorithm.

Choosing a encryption key from Diffie-Hellman output

I implemented Diffie–Hellman key exchange in Java with some large groups from RFC 3526. My output is a fairly large array of bytes. Is it safe to use the first 448 bits (56 bytes) of the output for a blowfish key? Should I transform the bytes in any way, or pick any specific bytes for the key?
From a theoretical point of view, no, it is not safe. Not that I could pinpoint an actual attack; but the output of a Diffie-Hellman key exchange is an element of a group consisting in q elements and offering sqrt(q) security at most. Truncating parts of the encoding of that element does not look like a good idea...
The "proper" way is to use a one-way key derivation function. In simple words, process the Diffie-Hellman output with a good hash function such as SHA-256 and use the hash result as key. Hashing time will be negligible with regards to the Diffie-Hellman step. Java already includes fine implementations of SHA-256 and SHA-512, and if you are after compatibility with very old Java implementations (e.g. the Microsoft JVM which was coming with Internet Explorer 5.5) then you can use an independent Java implementation of SHA-2 such as the one in sphlib. Or possibly reimplement it from the spec (that's not hard): FIPS 180-3 (a PDF file).
If you need more than 128 bits for your key then this means that you are a time-traveler from year 2050 or so; 128 bits are (much) more than enough to protect you for the time being, assuming that you use a proper symmetric encryption scheme.
Speaking of which: Blowfish is not really recommended anymore. It has 64-bit blocks, which implies trouble when the encrypted data length reaches a few gigabytes, a size which is not that big nowadays. You would be better off using a 128-bit block cipher such as the AES. Also, in any serious symmetric encryption system you will need a keyed integrity check. This can be done with a MAC (Message Authentication Code) such as HMAC, itself built over a hash function (then again, easy to implement, and there is a Java implementation in sphlib). Or, even better, use the AES in a combined encryption/MAC mode which will handle the tricky details for you (because using a block cipher properly is not easy); lookup CWC and GCM (both are patent-free; the latter has been approved by NIST).
The solution that you propose depends on whether the most significant bits of a Diffie-Hellman exchange are hard core. There are some small results known that show that the most significant bits are unpredictable, but I'm not aware of a paper that is strong enough to show that your approach is correct.
However, there are several proposals for a key derivation from Diffie-Hellman keys.
E.g. a nice paper is NIST SP 800-135. So far this is only a draft and can be found here. However, it reviews some existing standards. Of course, using a standard is always preferable to develop it yourself.
While Thomas Pornin's proposal looks reasonable it is nonetheless an ad hoc solution. And to be on the safe side you should probably not use it. Rather I'd use something that has been analyzed (e.g. the key derivation scheme use in TLS version 1.2).

Categories

Resources