How to decrease the size of SHA1? - java

I have a problem, maybe a silly question, I want to store data in a database after I hash with the SHA1 algorithm. However, at a future time, the size in database will increase because size words in SHA1 is big.
Can we decrease the size of SHA1 algorithm, maybe half the size. I'm sorry for my silly question, and for my bad English. Thanks. :D
I am using JAVA.

Is 20 bytes per hash(assuming binary storage) really too much? If you currently use hex encoding switching to binary saves you 20 bytes per hash. Base64 saves about 10 bytes compared to hex.
If you simply truncate a cryptographic hash it is still a good cryptographic hash, but with a reduced output size. What output size you need depends on your application.
Integrity checks against random changes can use a much shorter hash of 32-64 bits and don't need a cryptographic hash functions.
If you need uniqueness you should have >>2*log_2(entries) bits in your hash (See birthday paradox). At around 120 bits it's similar to a GUID/UUID (There is a sha1 based generation mode for GUIDs)
If you want cryptographic strength I'd avoid going below 128bits.

No; a SHA-1 hash has a size of 160 bits by definition. I strongly doubt that the size of the hash will be a problem; I suppose that you have other data in your database as well? Most likely, you will find that other parts of the data contribute even more to the database size. And how many rows to you expect to have with these hashes?
However, there is a size difference between storing the hash as a string (this will take at least 40 bytes, depending on the string encoding) and storing it as binary data (this will take 20 bytes).
You can switch to another algorithm, as others have noted, but that might not be a good choice from a security perspective - the shorter the output length of a hash algorithm is, the weaker it is.

If you reduce it it is no more SHA1 :). You have to think of a different algorithm

To store SHA1 hash in MySQL database, we need a CHAR(40).
SIZE REDUCTION
But, we can reduce size by 27% by choosing BASE64 encoding. The column type will be CHAR(29).
Example :
SHA1 -> digest Hex -> 40 chars :
5d41402abc4b2a76b9719d911017c575
SHA1 -> digest base64 -> 29 chars :
XUFAKrxLKna5cZ2REBfFdQ==
PERFORMANCE INCREASE
To guarantee more performance when reading (especially with PRIMARY, INDEX, UNIQUE, ... or using a JOIN) a BINARY(20) is more appropriate.
It's necessary to have a hash in Hex form (a-z/0-9) and apply the UNHEX() function of MySQL during the insertion.
INSERT INTO my_table (
id,
my_hash
) VALUES (
1,
UNHEX('5d41402abc4b2a76b9719d911017c575')
);
It could also be written with short X'...' syntax like this :
INSERT INTO my_table (
id,
my_hash
) VALUES (
1,
X'5d41402abc4b2a76b9719d911017c575'
);

Related

How to pad a key if the input key is not 16 byte in java encryption?

what if the input key is less than 16 bytes? i found a solution, but its not a best practice.
//hashing here act as padding because any input given, it will generate fixed 20 bytes long.
MessageDigest sha = MessageDigest.getInstance("SHA-1");
key = sha.digest(key);
//trim the code to only 16 bytes.
key = Arrays.copyOf(key, 16);
I'm not planning to use salt because it is not necessary in my project. Is there any better way?
There are three approaches:
Pad the key out to 16 bytes. You can use any value(s) you want to as padding, just so long as you do it consistently.
Your scheme of using a SHA-1 hash is OK. It would be better if you could use all of the bits in the hash as the key, but 128 bits should be enough.
Tell the user that the key needs to be at least N characters. A key that is too short may be susceptible to a password guessing attack. (A 15 character key is probably too long to be guessed, but 8 characters is tractable.) In fact, you probably should do some other password quality checks.
My recommendation is to combine 1. or 2. with 3 ... and password quality checks.
I'm not convinced that seeding the hash will make much difference. (I am assuming that the bad guy would be able to inspect your file encryption app and work out how you turn passwords into keys.) Seeding means that the bad guy cannot pre-generate a set of candidate keys for common / weak passwords, but he still needs to try each of the generated keys in turn.
But the flip-side is that using a crypto hash doesn't help if the passwords you start with are weak.
Don't confuse keys and passwords. Keys are randomly generated and may consist of any possible byte value. Passwords on the other hand need to be typable by a human and usually rememberable. If the key is too short then either emit an error to the user or treat it as a password.
A key should then only be entered in encoded format such as hex or Base64. Only check the length when you successfully decode it.
A password has all kinds of issues that makes it brute forceable such as short length or low complexity. There you would need to use a password-based key derivation function such as PBKDF2 and a sufficiently large work factor (iterations) in order to make a single key derivation attempt so slow that an attacker would need much more time to check the whole input space.
You should combine that with some message to the user to give some hints that the password is too short or doesn't include some character classes and is therefore not recommended.

How to generate custom length hash key?

I am trying to generate hash key using SHA-256 which is producing 64 length String but i need key of size<=32, what is the best algorithm recommended maintaining uniqueness? Please advice.
As already indicated you loose collision resistance for each bit you drop. Hashes however are considered to be indistinguishable from random. Because of the avalanche effect, each bit of the input is "represented" by each of the bits in the hash. So you can indeed simply use the first 128 bits / 16 bytes of the output of the hash. That would still leave you with some 64 bit of collision resistance. The more or less standard way to do this is to take the leftmost bytes.
Additional hints:
To have some additional security, use the result of a HMAC with a static, randomly generated 128 bit key instead of the output of a hash;
Of course you could also encode the hash in base 64 and use that, if you can store any string instead of only hexadecimals. In that case you can fit 32 * 6 = 192 bits into the value, which would result in higher security than a SHA-1 hash (which is considered insecure nowadays, keep to SHA-2).

Good Hash function? (32-bit too small, 64-bit too large)

I need to generate a hash value used for uniqueness of many billions of records in Java. Trouble is, I only have 16 numeric digits to play with. In researching this, I have found algorithms for 32-bit hash, which return Java integers. But this is too small, as it only has a range of +/ 2 billion, and have will have more records that that. I cannot go to a 64-bit hash, as that will give me numeric values back that are too large (+/ 4 quintillion, or 19 digits). Trouble is, I am dealing with a legacy system that is forcing me into a static key length of 16 digits.
Suggestions? I know no hash function will guarantee uniqueness, but I need a good one that will fit into these restrictions.
Thanks
If your generated hash is too large you can just mod it with your keyspace max to make it fit.
myhash = hash64bitvalue % 10^16
If you are limited to 16 decimal digits, your key space contains 10^16 values.
Even if you find a hash that gives uniform distribution on your data set, due to Birthday Paradox you will have a 50% chance of collision on ~10^8 items of data, which is an order of magnitude less than your billions of records.
This means that you cannot use any kind of hash alone and rely on uniqueness.
A straightforward solution is to use a global counter instead. If global counter is infeasible, counters with preallocated ranges can be used. For example, 6 most significant digits denote fixed data source index, 10 least significant digits contain monotonous counter maintained by that data source.
So your restriction is 53 bit?
For my understanding order number of bit in hashcode doesn't affect its value (order and value of bit are fully independent from each other). So you could get 64-bit hash function and use only last 53 bits from it. And you must use binary operations for this ( hash64 & (1<<54 - 1) ) not arithmetic.
You don't have to store your hashes in a human readable form (hex, as you said). Just store the 64-bit long datatype (generated by a 64-bit hash function) in your database, which is only 8 bytes. And not the 19 bytes of which you were scared off.
If that isn't a solution, improve the legacy system.
Edit: Wait!
64-bit: 264 =
18446744073709551616
16 hex-digits: 1616 =
18446744073709551616
Exact fit! So make a hex representation of your 64-bit hash, and there you are.
If you can save 16 alphanumeric characters then you can use a hexadecimal representation and pack 16^16 bits into 16 chars. 16^16 is 2^64.

What type of encryption to use for 48-bit to 48-bit?

I've got a bunch of 48-bit (6 byte) values that I need to encrypt symmetrically. The two requirements are:
The resulting encrypted value needs to also be 48-bits (6 bytes) long. They key itself can be (and would preferably be) much longer to guard again brute force attacks.
The resulting encrypted value needs to be deterministic, i.e. value A using key B will always produce encrypted value C (we encrypt on the fly and show the encrypted data to the user so need to always show the same value)
All block ciphers I've found have used a minimum block size of 64 and appear to be fixed (you can't use an arbitrary block size). Should I be thinking about a stream cipher?
I'm doing this in Java.
Note: I've seen this question and associated answers but wasn't clear on whether the suggestions would satisfy my 2nd requirement.
Consider format preserving encryption.
(Sorry, I originally misread the requirements thinking it was the INPUT data that needed to be 6 bytes.)
I don't think you can do exactly what you want with standard cryptographic algorithms:
the problem with stream ciphers is that standard ones effectively work by generating a stream of pseudorandom bits from the key and then XORing these bits with the plaintext; effectively this means that you should never use the same stream of bits twice (e.g. if you do, then XORing two ciphertexts gives you the same result as XORing the corresponding plaintexts; and in any case with 48 bits, there are only 2^48 possible bitstreams, so you can just test them all by brute force);
the problem with block ciphers is that there's no standard one as far as I'm aware that has a block size of 48 bits.
Now, that doesn't mean that a 48-bit block cipher couldn't be developed-- and indeed I dare say there are some out there-- just that none of the bog-standard ciphers that have undergone years of scrutiny from the cryptographic community have that block size.
So I would suggest options are:
relax the requirement of a 48-bit ciphertext; for example, TripleDES has a 64-bit block size and is "fairly" secure (equivalent to 112 bit security)[*];
in principle, you could implement your own block cipher with whatever block size you require, sticking as close as you can to a standard design, e.g. a Feistel network following some generally recommended design principles-- as a starting point, see Schneier, "Applied Cryptography", pp. 346ff, "Theory of Block Cipher Design".
The obvious problem with the latter option is that, whist standard block ciphers are generally based on common general principles, they adopt particular design decisions that have been subject to considerable scrutiny; yours presumably won't be.
I would also recommend standing back a bit from the problem (or perhaps explaining a bit more what you're trying to do), because it seems to be based on requirements that would normally go against good security practice (having the same plaintext always encrypt to the same ciphertext is something one would normally specifically avoid, for example). So you could have the best designed Feistel cipher in the world, but introduce some other vulnerability in how you're using it.
[*] TripleDES is generally not recommended because AES gives better security more efficiently (you might like to see some comparative timings of block ciphers that I took in Java to see just how bad it is). However, this might not matter in your particular application.
No, just "pad" your data out with some bytes you don't care about (but which are always the same if that's your requirement) so that you reach the size of a block. (If you're using an appropriate padding mode, then this will be done for you.)
I believe this is what you are looking for
http://web.cs.ucdavis.edu/~rogaway/papers/shuffle.html
This algorithm lets you construct PRP (i.e. arbitrary length block cipher) from secure PRF (e.g. sha256, blake2)
Block cipher in CTR mode has the same issue as a stream cipher.
Without a proper MAC (which require more bytes added) it will susceptible to bit flipping.
And without unique IV (which also require more bytes added) it will be just a broken implementation.
You can use a stream cipher only, if you have a unique salt for every encryption (don't even think about re-using the same salt, as that would be trivial to break).
When you have such unique values (e. g. a sequence number that's already associated with your values), you can use e.g. the stream cipher RC4-drop.
When you don't have such unique numbers already, you probably can't use a stream cipher, because you only have 48 bits for your result (so no space left for the salt.)
As for a block cipher with 48 bits - sorry, I don't know such a cipher, either. Maybe what you can do, is combining four 48 bit values into a 192 bit value, resulting in three 64 bit blocks, encode them, and then split them into four 48 bit values again. (I have no idea, if that would be possible in your situation or not?)
If you have a unique counter / sequence number associated with each plaintext value, then you can just use any block cipher in CTR (counter) mode.
To encrypt value V that is sequence number N under key K:
Expand N to the size of a block;
Encrypt N with the block cipher and key K;
Take the first 48 bits of the result and XOR them with V.
Decryption is the same. The most important thing to remember with this method is:
Never, ever use the same key and sequence number to encrypt two different values.
Sequence numbers must be unique. If your sequence restarts, you must use a new key.
There is a 48-bit block 80-bit key cipher designed in 2009 - KATAN48 (the family version of KTANTAN48 has some key scheduling issue. So far, it was not broken, and has quite high security margins, so it has passed the test of time.
Here's a proposed solution that meets both of your requirements.
What if you use a cipher with a 32-bit block size (such as Skip32) and run it twice, as described below.
You have a 48-bit value to encode, for example:
f2800af40110
Step 1:
Split this into a 32-bit value and a 16-bit value using some method. Here we'll just grab the left 4 bytes and the right 2 bytes (but in practice you could use a secret bitmask, see below).
32-bit value: f2800af4
16-bit value: 0110
Step 2:
Encrypt the first one with a secret key K1, using Skip32 and let's say we get:
Encrypted 32-bit value: b0daf2b9
Step 3:
Split this into two 16-bit values (again you could use a secret bitmask, but for this example we'll grab the left/right two bytes).
Value 1: b0da
Value 2: f2b9
Step 4:
Combine value 1 with the the 16-bit value from step 1 to get a new 32-bit value:
b0da0110
Step 5:
Encrypt the resulting 32-bit value with secret key K2, again using Skip32:
Encrypted 32-bit value: 6135d8f4
Step 6:
Combine this 32-bit value with value 2 from step 3 to get a 48-bit encrypted result.
6135d8f4f2b9
The result is both deterministic and reversible. No two inputs will produce the same output.
Note on splitting/combining values
Steps 1 and 3 above will split a value in a predictable way. I'm not sure if this introduces any weakness, but one alternative is to use a bitmask. If we want to split a 48-bit input number into 32-bits and 16-bits, we could come up with a bitmask, essentially a number with 16 1's that can dictate how we split bits of the input number to get two output numbers, as follows:
INPUT : 111100101000000000001010111101000000000100010000
BITMASK: 001010111001110000100000010001100000011001100000
| | ||| ||| | | || || ||
VALUE 1: 1 0 101 000 0 1 10 00 00 => a860
VALUE 2: 11 1 0 00 0000 010101 110 000000 10 10000 => e015c050
Similarly for steps 4 and 6 you could combine two values by interleaving bits based on a bitmask.
If you use separate bitmasks for each step and separate keys for each step you end up with 2 keys and 4 bitmasks, all of which would be needed to encrypt/decrypt values.
Quick note on Skip32 and it's use cases
"The Skip32 has the uncommon properties of being fast, creating very dissimilar encrypted values for consecutive input values, and producing output of the same size as the input. These make this cipher particularly useful for obfuscating series of sequential integers (e.g. auto-incremented database ids)."
- https://docs.rs/skip32/1.0.5/skip32/
Any thoughts about this approach from someone more experienced with cryptography than I am?

64bit MessageDigest - store short texts as long

I want to represent short texts (ie word, couple of words) as a 64 bit hash (want to store them as longs)
MessageDigest.getInstance("MD5") returns 128bits.
Is there anything else I could use, could i just peel off half of it. I am not worried of someone trying to duplicate a hash, I would like to minimize the number of clashes (two different strings having the same hash)
MD5 (and SHA) hash "smear" the data in a uniform way across the hashed value so any 64 bits ypu choose out of the final value will be as sensitive to a change as any other 64 bits. Your only concern will be the increased probability of collisions.
You can just use any part of the MD5 hash.
We tried to fold 128-bit into 64-bit with various algorithms but the folding action didn't make any noticeable difference in hash distribution.
Why don't you just using hashCode() of String? We hashed 8 million Email addresses into 32-bit integer and there are actually more collisions with MD5 than String hashCode. You can run hashCode twice (forward and backward) and make it a 64-bit long.
You can take a sampling of 64-bits from the 128-bit hash. You cannot guarantee there will be no clashes - only a perfect hash will give you that, and there is no perfect hash for arbitrary length strings) but the chances of a clash will be very small.
As well as a sampling, you could derive the hash using a more complex function, such as XOR consecutive pairs of bits.
As a cryptographic hash (even one nowadays considered broken), MD5 has no significant correlation between input and output bits. That means, simply taking the first or last half will give you a perfectly well-distributed hash function. Anything else would never have been seriously considered as a cryptographic hash.
What about using some block cipher with 64bit block size ?

Categories

Resources