how do i cast a Big Integer into a Key for java cryptography library?
I am trying to use a shared diffie hellman key that i generated myself for the key value for AES encryption.
Below is the code that i used
BigInteger bi; long value = 1000000000;
bi = BigInteger.valueOf(value);
Key key = new Key (bi);
however it did not work.
May i know how do i convert a BigInteger value into a Key value?
Thanks in advance!
First, you cannot cast it. There is no relationship between the BigInteger class and the Key interface.
Second, Key is an interface not a class, so you can't create instances of it. What you need to create is an instance of some class that implements Key. And it most likely needs to be a specific implementation class, not (say) an anonymous class.
The final thing is that the Java crypto APIs are designed to hide the representation of the key. To create a key from bytes, you need to create a KeySpec object; e.g. SecretKeySpec(byte[] key, String algorithm)) and then use a KeyFactory to "generate" a key from it. Typical KeySpec constructors take a byte[] as a parameter, so you first need to get the byte array from your BigInteger instance.
You need to convert your BigInteger to a byte array of a specific size, then use the first (leftmost) bytes to create a key. For this you need to know the size of the prime p used in DH, as the value needs to be left-padded to represent a key. I would suggest to use standardized DH parameters (or at least make sure that the size of the prime is dividable by 8).
Note that there may be a zero valued byte in front of the byte array retrieved using BigInteger.toByteArray() because the value returned is encoded as a signed (two-complement) big-endian byte array. You need to remove this byte if the result is bigger than the prime (in bytes) because of it.
public static byte[] encodeSharedSecret(final BigInteger sharedSecret, final int primeSizeBits) {
// TODO assignment add additional tests on input
final int sharedSecretSize = (primeSizeBits + Byte.SIZE - 1) / Byte.SIZE;
final byte[] signedSharedSecretEncoding = sharedSecret.toByteArray();
final int signedSharedSecretEncodingLength = signedSharedSecretEncoding.length;
if (signedSharedSecretEncodingLength == sharedSecretSize) {
return signedSharedSecretEncoding;
}
if (signedSharedSecretEncodingLength == sharedSecretSize + 1) {
final byte[] sharedSecretEncoding = new byte[sharedSecretSize];
System.arraycopy(signedSharedSecretEncoding, 1, sharedSecretEncoding, 0, sharedSecretSize);
return sharedSecretEncoding;
}
if (signedSharedSecretEncodingLength < sharedSecretSize) {
final byte[] sharedSecretEncoding = new byte[sharedSecretSize];
System.arraycopy(signedSharedSecretEncoding, 0,
sharedSecretEncoding, sharedSecretSize - signedSharedSecretEncodingLength, signedSharedSecretEncodingLength);
return sharedSecretEncoding;
}
throw new IllegalArgumentException("Shared secret is too big");
}
After that you need to derive the key bytes using some kind of key derivation scheme. The one you should use depends on the standard you are implementing:
As stated in RFC 2631
X9.42 provides an algorithm for generating an essentially arbitrary
amount of keying material from ZZ. Our algorithm is derived from that
algorithm by mandating some optional fields and omitting others.
KM = H ( ZZ || OtherInfo)
H is the message digest function SHA-1 [FIPS-180] ZZ is the shared
secret value computed in Section 2.1.1. Leading zeros MUST be
preserved, so that ZZ occupies as many octets as p.
Note that I have discovered a bug in the Bouncy Castle libraries up to 1.49 (that's the current version at this date) in the DH implementation regarding the secret extraction - it does strip the spurious leading 00h valued bytes, but it forgets to left-pad the result up to the prime size p. This will lead to an incorrect derived key once in 192 times (!)
Related
Since Ed25519 has not been around for long (in JDK), there are very few resources on how to use it.
While their example is very neat and useful, I have some trouble understanding what am I doing wrong regarding key parsing.
They Public Key is being read from a packet sent by an iDevice.
(Let's just say, it's an array of bytes)
From the searching and trying my best to understand how the keys are encoded, I stumbled upon this message.
4. The public key A is the encoding of the point [s]B. First,
encode the y-coordinate (in the range 0 <= y < p) as a little-
endian string of 32 octets. The most significant bit of the
final octet is always zero. To form the encoding of the point
[s]B, copy the least significant bit of the x coordinate to the
most significant bit of the final octet. The result is the
public key.
That means that if I want to get y and isXOdd I have to do some work.
(If I understood correctly)
Below is the code for it, yet the verifying still fails.
I think, I did it correctly by reversing the array to get it back into Big Endian for BigInteger to use.
My questions are:
Is this the correct way to parse the public key from byte arrays
If it is, what could possibly be the reason for it to fail the verifying process?
// devicePublicKey: ByteArray
val lastIndex = devicePublicKey.lastIndex
val lastByte = devicePublicKey[lastIndex]
val lastByteAsInt = lastByte.toInt()
val isXOdd = lastByteAsInt.and(255).shr(7) == 1
devicePublicKey[lastIndex] = (lastByteAsInt and 127).toByte()
val y = devicePublicKey.reversedArray().asBigInteger
val keyFactory = KeyFactory.getInstance("Ed25519")
val nameSpec = NamedParameterSpec.ED25519
val point = EdECPoint(isXOdd, y)
val keySpec = EdECPublicKeySpec(nameSpec, point)
val key = keyFactory.generatePublic(keySpec)
Signature.getInstance("Ed25519").apply {
initVerify(key)
update(deviceInfo)
println(verify(deviceSignature))
}
And the data (before manipulation) (all in HEX):
Device identifier: 34444432393531392d463432322d343237442d414436302d444644393737354244443533
Device public key: e0a611c84db0ae91abfe2e6db91b6a457a4b41f9d8e09afdc7207ce3e4942e94
Device signature: a0383afb3bcbd43d08b04274a9214036f16195dc890c07a81aa06e964668955b29c5026d73d8ddefb12160529eeb66f843be4a925b804b575e6a259871259907
Device info: a86a71d42874b36e81a0acc65df0f2a84551b263b80b61d2f70929cd737176a434444432393531392d463432322d343237442d414436302d444644393737354244443533e0a611c84db0ae91abfe2e6db91b6a457a4b41f9d8e09afdc7207ce3e4942e94
// Device info is simply concatenated [hkdf, identifier, public key]
And the public key after the manipulation:
e0a611c84db0ae91abfe2e6db91b6a457a4b41f9d8e09afdc7207ce3e4942e14
Thank you very much, and every bit of help is greatly appreciated.
This will help many more who will stumble upon this problem at a later point, when the Ed25519 implementation will not be so fresh.
Helped me a lot. Would never have figured it out without your example.
I did it in java.
public static PublicKey getPublicKey(byte[] pk)
throws NoSuchAlgorithmException, InvalidKeySpecException, InvalidParameterSpecException {
// key is already converted from hex string to a byte array.
KeyFactory kf = KeyFactory.getInstance("Ed25519");
// determine if x was odd.
boolean xisodd = false;
int lastbyteInt = pk[pk.length - 1];
if ((lastbyteInt & 255) >> 7 == 1) {
xisodd = true;
}
// make sure most significant bit will be 0 - after reversing.
pk[pk.length - 1] &= 127;
// apparently we must reverse the byte array...
pk = ReverseBytes(pk);
BigInteger y = new BigInteger(1, pk);
NamedParameterSpec paramSpec = new NamedParameterSpec("Ed25519");
EdECPoint ep = new EdECPoint(xisodd, y);
EdECPublicKeySpec pubSpec = new EdECPublicKeySpec(paramSpec, ep);
PublicKey pub = kf.generatePublic(pubSpec);
return pub;
Actually, the whole encoding and decoding is correct.
The one thing in the end, that was the problem was that I (by mistake) reversed the array I read one too many times.
Reversing arrays since certain keys are encoded in little endian, while in order to represent it as a BigInteger in JVM, you have to reverse the little endian so it becomes big endian.
Hopefully this helps everyone in the future who will get stuck on any similar problems.
If there will be any questions, simply comment here or send me a message here.
I'll do my best to help you out.
I the code here is way way way more than you probably need. I investigated this and came up with what I think is equivalent, only much simpler. Anyhow, here is the blog piece: https://www.tbray.org/ongoing/When/202x/2021/04/19/PKI-Detective. and here is the Java code: https://github.com/timbray/blueskidjava
You can check how it is done in the OpenJDK implementation:
https://github.com/openjdk/jdk15/blob/master/src/jdk.crypto.ec/share/classes/sun/security/ec/ed/EdDSAPublicKeyImpl.java#L65
Basically encodedPoint is your byte array (just the plain bytes, without ASN.1 encoding).
This is going to be a long question but I have a really weird bug. I use OpenSSL in C++ to compute a HMAC and compare them to a simular implementation using javax.crypto.Mac. For some keys the HMAC calculation is correct and for others there is a difference in HMAC. I believe the problem occurs when the keys get to big. Here are the details.
Here is the most important code for C++:
void computeHMAC(std::string message, std::string key){
unsigned int digestLength = 20;
HMAC_CTX hmac_ctx_;
BIGNUM* key_ = BN_new();;
BN_hex2bn(&key_, key);
unsigned char convertedKey[BN_num_bytes(key_)];
BIGNUM* bn = BN_new();
HMAC_CTX_init(&hmac_ctx_);
BN_bn2bin(bn, convertedKey);
int length = BN_bn2bin(key_, convertedKey);
HMAC_Init_ex(&hmac_ctx_, convertedKey, length, EVP_sha1(), NULL);
/*Calc HMAC */
std::transform( message.begin(), message.end(), message.begin(), ::tolower);
unsigned char digest[digestLength];
HMAC_Update(&hmac_ctx_, reinterpret_cast<const unsigned char*>(message.c_str()),
message.length());
HMAC_Final(&hmac_ctx_, digest, &digestLength);
char mdString[40];
for(unsigned int i = 0; i < 20; ++i){
sprintf(&mdString[i*2], "%02x", (unsigned int)digest[i]);
}
std::cout << "\n\nMSG:\n" << message << "\nKEY:\n" + std::string(BN_bn2hex(key_)) + "\nHMAC\n" + std::string(mdString) + "\n\n";
}
The java test looks like this:
public String calculateKey(String msg, String key) throws Exception{
HMAC = Mac.getInstance("HmacSHA1");
BigInteger k = new BigInteger(key, 16);
HMAC.init(new SecretKeySpec(k.toByteArray(), "HmacSHA1"));
msg = msg.toLowerCase();
HMAC.update(msg.getBytes());
byte[] digest = HMAC.doFinal();
System.out.println("Key:\n" + k.toString(16) + "\n");
System.out.println("HMAC:\n" + DatatypeConverter.printHexBinary(digest).toLowerCase() + "\n");
return DatatypeConverter.printHexBinary(digest).toLowerCase();
}
Some test runs with different keys (all strings are interpreted as hex):
Key1:
736A66B29072C49AB6DC93BB2BA41A53E169D14621872B0345F01EBBF117FCE48EEEA2409CFC1BD92B0428BA0A34092E3117BEB4A8A14F03391C661994863DAC1A75ED437C1394DA0741B16740D018CA243A800DA25311FDFB9CA4361743E8511E220B79C2A3483FCC29C7A54F1EB804481B2DC87E54A3A7D8A94253A60AC77FA4584A525EDC42BF82AE2A1FD6E3746F626E0AFB211F6984367B34C954B0E08E3F612590EFB8396ECD9AE77F15D5222A6DB106E8325C3ABEA54BB59E060F9EA0
Msg:
test
Hmac OpenSSL:
b37f79df52afdbbc4282d3146f9fe7a254dd23b3
Hmac Java Mac:
b37f79df52afdbbc4282d3146f9fe7a254dd23b3
Key 2: 636A66B29072C49AB6DC93BB2BA41A53E169D14621872B0345F01EBBF117FCE48EEEA2409CFC1BD92B0428BA0A34092E3117BEB4A8A14F03391C661994863DAC1A75ED437C1394DA0741B16740D018CA243A800DA25311FDFB9CA4361743E8511E220B79C2A3483FCC29C7A54F1EB804481B2DC87E54A3A7D8A94253A60AC77FA4584A525EDC42BF82AE2A1FD6E3746F626E0AFB211F6984367B34C954B0E08E3F612590EFB8396ECD9AE77F15D5222A6DB106E8325C3ABEA54BB59E060F9EA0
Msg:
test
Hmac OpenSSL:
bac64a905fa6ae3f7bf5131be06ca037b3b498d7
Hmac Java Mac:
bac64a905fa6ae3f7bf5131be06ca037b3b498d7
Key 3: 836A66B29072C49AB6DC93BB2BA41A53E169D14621872B0345F01EBBF117FCE48EEEA2409CFC1BD92B0428BA0A34092E3117BEB4A8A14F03391C661994863DAC1A75ED437C1394DA0741B16740D018CA243A800DA25311FDFB9CA4361743E8511E220B79C2A3483FCC29C7A54F1EB804481B2DC87E54A3A7D8A94253A60AC77FA4584A525EDC42BF82AE2A1FD6E3746F626E0AFB211F6984367B34C954B0E08E3F612590EFB8396ECD9AE77F15D5222A6DB106E8325C3ABEA54BB59E060F9EA0
Msg:
test
Hmac OpenSSL:
c189c637317b67cee04361e78c3ef576c3530aa7
Hmac Java Mac:
472d734762c264bea19b043094ad0416d1b2cd9c
As the data shows, when the key gets to big, an error occurs. If have no idea which implementation is faulty. I have also tried with bigger keys and smaller keys. I haven't determined the exact threshold. Can anyone spot the problem? Is there anyone capable of telling me which HMAC is incorrect in the last case by doing a simulation using different software or can anyone tell me which 3rd implementation I could use to check mine?
Kind regards,
Roel Storms
When you convert a hexadecimal string to a BigInt in Java, it assumes the number is positive (unless the string includes a - sign).
But the internal representation of it is twos-complement. Meaning that one bit is used for the sign.
If you are converting a value that starts with a hex between 00 and 7F inclusive, then that's not a problem. It can convert the byte directly, because the leftmost bit is zero, which means that the number is considered positive.
But if you are converting a value that starts with 80 through FF, then the leftmost bit is 1, which will be considered negative. To avoid this, and keep the BigInteger value exactly as it is supplied, it adds another zero byte at the beginning.
So, internally, the conversion of a number such as 7ABCDE is the byte array
0x7a 0xbc 0xde
But the conversion of a number such as FABCDE (only the first byte is different!), is:
0x00 0xfa 0xbc 0xde
This means that for keys that begin with a byte in the range 80-FF, the BigInteger.toByteArray() is not producing the same array that your C++ program produced, but an array one byte longer.
There are several ways to work around this - like using your own hex-to-byte-array parser or finding an existing one in some library. If you want to use the one produced by BigInteger, you could do something like this:
BigInteger k = new BigInteger(key, 16);
byte[] kByteArr = k.toByteArray();
if ( kByteArr.length > (key.length() + 1) / 2 ) {
kByteArr = Arrays.copyOfRange(kByteArr,1,kByteArr.length);
}
Now you can use the kByteArr to perform the operation properly.
Another issue you should watch out for is keys whose length is odd. In general, you shouldn't have a hex octet string that has an odd length. A string like F8ACB is actually 0F8ACB (which is not going to cause an extra byte in BigInteger) and should be interpreted as such. This is why I wrote (key.length() + 1) in my formula - if key is odd-length, it should be interpreted as a one octet longer. This is also important to watch out for if you write your own hex-to-byte-array converter - if the length is odd, you should add a zero at the beginning before you start converting.
I am using CVC certificates (If you haven't heard about them, pretend they are X509) with Elliptic curve signature with brainpool256r1 curve and SHA1 hash. In java with bouncycastle, I simply verify them like this:
Signature sign = Signature.getInstance("SHA1withECDSA", "BC");
sign.initVerify(key);
sign.update(certificate_data_to_be_verified);
sign.verify(signature);
And everything works fine. However, I need to verify them also in an embedded device, and I have encountered a problem, because I am supposed to use leftmost 256bits of hash to get the value of z at least according to wikipedia ECDSA article. But SHA1 has only 160bits.
How is this solved by bouncycastle, and is there some general theory on how to handle this?
You are confusing an order of base point with a key length.
Here is how Bouncy Castle code performs ECDSA digital signature verification.
private BigInteger calculateE(BigInteger n, byte[] message)
{
/* n is curve order value */
int log2n = n.bitLength();
/* and message is a hash */
int messageBitLength = message.length * 8;
BigInteger e = new BigInteger(1, message);
/* If message is longer than curve order */
if (log2n < messageBitLength)
{
/* only log2n bits are taken from the left */
e = e.shiftRight(messageBitLength - log2n);
}
return e;
}
I'm making a system where I want to verify the server's identity via RSA, but I can't seem to get the server to properly decrypt the client's message.
The public and private keys are in slot 0 of the array, and mod is in slot 1, so they are setup correctly.
Client side code
int keyLength = 3072 / 8;//RSA key size
byte[] data = new byte[keyLength];
//Generate some random data. Note that
//Only the fist half of this will be used.
new SecureRandom().nextBytes(data);
int serverKeySize = in.readInt();
if (serverKeySize != keyLength) {//Definitely not the right heard
return false;
}
//Take the server's half of the random data and pass ours
in.readFully(data, keyLength / 2 , keyLength / 2);
//Encrypt the data
BigInteger[] keys = getKeys();
BigInteger original = new BigInteger(data);
BigInteger encrypted = original.modPow(keys[0], keys[1]);
data = encrypted.toByteArray();
out.write(data);
//If the server's hash doesn't match, the server has the wrong key!
in.readFully(data, 0, data.length);
BigInteger decrypted = new BigInteger(data);
return original.equals(decrypted);
Server side code
int keyLength = 3072 / 8;//Key length
byte[] data = new byte[keyLength];
//Send the second half of the key
out.write(data, keyLength / 2, keyLength / 2);
in.readFully(data);
BigInteger[] keys = getKeys();
BigInteger encrypted = new BigInteger(data);
BigInteger original = encrypted.modPow(keys[0], keys[1]);
data = original.toByteArray();
out.write(data);
AFAIK that implementation is correct however it doesn't seem to produce the correct output. Also no, I do not wish to use a Cipher for various reasons.
There are some critical details that are not being accounted for. The data you want to apply RSA to must be encoded as BigInteger x, with 0 <= x < n, where n is your modulus. You aren't doing that. In fact, because you are filling your entire data array with random data you cannot guarantee that. The PKCS#1 padding algorithm is designed to do this correctly, but since you are rolling your own you'll have to fix this in your code. Also, examine carefully how the BigInteger(byte[]) constructor and BigInteger.toByteArray() decode/encode integers. Naively many expect simply the base 256 encoding, and forget that BigInteger must accommodate negative integer also. It does so by using the ASN.1 DER integer rules. If the positive integer's high-order byte would be >= 128 then a leading zero byte is added.
We have an app that the Python module will write data to redis shards and the Java module will read data from redis shards, so I need to implement the exact same consistent hashing algorithm for Java and Python to make sure the data can be found.
I googled around and tried several implementations, but found the Java and Python implementations are always different, can't be used togather. Need your help.
Edit, online implementations I have tried:
Java: http://weblogs.java.net/blog/tomwhite/archive/2007/11/consistent_hash.html
Python: http://techspot.zzzeek.org/2012/07/07/the-absolutely-simplest-consistent-hashing-example/
http://amix.dk/blog/post/19367
Edit, attached Java (Google Guava lib used) and Python code I wrote. Code are based on the above articles.
import java.util.Collection;
import java.util.SortedMap;
import java.util.TreeMap;
import com.google.common.hash.HashFunction;
public class ConsistentHash<T> {
private final HashFunction hashFunction;
private final int numberOfReplicas;
private final SortedMap<Long, T> circle = new TreeMap<Long, T>();
public ConsistentHash(HashFunction hashFunction, int numberOfReplicas,
Collection<T> nodes) {
this.hashFunction = hashFunction;
this.numberOfReplicas = numberOfReplicas;
for (T node : nodes) {
add(node);
}
}
public void add(T node) {
for (int i = 0; i < numberOfReplicas; i++) {
circle.put(hashFunction.hashString(node.toString() + i).asLong(),
node);
}
}
public void remove(T node) {
for (int i = 0; i < numberOfReplicas; i++) {
circle.remove(hashFunction.hashString(node.toString() + i).asLong());
}
}
public T get(Object key) {
if (circle.isEmpty()) {
return null;
}
long hash = hashFunction.hashString(key.toString()).asLong();
if (!circle.containsKey(hash)) {
SortedMap<Long, T> tailMap = circle.tailMap(hash);
hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();
}
return circle.get(hash);
}
}
Test code:
ArrayList<String> al = new ArrayList<String>();
al.add("redis1");
al.add("redis2");
al.add("redis3");
al.add("redis4");
String[] userIds =
{"-84942321036308",
"-76029520310209",
"-68343931116147",
"-54921760962352"
};
HashFunction hf = Hashing.md5();
ConsistentHash<String> consistentHash = new ConsistentHash<String>(hf, 100, al);
for (String userId : userIds) {
System.out.println(consistentHash.get(userId));
}
Python code:
import bisect
import md5
class ConsistentHashRing(object):
"""Implement a consistent hashing ring."""
def __init__(self, replicas=100):
"""Create a new ConsistentHashRing.
:param replicas: number of replicas.
"""
self.replicas = replicas
self._keys = []
self._nodes = {}
def _hash(self, key):
"""Given a string key, return a hash value."""
return long(md5.md5(key).hexdigest(), 16)
def _repl_iterator(self, nodename):
"""Given a node name, return an iterable of replica hashes."""
return (self._hash("%s%s" % (nodename, i))
for i in xrange(self.replicas))
def __setitem__(self, nodename, node):
"""Add a node, given its name.
The given nodename is hashed
among the number of replicas.
"""
for hash_ in self._repl_iterator(nodename):
if hash_ in self._nodes:
raise ValueError("Node name %r is "
"already present" % nodename)
self._nodes[hash_] = node
bisect.insort(self._keys, hash_)
def __delitem__(self, nodename):
"""Remove a node, given its name."""
for hash_ in self._repl_iterator(nodename):
# will raise KeyError for nonexistent node name
del self._nodes[hash_]
index = bisect.bisect_left(self._keys, hash_)
del self._keys[index]
def __getitem__(self, key):
"""Return a node, given a key.
The node replica with a hash value nearest
but not less than that of the given
name is returned. If the hash of the
given name is greater than the greatest
hash, returns the lowest hashed node.
"""
hash_ = self._hash(key)
start = bisect.bisect(self._keys, hash_)
if start == len(self._keys):
start = 0
return self._nodes[self._keys[start]]
Test code:
import ConsistentHashRing
if __name__ == '__main__':
server_infos = ["redis1", "redis2", "redis3", "redis4"];
hash_ring = ConsistentHashRing()
test_keys = ["-84942321036308",
"-76029520310209",
"-68343931116147",
"-54921760962352",
"-53401599829545"
];
for server in server_infos:
hash_ring[server] = server
for key in test_keys:
print str(hash_ring[key])
You seem to be running into two issues simultaneously: encoding issues and representation issues.
Encoding issues come about particularly since you appear to be using Python 2 - Python 2's str type is not at all like Java's String type, and is actually more like a Java array of byte. But Java's String.getBytes() isn't guaranteed to give you a byte array with the same contents as a Python str (they probably use compatible encodings, but aren't guaranteed to - even if this fix doesn't change things, it's a good idea in general to avoid problems in the future).
So, the way around this is to use a Python type that behaves like Java's String, and convert the corresponding objects from both languages to bytes specifying the same encoding. From the Python side, this means you want to use the unicode type, which is the default string literal type if you are using Python 3, or put this near the top of your .py file:
from __future__ import unicode_literals
If neither of those is an option, specify your string literals this way:
u'text'
The u at the front forces it to unicode. This can then be converted to bytes using its encode method, which takes (unsurprisingly) an encoding:
u'text'.encode('utf-8')
From the Java side, there is an overloaded version of String.getBytes that takes an encoding - but it takes it as a java.nio.Charset rather than a string - so, you'll want to do:
"text".getBytes(java.nio.charset.Charset.forName("UTF-8"))
These will give you equivalent sequences of bytes in both languages, so that the hashes have the same input and will give you the same answer.
The other issue you may have is representation, depending on which hash function you use. Python's hashlib (which is the preferred implementation of md5 and other cryptographic hashes since Python 2.5) is exactly compatible with Java's MessageDigest in this - they both give bytes, so their output should be equivalent.
Python's zlib.crc32 and Java's java.util.zip.CRC32, on the other hand, both give numeric results - but Java's is always an unsigned 64 bit number, while Python's (in Python 2) is a signed 32 bit number (in Python 3, its now an unsigned 32-bit number, so this problem goes away). To convert a signed result to an unsigned one, do: result & 0xffffffff, and the result should be comparable to the Java one.
According to this analysis of hash functions:
Murmur2, Meiyan, SBox, and CRC32 provide good performance for all kinds of keys. They can be recommended as general-purpose hashing functions on x86.
Hardware-accelerated CRC (labeled iSCSI CRC in the table) is the fastest hash function on the recent Core i5/i7 processors. However, the CRC32 instruction is not supported by AMD and earlier Intel processors.
Python has zlib.crc32 and Java has a CRC32 class. Since it's a standard algorithm, you should get the same result in both languages.
MurmurHash 3 is available in Google Guava (a very useful Java library) and in pyfasthash for Python.
Note that these aren't cryptographic hash functions, so they're fast but don't provide the same guarantees. If these hashes are important for security, use a cryptographic hash.
Differnt language implementations of a hashing algorithm does not make the hash value different. The SHA-1 hash whether generated in java or python will be the same.
I'm not familiar with Redis, but the Python example appears to be hashing keys, so I'm assuming we're talking about some sort of HashMap implementation.
Your python example appears to be using MD5 hashes, which will be the same in both Java and Python.
Here is an example of MD5 hashing in Java:
http://www.dzone.com/snippets/get-md5-hash-few-lines-java
And in Python:
http://docs.python.org/library/md5.html
Now, you may want to find a faster hashing algorithm. MD5 is focused on cryptographic security, which isn't really needed in this case.
Here is a simple hashing function that produces the same result on both python and java for your keys:
Python
def hash(key):
h = 0
for c in key:
h = ((h*37) + ord(c)) & 0xFFFFFFFF
return h;
Java
public static int hash(String key) {
int h = 0;
for (char c : key.toCharArray())
h = (h * 37 + c) & 0xFFFFFFFF;
return h;
}
You don't need a cryptographically secure hash for this. That's just overkill.
Let's get this straight: the same binary input to the same hash function (SHA-1, MD5, ...) in different environments/implementations (Python, Java, ...) will yield the same binary output. That's because these hash functions are implemented according to standards.
Hence, you will discover the sources of the problem(s) you experience when answering these questions:
do you provide the same binary input to both hash functions (e.g. MD5 in Python and Java)?
do you interpret the binary output of both hash functions (e.g. MD5 in Python and Java) equivalently?
#lvc's answer provides much more detail on these questions.
For the java version, I would recommend using MD5 which generates 128bit string result and it can then be converted into BigInteger (Integer and Long are not enough to hold 128bit data).
Sample code here:
private static class HashFunc {
static MessageDigest md5;
static {
try {
md5 = MessageDigest.getInstance("MD5");
} catch (NoSuchAlgorithmException e) {
//
}
}
public synchronized int hash(String s) {
md5.update(StandardCharsets.UTF_8.encode(s));
return new BigInteger(1, md5.digest()).intValue();
}
}
Note that:
The java.math.BigInteger.intValue() converts this BigInteger to an int. This conversion is analogous to a narrowing primitive conversion from long to int. If this BigInteger is too big to fit in an int, only the low-order 32 bits are returned. This conversion can lose information about the overall magnitude of the BigInteger value as well as return a result with the opposite sign.