MD5 for my java hashcode which is int 32 bits - java

I am looking to implement my ConsistentHashing to which I can supply a good HashingFunction. A decent implementation using a SortedMap is explained here: https://weblogs.java.net/blog/tomwhite/archive/2007/11/consistent_hash.html
Now like suggested on the post I would like to use a Crypto function like MD5 which has good randomization. I understand MD5 gives back an inherent 128 bit output, however I need a randomized 32 bits. Would the following have high cardinality?
(1) Would the first 4 bytes of MD5 output be random enough? In which case I could just take first 32 bits of 128 bits of MD5 hash:
class MD5Hashing implements HashFunction{
#Override
public int getHash(String key) throws Exception{
MessageDigest digest = MessageDigest.getInstance("MD5");
byte[] byteArray = digest.digest(key.getBytes("UTF-8"));
ByteBuffer buffer = ByteBuffer.wrap(byteArray);
return buffer.getInt()& 0x7fffffff;
}
}
(2) What if I use just the String's internal Horner's algorithm which uses 31x+y on all characters in String?
class StringHashing implements HashFunction{
#Override
public int getHash(String key) throws Exception{
return key.hashCode()& 0x7fffffff;
}
}
(3) My internal Consistent Hashing like in link above is just a TreeMap Should I be using a BigInteger instead to still be able to get all 128 bits from MD5 or other Crypto algorithm?
private final SortedMap<Integer, T> circle = new TreeMap<Integer, T>();
EDIT:
Looks like both are bad, I even tried taking the last 4 bytes from MD5 hash. buffer.getInt(12).
Running for 5000 random strings following was the distribution.
{host4.a.b.com=1599, host3.a.b.com=1075, host2.a.b.com=238, host1.a.b.com=2088}

Found Murmur hash which has API to convert String input to a 32 bit hash output. Gave me a very nice distribution as well.
{host4.a.b.com=1665, host3.a.b.com=1373, host2.a.b.com=648, host1.a.b.com=1314}
http://d3s.mff.cuni.cz/~holub/sw/javamurmurhash/
public static int hash32( final String text) {...}

Related

Ed25519 in JDK 15, Parse public key from byte array and verify

Since Ed25519 has not been around for long (in JDK), there are very few resources on how to use it.
While their example is very neat and useful, I have some trouble understanding what am I doing wrong regarding key parsing.
They Public Key is being read from a packet sent by an iDevice.
(Let's just say, it's an array of bytes)
From the searching and trying my best to understand how the keys are encoded, I stumbled upon this message.
4. The public key A is the encoding of the point [s]B. First,
encode the y-coordinate (in the range 0 <= y < p) as a little-
endian string of 32 octets. The most significant bit of the
final octet is always zero. To form the encoding of the point
[s]B, copy the least significant bit of the x coordinate to the
most significant bit of the final octet. The result is the
public key.
That means that if I want to get y and isXOdd I have to do some work.
(If I understood correctly)
Below is the code for it, yet the verifying still fails.
I think, I did it correctly by reversing the array to get it back into Big Endian for BigInteger to use.
My questions are:
Is this the correct way to parse the public key from byte arrays
If it is, what could possibly be the reason for it to fail the verifying process?
// devicePublicKey: ByteArray
val lastIndex = devicePublicKey.lastIndex
val lastByte = devicePublicKey[lastIndex]
val lastByteAsInt = lastByte.toInt()
val isXOdd = lastByteAsInt.and(255).shr(7) == 1
devicePublicKey[lastIndex] = (lastByteAsInt and 127).toByte()
val y = devicePublicKey.reversedArray().asBigInteger
val keyFactory = KeyFactory.getInstance("Ed25519")
val nameSpec = NamedParameterSpec.ED25519
val point = EdECPoint(isXOdd, y)
val keySpec = EdECPublicKeySpec(nameSpec, point)
val key = keyFactory.generatePublic(keySpec)
Signature.getInstance("Ed25519").apply {
initVerify(key)
update(deviceInfo)
println(verify(deviceSignature))
}
And the data (before manipulation) (all in HEX):
Device identifier: 34444432393531392d463432322d343237442d414436302d444644393737354244443533
Device public key: e0a611c84db0ae91abfe2e6db91b6a457a4b41f9d8e09afdc7207ce3e4942e94
Device signature: a0383afb3bcbd43d08b04274a9214036f16195dc890c07a81aa06e964668955b29c5026d73d8ddefb12160529eeb66f843be4a925b804b575e6a259871259907
Device info: a86a71d42874b36e81a0acc65df0f2a84551b263b80b61d2f70929cd737176a434444432393531392d463432322d343237442d414436302d444644393737354244443533e0a611c84db0ae91abfe2e6db91b6a457a4b41f9d8e09afdc7207ce3e4942e94
// Device info is simply concatenated [hkdf, identifier, public key]
And the public key after the manipulation:
e0a611c84db0ae91abfe2e6db91b6a457a4b41f9d8e09afdc7207ce3e4942e14
Thank you very much, and every bit of help is greatly appreciated.
This will help many more who will stumble upon this problem at a later point, when the Ed25519 implementation will not be so fresh.
Helped me a lot. Would never have figured it out without your example.
I did it in java.
public static PublicKey getPublicKey(byte[] pk)
throws NoSuchAlgorithmException, InvalidKeySpecException, InvalidParameterSpecException {
// key is already converted from hex string to a byte array.
KeyFactory kf = KeyFactory.getInstance("Ed25519");
// determine if x was odd.
boolean xisodd = false;
int lastbyteInt = pk[pk.length - 1];
if ((lastbyteInt & 255) >> 7 == 1) {
xisodd = true;
}
// make sure most significant bit will be 0 - after reversing.
pk[pk.length - 1] &= 127;
// apparently we must reverse the byte array...
pk = ReverseBytes(pk);
BigInteger y = new BigInteger(1, pk);
NamedParameterSpec paramSpec = new NamedParameterSpec("Ed25519");
EdECPoint ep = new EdECPoint(xisodd, y);
EdECPublicKeySpec pubSpec = new EdECPublicKeySpec(paramSpec, ep);
PublicKey pub = kf.generatePublic(pubSpec);
return pub;
Actually, the whole encoding and decoding is correct.
The one thing in the end, that was the problem was that I (by mistake) reversed the array I read one too many times.
Reversing arrays since certain keys are encoded in little endian, while in order to represent it as a BigInteger in JVM, you have to reverse the little endian so it becomes big endian.
Hopefully this helps everyone in the future who will get stuck on any similar problems.
If there will be any questions, simply comment here or send me a message here.
I'll do my best to help you out.
I the code here is way way way more than you probably need. I investigated this and came up with what I think is equivalent, only much simpler. Anyhow, here is the blog piece: https://www.tbray.org/ongoing/When/202x/2021/04/19/PKI-Detective. and here is the Java code: https://github.com/timbray/blueskidjava
You can check how it is done in the OpenJDK implementation:
https://github.com/openjdk/jdk15/blob/master/src/jdk.crypto.ec/share/classes/sun/security/ec/ed/EdDSAPublicKeyImpl.java#L65
Basically encodedPoint is your byte array (just the plain bytes, without ASN.1 encoding).

Java Mac HMAC vs C++ OpenSSL hmac

This is going to be a long question but I have a really weird bug. I use OpenSSL in C++ to compute a HMAC and compare them to a simular implementation using javax.crypto.Mac. For some keys the HMAC calculation is correct and for others there is a difference in HMAC. I believe the problem occurs when the keys get to big. Here are the details.
Here is the most important code for C++:
void computeHMAC(std::string message, std::string key){
unsigned int digestLength = 20;
HMAC_CTX hmac_ctx_;
BIGNUM* key_ = BN_new();;
BN_hex2bn(&key_, key);
unsigned char convertedKey[BN_num_bytes(key_)];
BIGNUM* bn = BN_new();
HMAC_CTX_init(&hmac_ctx_);
BN_bn2bin(bn, convertedKey);
int length = BN_bn2bin(key_, convertedKey);
HMAC_Init_ex(&hmac_ctx_, convertedKey, length, EVP_sha1(), NULL);
/*Calc HMAC */
std::transform( message.begin(), message.end(), message.begin(), ::tolower);
unsigned char digest[digestLength];
HMAC_Update(&hmac_ctx_, reinterpret_cast<const unsigned char*>(message.c_str()),
message.length());
HMAC_Final(&hmac_ctx_, digest, &digestLength);
char mdString[40];
for(unsigned int i = 0; i < 20; ++i){
sprintf(&mdString[i*2], "%02x", (unsigned int)digest[i]);
}
std::cout << "\n\nMSG:\n" << message << "\nKEY:\n" + std::string(BN_bn2hex(key_)) + "\nHMAC\n" + std::string(mdString) + "\n\n";
}
The java test looks like this:
public String calculateKey(String msg, String key) throws Exception{
HMAC = Mac.getInstance("HmacSHA1");
BigInteger k = new BigInteger(key, 16);
HMAC.init(new SecretKeySpec(k.toByteArray(), "HmacSHA1"));
msg = msg.toLowerCase();
HMAC.update(msg.getBytes());
byte[] digest = HMAC.doFinal();
System.out.println("Key:\n" + k.toString(16) + "\n");
System.out.println("HMAC:\n" + DatatypeConverter.printHexBinary(digest).toLowerCase() + "\n");
return DatatypeConverter.printHexBinary(digest).toLowerCase();
}
Some test runs with different keys (all strings are interpreted as hex):
Key1:
736A66B29072C49AB6DC93BB2BA41A53E169D14621872B0345F01EBBF117FCE48EEEA2409CFC1BD92B0428BA0A34092E3117BEB4A8A14F03391C661994863DAC1A75ED437C1394DA0741B16740D018CA243A800DA25311FDFB9CA4361743E8511E220B79C2A3483FCC29C7A54F1EB804481B2DC87E54A3A7D8A94253A60AC77FA4584A525EDC42BF82AE2A1FD6E3746F626E0AFB211F6984367B34C954B0E08E3F612590EFB8396ECD9AE77F15D5222A6DB106E8325C3ABEA54BB59E060F9EA0
Msg:
test
Hmac OpenSSL:
b37f79df52afdbbc4282d3146f9fe7a254dd23b3
Hmac Java Mac:
b37f79df52afdbbc4282d3146f9fe7a254dd23b3
Key 2: 636A66B29072C49AB6DC93BB2BA41A53E169D14621872B0345F01EBBF117FCE48EEEA2409CFC1BD92B0428BA0A34092E3117BEB4A8A14F03391C661994863DAC1A75ED437C1394DA0741B16740D018CA243A800DA25311FDFB9CA4361743E8511E220B79C2A3483FCC29C7A54F1EB804481B2DC87E54A3A7D8A94253A60AC77FA4584A525EDC42BF82AE2A1FD6E3746F626E0AFB211F6984367B34C954B0E08E3F612590EFB8396ECD9AE77F15D5222A6DB106E8325C3ABEA54BB59E060F9EA0
Msg:
test
Hmac OpenSSL:
bac64a905fa6ae3f7bf5131be06ca037b3b498d7
Hmac Java Mac:
bac64a905fa6ae3f7bf5131be06ca037b3b498d7
Key 3: 836A66B29072C49AB6DC93BB2BA41A53E169D14621872B0345F01EBBF117FCE48EEEA2409CFC1BD92B0428BA0A34092E3117BEB4A8A14F03391C661994863DAC1A75ED437C1394DA0741B16740D018CA243A800DA25311FDFB9CA4361743E8511E220B79C2A3483FCC29C7A54F1EB804481B2DC87E54A3A7D8A94253A60AC77FA4584A525EDC42BF82AE2A1FD6E3746F626E0AFB211F6984367B34C954B0E08E3F612590EFB8396ECD9AE77F15D5222A6DB106E8325C3ABEA54BB59E060F9EA0
Msg:
test
Hmac OpenSSL:
c189c637317b67cee04361e78c3ef576c3530aa7
Hmac Java Mac:
472d734762c264bea19b043094ad0416d1b2cd9c
As the data shows, when the key gets to big, an error occurs. If have no idea which implementation is faulty. I have also tried with bigger keys and smaller keys. I haven't determined the exact threshold. Can anyone spot the problem? Is there anyone capable of telling me which HMAC is incorrect in the last case by doing a simulation using different software or can anyone tell me which 3rd implementation I could use to check mine?
Kind regards,
Roel Storms
When you convert a hexadecimal string to a BigInt in Java, it assumes the number is positive (unless the string includes a - sign).
But the internal representation of it is twos-complement. Meaning that one bit is used for the sign.
If you are converting a value that starts with a hex between 00 and 7F inclusive, then that's not a problem. It can convert the byte directly, because the leftmost bit is zero, which means that the number is considered positive.
But if you are converting a value that starts with 80 through FF, then the leftmost bit is 1, which will be considered negative. To avoid this, and keep the BigInteger value exactly as it is supplied, it adds another zero byte at the beginning.
So, internally, the conversion of a number such as 7ABCDE is the byte array
0x7a 0xbc 0xde
But the conversion of a number such as FABCDE (only the first byte is different!), is:
0x00 0xfa 0xbc 0xde
This means that for keys that begin with a byte in the range 80-FF, the BigInteger.toByteArray() is not producing the same array that your C++ program produced, but an array one byte longer.
There are several ways to work around this - like using your own hex-to-byte-array parser or finding an existing one in some library. If you want to use the one produced by BigInteger, you could do something like this:
BigInteger k = new BigInteger(key, 16);
byte[] kByteArr = k.toByteArray();
if ( kByteArr.length > (key.length() + 1) / 2 ) {
kByteArr = Arrays.copyOfRange(kByteArr,1,kByteArr.length);
}
Now you can use the kByteArr to perform the operation properly.
Another issue you should watch out for is keys whose length is odd. In general, you shouldn't have a hex octet string that has an odd length. A string like F8ACB is actually 0F8ACB (which is not going to cause an extra byte in BigInteger) and should be interpreted as such. This is why I wrote (key.length() + 1) in my formula - if key is odd-length, it should be interpreted as a one octet longer. This is also important to watch out for if you write your own hex-to-byte-array converter - if the length is odd, you should add a zero at the beginning before you start converting.

Md5 hash namespace partition

I intend to parition md5 hash (2^128) into four quadrants. I'm using BigInteger java class to represent md5 hash. Below is the code I tried to partition it but seems to be not working:
private static final BigInterger chunkSize_ = (new BigInterger(2)).pow(125);
...
byte[] hashValue = new byte[hash.remaining()]; // hash is md5 bytebuffer
hash.get(hashValue);
BigInteger hashNumber = new BigInteger(hashValue);
String chunk = hashNumber.divide(chunkSize_).toString();
Collecting stats I see chunk value to as high as 34. Idea was to divide 2^128 into 4 halves. Please let me know what is wrong with above code snippet.
Thanks

Converting a BigInteger into a Key

how do i cast a Big Integer into a Key for java cryptography library?
I am trying to use a shared diffie hellman key that i generated myself for the key value for AES encryption.
Below is the code that i used
BigInteger bi; long value = 1000000000;
bi = BigInteger.valueOf(value);
Key key = new Key (bi);
however it did not work.
May i know how do i convert a BigInteger value into a Key value?
Thanks in advance!
First, you cannot cast it. There is no relationship between the BigInteger class and the Key interface.
Second, Key is an interface not a class, so you can't create instances of it. What you need to create is an instance of some class that implements Key. And it most likely needs to be a specific implementation class, not (say) an anonymous class.
The final thing is that the Java crypto APIs are designed to hide the representation of the key. To create a key from bytes, you need to create a KeySpec object; e.g. SecretKeySpec(byte[] key, String algorithm)) and then use a KeyFactory to "generate" a key from it. Typical KeySpec constructors take a byte[] as a parameter, so you first need to get the byte array from your BigInteger instance.
You need to convert your BigInteger to a byte array of a specific size, then use the first (leftmost) bytes to create a key. For this you need to know the size of the prime p used in DH, as the value needs to be left-padded to represent a key. I would suggest to use standardized DH parameters (or at least make sure that the size of the prime is dividable by 8).
Note that there may be a zero valued byte in front of the byte array retrieved using BigInteger.toByteArray() because the value returned is encoded as a signed (two-complement) big-endian byte array. You need to remove this byte if the result is bigger than the prime (in bytes) because of it.
public static byte[] encodeSharedSecret(final BigInteger sharedSecret, final int primeSizeBits) {
// TODO assignment add additional tests on input
final int sharedSecretSize = (primeSizeBits + Byte.SIZE - 1) / Byte.SIZE;
final byte[] signedSharedSecretEncoding = sharedSecret.toByteArray();
final int signedSharedSecretEncodingLength = signedSharedSecretEncoding.length;
if (signedSharedSecretEncodingLength == sharedSecretSize) {
return signedSharedSecretEncoding;
}
if (signedSharedSecretEncodingLength == sharedSecretSize + 1) {
final byte[] sharedSecretEncoding = new byte[sharedSecretSize];
System.arraycopy(signedSharedSecretEncoding, 1, sharedSecretEncoding, 0, sharedSecretSize);
return sharedSecretEncoding;
}
if (signedSharedSecretEncodingLength < sharedSecretSize) {
final byte[] sharedSecretEncoding = new byte[sharedSecretSize];
System.arraycopy(signedSharedSecretEncoding, 0,
sharedSecretEncoding, sharedSecretSize - signedSharedSecretEncodingLength, signedSharedSecretEncodingLength);
return sharedSecretEncoding;
}
throw new IllegalArgumentException("Shared secret is too big");
}
After that you need to derive the key bytes using some kind of key derivation scheme. The one you should use depends on the standard you are implementing:
As stated in RFC 2631
X9.42 provides an algorithm for generating an essentially arbitrary
amount of keying material from ZZ. Our algorithm is derived from that
algorithm by mandating some optional fields and omitting others.
KM = H ( ZZ || OtherInfo)
H is the message digest function SHA-1 [FIPS-180] ZZ is the shared
secret value computed in Section 2.1.1. Leading zeros MUST be
preserved, so that ZZ occupies as many octets as p.
Note that I have discovered a bug in the Bouncy Castle libraries up to 1.49 (that's the current version at this date) in the DH implementation regarding the secret extraction - it does strip the spurious leading 00h valued bytes, but it forgets to left-pad the result up to the prime size p. This will lead to an incorrect derived key once in 192 times (!)

Same consistent-hashing algorithm implementation for Java and Python program

We have an app that the Python module will write data to redis shards and the Java module will read data from redis shards, so I need to implement the exact same consistent hashing algorithm for Java and Python to make sure the data can be found.
I googled around and tried several implementations, but found the Java and Python implementations are always different, can't be used togather. Need your help.
Edit, online implementations I have tried:
Java: http://weblogs.java.net/blog/tomwhite/archive/2007/11/consistent_hash.html
Python: http://techspot.zzzeek.org/2012/07/07/the-absolutely-simplest-consistent-hashing-example/
http://amix.dk/blog/post/19367
Edit, attached Java (Google Guava lib used) and Python code I wrote. Code are based on the above articles.
import java.util.Collection;
import java.util.SortedMap;
import java.util.TreeMap;
import com.google.common.hash.HashFunction;
public class ConsistentHash<T> {
private final HashFunction hashFunction;
private final int numberOfReplicas;
private final SortedMap<Long, T> circle = new TreeMap<Long, T>();
public ConsistentHash(HashFunction hashFunction, int numberOfReplicas,
Collection<T> nodes) {
this.hashFunction = hashFunction;
this.numberOfReplicas = numberOfReplicas;
for (T node : nodes) {
add(node);
}
}
public void add(T node) {
for (int i = 0; i < numberOfReplicas; i++) {
circle.put(hashFunction.hashString(node.toString() + i).asLong(),
node);
}
}
public void remove(T node) {
for (int i = 0; i < numberOfReplicas; i++) {
circle.remove(hashFunction.hashString(node.toString() + i).asLong());
}
}
public T get(Object key) {
if (circle.isEmpty()) {
return null;
}
long hash = hashFunction.hashString(key.toString()).asLong();
if (!circle.containsKey(hash)) {
SortedMap<Long, T> tailMap = circle.tailMap(hash);
hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();
}
return circle.get(hash);
}
}
Test code:
ArrayList<String> al = new ArrayList<String>();
al.add("redis1");
al.add("redis2");
al.add("redis3");
al.add("redis4");
String[] userIds =
{"-84942321036308",
"-76029520310209",
"-68343931116147",
"-54921760962352"
};
HashFunction hf = Hashing.md5();
ConsistentHash<String> consistentHash = new ConsistentHash<String>(hf, 100, al);
for (String userId : userIds) {
System.out.println(consistentHash.get(userId));
}
Python code:
import bisect
import md5
class ConsistentHashRing(object):
"""Implement a consistent hashing ring."""
def __init__(self, replicas=100):
"""Create a new ConsistentHashRing.
:param replicas: number of replicas.
"""
self.replicas = replicas
self._keys = []
self._nodes = {}
def _hash(self, key):
"""Given a string key, return a hash value."""
return long(md5.md5(key).hexdigest(), 16)
def _repl_iterator(self, nodename):
"""Given a node name, return an iterable of replica hashes."""
return (self._hash("%s%s" % (nodename, i))
for i in xrange(self.replicas))
def __setitem__(self, nodename, node):
"""Add a node, given its name.
The given nodename is hashed
among the number of replicas.
"""
for hash_ in self._repl_iterator(nodename):
if hash_ in self._nodes:
raise ValueError("Node name %r is "
"already present" % nodename)
self._nodes[hash_] = node
bisect.insort(self._keys, hash_)
def __delitem__(self, nodename):
"""Remove a node, given its name."""
for hash_ in self._repl_iterator(nodename):
# will raise KeyError for nonexistent node name
del self._nodes[hash_]
index = bisect.bisect_left(self._keys, hash_)
del self._keys[index]
def __getitem__(self, key):
"""Return a node, given a key.
The node replica with a hash value nearest
but not less than that of the given
name is returned. If the hash of the
given name is greater than the greatest
hash, returns the lowest hashed node.
"""
hash_ = self._hash(key)
start = bisect.bisect(self._keys, hash_)
if start == len(self._keys):
start = 0
return self._nodes[self._keys[start]]
Test code:
import ConsistentHashRing
if __name__ == '__main__':
server_infos = ["redis1", "redis2", "redis3", "redis4"];
hash_ring = ConsistentHashRing()
test_keys = ["-84942321036308",
"-76029520310209",
"-68343931116147",
"-54921760962352",
"-53401599829545"
];
for server in server_infos:
hash_ring[server] = server
for key in test_keys:
print str(hash_ring[key])
You seem to be running into two issues simultaneously: encoding issues and representation issues.
Encoding issues come about particularly since you appear to be using Python 2 - Python 2's str type is not at all like Java's String type, and is actually more like a Java array of byte. But Java's String.getBytes() isn't guaranteed to give you a byte array with the same contents as a Python str (they probably use compatible encodings, but aren't guaranteed to - even if this fix doesn't change things, it's a good idea in general to avoid problems in the future).
So, the way around this is to use a Python type that behaves like Java's String, and convert the corresponding objects from both languages to bytes specifying the same encoding. From the Python side, this means you want to use the unicode type, which is the default string literal type if you are using Python 3, or put this near the top of your .py file:
from __future__ import unicode_literals
If neither of those is an option, specify your string literals this way:
u'text'
The u at the front forces it to unicode. This can then be converted to bytes using its encode method, which takes (unsurprisingly) an encoding:
u'text'.encode('utf-8')
From the Java side, there is an overloaded version of String.getBytes that takes an encoding - but it takes it as a java.nio.Charset rather than a string - so, you'll want to do:
"text".getBytes(java.nio.charset.Charset.forName("UTF-8"))
These will give you equivalent sequences of bytes in both languages, so that the hashes have the same input and will give you the same answer.
The other issue you may have is representation, depending on which hash function you use. Python's hashlib (which is the preferred implementation of md5 and other cryptographic hashes since Python 2.5) is exactly compatible with Java's MessageDigest in this - they both give bytes, so their output should be equivalent.
Python's zlib.crc32 and Java's java.util.zip.CRC32, on the other hand, both give numeric results - but Java's is always an unsigned 64 bit number, while Python's (in Python 2) is a signed 32 bit number (in Python 3, its now an unsigned 32-bit number, so this problem goes away). To convert a signed result to an unsigned one, do: result & 0xffffffff, and the result should be comparable to the Java one.
According to this analysis of hash functions:
Murmur2, Meiyan, SBox, and CRC32 provide good performance for all kinds of keys. They can be recommended as general-purpose hashing functions on x86.
Hardware-accelerated CRC (labeled iSCSI CRC in the table) is the fastest hash function on the recent Core i5/i7 processors. However, the CRC32 instruction is not supported by AMD and earlier Intel processors.
Python has zlib.crc32 and Java has a CRC32 class. Since it's a standard algorithm, you should get the same result in both languages.
MurmurHash 3 is available in Google Guava (a very useful Java library) and in pyfasthash for Python.
Note that these aren't cryptographic hash functions, so they're fast but don't provide the same guarantees. If these hashes are important for security, use a cryptographic hash.
Differnt language implementations of a hashing algorithm does not make the hash value different. The SHA-1 hash whether generated in java or python will be the same.
I'm not familiar with Redis, but the Python example appears to be hashing keys, so I'm assuming we're talking about some sort of HashMap implementation.
Your python example appears to be using MD5 hashes, which will be the same in both Java and Python.
Here is an example of MD5 hashing in Java:
http://www.dzone.com/snippets/get-md5-hash-few-lines-java
And in Python:
http://docs.python.org/library/md5.html
Now, you may want to find a faster hashing algorithm. MD5 is focused on cryptographic security, which isn't really needed in this case.
Here is a simple hashing function that produces the same result on both python and java for your keys:
Python
def hash(key):
h = 0
for c in key:
h = ((h*37) + ord(c)) & 0xFFFFFFFF
return h;
Java
public static int hash(String key) {
int h = 0;
for (char c : key.toCharArray())
h = (h * 37 + c) & 0xFFFFFFFF;
return h;
}
You don't need a cryptographically secure hash for this. That's just overkill.
Let's get this straight: the same binary input to the same hash function (SHA-1, MD5, ...) in different environments/implementations (Python, Java, ...) will yield the same binary output. That's because these hash functions are implemented according to standards.
Hence, you will discover the sources of the problem(s) you experience when answering these questions:
do you provide the same binary input to both hash functions (e.g. MD5 in Python and Java)?
do you interpret the binary output of both hash functions (e.g. MD5 in Python and Java) equivalently?
#lvc's answer provides much more detail on these questions.
For the java version, I would recommend using MD5 which generates 128bit string result and it can then be converted into BigInteger (Integer and Long are not enough to hold 128bit data).
Sample code here:
private static class HashFunc {
static MessageDigest md5;
static {
try {
md5 = MessageDigest.getInstance("MD5");
} catch (NoSuchAlgorithmException e) {
//
}
}
public synchronized int hash(String s) {
md5.update(StandardCharsets.UTF_8.encode(s));
return new BigInteger(1, md5.digest()).intValue();
}
}
Note that:
The java.math.BigInteger.intValue() converts this BigInteger to an int. This conversion is analogous to a narrowing primitive conversion from long to int. If this BigInteger is too big to fit in an int, only the low-order 32 bits are returned. This conversion can lose information about the overall magnitude of the BigInteger value as well as return a result with the opposite sign.

Categories

Resources