Murmur3 hash different result between Python and Java implementation

Murmur3 hash different result between Python and Java implementation - java

I have two different program that wish to hash same string using Murmur3 in Python and Java respectively.
Python version 2.7.9:
mmh3.hash128('abc')
Gives 79267961763742113019008347020647561319L.
Java is Guava 18.0:
HashCode hashCode = Hashing.murmur3_128().newHasher().putString("abc", StandardCharsets.UTF_8).hash();
Gives string "6778ad3f3f3f96b4522dca264174a23b", converting to BigInterger gives 137537073056680613988840834069010096699.
How to get same result from both?
Thanks

Here's how to get the same result from both:
byte[] mm3_le = Hashing.murmur3_128().hashString("abc", UTF_8).asBytes();
byte[] mm3_be = Bytes.toArray(Lists.reverse(Bytes.asList(mm3_le)));
assertEquals("79267961763742113019008347020647561319",
new BigInteger(mm3_be).toString());
The hash code's bytes need to be treated as little endian but BigInteger interprets bytes as big endian. You were presumably using new BigInteger(hex, 16) to create the BigInteger, but the output of HashCode.toString() is actually a series of pairs of hexadecimal digits representing the hash bytes in the same order they're returned by asBytes() (little endian). (You can also reverse those pairs of hexadecimal to get a hex number that does produce the same result when passed to new BigInteger(reversedHex, 16)).
I think the documentation of toString() is somewhat confusing because of the way it refers to "big endian"; it doesn't actually mean that the output of the method is the hexadecimal number representing the bytes interpreted as big endian.
We have an open issue for adding asBigInteger() to HashCode.

If anyone is interested in the reverse answer, converting the python output to the Java output:
import mmh3
import string
char_array = '0123456789abcdef'
mumrmur = mmh3.hash_bytes('abc')
result = [f'{string.hexdigits[(char >> 4) & 0xf]}{string.hexdigits[char & 0xf]}' for char in mumrmur]
print(''.join(result))

Related

Hash a hexadecimal number with sha-256 in java [duplicate]

The question is about the correct way of creating a hash in Java:
Lets assume I have a positive BigInteger value that I would like to create a hash from. Lets assume that below instance of the messageDigest is a valid instance of (SHA-256)
public static final BigInteger B = new BigInteger("BD0C61512C692C0CB6D041FA01BB152D4916A1E77AF46AE105393011BAF38964DC46A0670DD125B95A981652236F99D9B681CBF87837EC996C6DA04453728610D0C6DDB58B318885D7D82C7F8DEB75CE7BD4FBAA37089E6F9C6059F388838E7A00030B331EB76840910440B1B27AAEAEEB4012B7D7665238A8E3FB004B117B58", 16);
byte[] byteArrayBBigInt = B.toByteArray();
this.printArray(byteArrayBBigInt);
messageDigest.reset();
messageDigest.update(byteArrayBBigInt);
byte[] outputBBigInt = messageDigest.digest();
Now I only assume that the code below is correct, as according to the test the hashes I produce match with the one produced by:
http://www.fileformat.info/tool/hash.htm?hex=BD0C61512C692C0CB6D041FA01BB152D4916A1E77AF46AE105393011BAF38964DC46A0670DD125B95A981652236F99D9B681CBF87837EC996C6DA04453728610D0C6DDB58B318885D7D82C7F8DEB75CE7BD4FBAA37089E6F9C6059F388838E7A00030B331EB76840910440B1B27AAEAEEB4012B7D7665238A8E3FB004B117B58
However I am not sure why we are doing the step below i.e.
because the returned byte array after the digest() call is signed and in this case it is a negative, I suspect that we do need to convert it to a positive number i.e. we can use a function like that.
public static String byteArrayToHexString(byte[] b) {
String result = "";
for (int i=0; i < b.length; i++) {
result += Integer.toString((b[i] & 0xff) + 0x100, 16).substring(1);
}
return result;
}
thus:
String hex = byteArrayToHexString(outputBBigInt)
BigInteger unsignedBigInteger = new BigInteger(hex, 16);
When I construct a BigInteger from the new hex string and convert it back to byte array then I see that the sign bit, that is most significant bit i.e. the leftmost bit, is set to 0 which means that the number is positive, moreover the whole byte is constructed from zeros ( 00000000 ).
My question is: Is there any RFC that describes why do we need to convert the hash always to a "positive" unsigned byte array. I mean even if the number produced after the digest call is negative it is still a valid hash, right? thus why do we need that additional procedure. Basically, I am looking for a paper: standard or rfc describing that we need to do so.

A hash consists of an octet string (called a byte array in Java). How you convert it to or from a large number (a BigInteger in Java) is completely out of the scope for cryptographic hash algorithms. So no, there is no RFC to describe it as there is (usually) no reason to treat a hash as a number. In that sense a cryptographic hash is rather different from Object.hashCode().
That you can only treat hexadecimals as unsigned is a bit of an issue, but if you really want to then you can first convert it back to a byte array, and then perform new BigInteger(result). That constructor does threat the encoding within result as signed. Note that in protocols it is often not needed to convert back and forth to hexadecimals; hexadecimals are mainly for human consumption, a computer is fine with bytes.

How to convert RSA encrypted numbers into text/characters

I wrote a RSA encryption in Java. I am trying to turn the numbers that it outputs into text or characters. For example if I feed it Hello I get:
23805663430659911910
However, online RSA encryptions return something to the effect of this:
GVom5zCerZ+dmOCE7YAp0F+N3L26L
I would just like to know how to convert my numbers into something similar. The number returned by my system is a BigInteger. This is what I've tried so far:
RSA rsa = new RSA("Hello");
BigInteger cypher_number = rsa.encrypt(); // 23805663430659911910
byte[] cypher_bytes = cypher_number.toByteArray(); // [B#368102c8
String cypher_text = new String(cypher_bytes); // J^��*���
// Now even though cypher_text is J^��*��� I wouldn't care as long as I can turn it back.
byte[] plain_bytes = cypher_text.getBytes(); // [B#6996db8 | Not the same as cypher_bytes but lets keep going.
BigInteger plain_number = new BigInteger(plain_bytes); // 28779359581043512470254837759607478877667261
// plain_number has more than doubled in size compared to cypher_number and won't decrypt properly.
Using bytes it the only way I can think of. Can someone please help me understand what I'm supposed to be doing or if it's even possible?

This is generally a 2-step process:
convert to binary encoding of the number;
convert the binary encoding to a text base encoding.
For both steps there are multiple schemes possible.
For binary encoding: the PKCS#1 specifications have always included one that converts the number to a statically sized integer. To be precise, it describes the number into a statically sized, unsigned, big endian octet string. An octet string is nothing but a byte array.
Now, BigInteger.toByteArray returns a dynamically sized, signed, big endian octet string. So you need to implement the possible resizing and removal of initial 00 byte in a separate method, which I have at my other post here. Fortunately going back to a number is much easier as the Java implementation provides a BigInteger(int sign, byte[] value) constructor that reads in an unsigned number and skips leading zero bytes.
Having a standardized and statically sized octet string can be terribly useful, so I would not go for any other scheme.
This leaves the conversion to and from text. For that you can (indeed) use the java.util.Base64 class, which doesn't need much explaining. The only note that I must make is that it converts to an ASCII byte[] for some of the methods, so you need to use the encodeToString(byte[] src) instead.
Another method would be hexadecimals, but since Java doesn't contain a hex encoder for byte arrays in the base classes, I'd go for base 64 instead.

I have found the answer. In case you've found this looking for the answer, you just need to encode the numbers into Base64.
The following code converts the number into a dynamically sized, signed, big endian encoded integer, and then converts it back into a number using the reverse process.
// Encode
BigInteger numbers = new BigInteger("5109763");
byte[] bytes = Base64.getEncoder().encode(numbers.toByteArray());
String encoded = new String(bytes); // Encoded value
// Decode
byte[] decoded_bytes = Base64.getDecoder().decode(encoded.getBytes());
BigInteger numbers_again = new BigInteger(decoded_bytes); // Original numbers

TOTP / HOTP / HmacSHA256 with unsigned bytes key in Java

As we can see from the following questions:
Java HmacSHA256 with key
Java vs. Golang for HOTP (rfc-4226)
, Java doesn't really play nicely when using a key in a TOTP / HOTP / HmacSHA256 use case. My analysis is that the following cause trouble:
String.getBytes will (of course) give negative byte values for characters with a character value > 127;
javax.crypto.Mac and javax.crypto.spec.SecretKeySpec both externally and internally use byte[] for accepting and transforming the key.
We have acquired a number of Feitian C-200 Single Button OTP devices, and they come with a hexadecimal string secret which consist of byte values > 127.
We have successfully created a PoC in Ruby for these tokens, which works flawlessly. Since we want to integrate these in Keycloak, we need to find a Java solution.
Since every implementation of TOTP / HOTP / HmacSHA256 we have seen makes use the javax.crypto library and byte[], we fear we have to rewrite all the used classes but using int in order to support this scenario.
Q: Is there another way? How can we use secrets in a HmacSHA256 calculation in Java of which the bytes have values > 127 without having to rewrite everything?
Update
I was looking in the wrong direction. My problem was that the key was represented a String (UTF-16 in Java), which contained Unicode characters that were exploded into two bytes by getBytes(), before being passed into the SecretKeySpec.
Forcing StandardCharsets.ISO_8859_1 on this conversion fixes the problem.

Signed vs. unsigned is a presentation issue that's mainly relevant to humans only. The computer doesn't know or care whether 0xFF means -1 or 255 to you. So no, you don't need to use ints, using byte[] works just fine.
This doesn't mean that you can't break things, since some operations work based on default signed variable types. For example:
byte b = (byte)255; // b is -1 or 255, depending on how you interpret it
int i = b; // i is -1 or 2³² instead of 255
int u = b & 0xFF; // u is 255
It seems to throw many people off that Java has only signed primitives (boolean and char not withstanding). However Java is perfectly capable of performing cryptographic operations, so all these questions where something is "impossible" are just user errors. Which is not something you want when writing security sensitive code.

Don't be afraid of Java :) I've tested dozens tokens from different vendors, and everything is fine with Java, you just need to pickup correct converter.
It's common issue to get bytes from String as getBytes() instead of using proper converter. The file you have from your vendor represent secret keys in hex format, so just google 'java hex string to byte array' and choose solution, that works for you.
Hex, Base32, Base64 is just a representation and you can easily convert from one to another.

I've ran into absolutely the same issue (some years later): we got Feitian devices, and had to set up their server side code.
None of the available implementations worked with them (neither php or java).
Solution: Feitian devices come with seeds in hexadecimal. First you have to decode the seed into raw binary (e.g. in PHP using the hex2bin()). That data is the correct input of the TOTP/HOTP functions.
The hex2bin() version of java is a bit tricky, and its solution is clearly written in the question of the OP.
(long story short: the result of hex2bin you have to interpret with StandardCharsets.ISO_8859_1, otherwise some chars will be interpreted as 2 bytes utf-16 char, which causes different passcode at the end)
String hex = "1234567890ABCDEF"; // original seed from Feitian
Sring secretKey = new String(hex2bin(hex), StandardCharsets.ISO_8859_1);
Key key = new SecretKeySpec(secretKey.getBytes(StandardCharsets.ISO_8859_1), "RAW");
// or without String representation:
Key key = new SecretKeySpec(hex2bin(hex), "RAW");

What integer format when reading binary data from Java DataOutputStream in PHP?

I'm aware that this is probably not the best idea but I've been playing around trying to read a file in PHP that was encoded using Java's DataOutputStream.
Specifically, in Java I use:
dataOutputStream.writeInt(number);
Then in PHP I read the file using:
$data = fread($handle, 4);
$number = unpack('N', $data);
The strange thing is that the only format character in PHP that gives the correct value is 'N', which is supposed to represent "unsigned long (always 32 bit, big endian byte order)". I thought that int in java was always signed?
Is it possible to reliably read data encoded in Java in this way or not? In this case the integer will only ever need to be positive. It may also need to be quite large so writeShort() is not possible. Otherwise of course I could use XML or JSON or something.

This is fine, as long as you don't need that extra bit. l (instead of N) would work on a big endian machine.
Note, however, that the maximum number that you can store is 2,147,483,647 unless you want to do some math on the Java side to get the proper negative integer to represent the desired unsigned integer.
Note that a signed Java integer uses the two's complement method to represent a negative number, so it's not as easy as flipping a bit.

DataOutputStream.writeInt:
Writes an int to the underlying output stream as four bytes, high byte
first.
The formats available for the unpack function for signed integers all use machine dependent byte order. My guess is that your machine uses a different byte order than Java. If that is true, the DataOutputStream + unpack combination will not work for any signed primitive.

Converting US-ASCII encoded byte to integer and back

I have a byte array that can be of size 2,3 or 4. I need to convert this to the correct integer value. I also need to do this in reverse, i.e an 2,3 or 4 character integer to a byte array.
e.g., raw hex bytes are : 54 and 49. The decoded string US-ASCII value is 61. So the integer answer needs to be 61.
I have read all the conversion questions on stackoverflow etc that I could find, but they all give the completely wrong answer, I dont know whether it could be the encoding?
If I do new String(lne,"US-ASCII"), where lne is my byte array, I get the correct 61. But when doing this ((int)lne[0] << 8) | ((int)lne[1] & 0xFF), I get the complete wrong answer.
This may be a silly mistake or I completely don't understand the number representation schemes in Java and the encoding/decoding idea.
Any help would be appreciated.
NOTE: I know I can just parse the String to integer, but I would like to know if there is a way to use fast operations like shifting and binary arithmetic instead?

Here's a thought on how to use fast operations like byte shifting and decimal arithmetic to speed this up. Assuming you have the current code:
byte[] token; // bytes representing a bunch of ascii numbers
int n = Integer.parseInt(new String(token)); // current approach
Then you could instead replace that last line and do the following (assuming no negative numbers, no foreign langauge characters, etc.):
int n = 0;
for (byte b : token)
n = 10*n + (b-'0');
Out of interest, this resulted in roughly a 28% speedup for me on a massive data set. I think this is due to not having to allocate new String objects and then trash them after each parseInt call.

You need two conversion steps. First, convert your ascii bytes to a string. That's what new String(lne,"us-ascii") does for you. Then, convert the string representation of the number to an actual number. For that you use something like Integer.parseInt(theString) -- remember to handle NumberFormatException.

As you say, new String(lne,"US-ASCII") will give you the correct string. To convert your String to an integer, use int myInt = Integer.parseInt(new String(lne,"US-ASCII"));

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Murmur3 hash different result between Python and Java implementation - java

Related

Hash a hexadecimal number with sha-256 in java [duplicate]

How to convert RSA encrypted numbers into text/characters

TOTP / HOTP / HmacSHA256 with unsigned bytes key in Java

What integer format when reading binary data from Java DataOutputStream in PHP?

Converting US-ASCII encoded byte to integer and back

Categories

Resources