Generate Ethereum addresses in HD Wallet using public key only (bitcoinj/web3j) - java

I trying to generate Ethereum addresses for the HD Wallet keys implemented with bitcoinj library, but I got confused:
DeterministicSeed seed = new DeterministicSeed("some seed code here", null, "", 1409478661L);
DeterministicKeyChain chain = DeterministicKeyChain.builder().seed(seed).build();
DeterministicKey addrKey = chain.getKeyByPath(HDUtils.parsePath("M/44H/60H/0H/0/0"), true);
System.out.println("address from pub=" + Keys.getAddress(Sign.publicKeyFromPrivate(addrKey.getPrivKey())));
this code prints a correct Ethereum address accordingly to https://iancoleman.io/bip39/. Everything is fine here.
But when I trying to avoid private key usage and generate non-hardened keys using public keys only I getting different results, i.e. the call returns another result:
System.out.println("address from pub=" + Keys.getAddress(addrKey.getPublicKeyAsHex()));
And it looks like the issue is in the "different public keys", i.e. result of the Sign.publicKeyFromPrivate(addrKey.getPrivKey()) and addrKey.getPublicKeyAsHex() are different.
I'm not experienced with cryptography, thus it may be a silly question... but I would appreciate any advice here.

Like Bitcoin, Ethereum uses secp256k1. Ethereum addresses are derived as follows:
Step 1: The 32 bytes x and y coordinate of the public key are concatenated to 64 bytes (where both the x and y coordinate are padded with leading 0x00 values if necessary).
Step 2: From this the Keccak-256 hash is generated.
Step 3: The last 20 bytes are used as the Ethereum address.
For the examples used here, the key is generated with:
String mnemonic = "elevator dinosaur switch you armor vote black syrup fork onion nurse illegal trim rocket combine";
DeterministicSeed seed = new DeterministicSeed(mnemonic, null, "", 1409478661L);
DeterministicKeyChain chain = DeterministicKeyChain.builder().seed(seed).build();
DeterministicKey addrKey = chain.getKeyByPath(HDUtils.parsePath("M/44H/60H/0H/0/0"), true);
This corresponds to the following public key and Ethereum address:
X: a35bf0fdf5df296cc3600422c3c8af480edb766ff6231521a517eb822dff52cd
Y: 5440f87f5689c2929542e75e739ff30cd1e8cb0ef0beb77380d02cd7904978ca
Address: 23ad59cc6afff2e508772f69d22b19ffebf579e7
as can also be verified with the website https://iancoleman.io/bip39/.
Step 1:
In the posted question, the expressions Sign.publicKeyFromPrivate() and addrKey.getPublicKeyAsHex() provide different results. Both functions return the public key in different types. While Sign.publicKeyFromPrivate() uses a BigInteger, addrKey.getPublicKeyAsHex() provides a hex string. For a direct comparison, BigInteger can be converted to a hex string with toString(16). When the results of both expressions are displayed with:
System.out.println(Sign.publicKeyFromPrivate(addrKey.getPrivKey()).toString(16));
System.out.println(addrKey.getPublicKeyAsHex());
the following result is obtained:
a35bf0fdf5df296cc3600422c3c8af480edb766ff6231521a517eb822dff52cd5440f87f5689c2929542e75e739ff30cd1e8cb0ef0beb77380d02cd7904978ca
02a35bf0fdf5df296cc3600422c3c8af480edb766ff6231521a517eb822dff52cd
The output of Sign.publicKeyFromPrivate() has a length of 64 bytes and corresponds to the concatenated x and y coordinate as defined in step 1. Therefore, the address generated with this is a valid Ethereum address, as also described in the posted question.
The output of addrKey.getPublicKeyAsHex(), on the other hand, corresponds to the x coordinate prefixed with a 0x02 value. This is the compressed format of the public key. The leading byte has either the value 0x02 if the y value is even (as in this example), or the value 0x03. Since the compressed format does not contain the y coordinate, this cannot be used to directly infer the Ethereum address, or if it is done anyway, it will result in a wrong address (indirectly, of course, it would be possible since the y coordinate can be derived from a compressed public key).
The uncompressed format of the public key can be obtained, e.g. with addrKey.decompress():
System.out.println(addrKey.decompress().getPublicKeyAsHex());
which gives this result:
04a35bf0fdf5df296cc3600422c3c8af480edb766ff6231521a517eb822dff52cd5440f87f5689c2929542e75e739ff30cd1e8cb0ef0beb77380d02cd7904978ca
The uncompressed format consists of a leading marker byte with the value 0x04 followed by the x and y coordinates. So if the leading marker byte is removed, just the data according to step 1 is obtained, which is needed for the derivation of the Ethereum address:
System.out.println(addrKey.decompress().getPublicKeyAsHex().substring(2));
which results in:
a35bf0fdf5df296cc3600422c3c8af480edb766ff6231521a517eb822dff52cd5440f87f5689c2929542e75e739ff30cd1e8cb0ef0beb77380d02cd7904978ca
Steps 2 and 3:
Steps 2 and 3 are performed by Keys.getAddress(). This allows the Ethereum address to be obtained using the uncompressed public key as follows:
System.out.println(Keys.getAddress(addrKey.decompress().getPublicKeyAsHex().substring(2)));
System.out.println(Keys.getAddress(Sign.publicKeyFromPrivate(addrKey.getPrivKey()))); // For comparison
which gives the Ethereum address:
23ad59cc6afff2e508772f69d22b19ffebf579e7
23ad59cc6afff2e508772f69d22b19ffebf579e7
Overloads of Keys.getAddress():
Keys.getAddress() provides various overloads for the data types BigInteger, hex string and byte[]. If the uncompressed key is given as byte[], e.g. with addrKey.getPubKeyPoint().getEncoded(false), the byte[] can be used directly after removing the marker byte. Alternatively, the byte[] can be converted to a BigInteger with the marker byte removed:
byte[] uncompressed = addrKey.getPubKeyPoint().getEncoded(false);
System.out.println(bytesToHex(Keys.getAddress(Arrays.copyOfRange(uncompressed, 1, uncompressed.length))).toLowerCase()); // bytesToHex() from https://stackoverflow.com/a/9855338
System.out.println(Keys.getAddress(new BigInteger(1, uncompressed, 1, uncompressed.length - 1)));
System.out.println(Keys.getAddress(Sign.publicKeyFromPrivate(addrKey.getPrivKey()))); // For comparison
which as expected returns the same Ethereum address:
23ad59cc6afff2e508772f69d22b19ffebf579e7
23ad59cc6afff2e508772f69d22b19ffebf579e7
23ad59cc6afff2e508772f69d22b19ffebf579e7
One thing to note here is that Keys.getAddress(byte[]) does not pad the passed byte[], while the overloads for BigInteger or hex strings implicitly pad. This can be relevant e.g. when converting a BigInteger (e.g. provided by Sign.publicKeyFromPrivate(addrKey.getPrivKey())) to a byte[], since the result can also have less than 64 bytes (which would lead to different Keccak-256 hashes). If Keys.getAddress(byte[]) is used in this case, it must be explicitly padded with leading 0x00 values up to a length of 64 bytes.

Related

Keeping Java String Offsets With Unicode Consistent in Python

We are building a Python 3 program which calls a Java program. The Java program (which is a 3rd party program we cannot modify) is used to tokenize strings (find the words) and provide other annotations. Those annotations are in the form of character offsets.
As an example, we might provide the program with string data such as "lovely weather today". It provides something like the following output:
0,6
7,14
15,20
Where 0,6 are the offsets corresponding to word "lovely", 7,14 are the offsets corresponding to the word "weather" and 15,20 are offsets corresponding to the word "today" within the source string. We read these offsets in Python to extract the text at those points and perform further processing.
All is well and good as long as the characters are within the Basic Multilingual Plane (BMP). However, when they are not, the offsets reported by this Java program show up all wrong on the Python side.
For example, given the string "I feel 🙂 today", the Java program will output:
0,1
2,6
7,9
10,15
On the Python side, these translate to:
0,1 "I"
2,6 "feel"
7,9 "🙂 "
10,15 "oday"
Where the last index is technically invalid. Java sees "🙂" as length 2, which causes all the annotations after that point to be off by one from the Python program's perspective.
Presumably this occurs because Java encodes strings internally in a UTF-16esqe way, and all string operations act on those UTF-16esque code units. Python strings, on the other hand, appear to operate on the actual unicode characters (code points). So when a character shows up outside the BMP, the Java program sees it as length 2, whereas Python sees it as length 1.
So now the question is: what is the best way to "correct" those offsets before Python uses them, so that the annotation substrings are consistent with what the Java program intended to output?
You could convert the string to a bytearray in UTF16 encoding, then use the offsets (multiplied by 2 since there are two bytes per UTF-16 code-unit) to index that array:
x = "I feel 🙂 today"
y = bytearray(x, "UTF-16LE")
offsets = [(0,1),(2,6),(7,9),(10,15)]
for word in offsets:
print(str(y[word[0]*2:word[1]*2], 'UTF-16LE'))
Output:
I
feel
🙂
today
Alternatively, you could convert every python character in the string individually to UTF-16 and count the number of code-units it takes. This lets you map the indices in terms of code-units (from Java) to indices in terms of Python characters:
from itertools import accumulate
x = "I feel 🙂 today"
utf16offsets = [(0,1),(2,6),(7,9),(10,15)] # from java program
# map python string indices to an index in terms of utf-16 code units
chrLengths = [len(bytearray(ch, "UTF-16LE"))//2 for ch in x]
utf16indices = [0] + list(itertools.accumulate(chrLengths))
# reverse the map so that it maps utf16 indices to python indices
index_map = dict((x,i) for i, x in enumerate(utf16indices))
# convert the offsets from utf16 code-unit indices to python string indices
offsets = [(index_map[o[0]], index_map[o[1]]) for o in utf16offsets]
# now you can just use those indices as normal
for word in offsets:
print(x[word[0]:word[1]])
Output:
I
feel
🙂
today
The above code is messy and can probably be made clearer, but you get the idea.
This solves the problem given the proper encoding, which, in our situation appears to be 'UTF-16BE':
def correct_offsets(input, offsets, encoding):
offset_list = [{'old': o, 'new': [o[0],o[1]]} for o in offsets]
for idx in range(0, len(input)):
if len(input[idx].encode(encoding)) > 2:
for o in offset_list:
if o['old'][0] > idx:
o['new'][0] -= 1
if o['old'][1] > idx:
o['new'][1] -= 1
return [o['new'] for o in offset_list]
This may be pretty inefficient though. I gladly welcome any performance improvements.

Why does my Jose4j JSON Web Key cause this InvalidKeyException?

I am using Jose4j to perform the encryption of a JSON Web Token in Java.
I create a key as a String in JSON format to pass to the JsonWebKey.Factory.newJwk method, thus:
String jwkJson = "{\"kty\":\"oct\",\"k\":\"5uP3r53cR37k3yPW\"}";
I pass it to the factory and get a JsonWebKey (jwk) back.
Then pass the key (from the jwk.getKey() method) in to the JsonWebEncryption's setKey() method.
I set the AlgorithmHeaderValue and the EncryptionMethodHeaderParameter...
Then, when I call jwe.getCompactSerialization() it throws the following exception
org.jose4j.lang.InvalidKeyException:
Invalid key for JWE A128KW, expected a 128 bit key but a 96 bit key was provided.
I passed in 16 bytes, so why does this evaluate to 96 bits insted of 128??
You need to base64 encode the key string before adding it to the JSON object jwkJson.
E.G.
String pass = "5uP3r53cR37k3yPW";
String jwkJson = "{\"kty\":\"oct\",\"k\":\""+ Base64Url.encodeUtf8ByteRepresentation(pass) +"\"}";
In the factory method of JsonWebKey, after it has retrieved the key (k) value from the JSON object, it base64 decodes it. This has the effect (if you have not encoded it first) of reducing the number of characters that the bit pattern represents by 3.
As to why this occurs, I am a little confused. I would assume that if you took a binary string that describes a string of characters using an 8 bit representation (UTF-8, the native charset in Java), that re-interpreting that binary string as characters using a 6 bit representation (base64), would yield a longer string!
The "oct" JWK key type used for symmetric keys base64url encodes the key value for the value of the "k" parameter (see https://www.rfc-editor.org/rfc/rfc7518#section-6.4). While "5uP3r53cR37k3yPW" is 16 characters, it uses the base64url alphabet and decodes to 12 bytes (96 bits) of raw data when processed as the JWK key value. The k value needs to be a bit longer to represent 16 bytes / 128 bits. Something like String jwkJson = "{\"kty\":\"oct\",\"k\":\"5uP3r53cR37k3yPWj_____\"}";, for example, is a 128 bit symmetric JWK that would work with what you are doing. However, encryption keys really should be created using secure random number generation rather than something that looks like a password. FWIW, JsonWebKey jwk = OctJwkGenerator.generateJwk(128); is one maybe convenient way to generate 128 bit symmetric JWK objects.

Hashing raw bytes in Python and Java produces different results

I'm trying to replicate the behavior of a Python 2.7 function in Java, but I'm getting different results when running a (seemingly) identical sequence of bytes through a SHA-256 hash. The bytes are generated by manipulating a very large integer (exactly 2048 bits long) in a specific way (2nd line of my Python code example).
For my examples, the original 2048-bit integer is stored as big_int and bigInt in Python and Java respectively, and both variables contain the same number.
Python2 code I'm trying to replicate:
raw_big_int = ("%x" % big_int).decode("hex")
buff = struct.pack(">i", len(raw_big_int) + 1) + "\x00" + raw_big_int
pprint("Buffer contains: " + buff)
pprint("Encoded: " + buff.encode("hex").upper())
digest = hashlib.sha256(buff).digest()
pprint("Digest contains: " + digest)
pprint("Encoded: " + digest.encode("hex").upper())
Running this code prints the following (note that the only result I'm actually interested in is the last one - the hex-encoded digest. The other 3 prints are just to see what's going on under the hood):
'Buffer contains: \x00\x00\x01\x01\x00\xe3\xbb\xd3\x84\x94P\xff\x9c\'\xd0P\xf2\xf0s,a^\xf0i\xac~\xeb\xb9_\xb0m\xa2&f\x8d~W\xa0\xb3\xcd\xf9\xf0\xa8\xa2\x8f\x85\x02\xd4&\x7f\xfc\xe8\xd0\xf2\xe2y"\xd0\x84ck\xc2\x18\xad\xf6\x81\xb1\xb0q\x19\xabd\x1b>\xc8$g\xd7\xd2g\xe01\xd4r\xa3\x86"+N\\\x8c\n\xb7q\x1c \x0c\xa8\xbcW\x9bt\xb0\xae\xff\xc3\x8aG\x80\xb6\x9a}\xd9*\x9f\x10\x14\x14\xcc\xc0\xb6\xa9\x18*\x01/eC\x0eQ\x1b]\n\xc2\x1f\x9e\xb6\x8d\xbfb\xc7\xce\x0c\xa1\xa3\x82\x98H\x85\xa1\\\xb2\xf1\'\xafmX|\x82\xe7%\x8f\x0eT\xaa\xe4\x04*\x91\xd9\xf4e\xf7\x8c\xd6\xe5\x84\xa8\x01*\x86\x1cx\x8c\xf0d\x9cOs\xebh\xbc1\xd6\'\xb1\xb0\xcfy\xd7(\x8b\xeaIf6\xb4\xb7p\xcdgc\xca\xbb\x94\x01\xb5&\xd7M\xf9\x9co\xf3\x10\x87U\xc3jB3?vv\xc4JY\xc9>\xa3cec\x01\x86\xe9c\x81F-\x1d\x0f\xdd\xbf\xe8\xe9k\xbd\xe7c5'
'Encoded: 0000010100E3BBD3849450FF9C27D050F2F0732C615EF069AC7EEBB95FB06DA226668D7E57A0B3CDF9F0A8A28F8502D4267FFCE8D0F2E27922D084636BC218ADF681B1B07119AB641B3EC82467D7D267E031D472A386222B4E5C8C0AB7711C200CA8BC579B74B0AEFFC38A4780B69A7DD92A9F101414CCC0B6A9182A012F65430E511B5D0AC21F9EB68DBF62C7CE0CA1A382984885A15CB2F127AF6D587C82E7258F0E54AAE4042A91D9F465F78CD6E584A8012A861C788CF0649C4F73EB68BC31D627B1B0CF79D7288BEA496636B4B770CD6763CABB9401B526D74DF99C6FF3108755C36A42333F7676C44A59C93EA36365630186E96381462D1D0FDDBFE8E96BBDE76335'
'Digest contains: Q\xf9\xb9\xaf\xe1\xbey\xdc\xfa\xc4.\xa9 \xfckz\xfeB\xa0>\xb3\xd6\xd0*S\xff\xe1\xe5*\xf0\xa3i'
'Encoded: 51F9B9AFE1BE79DCFAC42EA920FC6B7AFE42A03EB3D6D02A53FFE1E52AF0A369'
Now, below is my Java code so far. When I test it, I get the same value for the input buffer, but a different value for the digest. (bigInt contains a BigInteger object containing the same number as big_int in the Python example above)
byte[] rawBigInt = bigInt.toByteArray();
ByteBuffer buff = ByteBuffer.allocate(rawBigInt.length + 4);
buff.order(ByteOrder.BIG_ENDIAN);
buff.putInt(rawBigInt.length).put(rawBigInt);
System.out.print("Buffer contains: ");
System.out.println( DatatypeConverter.printHexBinary(buff.array()) );
MessageDigest hash = MessageDigest.getInstance("SHA-256");
hash.update(buff);
byte[] digest = hash.digest();
System.out.print("Digest contains: ");
System.out.println( DatatypeConverter.printHexBinary(digest) );
Notice that in my Python example, I started the buffer off with len(raw_big_int) + 1 packed, where in Java I started with just rawBigInt.length. I also omitted the extra 0-byte ("\x00") when writing in Java. I did both of these for the same reason - in my tests, calling toByteArray() on a BigInteger returned a byte array already beginning with a 0-byte that was exactly 1 byte longer than Python's byte sequence. So, at least in my tests, len(raw_big_int) + 1 equaled rawBigInt.length, since rawBigInt began with a 0-byte and raw_big_int did not.
Alright, that aside, here is the Java code's output:
Buffer contains: 0000010100E3BBD3849450FF9C27D050F2F0732C615EF069AC7EEBB95FB06DA226668D7E57A0B3CDF9F0A8A28F8502D4267FFCE8D0F2E27922D084636BC218ADF681B1B07119AB641B3EC82467D7D267E031D472A386222B4E5C8C0AB7711C200CA8BC579B74B0AEFFC38A4780B69A7DD92A9F101414CCC0B6A9182A012F65430E511B5D0AC21F9EB68DBF62C7CE0CA1A382984885A15CB2F127AF6D587C82E7258F0E54AAE4042A91D9F465F78CD6E584A8012A861C788CF0649C4F73EB68BC31D627B1B0CF79D7288BEA496636B4B770CD6763CABB9401B526D74DF99C6FF3108755C36A42333F7676C44A59C93EA36365630186E96381462D1D0FDDBFE8E96BBDE76335
Digest contains: E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855
As you can see, the buffer contents appear the same in both Python and Java, but the digests are obviously different. Can someone point out where I'm going wrong?
I suspect it has something to do with the strange way Python seems to store bytes - the variables raw_big_int and buff show as type str in the interpreter, and when printed out by themselves have that strange format with the '\x's that is almost the same as the bytes themselves in some places, but is utter gibberish in others. I don't have enough Python experience to understand exactly what's going on here, and my searches have turned up fruitless.
Also, since I'm trying to port the Python code into Java, I can't just change the Python - my goal is to write Java code that takes the same input and produces the same output. I've searched around (this question in particular seemed related) but didn't find anything to help me out. Thanks in advance, if for nothing else than for reading this long-winded question! :)
In Java, you've got the data in the buffer, but the cursor positions are all wrong. After you've written your data to the ByteBuffer it looks like this, where the x's represent your data and the 0's are unwritten bytes in the buffer:
xxxxxxxxxxxxxxxxxxxx00000000000000000000000000000000000000000
^ position ^ limit
The cursor is positioned after the data you've written. A read at this point will read from position to limit, which is the bytes you haven't written.
Instead, you want this:
xxxxxxxxxxxxxxxxxxxx00000000000000000000000000000000000000000
^ position ^ limit
where the position is 0 and the limit is the number of bytes you've written. To get there, call flip(). Flipping a buffer conceptually switches it from write mode to read mode. I say "conceptually" because ByteBuffers don't have explicit read and write modes, but you should think of them as if they do.
(The opposite operation is compact(), which goes back to read mode.)

unknown packets sent by Flash

I'm learning about Flash (AMF) and Java (BlazeDS) using the project I found on the internet, but I noticed that the server is receiving via socket the data below:
When I tried to use the Amf0Input/Amf3Input me to return the object, I get an error that does not recognize this type of package. Anyone know which library should I use to decode this message?
The packet you got seems to be a length prefixed AMF3 AmfObject.
In general, whenever you see a string that follows the usual naming convention of fully qualified class names (i.e. like reverse domains), chances are you're dealing with an object instance1.
Looking at the first few bytes, you see 0x00 repeated three times. If we assume AMF3, this would be 3 undefineds, followed by an object with type marker 0x3e - which does not exist. If we instead assume AMF0, we would first have a number (0x00 type marker, followed by 8 bytes of data), followed by an object with type marker 0x6d - which again does not exist.
Thus, the data you got there can't be AMF payload alone. However, if we interpret the first 4 bytes as network byte order (i.e. big endian) integer, we get 0x3E = 62 - which is exactly the length of the remaining data.
Assuming then that the first 4 bytes are just a length prefix, the next byte must be a type marker. In AMF3, 0x0a indicates an object instance. So lets just try to decode the remaining data (section 3.12 of the AMF3 spec, if you want to follow along2): the next byte must indicate the object traits. 0x23 means we have a direct encoding of the traits in that byte - as opposed to a reference to earlier submitted traits.
Since the fourth bit (counted from least significant first) is 0, the object is not dynamic - as in, an instance of some class, not just a plain object instance. The remaining bits, shifted to the right by 4, indicate the number of sealed properties this instance has, which is 2.
Next, we expect the classname, encoded as UTF-8-vr - i.e. length prefixed (when shifted right by 1), UTF-8 encoded string. The next byte is 0x1d, which means the length is 0x1d >> 1 = 14. The next 14 bytes encode common.net.APC, so that's the instance's class name.
Afterwards, we have the two sealed property names, also encoded as UTF-8-vr. The first one has a prefix of 0x15, so a length of 10 - giving us parameters, followed by a prefix of 0x19 (length 12) and payload functionName.
After this, you have the values corresponding to these sealed properties, in the same order. The first one has a type marker of 0x09, which corresponds to an array. The length marker is 0x03, which means the array contains one element, and the next byte is 0x01, indicating we have no associative members. The only element itself has a type marker of 0x04, meaning it's an integer - in this case with value 0.
This is followed by a type marker of 0x06 - a string, with length 14. That string - you probably have guessed it by now - is syncServerTime.
So, in summary, your packet is a length-prefixed instance of common.net.APC, with it's parameters attribute set to [0], and the functionName attribute set to "syncServerTime".
1: The only other alternatives are a vector of object instances - which would require a type marker of 0x10 somewhere - or an AMF0 packet. In the case of an AMF0 packet, you'd also have to have a URI-style path somewhere in the packet, which is not the case here.
2: Note that the EBNF given at the end of the section is not exactly correct - neither syntactically nor semantically ...

How to convert an int to a byte array (which is a local variable) guaranteeing the same result regardless of the endianness of the underlying hardware

I am writing a method which generates a hash from a collection of Objects. And I need to be sure that, given a particular set of inputs, the generated hash will be the same on all machines, as this hash value is used in a verification process in a distributed system.
This verification procees involves users generating a hash on their machine, sending that hash to a central authority, that central authority then regenerates the hash (using the same inputs as the user used) and verifying that the hash values match.
The method uses MessageDigest to generate the hash. In this method, we loop through each received object, updating the MessageDigest with the hashcode from each Object. Finally, once all objects have been processed, we return a hash from the MessageDigest.
My concern is the conversion of the int to the byte array. At the moment we are using the class ByteBuffer to perform this conversion. The question is: will all JVMs, regardless of whether they are running on a little-endian or a big-endian hardware, always generate the same byte array? or will the "endianness" of the hardware affect the byte array?
I have looked through the JVM spec, and it mentions big-endian in relation to how class data is stored. But it does not specifically mention local variables. So I am not sure if the endianness of local variables could affect the output of my method which generates the hash.
The class that I am writing looks like:
...
private final MessageDigest md;
...
public byte[] buildHashFromHashcodes(final Object... listOfObjects)
throws UnsupportedEncodingException {
byte[] bytes;
for (Object obj : listOfObjects) {
bytes = ByteBuffer.allocate(4).putInt(obj.hashCode()).array();
md.update(bytes);
}
return md.digest();
}
Thanks a lot!
The ByteBuffer.order() method allows you to get and to set the byte order used to store multi-byte values such as int or long into the buffer.
The initial byte order of the newly-created ByteBuffer is always big-endian regardless of JVM, OS or hardware.
regardless of the endianness of the underlying hardware
Then you are fine. To your question:
The question is: will all JVMs, regardless of whether they are running on a little-endian or a big-endian hardware, always generate the same byte array?
Yes. The JVM always uses big endian at the bytecode level, and you create a ByteBuffer which also uses big endian by default.
Only on optimization will the JIT use native and, therefore, potentially little endian code on relevant architectures; but the "user view" of the data you manipulate will never change.
So you are perfectly safe.
Do it manually.
public byte[] intToBytes(int i) {
return new byte[] {
(byte)(i >> 24),
(byte)(i >> 16),
(byte)(i >> 8),
(byte)i,
};
}
Edit: In the Java language, endianness does not exist. It only matters when you try to convert an int to a byte, and in that case, it's just how your conversion function works - not part of Java itself. Some of the standard library classes use such conversion functions, and therefore have endianness, though - like NIO buffers and DataOutputStream.

Categories

Resources