I've been writing a port of a networking library from Java and this is the last line of code I have yet to decipher and move on over. The line of code is as follows:
Float.floatToIntBits(Float);
Which returns an integer.
The code of floatToIntBits in Java
public static int floatToIntBits(float value) {
int result = floatToRawIntBits(value);
// Check for NaN based on values of bit fields, maximum
// exponent and nonzero significand.
if ( ((result & FloatConsts.EXP_BIT_MASK) ==
FloatConsts.EXP_BIT_MASK) &&
(result & FloatConsts.SIGNIF_BIT_MASK) != 0)
result = 0x7fc00000;
return result;
}
I'm not nearly experienced enough with memory and hex values to port this over myself, not to mention the bit shifting that's all over the place that's been driving me absolutely mad.
Take a look at the BitConverter class. For doubles it has methods DoubleToInt64Bits and Int64BitsToDouble. For floats you could do something like this:
float f = ...;
int i = BitConverter.ToInt32(BitConverter.GetBytes(f), 0);
Or changing endianness:
byte[] bytes = BitConverter.GetBytes(f);
Array.Reverse(bytes);
int i = BitConverter.ToInt32(bytes, 0);
If you can compile with unsafe, this becomes trivial:
public static unsafe uint FloatToUInt32Bits(float f) {
return *((uint*)&f);
}
Replace uint with int if you want to work with signed values, but I would say unsigned makes more sense. This is actually equivalent to Java's floatToRawIntBits(); floatToIntBits() is identical except that it always returns the same bitmask for all NaN values. If you want that functionality, you can just replicate that if statement from the Java version, but it's probably unnecesssary.
You'll need to switch on 'unsafe' support for your assembly, so it's up to you whether you want to go this route. It's not at all uncommon for high performance networking libraries to use unsafe code.
Related
Can someone please explain me why this code doesn‘t compile:
boolean r = (boolean) 0;
Why does this one compile?
double z = (float) 2.0_0+0___2;
I don‘t understand the Alphabet in which the numbers after float are written.
The first one doesn't compile because you simply can't cast a number to a boolean. A boolean is true or false.
The second one just uses underscores, which can be used to separate numbers like 2_000_000 for improved readability. In this case they're used to decrease readability, as is the cast to float (a double cast to float and assigned to double doesn't do anything in this particular case).
The latter case seems to be designed for confusion, as there are several pitfalls. If we remove the unnecessary underscores we get 2.00+02 which adds a literal double with an octal 02. This is still basically just 2+2, but if the octal value were 0___10 you'd get a result of z = 10. Then you have the cast to float which could affect the final result, as 64 bits are forced to 32 bits and then back to 64 bits. This could make the end result less precise than without the cast.
In some languages, like PHP or Javascript 0 is falsy, that is, not false, but evaluated as a boolean value, it will be false. In C, 0 is false. These are possible reasons for your expectation. However, in Java you cannot convert a number to a boolean. If you want to have a truey-ish evaluation, you can implement helper methods, like:
public class LooselyTyped {
public boolean toBoolean(int input) {
return input != 0;
}
public boolean toBoolean(Object input) {
return (input != null) && (!input.equals(""));
}
}
and then:
boolean lt = LooselyTyped.toBoolean(yourvariable);
In C I could the Endianess of the machine by the following method. how would I get using a python or Java program?. In Java, char is 2-bytes unlike C where it is 1-byte. I think it might not be possible with python since it is a dynamic language, but I could be wrong
bool isLittleEndian()
{
// 16 bit value, represented as 0x0100 on Intel, and 0x0001 else
short pattern = 0x0001;
// access first byte, will be 1 on Intel, and 0 else
return *(char*) &pattern == 0x01;
}
In Java, it's just
ByteOrder.nativeOrder();
...which returns either BIG_ENDIAN or LITTLE_ENDIAN.
http://docs.oracle.com/javase/6/docs/api/java/nio/ByteOrder.html
The (byte) cast of short is defined in both Java and C to return the "little" end of the short, regardless of the endianness of the processor. And >> is defined to shift from "big" end to "little" end (and << the opposite), regardless of the endianness of the processor. Likewise +, -, *, /, et al, are all defined to be independent of processor endianness.
So no sequence of operations in Java or C will detect endianness. What is required is some sort of "alias" of one size value on top of another (such as taking the address of an int and casting to char*), but Java does not have any way to do this.
import java.nio.ByteOrder;
public class Endian {
public static void main(String argv[]) {
ByteOrder b = ByteOrder.nativeOrder();
if (b.equals(ByteOrder.BIG_ENDIAN)) {
System.out.println("Big-endian");
} else {
System.out.println("Little-endian");
}
}
}
valter
You can use sys.byteorder:
>>> import sys
>>> print sys.byteorder
'little'
Or you can get endianness by yourself with a little help of the built-in struct module:
import struct
def is_little():
packed = struct.pack("i", 1)
return packed[0] == "\x01";
print is_little()
That was all Python of course.
We have an app that the Python module will write data to redis shards and the Java module will read data from redis shards, so I need to implement the exact same consistent hashing algorithm for Java and Python to make sure the data can be found.
I googled around and tried several implementations, but found the Java and Python implementations are always different, can't be used togather. Need your help.
Edit, online implementations I have tried:
Java: http://weblogs.java.net/blog/tomwhite/archive/2007/11/consistent_hash.html
Python: http://techspot.zzzeek.org/2012/07/07/the-absolutely-simplest-consistent-hashing-example/
http://amix.dk/blog/post/19367
Edit, attached Java (Google Guava lib used) and Python code I wrote. Code are based on the above articles.
import java.util.Collection;
import java.util.SortedMap;
import java.util.TreeMap;
import com.google.common.hash.HashFunction;
public class ConsistentHash<T> {
private final HashFunction hashFunction;
private final int numberOfReplicas;
private final SortedMap<Long, T> circle = new TreeMap<Long, T>();
public ConsistentHash(HashFunction hashFunction, int numberOfReplicas,
Collection<T> nodes) {
this.hashFunction = hashFunction;
this.numberOfReplicas = numberOfReplicas;
for (T node : nodes) {
add(node);
}
}
public void add(T node) {
for (int i = 0; i < numberOfReplicas; i++) {
circle.put(hashFunction.hashString(node.toString() + i).asLong(),
node);
}
}
public void remove(T node) {
for (int i = 0; i < numberOfReplicas; i++) {
circle.remove(hashFunction.hashString(node.toString() + i).asLong());
}
}
public T get(Object key) {
if (circle.isEmpty()) {
return null;
}
long hash = hashFunction.hashString(key.toString()).asLong();
if (!circle.containsKey(hash)) {
SortedMap<Long, T> tailMap = circle.tailMap(hash);
hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();
}
return circle.get(hash);
}
}
Test code:
ArrayList<String> al = new ArrayList<String>();
al.add("redis1");
al.add("redis2");
al.add("redis3");
al.add("redis4");
String[] userIds =
{"-84942321036308",
"-76029520310209",
"-68343931116147",
"-54921760962352"
};
HashFunction hf = Hashing.md5();
ConsistentHash<String> consistentHash = new ConsistentHash<String>(hf, 100, al);
for (String userId : userIds) {
System.out.println(consistentHash.get(userId));
}
Python code:
import bisect
import md5
class ConsistentHashRing(object):
"""Implement a consistent hashing ring."""
def __init__(self, replicas=100):
"""Create a new ConsistentHashRing.
:param replicas: number of replicas.
"""
self.replicas = replicas
self._keys = []
self._nodes = {}
def _hash(self, key):
"""Given a string key, return a hash value."""
return long(md5.md5(key).hexdigest(), 16)
def _repl_iterator(self, nodename):
"""Given a node name, return an iterable of replica hashes."""
return (self._hash("%s%s" % (nodename, i))
for i in xrange(self.replicas))
def __setitem__(self, nodename, node):
"""Add a node, given its name.
The given nodename is hashed
among the number of replicas.
"""
for hash_ in self._repl_iterator(nodename):
if hash_ in self._nodes:
raise ValueError("Node name %r is "
"already present" % nodename)
self._nodes[hash_] = node
bisect.insort(self._keys, hash_)
def __delitem__(self, nodename):
"""Remove a node, given its name."""
for hash_ in self._repl_iterator(nodename):
# will raise KeyError for nonexistent node name
del self._nodes[hash_]
index = bisect.bisect_left(self._keys, hash_)
del self._keys[index]
def __getitem__(self, key):
"""Return a node, given a key.
The node replica with a hash value nearest
but not less than that of the given
name is returned. If the hash of the
given name is greater than the greatest
hash, returns the lowest hashed node.
"""
hash_ = self._hash(key)
start = bisect.bisect(self._keys, hash_)
if start == len(self._keys):
start = 0
return self._nodes[self._keys[start]]
Test code:
import ConsistentHashRing
if __name__ == '__main__':
server_infos = ["redis1", "redis2", "redis3", "redis4"];
hash_ring = ConsistentHashRing()
test_keys = ["-84942321036308",
"-76029520310209",
"-68343931116147",
"-54921760962352",
"-53401599829545"
];
for server in server_infos:
hash_ring[server] = server
for key in test_keys:
print str(hash_ring[key])
You seem to be running into two issues simultaneously: encoding issues and representation issues.
Encoding issues come about particularly since you appear to be using Python 2 - Python 2's str type is not at all like Java's String type, and is actually more like a Java array of byte. But Java's String.getBytes() isn't guaranteed to give you a byte array with the same contents as a Python str (they probably use compatible encodings, but aren't guaranteed to - even if this fix doesn't change things, it's a good idea in general to avoid problems in the future).
So, the way around this is to use a Python type that behaves like Java's String, and convert the corresponding objects from both languages to bytes specifying the same encoding. From the Python side, this means you want to use the unicode type, which is the default string literal type if you are using Python 3, or put this near the top of your .py file:
from __future__ import unicode_literals
If neither of those is an option, specify your string literals this way:
u'text'
The u at the front forces it to unicode. This can then be converted to bytes using its encode method, which takes (unsurprisingly) an encoding:
u'text'.encode('utf-8')
From the Java side, there is an overloaded version of String.getBytes that takes an encoding - but it takes it as a java.nio.Charset rather than a string - so, you'll want to do:
"text".getBytes(java.nio.charset.Charset.forName("UTF-8"))
These will give you equivalent sequences of bytes in both languages, so that the hashes have the same input and will give you the same answer.
The other issue you may have is representation, depending on which hash function you use. Python's hashlib (which is the preferred implementation of md5 and other cryptographic hashes since Python 2.5) is exactly compatible with Java's MessageDigest in this - they both give bytes, so their output should be equivalent.
Python's zlib.crc32 and Java's java.util.zip.CRC32, on the other hand, both give numeric results - but Java's is always an unsigned 64 bit number, while Python's (in Python 2) is a signed 32 bit number (in Python 3, its now an unsigned 32-bit number, so this problem goes away). To convert a signed result to an unsigned one, do: result & 0xffffffff, and the result should be comparable to the Java one.
According to this analysis of hash functions:
Murmur2, Meiyan, SBox, and CRC32 provide good performance for all kinds of keys. They can be recommended as general-purpose hashing functions on x86.
Hardware-accelerated CRC (labeled iSCSI CRC in the table) is the fastest hash function on the recent Core i5/i7 processors. However, the CRC32 instruction is not supported by AMD and earlier Intel processors.
Python has zlib.crc32 and Java has a CRC32 class. Since it's a standard algorithm, you should get the same result in both languages.
MurmurHash 3 is available in Google Guava (a very useful Java library) and in pyfasthash for Python.
Note that these aren't cryptographic hash functions, so they're fast but don't provide the same guarantees. If these hashes are important for security, use a cryptographic hash.
Differnt language implementations of a hashing algorithm does not make the hash value different. The SHA-1 hash whether generated in java or python will be the same.
I'm not familiar with Redis, but the Python example appears to be hashing keys, so I'm assuming we're talking about some sort of HashMap implementation.
Your python example appears to be using MD5 hashes, which will be the same in both Java and Python.
Here is an example of MD5 hashing in Java:
http://www.dzone.com/snippets/get-md5-hash-few-lines-java
And in Python:
http://docs.python.org/library/md5.html
Now, you may want to find a faster hashing algorithm. MD5 is focused on cryptographic security, which isn't really needed in this case.
Here is a simple hashing function that produces the same result on both python and java for your keys:
Python
def hash(key):
h = 0
for c in key:
h = ((h*37) + ord(c)) & 0xFFFFFFFF
return h;
Java
public static int hash(String key) {
int h = 0;
for (char c : key.toCharArray())
h = (h * 37 + c) & 0xFFFFFFFF;
return h;
}
You don't need a cryptographically secure hash for this. That's just overkill.
Let's get this straight: the same binary input to the same hash function (SHA-1, MD5, ...) in different environments/implementations (Python, Java, ...) will yield the same binary output. That's because these hash functions are implemented according to standards.
Hence, you will discover the sources of the problem(s) you experience when answering these questions:
do you provide the same binary input to both hash functions (e.g. MD5 in Python and Java)?
do you interpret the binary output of both hash functions (e.g. MD5 in Python and Java) equivalently?
#lvc's answer provides much more detail on these questions.
For the java version, I would recommend using MD5 which generates 128bit string result and it can then be converted into BigInteger (Integer and Long are not enough to hold 128bit data).
Sample code here:
private static class HashFunc {
static MessageDigest md5;
static {
try {
md5 = MessageDigest.getInstance("MD5");
} catch (NoSuchAlgorithmException e) {
//
}
}
public synchronized int hash(String s) {
md5.update(StandardCharsets.UTF_8.encode(s));
return new BigInteger(1, md5.digest()).intValue();
}
}
Note that:
The java.math.BigInteger.intValue() converts this BigInteger to an int. This conversion is analogous to a narrowing primitive conversion from long to int. If this BigInteger is too big to fit in an int, only the low-order 32 bits are returned. This conversion can lose information about the overall magnitude of the BigInteger value as well as return a result with the opposite sign.
How to convert a long number in base 10 to base 9 without converting to string ?
FWIW, all values are actually in base 2 inside your machine (I bet you already knew that). It only shows up as base 10 because string conversion creates string representations in base 10 (e.g. when you print), because methods like parseLong assumes the input string is in base 10 and because the compiler expects all literals to be in base 10 when you actually write code. In other words, everything is in binary, the computer only converts stuff into and from base 10 for the convenience of us humans.
It follows that we should be easily able to change the output base to be something other than 10, and hence get string representations for the same value in base 9. In Java this is done by passing an optional extra base parameter into the Long.toString method.
long x=10;
System.out.println(Long.toString(x,9));
Long base10 = 10;
Long.valueOf(base10.toString(), 9);
What does "convert to base 9 without converting to string" actually mean?
Base-9, base-10, base-2 (binary), base-16 (hexadecimal), are just ways to represent numbers. The value itself does not depend on how you represent it. int x = 256 is exactly the same as int x = 0xff as far as the compiler is concerned.
If you don't want to "convert to string" (I read this as meaning you are not concerned with the representation of the value), then what do you want to do exactly?
You can't convert to base 9 without converting to string.
When you write
Long a = 123;
you're making the implicit assumption that it's in base 10. If you want to interpret that as a base 9 number that's fine, but there's no way Java (or any other language I know of) is suddenly going to see it that way and so 8+1 will return 9 and not 10. There's native support for base 2, 8, 16 and 10 but for any other base you'll have to treat it as a string. (And then, if you're sure you want this, convert it back to a long)
You have to apply the algorithm that converts number from one base to another by applying repeated modulo operations. Look here for a Java implementation. I report here the code found on that site. The variable M must contain the number to be converted, and N is the new base.
Caveat: for the snippet to work properly, N>=1 && N<=10 must be true. The extension with N>10 is left to the interested reader (you have to use letters instead of digits).
String Conversion(int M, int N) // return string, accept two integers
{
Stack stack = new Stack(); // create a stack
while (M >= N) // now the repetitive loop is clearly seen
{
stack.push(M mod N); // store a digit
M = M/N; // find new M
}
// now it's time to collect the digits together
String str = new String(""+M); // create a string with a single digit M
while (stack.NotEmpty())
str = str+stack.pop() // get from the stack next digit
return str;
}
If you LITERALLY can do anything but convert to string do the following:
public static long toBase(long num, int base) {
long result;
StringBuilder buffer = new StringBuilder();
buffer.append(Long.toString(num, base));
return Long.parseLong(buffer.toString());
}
I'm currently looking at a simple programming problem that might be fun to optimize - at least for anybody who believes that programming is art :) So here is it:
How to best represent long's as Strings while keeping their natural order?
Additionally, the String representation should match ^[A-Za-z0-9]+$. (I'm not too strict here, but avoid using control characters or anything that might cause headaches with encodings, is illegal in XML, has line breaks, or similar characters that will certainly cause problems)
Here's a JUnit test case:
#Test
public void longConversion() {
final long[] longs = { Long.MIN_VALUE, Long.MAX_VALUE, -5664572164553633853L,
-8089688774612278460L, 7275969614015446693L, 6698053890185294393L,
734107703014507538L, -350843201400906614L, -4760869192643699168L,
-2113787362183747885L, -5933876587372268970L, -7214749093842310327L, };
// keep it reproducible
//Collections.shuffle(Arrays.asList(longs));
final String[] strings = new String[longs.length];
for (int i = 0; i < longs.length; i++) {
strings[i] = Converter.convertLong(longs[i]);
}
// Note: Comparator is not an option
Arrays.sort(longs);
Arrays.sort(strings);
final Pattern allowed = Pattern.compile("^[A-Za-z0-9]+$");
for (int i = 0; i < longs.length; i++) {
assertTrue("string: " + strings[i], allowed.matcher(strings[i]).matches());
assertEquals("string: " + strings[i], longs[i], Converter.parseLong(strings[i]));
}
}
and here are the methods I'm looking for
public static class Converter {
public static String convertLong(final long value) {
// TODO
}
public static long parseLong(final String value) {
// TODO
}
}
I already have some ideas on how to approach this problem. Still, I though I might get some nice (creative) suggestions from the community.
Additionally, it would be nice if this conversion would be
as short as possible
easy to implement in other languages
EDIT: I'm quite glad to see that two very reputable programmers ran into the same problem as I did: using '-' for negative numbers can't work as the '-' doesn't reverse the order of sorting:
-0001
-0002
0000
0001
0002
Ok, take two:
class Converter {
public static String convertLong(final long value) {
return String.format("%016x", value - Long.MIN_VALUE);
}
public static long parseLong(final String value) {
String first = value.substring(0, 8);
String second = value.substring(8);
long temp = (Long.parseLong(first, 16) << 32) | Long.parseLong(second, 16);
return temp + Long.MIN_VALUE;
}
}
This one takes a little explanation. Firstly, let me demonstrate that it is reversible and the resultant conversions should demonstrate the ordering:
for (long aLong : longs) {
String out = Converter.convertLong(aLong);
System.out.printf("%20d %16s %20d\n", aLong, out, Converter.parseLong(out));
}
Output:
-9223372036854775808 0000000000000000 -9223372036854775808
9223372036854775807 ffffffffffffffff 9223372036854775807
-5664572164553633853 316365a0e7370fc3 -5664572164553633853
-8089688774612278460 0fbba6eba5c52344 -8089688774612278460
7275969614015446693 e4f96fd06fed3ea5 7275969614015446693
6698053890185294393 dcf444867aeaf239 6698053890185294393
734107703014507538 8a301311010ec412 734107703014507538
-350843201400906614 7b218df798a35c8a -350843201400906614
-4760869192643699168 3dedfeb1865f1e20 -4760869192643699168
-2113787362183747885 62aa5197ea53e6d3 -2113787362183747885
-5933876587372268970 2da6a2aeccab3256 -5933876587372268970
-7214749093842310327 1be00fecadf52b49 -7214749093842310327
As you can see Long.MIN_VALUE and Long.MAX_VALUE (the first two rows) are correct and the other values basically fall in line.
What is this doing?
Assuming signed byte values you have:
-128 => 0x80
-1 => 0xFF
0 => 0x00
1 => 0x01
127 => 0x7F
Now if you add 0x80 to those values you get:
-128 => 0x00
-1 => 0x7F
0 => 0x80
1 => 0x81
127 => 0xFF
which is the correct order (with overflow).
Basically the above is doing that with 64 bit signed longs instead of 8 bit signed bytes.
The conversion back is a little more roundabout. You might think you can use:
return Long.parseLong(value, 16);
but you can't. Pass in 16 f's to that function (-1) and it will throw an exception. It seems to be treating that as an unsigned hex value, which long cannot accommodate. So instead I split it in half and parse each piece, combining them together, left-shifting the first half by 32 bits.
EDIT: Okay, so just adding the negative sign for negative numbers doesn't work... but you could convert the value into an effectively "unsigned" long such that Long.MIN_VALUE maps to "0000000000000000", and Long.MAX_VALUE maps to "FFFFFFFFFFFFFFFF". Harder to read, but will get the right results.
Basically you just need to add 2^63 to the value before turning it into hex - but that may be a slight pain to do in Java due to it not having unsigned longs... it may be easiest to do using BigInteger:
private static final BigInteger OFFSET = BigInteger.valueOf(Long.MIN_VALUE)
.negate();
public static String convertLong(long value) {
BigInteger afterOffset = BigInteger.valueOf(value).add(OFFSET);
return String.format("%016x", afterOffset);
}
public static long parseLong(String text) {
BigInteger beforeOffset = new BigInteger(text, 16);
return beforeOffset.subtract(OFFSET).longValue();
}
That wouldn't be terribly efficient, admittedly, but it works with all your test cases.
If you don't need a printable String, you can encode the long in four chars after you've shifted the value by Long.MIN_VALUE (-0x80000000) to emulate an unsigned long:
public static String convertLong(long value) {
value += Long.MIN_VALUE;
return "" +
(char)(value>>48) + (char)(value>>32) +
(char)(value>>16) + (char)value;
}
public static long parseLong(String value) {
return (
(((long)value.charAt(0))<<48) +
(((long)value.charAt(1))<<32) +
(((long)value.charAt(2))<<16) +
(long)value.charAt(3)) + Long.MIN_VALUE;
}
Usage of surrogate pairs is not a problem, since the natural order of a string is defined by the UTF-16 values in its chars and not by the UCS-2 codepoint values.
There's a technique in RFC2550 -- an April 1st joke RFC about the Y10K problem with 4-digit dates -- that could be applied to this purpose. Essentially, each time the integer's string representation grows to require another digit, another letter or other (printable) character is prepended to retain desired sort-order. The negative rules are more arcane, yielding strings that are harder to read at a glance... but still easy enough to apply in code.
Nicely, for positive numbers, they're still readable.
See:
http://www.faqs.org/rfcs/rfc2550.html