I have a Tuple object that holds 3 primitives: Tuple(double, long, long). To avoid creating a huge amount of Tuple, I'm thinking using Trove library's primitive MAP, which would take two primitive as key and value. In my case, it would be Map<double, some primitive>.
My question: is it possible efficiently to encode the two long into a single primitive that I can store in the map, and later decode them?
is it possible efficiently to encode the two long into a single primitive
No, simply because longs are 64-bit, and no Java primitive is longer than that. You would need a 128-bit primitive to encode two longs into it.
It's right, you cannot pack two 64-bit primitives into another primitive, which is at most 64 bits of size. Both, double and long by standard are mapped by 64 binary digits.
The question is, whether you can impose some restrictions on the numbers you are dealing with. If you know, you will always have even numbers or uneven numbers or the first component will have integer range or you are dealing with multitudes of 1000, you can win some bits here.
Practically speaking, you will never make use of all
2^64 x 2^64 combinations
of pairs of long values.
On the other hand, it's no big deal to handle maps on pairs of values. That was the whole effort to make object-oriented languages like Java to not only deal with data types like struct in C, but also to bind methods to the data.
You can find good implementations of a Pair class in the web, e.g. angelikalanger.com. Or you can easily code an implementation yourself, especially, since you only need a pair of Long values.Also consider to use Pair<Double, Pair<Long, Long>> or implement a Tuple<M,N,T> class right away instead of a Map, i.e. key-value combination, following the outline of the Pair<M,N> implementation.
Finally, you could even employ an in-memory database like H2 to hold your Tuple(double, long, long) entries. It is enough to enclose it in your project as a Java library and configure it properly.
By the way, a 3-tuple is called a triple. Therefore, you could correctly call your class Triple(double, long, long) or better Triple(Double, Long, Long).
You could use Trove's double-Object map and encode the two longs into a BigInteger, but if your objective is to stay strictly with primitive types, that obviously isn't any help.
As Joonas says, there is no single primitive that will hold 128 bits. What might meet your need is to use an array to hold the two longs: Map<Double, long[]>. While Double and long[] are not strictly primitives that might suit. Remember that you cannot put double (small-d) into a Map as Maps can only contain reference types, not primitives.
Alternatively, how about Map(Double, Pair), where Pair is a small class to hold two longs? Most libraries have something like that lying around somewhere.
Related
In my project I have seen many places where constants like INTEGER_ONE, INTEGER_ZERO are used. What is the purpose of using like this? We use constants to change the value in one place that automatically reflects everywhere for particular case, but using constants like INTEGER_ONE is same like using 1, it is common value irrespective of the case they will be using in all places. When we need to change the value of one from the place we obviously need to visit there and change that to another value like INTEGER_N. So why can't we use the numbers directly?
If you go through the documentation, you will find that, INTEGER_ONE is stated as:
Reusable Integer constant for one.
And, the actual declaration of INTEGER_ONE is:
public static final Integer INTEGER_ONE = Integer.valueOf(1);
In Java, values between -128 and 127 are kept in IntegerCache for reuse. So, if you call Integer.valueOf() method with parameter from the specified range, in that case, the same object will be returned all the time. As object creation is expensive, this is one kind of optimization, if you have to use Integer.
But, if you are using primitive types, like int, then there is no need to use INTEGER_ONE, you can directly use 1.
There is a style rule of not using magic constants. Like 7 - days of week, DAYS_IN_WEEK? However INTEGER_ONE is for any kind of usage for 1 is IMHO even worse.
There is a use case: constant object, to share. BigDecimal.ZERO is such a case.
Should INTEGER_ONE not be an int but the wrapper object Integer then you can share the same object, and need not create a plethora of Integer objects with the same value.
However Integer.valueOf(n) would also give the same object for values between -128 and 127. In general one should not work with the wrapper classes; Integer here, when using the int values. However collections like List<Integer> use Integers.
We use a HashMap<Integer, SomeType>() with more than a million entries. I consider that large.
But integers are their own hash code. Couldn't we save memory with a, say, IntegerHashMap<Integer, SomeType>() that uses a special Map.Entry, using int directly instead of a pointer to an Integer object? In our case, that would save 1000000x the memory required for an Integer object.
Any faults in my line of thought? Too special to be of general interest? (at least, there is an EnumHashMap)
add1. The first generic parameter of IntegerHashMap is used to make it closely similar to the other Map implementations. It could be dropped, of course.
add2. The same should be possible with other maps and collections. For example ToIntegerHashMap<KeyType, Integer>, IntegerHashSet<Integer>, etc.
What you're looking for is a "Primitive collections" library. They are usually much better with memory usage and performance. One of the oldest/popular libraries was called "Trove". However, it is a bit outdated now. The main active libraries in use now are:
Goldman Sach Collections
Fast Util
Koloboke
See Benchmarks Here
Some words of caution:
"integers are their own hash code" I'd be very careful with this statement. Depending on the integers you have, the distribution of keys may be anything from optimal to terrible. Ideally, I'd design the map so that you can pass in a custom IntFunction as hashing strategy. You can still default this to (i) -> i if you want, but you probably want to introduce a modulo factor, or your internal array will be enormous. You may even want to use an IntBinaryOperator, where one param is the int and the other is the number of buckets.
I would drop the first generic param. You probably don't want to implement Map<Integer, SomeType>, because then you will have to box / unbox in all your methods, and you will lose all your optimizations (except space). Trying to make a primitive collection compatible with an object collection will make the whole exercise pointless.
I work with messages that contain a few attributes and an array of a thousand floating point values (double[]). When the messages are serialized with protocol buffers, thanks to the "packed=true" directive, the double values are aligned and stored compactly in the messages.
But by default the Java classes generated for that message represent the double array as an array list (!), boxing primitive double values into objects, scattering those objects in memory, while at the end I need the double[] representation for further aggregations...
Is there an option to generate classes that handle repeated primitive values as Java primitive arrays?
As explained here what is needed is versions of ArrayList which store unboxed values. Since java generics works only with objects(boxed types), an implementation should be needed for each primitive type. So you can use the one provided by Apache Commons Primitives.
After discussing this topic in several places, the answer is a clear no.
With protocol buffers the binary representation for vectors of numbers is efficient. But it is currently not possible with the Java implementation to efficiently deserialize those vectors (instead of primitive arrays you get collections of boxed numbers...)
I'm just wondering about the performance of the Number class as opposed to, say using generics or even a whole lot of functions to handle primitive types.
The primitive types would clearly be the fastest option I would assume, however if the performance hit is not too huge, it would likely be easier for the coder to just use the Number class or generics rather than making a function that accepts and returns long, double (etc).
I am about to do a performance benchmark of the 3 options mentioned. Is there anything I should be aware of/try out when doing this, or even better, has someone done this before that they can give me results to?
Typically you use the Number class as opposed to primitive types because you need to use these values in collections or other classes that are based on Objects. If you are not restricted by this requirement, then you should use primitives.
Yes, there is a performance hit associated with using the Number class, in comparison with primitive types like int, long, etc. Especially if you are creating a lot of new Numbers, you will want to worry about the performance when compared with creating primitive types. But this is not necessarily the case for passing Numbers to methods. Passing an instance of Number to a method is no slower than passing an int or a long, since the compiler can basically pass a "pointer" to a memory location. This is very general information because your question is very general.
One thing you should be aware is that object allocation is likely to be the largest cost when you use Numbers. This affects your benchmarks as certain operations which use auto-boxing can use cached values (which don't create objects) and can give you much better performance results. e.g. if you use Integers between -128 and 127, you will get much better results than Doubles from -128 to 127 because the former uses caches values, the later does not.
In short, if you are micro-benchmarking the use of Numbers, you need to ensure the range of values you use are realistic, not all values are equal in terms of performance (of course for primitives this doesn't matter so much)
I am generating a large arrays(size>1000) with elements of int type, from a function. I need to pass this array to a generic type array but since the generic type array doesnt accept arrays of primitive type, I am unable to do so.
I fear to use the Integer type array since it will be costly, in terms of creation, performance, space used(an array of 12 byte objects) when doing so for a large size arrays. More it will create immutable Integer s when I need to perform some addition operations on the array elements.
What would be the best way to go with ?
EDIT Just to remove some confusions around, I need to pass int[] to a method of signature type: void setKeys(K... keys).
I want to pass an int[] to this function: public Query<K> setKeys(K... keys);
I assume that you mean that int[] should be the set of keys ... not just one key.
That is impossible. The type parameters of a generic type have to be reference types. Your use-case requires K to be a int.
You have two choices:
use Integer (or a mutable int holder class) and pay the performance penalty, or
forgo the use of generics and change the signature of that method.
Incidentally, the Integer class keeps a cache of Integer objects for small int values. If you create your objects using Integer.valueOf(int) there's a good chance that you will get a reference to an pre-existing object. (Of course, this only works because Integer objects are immutable.)
If your arrays are on the order of 1000 (or even 10,000 or 100,000) elements, the cost difference in terms of memory and performance probably wouldn't be noticeable unless you're processing the arrays thousands of times each. Write the code with Integer and optimize later if you have performance problems.
If you're that concerned about performance, you could write a simple class that wraps a public int, thus meaning you can make your call and still mutate it as needed. Having said that, I do agree that you want to make absolute sure you need this performance improvement before doing it.
If you actually do need to worry about the performance implications of boxing/unboxing integers, you could consider GNU Trove, specifically their TIntArrayList. It lets you mimic the functionality of an ArrayList<Integer> while being backed by primitives. That said, I'm not certain you need this, and I'm not certain this is exactly what you are looking for.
If you don't want the integers permanently boxed, you could pass in the result of Ints.asList() from the Google Collections library (http://guava-libraries.googlecode.com/svn/tags/release08/javadoc/com/google/common/primitives/Ints.html#asList(int...)), which would be a List<Integer> backed by the array. The values will get boxed as they're accessed, so this only makes sense if the values are not being accessed lots of times.