I'm just wondering about the performance of the Number class as opposed to, say using generics or even a whole lot of functions to handle primitive types.
The primitive types would clearly be the fastest option I would assume, however if the performance hit is not too huge, it would likely be easier for the coder to just use the Number class or generics rather than making a function that accepts and returns long, double (etc).
I am about to do a performance benchmark of the 3 options mentioned. Is there anything I should be aware of/try out when doing this, or even better, has someone done this before that they can give me results to?
Typically you use the Number class as opposed to primitive types because you need to use these values in collections or other classes that are based on Objects. If you are not restricted by this requirement, then you should use primitives.
Yes, there is a performance hit associated with using the Number class, in comparison with primitive types like int, long, etc. Especially if you are creating a lot of new Numbers, you will want to worry about the performance when compared with creating primitive types. But this is not necessarily the case for passing Numbers to methods. Passing an instance of Number to a method is no slower than passing an int or a long, since the compiler can basically pass a "pointer" to a memory location. This is very general information because your question is very general.
One thing you should be aware is that object allocation is likely to be the largest cost when you use Numbers. This affects your benchmarks as certain operations which use auto-boxing can use cached values (which don't create objects) and can give you much better performance results. e.g. if you use Integers between -128 and 127, you will get much better results than Doubles from -128 to 127 because the former uses caches values, the later does not.
In short, if you are micro-benchmarking the use of Numbers, you need to ensure the range of values you use are realistic, not all values are equal in terms of performance (of course for primitives this doesn't matter so much)
Related
We use a HashMap<Integer, SomeType>() with more than a million entries. I consider that large.
But integers are their own hash code. Couldn't we save memory with a, say, IntegerHashMap<Integer, SomeType>() that uses a special Map.Entry, using int directly instead of a pointer to an Integer object? In our case, that would save 1000000x the memory required for an Integer object.
Any faults in my line of thought? Too special to be of general interest? (at least, there is an EnumHashMap)
add1. The first generic parameter of IntegerHashMap is used to make it closely similar to the other Map implementations. It could be dropped, of course.
add2. The same should be possible with other maps and collections. For example ToIntegerHashMap<KeyType, Integer>, IntegerHashSet<Integer>, etc.
What you're looking for is a "Primitive collections" library. They are usually much better with memory usage and performance. One of the oldest/popular libraries was called "Trove". However, it is a bit outdated now. The main active libraries in use now are:
Goldman Sach Collections
Fast Util
Koloboke
See Benchmarks Here
Some words of caution:
"integers are their own hash code" I'd be very careful with this statement. Depending on the integers you have, the distribution of keys may be anything from optimal to terrible. Ideally, I'd design the map so that you can pass in a custom IntFunction as hashing strategy. You can still default this to (i) -> i if you want, but you probably want to introduce a modulo factor, or your internal array will be enormous. You may even want to use an IntBinaryOperator, where one param is the int and the other is the number of buckets.
I would drop the first generic param. You probably don't want to implement Map<Integer, SomeType>, because then you will have to box / unbox in all your methods, and you will lose all your optimizations (except space). Trying to make a primitive collection compatible with an object collection will make the whole exercise pointless.
I know you can use public fields, or some other workarounds. Or maybe you don't need them at all. But just out of curiosity why Sun leave structures out.
Here's a link that explains Sun's decision:
2.2.2 No More Structures or Unions
Java has no structures or unions as complex data types. You don't need structures and unions when you have classes; you can achieve the same effect simply by declaring a class with the appropriate instance variables.
Although Java can support arbitrarily many kinds of classes, the Runtime only supports a few variable types: int, long, float, double, and reference; additionally, the Runtime only recognizes a few object types: byte[], char[], short[], int[], long[], float[], double[], reference[], and non-array object. The system will record a class type for each reference variable or array instance, and the Runtime will perform certain checks like ensuring that a reference stored into an array is compatible with the array type, but such behaviors merely regard the types of objects as "data".
I disagree with the claim that the existence of classes eliminates the need for structures, since structures have semantics which are fundamentally different from class objects. On the other hand, from a Runtime-system design perspective, adding structures greatly complicates the type system. In the absence of structures, the type system only needs eight array types. Adding structures to the type system would require the type system to recognize an arbitrary number of distinct variable types and array types. Such recognition is useful, but Sun felt that it wasn't worth the complexity.
Given the constraints under which Java's Runtime and type system operate, I personally think it should have included a limited form of aggregate type. Much of this would be handled by the language compiler, but it would need a couple of features in the Runtime to really work well. Given a declaration
aggregate TimedNamedPoint
{ int x,y; long startTime; String name; }
a field declaration like TimedNamedPoint tpt; would create four variables: tpt.x, tpt.y of type int, tpt.startTime of type long, and tpt.name of type String. Declaring a parameter of that type would behave similarly.
For such types to be useful, the Runtime would need a couple of slight additions: it would be necessary to allow functions to leave multiple values on the stack when they return, rather than simply having a single return value of one the five main types. Additionally, it would be necessary to have a means of storing multiple kinds of things in an array. While that could be accomplished by having the creation of something declared as TimedNamedPoint[12] actually be an Object[4] which would be initialized to identify two instances of int[12], a long[12], and a String[12], it would be better to have a means by which code could construct a single array instance could hold 24 values of type int, 12 of type long, and 12 of type String.
Personally, I think that for things like Point, the semantics of a simple aggregate would be much cleaner than for a class. Further, the lack of aggregates often makes it impractical to have a method that can return more than one kind of information simultaneously. There are many situations where it would be possible to have a method simultaneously compute and report the sine and cosine of a passed-in angle with much less work than would be required to compute both separately, but having to constructing a SineAndCosineResult object instance would negate any speed advantage that could have been gained by doing so. The execution model wouldn't need to change much to allow a method to leave two floating-point values on the evaluation stack when it returns, but at present no such thing is supported.
I have a Tuple object that holds 3 primitives: Tuple(double, long, long). To avoid creating a huge amount of Tuple, I'm thinking using Trove library's primitive MAP, which would take two primitive as key and value. In my case, it would be Map<double, some primitive>.
My question: is it possible efficiently to encode the two long into a single primitive that I can store in the map, and later decode them?
is it possible efficiently to encode the two long into a single primitive
No, simply because longs are 64-bit, and no Java primitive is longer than that. You would need a 128-bit primitive to encode two longs into it.
It's right, you cannot pack two 64-bit primitives into another primitive, which is at most 64 bits of size. Both, double and long by standard are mapped by 64 binary digits.
The question is, whether you can impose some restrictions on the numbers you are dealing with. If you know, you will always have even numbers or uneven numbers or the first component will have integer range or you are dealing with multitudes of 1000, you can win some bits here.
Practically speaking, you will never make use of all
2^64 x 2^64 combinations
of pairs of long values.
On the other hand, it's no big deal to handle maps on pairs of values. That was the whole effort to make object-oriented languages like Java to not only deal with data types like struct in C, but also to bind methods to the data.
You can find good implementations of a Pair class in the web, e.g. angelikalanger.com. Or you can easily code an implementation yourself, especially, since you only need a pair of Long values.Also consider to use Pair<Double, Pair<Long, Long>> or implement a Tuple<M,N,T> class right away instead of a Map, i.e. key-value combination, following the outline of the Pair<M,N> implementation.
Finally, you could even employ an in-memory database like H2 to hold your Tuple(double, long, long) entries. It is enough to enclose it in your project as a Java library and configure it properly.
By the way, a 3-tuple is called a triple. Therefore, you could correctly call your class Triple(double, long, long) or better Triple(Double, Long, Long).
You could use Trove's double-Object map and encode the two longs into a BigInteger, but if your objective is to stay strictly with primitive types, that obviously isn't any help.
As Joonas says, there is no single primitive that will hold 128 bits. What might meet your need is to use an array to hold the two longs: Map<Double, long[]>. While Double and long[] are not strictly primitives that might suit. Remember that you cannot put double (small-d) into a Map as Maps can only contain reference types, not primitives.
Alternatively, how about Map(Double, Pair), where Pair is a small class to hold two longs? Most libraries have something like that lying around somewhere.
I am generating a large arrays(size>1000) with elements of int type, from a function. I need to pass this array to a generic type array but since the generic type array doesnt accept arrays of primitive type, I am unable to do so.
I fear to use the Integer type array since it will be costly, in terms of creation, performance, space used(an array of 12 byte objects) when doing so for a large size arrays. More it will create immutable Integer s when I need to perform some addition operations on the array elements.
What would be the best way to go with ?
EDIT Just to remove some confusions around, I need to pass int[] to a method of signature type: void setKeys(K... keys).
I want to pass an int[] to this function: public Query<K> setKeys(K... keys);
I assume that you mean that int[] should be the set of keys ... not just one key.
That is impossible. The type parameters of a generic type have to be reference types. Your use-case requires K to be a int.
You have two choices:
use Integer (or a mutable int holder class) and pay the performance penalty, or
forgo the use of generics and change the signature of that method.
Incidentally, the Integer class keeps a cache of Integer objects for small int values. If you create your objects using Integer.valueOf(int) there's a good chance that you will get a reference to an pre-existing object. (Of course, this only works because Integer objects are immutable.)
If your arrays are on the order of 1000 (or even 10,000 or 100,000) elements, the cost difference in terms of memory and performance probably wouldn't be noticeable unless you're processing the arrays thousands of times each. Write the code with Integer and optimize later if you have performance problems.
If you're that concerned about performance, you could write a simple class that wraps a public int, thus meaning you can make your call and still mutate it as needed. Having said that, I do agree that you want to make absolute sure you need this performance improvement before doing it.
If you actually do need to worry about the performance implications of boxing/unboxing integers, you could consider GNU Trove, specifically their TIntArrayList. It lets you mimic the functionality of an ArrayList<Integer> while being backed by primitives. That said, I'm not certain you need this, and I'm not certain this is exactly what you are looking for.
If you don't want the integers permanently boxed, you could pass in the result of Ints.asList() from the Google Collections library (http://guava-libraries.googlecode.com/svn/tags/release08/javadoc/com/google/common/primitives/Ints.html#asList(int...)), which would be a List<Integer> backed by the array. The values will get boxed as they're accessed, so this only makes sense if the values are not being accessed lots of times.
We have wrapper classes in java like Interger, Float.. why it is still supportng primitives which is stoppting java to be fully object oriented language?
Wrappers, being objects, get placed in the heap. Primitives are just "values" and go in the stack. This is more efficient, because for wrapped primitives in the heap you need (at least) both the value (which is in the stack) and the reference to the wrapper object.
Whether this performance gain matters at all depends on what you're doing. For heavy numerical work, definitely, but for 99 % of stuff out there, this is rather an annoyance. For one thing, you can't store primitives in a Collection anyway; they get autoboxed. So the only way to store lots of them is to use plain arrays, which in turn can lead to other kinds of inefficiencies (if you need to resize them, for instance).
Because primitives are lighter and more efficient in term of memory and CPU processing.
One word: Performance.
The wrapper types are also immutable, which makes it extra expensive if one wanted to use one for a loop counter, for example.
The JVM also has opcodes for directly doing arithmetics on primitives.