Is there a mutable reduction operation(Collector) for Stream<BigDecimal>? - java

AFAIK the only way to sum a stream of BigDecimal is:
BigDecimal result = stream.reduce(BigDecimal.ZERO, BigDecimal::add);
The problem here is that every call to BigDecimal::add will create a new BigDecimal as opposed to changing a mutable type.
Is there a mutable reduction operation aka Collector for Stream<BigDecimal>?

BigDecimal: "Immutable, arbitrary-precision signed decimal numbers."
Since its immutable there is no method to manipulate them without creating new objects. Any method that can do that would break guarantees of the class (like BigDecimal.ZERO being 0)

Well, there is no public mutable BigDecimal companion class, so there is no Collector using it. But you should not worry about the performance implication of the instance creation unless a profiling tool tells you that there is a problem.
Modern JVMs like HotSpot are usually good at dealing with temporary objects created in a hot loop. Even if they are not able to elide the allocation, the allocation costs are not so big. This is different to, e.g. String::concat where the instance creation costs do not only include the allocation, but copying the entire contents of the previously created String instances, yielding a quadratic time complexity of such reduction (unless the optimizer manages to rewrite such code). The same would apply to attempts to produce a Collection via pure (immutable) reduction.
This might be contradicting to the existence of primitive type specializations like IntStream, LongStream and DoubleStream, but that’s a trade-off. Generally, the preference of the JRE developers is towards improving the JVM performance (for the benefit of all value types) rather than adding a mutable helper class for every immutable class. There might be a continuity of special support for primitive types until the arrival of full value type support, but don’t expect the addition of new public mutable companion classes for immutable types (unless we’re talking about construction costs like in the String example).

Related

Improve memory usage: IntegerHashMap

We use a HashMap<Integer, SomeType>() with more than a million entries. I consider that large.
But integers are their own hash code. Couldn't we save memory with a, say, IntegerHashMap<Integer, SomeType>() that uses a special Map.Entry, using int directly instead of a pointer to an Integer object? In our case, that would save 1000000x the memory required for an Integer object.
Any faults in my line of thought? Too special to be of general interest? (at least, there is an EnumHashMap)
add1. The first generic parameter of IntegerHashMap is used to make it closely similar to the other Map implementations. It could be dropped, of course.
add2. The same should be possible with other maps and collections. For example ToIntegerHashMap<KeyType, Integer>, IntegerHashSet<Integer>, etc.
What you're looking for is a "Primitive collections" library. They are usually much better with memory usage and performance. One of the oldest/popular libraries was called "Trove". However, it is a bit outdated now. The main active libraries in use now are:
Goldman Sach Collections
Fast Util
Koloboke
See Benchmarks Here
Some words of caution:
"integers are their own hash code" I'd be very careful with this statement. Depending on the integers you have, the distribution of keys may be anything from optimal to terrible. Ideally, I'd design the map so that you can pass in a custom IntFunction as hashing strategy. You can still default this to (i) -> i if you want, but you probably want to introduce a modulo factor, or your internal array will be enormous. You may even want to use an IntBinaryOperator, where one param is the int and the other is the number of buckets.
I would drop the first generic param. You probably don't want to implement Map<Integer, SomeType>, because then you will have to box / unbox in all your methods, and you will lose all your optimizations (except space). Trying to make a primitive collection compatible with an object collection will make the whole exercise pointless.

Why Java doesn't support structures ? (Just out of curiosity)

I know you can use public fields, or some other workarounds. Or maybe you don't need them at all. But just out of curiosity why Sun leave structures out.
Here's a link that explains Sun's decision:
2.2.2 No More Structures or Unions
Java has no structures or unions as complex data types. You don't need structures and unions when you have classes; you can achieve the same effect simply by declaring a class with the appropriate instance variables.
Although Java can support arbitrarily many kinds of classes, the Runtime only supports a few variable types: int, long, float, double, and reference; additionally, the Runtime only recognizes a few object types: byte[], char[], short[], int[], long[], float[], double[], reference[], and non-array object. The system will record a class type for each reference variable or array instance, and the Runtime will perform certain checks like ensuring that a reference stored into an array is compatible with the array type, but such behaviors merely regard the types of objects as "data".
I disagree with the claim that the existence of classes eliminates the need for structures, since structures have semantics which are fundamentally different from class objects. On the other hand, from a Runtime-system design perspective, adding structures greatly complicates the type system. In the absence of structures, the type system only needs eight array types. Adding structures to the type system would require the type system to recognize an arbitrary number of distinct variable types and array types. Such recognition is useful, but Sun felt that it wasn't worth the complexity.
Given the constraints under which Java's Runtime and type system operate, I personally think it should have included a limited form of aggregate type. Much of this would be handled by the language compiler, but it would need a couple of features in the Runtime to really work well. Given a declaration
aggregate TimedNamedPoint
{ int x,y; long startTime; String name; }
a field declaration like TimedNamedPoint tpt; would create four variables: tpt.x, tpt.y of type int, tpt.startTime of type long, and tpt.name of type String. Declaring a parameter of that type would behave similarly.
For such types to be useful, the Runtime would need a couple of slight additions: it would be necessary to allow functions to leave multiple values on the stack when they return, rather than simply having a single return value of one the five main types. Additionally, it would be necessary to have a means of storing multiple kinds of things in an array. While that could be accomplished by having the creation of something declared as TimedNamedPoint[12] actually be an Object[4] which would be initialized to identify two instances of int[12], a long[12], and a String[12], it would be better to have a means by which code could construct a single array instance could hold 24 values of type int, 12 of type long, and 12 of type String.
Personally, I think that for things like Point, the semantics of a simple aggregate would be much cleaner than for a class. Further, the lack of aggregates often makes it impractical to have a method that can return more than one kind of information simultaneously. There are many situations where it would be possible to have a method simultaneously compute and report the sine and cosine of a passed-in angle with much less work than would be required to compute both separately, but having to constructing a SineAndCosineResult object instance would negate any speed advantage that could have been gained by doing so. The execution model wouldn't need to change much to allow a method to leave two floating-point values on the evaluation stack when it returns, but at present no such thing is supported.

Performance of the Number class

I'm just wondering about the performance of the Number class as opposed to, say using generics or even a whole lot of functions to handle primitive types.
The primitive types would clearly be the fastest option I would assume, however if the performance hit is not too huge, it would likely be easier for the coder to just use the Number class or generics rather than making a function that accepts and returns long, double (etc).
I am about to do a performance benchmark of the 3 options mentioned. Is there anything I should be aware of/try out when doing this, or even better, has someone done this before that they can give me results to?
Typically you use the Number class as opposed to primitive types because you need to use these values in collections or other classes that are based on Objects. If you are not restricted by this requirement, then you should use primitives.
Yes, there is a performance hit associated with using the Number class, in comparison with primitive types like int, long, etc. Especially if you are creating a lot of new Numbers, you will want to worry about the performance when compared with creating primitive types. But this is not necessarily the case for passing Numbers to methods. Passing an instance of Number to a method is no slower than passing an int or a long, since the compiler can basically pass a "pointer" to a memory location. This is very general information because your question is very general.
One thing you should be aware is that object allocation is likely to be the largest cost when you use Numbers. This affects your benchmarks as certain operations which use auto-boxing can use cached values (which don't create objects) and can give you much better performance results. e.g. if you use Integers between -128 and 127, you will get much better results than Doubles from -128 to 127 because the former uses caches values, the later does not.
In short, if you are micro-benchmarking the use of Numbers, you need to ensure the range of values you use are realistic, not all values are equal in terms of performance (of course for primitives this doesn't matter so much)

Large array of 'int' type needs to be passed to a generic array & collections

I am generating a large arrays(size>1000) with elements of int type, from a function. I need to pass this array to a generic type array but since the generic type array doesnt accept arrays of primitive type, I am unable to do so.
I fear to use the Integer type array since it will be costly, in terms of creation, performance, space used(an array of 12 byte objects) when doing so for a large size arrays. More it will create immutable Integer s when I need to perform some addition operations on the array elements.
What would be the best way to go with ?
EDIT Just to remove some confusions around, I need to pass int[] to a method of signature type: void setKeys(K... keys).
I want to pass an int[] to this function: public Query<K> setKeys(K... keys);
I assume that you mean that int[] should be the set of keys ... not just one key.
That is impossible. The type parameters of a generic type have to be reference types. Your use-case requires K to be a int.
You have two choices:
use Integer (or a mutable int holder class) and pay the performance penalty, or
forgo the use of generics and change the signature of that method.
Incidentally, the Integer class keeps a cache of Integer objects for small int values. If you create your objects using Integer.valueOf(int) there's a good chance that you will get a reference to an pre-existing object. (Of course, this only works because Integer objects are immutable.)
If your arrays are on the order of 1000 (or even 10,000 or 100,000) elements, the cost difference in terms of memory and performance probably wouldn't be noticeable unless you're processing the arrays thousands of times each. Write the code with Integer and optimize later if you have performance problems.
If you're that concerned about performance, you could write a simple class that wraps a public int, thus meaning you can make your call and still mutate it as needed. Having said that, I do agree that you want to make absolute sure you need this performance improvement before doing it.
If you actually do need to worry about the performance implications of boxing/unboxing integers, you could consider GNU Trove, specifically their TIntArrayList. It lets you mimic the functionality of an ArrayList<Integer> while being backed by primitives. That said, I'm not certain you need this, and I'm not certain this is exactly what you are looking for.
If you don't want the integers permanently boxed, you could pass in the result of Ints.asList() from the Google Collections library (http://guava-libraries.googlecode.com/svn/tags/release08/javadoc/com/google/common/primitives/Ints.html#asList(int...)), which would be a List<Integer> backed by the array. The values will get boxed as they're accessed, so this only makes sense if the values are not being accessed lots of times.

When we have wrappers classes, why primitives are supported?

We have wrapper classes in java like Interger, Float.. why it is still supportng primitives which is stoppting java to be fully object oriented language?
Wrappers, being objects, get placed in the heap. Primitives are just "values" and go in the stack. This is more efficient, because for wrapped primitives in the heap you need (at least) both the value (which is in the stack) and the reference to the wrapper object.
Whether this performance gain matters at all depends on what you're doing. For heavy numerical work, definitely, but for 99 % of stuff out there, this is rather an annoyance. For one thing, you can't store primitives in a Collection anyway; they get autoboxed. So the only way to store lots of them is to use plain arrays, which in turn can lead to other kinds of inefficiencies (if you need to resize them, for instance).
Because primitives are lighter and more efficient in term of memory and CPU processing.
One word: Performance.
The wrapper types are also immutable, which makes it extra expensive if one wanted to use one for a loop counter, for example.
The JVM also has opcodes for directly doing arithmetics on primitives.

Categories

Resources