Protocol Buffers in Java: can we handle primitive arrays efficiently? - java

I work with messages that contain a few attributes and an array of a thousand floating point values (double[]). When the messages are serialized with protocol buffers, thanks to the "packed=true" directive, the double values are aligned and stored compactly in the messages.
But by default the Java classes generated for that message represent the double array as an array list (!), boxing primitive double values into objects, scattering those objects in memory, while at the end I need the double[] representation for further aggregations...
Is there an option to generate classes that handle repeated primitive values as Java primitive arrays?

As explained here what is needed is versions of ArrayList which store unboxed values. Since java generics works only with objects(boxed types), an implementation should be needed for each primitive type. So you can use the one provided by Apache Commons Primitives.

After discussing this topic in several places, the answer is a clear no.
With protocol buffers the binary representation for vectors of numbers is efficient. But it is currently not possible with the Java implementation to efficiently deserialize those vectors (instead of primitive arrays you get collections of boxed numbers...)

Related

How to efficiently serialize primitive arrays with Message Pack?

Message Pack formats can serialize small integers or short strings in a compact way that merges type identifier with actual data.
Now when the data to serialize contains a primitive array (Java double[] for instance) then the Message Pack serialization will apparently waste one byte for each value in the array, to specify its type, instead of seeing that the type is constant for all values in the array.
Is there a way to avoid this behavior while remaining inter-operable? (other than using a binary string and converting in the application)

Memory comparison of Scala's Array[Int] v/s int[] in Java?

Do they take the same amount of memory? Array is an abstract class so does it incur any object header cost? Is it same for other arrays of Java primitives in Scala?
PS: I read somewhere that Scala stores them as primitive arrays in JVM but now am confused.
Scala's Array[T] is exactly represented as Java's T[], there's no overhead. They generate the same bytecode. You additionally have the operations provided by ArrayOps, but it is an implicit conversion, which does not affect the pure Array[T] representation.
If you are not concerned about potential difference in several bytes (Scala Array vs Java array headers), they are roughly the same in terms of memory use, since Scala Int is represented as Java primitive int:
http://www.scala-lang.org/api/current/index.html#scala.Int

Do collections convert int to integers sometimes?

I was reading somewhere that even if I use int primitives, adding them to certain types of collections may result in a conversion from int to integers.
When is this the case?
Java collections can only contain objects. Therefore, all collections will Autobox any primitive types you pass them into their equivalent object (boxed) form before storing them. So int primitives will be converted to Integers before being stored in a collection.
Java generics and require their type to be a full-fledged object, which primitives obviously aren't. Note that collections before generics were introduced worked with Objects, so they also required full-fledged objects. Java also introduced auto-boxing and auto-unboxing to make this requirement less of a pain, that means that when you pass an int where a method expects an Integer, an appropriate Integer will automatically be created with the correct value.
It is called auto-boxing and all collections does this since Java 5. It is from int primitive to java.lang.Integer wrapper class (not integers as you mentioned in your post).
The Java guide on autoboxing says about this:
... you can’t put an int (or other primitive value) into a collection. Collections can only hold object references, so you have to box
primitive values into the appropriate wrapper class
Since the JDK's collections maintain data as java.lang.Object references (or Object arrays) internally, they all do have to store the values in boxed/wrapped form.
If you do want to reduce the memory footprint for specialized collections, consider using Trove, which has specialized implementations that store data in primitive arrays.

Encode two longs into another primitive in java

I have a Tuple object that holds 3 primitives: Tuple(double, long, long). To avoid creating a huge amount of Tuple, I'm thinking using Trove library's primitive MAP, which would take two primitive as key and value. In my case, it would be Map<double, some primitive>.
My question: is it possible efficiently to encode the two long into a single primitive that I can store in the map, and later decode them?
is it possible efficiently to encode the two long into a single primitive
No, simply because longs are 64-bit, and no Java primitive is longer than that. You would need a 128-bit primitive to encode two longs into it.
It's right, you cannot pack two 64-bit primitives into another primitive, which is at most 64 bits of size. Both, double and long by standard are mapped by 64 binary digits.
The question is, whether you can impose some restrictions on the numbers you are dealing with. If you know, you will always have even numbers or uneven numbers or the first component will have integer range or you are dealing with multitudes of 1000, you can win some bits here.
Practically speaking, you will never make use of all
2^64 x 2^64 combinations
of pairs of long values.
On the other hand, it's no big deal to handle maps on pairs of values. That was the whole effort to make object-oriented languages like Java to not only deal with data types like struct in C, but also to bind methods to the data.
You can find good implementations of a Pair class in the web, e.g. angelikalanger.com. Or you can easily code an implementation yourself, especially, since you only need a pair of Long values.Also consider to use Pair<Double, Pair<Long, Long>> or implement a Tuple<M,N,T> class right away instead of a Map, i.e. key-value combination, following the outline of the Pair<M,N> implementation.
Finally, you could even employ an in-memory database like H2 to hold your Tuple(double, long, long) entries. It is enough to enclose it in your project as a Java library and configure it properly.
By the way, a 3-tuple is called a triple. Therefore, you could correctly call your class Triple(double, long, long) or better Triple(Double, Long, Long).
You could use Trove's double-Object map and encode the two longs into a BigInteger, but if your objective is to stay strictly with primitive types, that obviously isn't any help.
As Joonas says, there is no single primitive that will hold 128 bits. What might meet your need is to use an array to hold the two longs: Map<Double, long[]>. While Double and long[] are not strictly primitives that might suit. Remember that you cannot put double (small-d) into a Map as Maps can only contain reference types, not primitives.
Alternatively, how about Map(Double, Pair), where Pair is a small class to hold two longs? Most libraries have something like that lying around somewhere.

Large array of 'int' type needs to be passed to a generic array & collections

I am generating a large arrays(size>1000) with elements of int type, from a function. I need to pass this array to a generic type array but since the generic type array doesnt accept arrays of primitive type, I am unable to do so.
I fear to use the Integer type array since it will be costly, in terms of creation, performance, space used(an array of 12 byte objects) when doing so for a large size arrays. More it will create immutable Integer s when I need to perform some addition operations on the array elements.
What would be the best way to go with ?
EDIT Just to remove some confusions around, I need to pass int[] to a method of signature type: void setKeys(K... keys).
I want to pass an int[] to this function: public Query<K> setKeys(K... keys);
I assume that you mean that int[] should be the set of keys ... not just one key.
That is impossible. The type parameters of a generic type have to be reference types. Your use-case requires K to be a int.
You have two choices:
use Integer (or a mutable int holder class) and pay the performance penalty, or
forgo the use of generics and change the signature of that method.
Incidentally, the Integer class keeps a cache of Integer objects for small int values. If you create your objects using Integer.valueOf(int) there's a good chance that you will get a reference to an pre-existing object. (Of course, this only works because Integer objects are immutable.)
If your arrays are on the order of 1000 (or even 10,000 or 100,000) elements, the cost difference in terms of memory and performance probably wouldn't be noticeable unless you're processing the arrays thousands of times each. Write the code with Integer and optimize later if you have performance problems.
If you're that concerned about performance, you could write a simple class that wraps a public int, thus meaning you can make your call and still mutate it as needed. Having said that, I do agree that you want to make absolute sure you need this performance improvement before doing it.
If you actually do need to worry about the performance implications of boxing/unboxing integers, you could consider GNU Trove, specifically their TIntArrayList. It lets you mimic the functionality of an ArrayList<Integer> while being backed by primitives. That said, I'm not certain you need this, and I'm not certain this is exactly what you are looking for.
If you don't want the integers permanently boxed, you could pass in the result of Ints.asList() from the Google Collections library (http://guava-libraries.googlecode.com/svn/tags/release08/javadoc/com/google/common/primitives/Ints.html#asList(int...)), which would be a List<Integer> backed by the array. The values will get boxed as they're accessed, so this only makes sense if the values are not being accessed lots of times.

Categories

Resources