Kryo Serialization Type Detection - java

I'm using Kryo IO directly to do my own low level primitive serialization of Strings, Longs and Doubles.
What I'm wondering is if there is any way for Kryo IO to automatically detect the primitive data types from the serialized bytes when reading them back?
If I have a byte array of say 10 serialized values, and I don't know if they were Strings, Longs, or Doubles; is there any way for Kryo to determine the data types (like MsgPack can)?

Kryo is no different to the normal Java serialization in this respect. There are two ways in which the deserializer can know what type it is deserializing each time:
It is a field in a known class, so the deserializer implementation reads each field in its proper order.
There is type information embedded in the stream in some manner to let it know. The writeClassAndObject() method in Kryo does just that - it prepends a compact class identifier to the actual object content, letting the deserializer know what to do.
Alternatively, you can do something like this manually e.g. by sending a single byte that would select among a limited number of supported types.
Besides, this is what the MessagePack format mandates as well...

Related

Java json object size: JsonNode vs. String

I work on a java application that needs to hold ~50k json objects in memory.
Each json string is ~5000 characters long.
Extra memory consumption is my concern.
I want to compare the json objects later, but processing is not my concern, only extra memory consumption.
What is more efficient:
Keep json as java String
Keep json as Jackson JsonNode object
I tried serializing the JsonNode objects and the resulting files are smaller, than the string size - but I am not sure if the same is true in memory.
My use-case:
I need to detect changes to some objects, which are encoded as json. This change detection runs every minute and compares the current state with the last state (which we hold in memory).
There are no hooks or events or similar to get changes.
We already hold a list of these objects in memory - with only a limited subset of the json fields.
I cannot change that architecture.
Now instead of mapping json data to some Pojo and comparing each property manually, the idea is to hold the json string/objects and then calculate the diff/patch with some library.
This simplifies the logic a lot and is more generic - but we are worried about the extra memory consumption.
You can use the java.lang.instrument package's getObjectSize() method, to get approximations on the sizes of the objects with the both ways.
long getObjectSize(Object objectToSize)
From the javadoc:
Returns an implementation-specific approximation of the amount of storage consumed by the specified object. The result may include some or all of the object's overhead, and thus is useful for comparison within an implementation but not between implementations. The estimate may change during a single invocation of the JVM.

How to efficiently serialize primitive arrays with Message Pack?

Message Pack formats can serialize small integers or short strings in a compact way that merges type identifier with actual data.
Now when the data to serialize contains a primitive array (Java double[] for instance) then the Message Pack serialization will apparently waste one byte for each value in the array, to specify its type, instead of seeing that the type is constant for all values in the array.
Is there a way to avoid this behavior while remaining inter-operable? (other than using a binary string and converting in the application)

Protocol Buffers in Java: can we handle primitive arrays efficiently?

I work with messages that contain a few attributes and an array of a thousand floating point values (double[]). When the messages are serialized with protocol buffers, thanks to the "packed=true" directive, the double values are aligned and stored compactly in the messages.
But by default the Java classes generated for that message represent the double array as an array list (!), boxing primitive double values into objects, scattering those objects in memory, while at the end I need the double[] representation for further aggregations...
Is there an option to generate classes that handle repeated primitive values as Java primitive arrays?
As explained here what is needed is versions of ArrayList which store unboxed values. Since java generics works only with objects(boxed types), an implementation should be needed for each primitive type. So you can use the one provided by Apache Commons Primitives.
After discussing this topic in several places, the answer is a clear no.
With protocol buffers the binary representation for vectors of numbers is efficient. But it is currently not possible with the Java implementation to efficiently deserialize those vectors (instead of primitive arrays you get collections of boxed numbers...)

Comparing objects before and after some events in java in printed texts

I need a solution to the following problem. Suppose I have different fields in a class. Each of different type, some may be basic types such as Integers, some may be complex object type fields. I need to find a way to compare those fields after exit and restart of the app. By I am limited to dumping the values to file and comparing those. How can I put something on file and compare them so that I can determine whether they have changed or not. I do not need the values. Will getHashCode() help?
If I understand your question, you would like to compare content in a file after exit and before restart. One way would be to use a message digest. As in calculating the SHA1 of the contents and comparing that before restart.
It sounds like Java object serialization might do the trick for you. With serialization, you can write any object to a file, and later read it in again and reconstruct the original object. If you then have an isEqual() method on the object, you can use that to simply check whether the object is the same.
EDIT: reread the question. If you want to compare the file contents, then serialization is not particular useful, as there are bound to be small differences between the two files.
I guess hashCode() will help only if it's implemented in such a way that will return the same result for two objects if the objects have the same values. Of course, for non-primitive fields you'll have to decide what does "same value" mean, and you would be probably required to implement hashCode() for the types of those fields as well.
If you can't/don't want to implement hashCode() maybe JSON could help. I suggest using a library like Google's Gson to render a string representation of your object which you can then dump to file. If the way in which the object (or any of its members) is converted to string does not suit your needs you can specify the conversion with a JsonSerializer.
String strRep = new Gson().toJson(myObject);

Java's Representation of Serialized Objects

I'm looking for the format that Java uses to serialize objects. The default serialization serializes the object in a binary format. In particular, I'm curious to know if two runs of a program can serialize the same object differently.
What condition should an object satisfy so that the object maintains its behavior under Java's default serialization/deserialization round-trip?
You need the Java Object Serialization Specification at http://java.sun.com/javase/6/docs/platform/serialization/spec/protocol.html.
If you have two objects with all properties set to identical values, then they will be serialized the same way.
If it weren't repeatable, then it wouldn't be useful!
They will always serialize it the same way. If this wasn't the case, there would be no guarantee that another program could de-serialize the data correctly, defeating the purpose of serialization.
Typically running the same single-threaded algorithm with the same data will result in the same result.
However, things such as the order with which a HashSet serialises entries is not guaranteed. Indeed, an object may be subtly altered when serialised.
I like #Stephen C's example of Object.hashCode(). If such nondeterministic hash codes are serialized, then when we deserialize, the hash codes will be of no use. For example, if we serialize a HashMap that works based on Object.hashCode(), its deserialized version would behave differently than the original map. That is, looking up the same object would give us different results in the two maps.
If you don't want binary then you can use JSON (http://www.json.org/example.html) in java http://www.json.org/java/
Or XML for that matter http://www.developer.com/xml/article.php/1377961/Serializing-Java-Objects-as-XML.htm
I'm looking for the format that Java
uses to serialize objects.
Not to be inane, it writes them somehow. How exactly that is can and probably should be determined by you. A Character maps to .... uh, it gets involved but rather than re-inventing the wheel let us ask exactly what do you need to have available to reconstruct an object to what state?
The default serialization serializes
the object in a binary format.
So? ( again, not trying to be inane - sounds like we need to define a problem that may not have data concepted )
I'm curious to know if two runs of a
program can serialize the same object
differently.
If you had a Stream of information, how would you determine what states the object needed to be restored to?

Categories

Resources