Data Exchange Formats and Java Seralization

Data Exchange Formats and Java Seralization - java

A number of very useful answers posted in this thread helped clear my questions around serialization. From the responses I understand that it is just a means to persist and re-create data in a jvm.So serialization is used for recreating a java object from byte stream. However data could be transferred by means of XML / JSON or via any other data format. So could this be called as serialization? I assume that the difference is that the relevant java libraries would re-create the object using byte stream / xml data / json data etc based on the format of data passed. In case of communication between 2 java based systems, I assume bytestream would be useful where as in case of communication between 2 systems working in different technologies other standard data formats will be used. In case of EJBs / Java RMI , I assume the objects that are transferred between client and server must be serialised as I assume java would be using standard serialization apis to deserialize the objects. Are all these listed above correct?

Wiki sums it up well,
In computer science, in the context of data storage and transmission,
serialization is the process of translating data structures or
object state into a format that can be stored
So your first question
However data could be transferred by means of XML / JSON or via any other data format. So could this be called as serialization?
Yes absolutely. Any format you like, as long as its able to be stored.
Question two:
In case of communication between 2 java based systems, I assume bytestream would be useful where as in case of communication between 2 systems working in different technologies other standard data formats will be used.
Actually Java's built in serialization tends to be only used when its largely invisible to the user and when speed doesn't matter. For example some distributed products might send objects from one node to another using java serialization. For any kind of web service, even from a JVM backed service to another, some kind of friendly format like JSON or XML is far more common. For any product where speed was important or payload size must be as small as possible, they wouldn't use java's serialization but likely some priority binary format.
Protocols like protobuf, avro and thrift were designed to try and give you the best of both worlds. They're somewhat popular but far from universal.
You might also hear the term marshalling, as in a marshaller or marshalling an object. They basically mean the same thing, although in Java land its more common to hear marshalling when you're talking about a non binary format, and serialization when its binary.

Related

POJOs to byte array vs POJOs to json

I'm a newbie at Java. I'm just confusing about serialization and deserialization.
So, I'm confusing that which one I should use.
I'm looking a round and found that Boon, Jackson, GSON (I'm currently using GSON, but some of article using Jackson and Boon) for JSON serialization. And serialize object into byte array or binary object.
Just, which one was faster and which one I should chose?
I'm make this for my simple application, saving current state, document and some other thing.
Thanks in advance :)

Serialize data means convert them to a sequence of bytes.
This sequence can be interpreted as a sequence of readable chars as happens in json, xml, yaml and so on.
The same sequence can be also a sequence of binary data that is not human readable.
There are pro and cons for each serialization approach.
Pro of human readable:
Many existing libraries to serialize deserialize data
The data can be easily debugged
Many existing libraries and apis use this solution
Cons:
Many bytes needed
Serialization and deserialization process can be difficult and time machine consuming
Pro and cons of binary data:
Pro:
Faster serialization deserialization process
Less data to transfer over the network
Cons:
Difficult to read the data that are passed over the network
Not only one representation of same data (BigEndian or not for example)

Serialization vs. Byte Code Translation

I'm a beginner with programming, and I was just wondering if there is a difference between the process of serialization and the process of converting to and from byte code (intermediate language).
I found this on javacodegeeks.com:
Serialization is usually used When the need arises to send your data
over network or stored in files. By data I mean objects and not text.
Now the problem is your Network infrastructure and your Hard disk are
hardware components that understand bits and bytes but not Java
objects. Serialization is the translation of your Java object’s
values/states to bytes to send it over network or save it. --> On
other hand, Deserialization is conversion of byte code to
corresponding java objects. <--
From my understanding of this paragraph, serialization may be the process by which java converts its programs to byte code for the ability to transport to different computer environments and still function correctly.
Am I correct in thinking this?

From my understanding of this paragraph, serialization may be the process by which java converts its programs to byte code for the ability to transport to different computer environments and still function correctly. Am I correct in thinking this?
No, compiling with javac creates the byte code that runs on the JVM. VMs (such as the JVM) INTERPRET the bytecode and use some clever and complicated just-in-time compilation (which IS machine/platform-dependent) to give you the final product. See bytecode is just a bunch of instructions that the JVM interprets. Each bytecode opcode is one byte in length, hence the name bytecode.
Serialization on the other hand, converts the state of a Java object into a stream of bytes. These bytes are not instructions like bytecode. Primary purpose of Java Serialization is to write an object into a stream, so that it can be transported through a network and that object can be rebuilt again. When there are two different parties involved, you need a protocol to rebuild the exact same object again. Java serialization API just provides you that. Other ways you can leverage the feature of serialization is, you can use it to perform a deep copy.
Now the problem is your Network infrastructure and your Hard disk are hardware components that understand bits and bytes but not Java objects. Serialization is the translation of your Java object’s values/states to bytes to send it over network or save it. --> On other hand, Deserialization is conversion of byte code to corresponding java objects.
See you can't just pass a java object to the link layer of the network and expect it to be able to send. Networks send bits and bytes across the physical medium. So serializable lets you encode an object in a standard way to binary, pass it across the network, and then decode it at the receiving end back to the object in the exact state the object was in on the sending side

Is there a java library which can find the binary delta between 2 object instance's serialized state

I have uploaded a Java Game Server to github. I would like to provide the following functionality to users. When the game state changes, only transmit the delta to the connected game clients, thereby reducing network load.
I have the below idea to do it.... which is pretty dump as far as I can see.
1) Serialize object before modification
2) Serialize object after modification
3) Convert both to String and find diff (not sure how, but sure some libraries will be there to do that)
4) Transmit diff to interested clients.
How are these kind of requirements normally handled in enterprise?

It would be simpler to produce the delta first and serialize that. You don't need serialization at all to produce the delta. You could get a long way with it just using the Bean Introspector on object properties, if your objects are bean-ish enough.

I would use a library like kryo (https://code.google.com/p/kryo/wiki/V1Documentation) or Sqisher java-object-diff from Daniel Bechler. The latter is suitable for Beans, that is, you require get and set methods for each variable. kryo is more flexible and very fast (https://github.com/eishay/jvm-serializers/wiki). Search for kryo and Delta to find more information. To make use of the delta functionality you have to use kryo version 1 and kryonet.

Well as for diff a couple options exists; a few are pointed out here:
How to perform string Diffs in Java?
Another way might be to serialize the object to XML and use a XML diff tool to produce the delta. XML as the advantage of offering a structure where your binary serialized instances won't. However you should make sure to compress your messages to minizime traffic if you use this strategy.

You might want to investigate badiff for this. badiff is a pure-java binary differ with an emphasis on small diffs and parallelization. See the website at http://badiff.org/ .
Since the serialized form of an object is deterministic, your approach should work. Just serialize the object to a ByteArrayInputStream, do your modifications, serialize it to another ByteArrayInputStream, and use badiff to compute the diff.
This is not an appropriate solution for gaming applications where diffs might arrive out of order. In those cases, if you still want to send diffs, consider sending "key" serialized objects every now and then, which you can compute diffs from, such that keys will always be in order.

Serialization objects in Java

I want to send some objects through sockets from client to server. I can serialize they like object or convert to xml. Which of this methods take less memory?

Serializing them will take A LOT less space. You can also try kryo to get an even better size for your serialized objects. It supports Deflate compression/decompression. Take note however that it's non-standard, so the other side of the socket must use the library as well to de-serialize.

Naturally serialization takes a lot less memory that converting to XML... think of all those <...> and </...> tags! Serialization takes care of all that with numbers, not ASCII characters.
Also, you can serialize to xml! http://x-stream.github.io/

Converting to XML takes up more space on the client and server than just sending them serialized, since you are basically copying the content into a new variable. Sending them serialized may not use the full capacity of a packet, but you can always just process the first packet and overwrite it with the next to save some space (At least that's how I'm currently doing it).
However, serializing it will probably make the transfer slower, since you have to send multiple packages. On the other hand, if you put everything into one XML, you might run into size restrictions on the packets
(I'm talking about DatagramSocket and DatagramPacket here, since these are the ones I use. I dont know how the situation is with other transfer methods).

XML vs Java Serialization, one may use more bandwidth, but the main memory used will be your objects. If you are worried about memory used, I would make your object structure more efficient (assuming it is a real issue)
You can stream XML and Java Objects as you serialize/deserialize which is why they shouldn't use much memory.
Obviously, if you build your serialized data before sending it, this will be inefficient.

Java - Object Stream efficiency over network

Quick design question: I need to implement a form of communication between a client-server network in my game-engine architecture in order to send events between one another.
I had opted to create event objects and as such, I was wondering how efficient it would be to serialize these objects and pass them through an object stream over the simple socket network?
That is, how efficient is it comparatively to creating a string representation of the object, sending the string over via a char stream, and parsing the string client side?
The events will be sent every game loop, if not more; but the event object itself is just a simple wrapper for a few java primitives.
Thanks for your insight!
(tl;dr - are object streams over networks efficient?)

If performance is the primary issue, I suggest using Protocol Buffers over both your own custom serialization and Java's native serialization.
Jon Skeet gives a good explanation as well as benchmarks here: High performance serialization: Java vs Google Protocol Buffers vs ...?
If you can't use PBs, I suspect Java's native serialization will be more optimized than manually serializing/deserializing from a String. Whether or not this difference is significant is likely dependent on how complex of an object you're serializing. As always, you should benchmark to confirm your predictions.
The fact that you're sending things over a network shouldn't matter.

Edit: For time-critical applications Protocol Buffers appear a better choice. However, it appears to me that there is a significant increase in development time. Effectively you'll have to code every exchange message twice: Once as a .proto file which is compiled and spits out java wrappers, and once as a POJO which makes something useful out of these wrappers. But that's guessing from the documentation.
End of Edit
Abstract: Go for the Object Stream
So, what is less? The time it takes to code the object, send the byte stream, and decode it - all by hand - or the time it takes to code the object, send the byte stream, and decode it - all by the trusty and tried serialization mechanism?
You should make sure the objects you send are as small as possible. This can be achieved with enum values, lookup tables and the such, where possible. Might shave a few bytes off each transmission. The serialization algorithm appears very speedy to me, and anything you would code would do exactly the same. When you reinvent the wheel, more often than not you end up with triangles.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.