I want to send some objects through sockets from client to server. I can serialize they like object or convert to xml. Which of this methods take less memory?
Serializing them will take A LOT less space. You can also try kryo to get an even better size for your serialized objects. It supports Deflate compression/decompression. Take note however that it's non-standard, so the other side of the socket must use the library as well to de-serialize.
Naturally serialization takes a lot less memory that converting to XML... think of all those <...> and </...> tags! Serialization takes care of all that with numbers, not ASCII characters.
Also, you can serialize to xml! http://x-stream.github.io/
Converting to XML takes up more space on the client and server than just sending them serialized, since you are basically copying the content into a new variable. Sending them serialized may not use the full capacity of a packet, but you can always just process the first packet and overwrite it with the next to save some space (At least that's how I'm currently doing it).
However, serializing it will probably make the transfer slower, since you have to send multiple packages. On the other hand, if you put everything into one XML, you might run into size restrictions on the packets
(I'm talking about DatagramSocket and DatagramPacket here, since these are the ones I use. I dont know how the situation is with other transfer methods).
XML vs Java Serialization, one may use more bandwidth, but the main memory used will be your objects. If you are worried about memory used, I would make your object structure more efficient (assuming it is a real issue)
You can stream XML and Java Objects as you serialize/deserialize which is why they shouldn't use much memory.
Obviously, if you build your serialized data before sending it, this will be inefficient.
Related
I was just introduced to the concept of serialisation in Java and while I 'get' the fundamentals, I can't help but feel like it's a bit of an overkill? My logic is that if I have pointers to the objects and I know how many bytes it takes up in memory. Why can't I just theoretically write these bytes to some txt file, along with the some extra bytes to indicate the type. With this, can't I just read these bytes back and restore my original object?
The amount of detail my book goes into serialisation is giving me a good indication that I'm not really understanding the importance of this and that there is probably something more subtle than just writing out all the bytes exactly as they are. Any help is greatly appreciated! (I have some background in c++ if that helps)
Why can't I just theoretically write these bytes to some txt file, along with the some extra bytes to indicate the type. With this, can't I just read these bytes back and restore my original object?
How could anyone ever read them back in? Say I'm writing code that's supposed to read in your file. Please tell me what the third byte means so that I can decode it properly.
What if the internal representation of the object contains pointers to other objects that might be in different memory locations the next time the program runs? For example, it is quite common to manage identical strings by having internal references to the same internal string object. How will writing that reference to a file be sensible given that the internal string object may not exist in the next run?
To write data to a file, you need to write it out in some specific format that actually contains all the information you need to be able to read back in. What happens to work internally for this program at this time just won't do as there's no guarantee another program at another time can make sense of it.
What you suggest works provided;
the order and type of fields doesn't change. Note this is not set at compile time.
the byte order doesn't change.
you don't have any references eg no String, enum, List or Map.
the name&package of the type doesn't change.
We at Chronicle, use a form of serialization which supports this as it's much faster but it's very limiting. You have to be very aware of those limitations and have a problem which is suitable. We also have a form of serialization which have none of these constraints, but it is slower.
The purpose of Java Serialization is to support arbitrary object graphs even if data is exchanged between systems which might arrange the data differently.
Serialization is the process of converting an object stored in memory into a stream of bytes to be transferred over a network, stored in a DB, etc.
But isn't the object already stored in memory as bits and bytes? Why do we need another process to convert the object stored as bytes into another byte representation? Can't we just transmit the object directly over the network?
I think I may be missing something in the way the objects are stored in memory, or the way the object fields are accessed.
Can someone please help me in clearing up this confusion?
Different systems don't store things in memory in the same way. The obvious example is endianness.
Serialization defines a way by which systems using different in-memory representations can communicate.
Another important fact is that the requirements on in-memory and serialized data may be different: when in-memory, fast read (and maybe write) access is desirable; when serialized, small size is desirable. It is easier to create two different formats to fit these two use cases than it is to create one format which is good for both.
An example which springs to mind is LinkedHashMap: this basically stores two versions of the mapping when in memory (one to capture insertion order; one as a traditional hash map). However, you don't need both of these representations to reconstruct the same map from a serialized form: you only need the insertion order of key/value pairs. As such, the serialized form does not store the same data as the in-memory form.
Serialization turns the pre-existing bytes from the memory into a universal form.
This is done because different systems allocate memory in different ways. Thus, we cannot ensure that the object can be saved directly from the memory on one machine and then be loaded back in properly into another, different machine.
Mabe you can find more information on this page of Oracle docs.
Explanation of object serialization from book Thinking In Java.
When you create an object, it exists for as long as you need it, but under no circumstances does it exist when the program terminates. While this makes sense at first, there are situations in which it would be incredibly useful if an object could exist and hold its information even while the program wasn’t running. Then, the next time you started the program, the object would be there and it would have the same information it had the previous time the program was running. Of course, you can get a similar effect by writing the information to a file or to a database, but in the spirit of making everything an object, it would be quite convenient to declare an object to be "persistent," and have all the details taken care of for you.
Java’s object serialization allows you to take any object that implements the Serializable interface and turn it into a sequence of bytes that can later be fully restored to regenerate the original object. This is even true across a network, which means that the serialization mechanism automatically compensates for differences in operating systems. That is, you can create an object on a Windows machine, serialize it, and send it across the network to a Unix machine, where it will be correctly reconstructed. You don’t have to worry about the data representations on the different machines, the byte ordering, or any other details.
Hope this helps you.
Let's go with that set of mind : we take the object as is , and we send it as byte array over the network. another socket/httphandler receives that byte array.
now, two things come to mind:
ho much bytes to send?
what are these bytes? what class do these btyes represent?
you will have to provide this data as well. so for this action alone we need extra 2 steps.
Now, in C# and Java, as opposed to C++, the objects are scattered throught the heap, each object hold references to the objects it containes , so now we have another requirement
recursivly "catch" all the inner object and pack them into the byte array
now we get packed byte array which represent some object hirarchy, we need to tell the other side how to de-pack this byte array back to object+the object it holds so
Send information on how to unpack that byte array to object hirarchy
Some entities a obejct have cannot be sent over the net, such as functions. so now we have yet another step
Strip away things that cannot be serialized, like functions
this process goes on and one, for every new solution you will find many problems. Serialization is the process of taking that byte array you are talking about and making it something that can be handled in other enviroments, like network/files.
I have uploaded a Java Game Server to github. I would like to provide the following functionality to users. When the game state changes, only transmit the delta to the connected game clients, thereby reducing network load.
I have the below idea to do it.... which is pretty dump as far as I can see.
1) Serialize object before modification
2) Serialize object after modification
3) Convert both to String and find diff (not sure how, but sure some libraries will be there to do that)
4) Transmit diff to interested clients.
How are these kind of requirements normally handled in enterprise?
It would be simpler to produce the delta first and serialize that. You don't need serialization at all to produce the delta. You could get a long way with it just using the Bean Introspector on object properties, if your objects are bean-ish enough.
I would use a library like kryo (https://code.google.com/p/kryo/wiki/V1Documentation) or Sqisher java-object-diff from Daniel Bechler. The latter is suitable for Beans, that is, you require get and set methods for each variable. kryo is more flexible and very fast (https://github.com/eishay/jvm-serializers/wiki). Search for kryo and Delta to find more information. To make use of the delta functionality you have to use kryo version 1 and kryonet.
Well as for diff a couple options exists; a few are pointed out here:
How to perform string Diffs in Java?
Another way might be to serialize the object to XML and use a XML diff tool to produce the delta. XML as the advantage of offering a structure where your binary serialized instances won't. However you should make sure to compress your messages to minizime traffic if you use this strategy.
You might want to investigate badiff for this. badiff is a pure-java binary differ with an emphasis on small diffs and parallelization. See the website at http://badiff.org/ .
Since the serialized form of an object is deterministic, your approach should work. Just serialize the object to a ByteArrayInputStream, do your modifications, serialize it to another ByteArrayInputStream, and use badiff to compute the diff.
This is not an appropriate solution for gaming applications where diffs might arrive out of order. In those cases, if you still want to send diffs, consider sending "key" serialized objects every now and then, which you can compute diffs from, such that keys will always be in order.
Quick design question: I need to implement a form of communication between a client-server network in my game-engine architecture in order to send events between one another.
I had opted to create event objects and as such, I was wondering how efficient it would be to serialize these objects and pass them through an object stream over the simple socket network?
That is, how efficient is it comparatively to creating a string representation of the object, sending the string over via a char stream, and parsing the string client side?
The events will be sent every game loop, if not more; but the event object itself is just a simple wrapper for a few java primitives.
Thanks for your insight!
(tl;dr - are object streams over networks efficient?)
If performance is the primary issue, I suggest using Protocol Buffers over both your own custom serialization and Java's native serialization.
Jon Skeet gives a good explanation as well as benchmarks here: High performance serialization: Java vs Google Protocol Buffers vs ...?
If you can't use PBs, I suspect Java's native serialization will be more optimized than manually serializing/deserializing from a String. Whether or not this difference is significant is likely dependent on how complex of an object you're serializing. As always, you should benchmark to confirm your predictions.
The fact that you're sending things over a network shouldn't matter.
Edit: For time-critical applications Protocol Buffers appear a better choice. However, it appears to me that there is a significant increase in development time. Effectively you'll have to code every exchange message twice: Once as a .proto file which is compiled and spits out java wrappers, and once as a POJO which makes something useful out of these wrappers. But that's guessing from the documentation.
End of Edit
Abstract: Go for the Object Stream
So, what is less? The time it takes to code the object, send the byte stream, and decode it - all by hand - or the time it takes to code the object, send the byte stream, and decode it - all by the trusty and tried serialization mechanism?
You should make sure the objects you send are as small as possible. This can be achieved with enum values, lookup tables and the such, where possible. Might shave a few bytes off each transmission. The serialization algorithm appears very speedy to me, and anything you would code would do exactly the same. When you reinvent the wheel, more often than not you end up with triangles.
I'm trying to design a lightweight way to store persistent data in Java. I've already got a very efficient way to serialize POJOs to DataOutputStreams (and back), but I'm trying to think of a good way to ensure that changes to the data in the POJOs gets serialized when necessary.
This is for a client-side app where I'm trying to keep the size of the eventual distributable as low as possible, so I'm reluctant to use anything that would pull-in heavy-weight dependencies. Right now my distributable is almost 10MB, and I don't want it to get much bigger.
I've considered DB4O but its too heavy - I need something light. Really its probably more a design pattern I need, rather than a library.
Any ideas?
The 'lightest weight' persistence option will almost surely be simply marking some classes Serializable and reading/writing from some fixed location. Are you trying to accomplish something more complex than this? If so, it's time to bundle hsqldb and use an ORM.
If your users are tech savvy, or you're just worried about initial payload, there are libraries which can pull dependencies at runtime, such as Grape.
If you already have a compact data output format in bytes (which I assume you have if you can persist efficiently to a DataOutputStream) then an efficient and general technique is to use run-length-encoding on the difference between the previous byte array output and the new byte array output.
Points to note:
If the object has not changed, the difference in byte arrays will be an array of zeros and hence will compress very small....
For the first time you serialize the object, consider the previous output to be all zeros so that you communicate a complete set of data
You probably want to be a bit clever when the object has variable-sized substructures....
You can also try zipping the difference rather than RLE - might be more efficient in some cases where you have a large object graph with a lot of changes