I'm a newbie at Java. I'm just confusing about serialization and deserialization.
So, I'm confusing that which one I should use.
I'm looking a round and found that Boon, Jackson, GSON (I'm currently using GSON, but some of article using Jackson and Boon) for JSON serialization. And serialize object into byte array or binary object.
Just, which one was faster and which one I should chose?
I'm make this for my simple application, saving current state, document and some other thing.
Thanks in advance :)
Serialize data means convert them to a sequence of bytes.
This sequence can be interpreted as a sequence of readable chars as happens in json, xml, yaml and so on.
The same sequence can be also a sequence of binary data that is not human readable.
There are pro and cons for each serialization approach.
Pro of human readable:
Many existing libraries to serialize deserialize data
The data can be easily debugged
Many existing libraries and apis use this solution
Cons:
Many bytes needed
Serialization and deserialization process can be difficult and time machine consuming
Pro and cons of binary data:
Pro:
Faster serialization deserialization process
Less data to transfer over the network
Cons:
Difficult to read the data that are passed over the network
Not only one representation of same data (BigEndian or not for example)
Related
I am sending a big array of data. What is more optimized: to concat data with a symbol or to send it as a JSONArray?
Data is being sent from Android client to Apache PHP.
Example of concated data:
data1_data2_data3_data4
Example of JSONArray
{ "Data": [data1, data2, data3, data4] }
It completely depends on your usecase. From you example, here are some thoughts:
in terms of bytes sent, the concatenation is slightly better, as a JSON adds some metadata and symbols.
In terms of ease of use, JSON clearly wins, as there are libraries and standards. If you just have plain data without any _, concatenated data are ok. But what happens if one of you data has a _ ? You will need to escape those and to keep track of your custom format all over your codes... (And that's just the tip of the iceberg).
In general, my advice is: use standard data serialization schemes, always. In case the size of the serialized data is a concern, have a look at binary standards (for example protobuf).
It doesn't really matter, if you're asking about optimized towards transfer size in bytes, the difference is minimal.
However, the concatenated data example that you gave will require more data processing on the recipient part as your script will have to cut the sent data and parse the symbol transferring it into a usable object.
So best to stick with the usual JSON object, as I don't think you will gain any optimization this way.
Depends on what you mean by optimization.
Realistically speaking, even if you were to parse it with a custom made function/class vs. some built in function(like json_decode from PHP) the time difference would be rather minimal or irrelevant.
If you can stick to the standard, then do it. Send it as proper JSON not some weirdly concatenated string.
The advantages outweigh anything else.
to concat data will be optimized but you want to make sure your data do not have "_" or handle delimiter properly.
A number of very useful answers posted in this thread helped clear my questions around serialization. From the responses I understand that it is just a means to persist and re-create data in a jvm.So serialization is used for recreating a java object from byte stream. However data could be transferred by means of XML / JSON or via any other data format. So could this be called as serialization? I assume that the difference is that the relevant java libraries would re-create the object using byte stream / xml data / json data etc based on the format of data passed. In case of communication between 2 java based systems, I assume bytestream would be useful where as in case of communication between 2 systems working in different technologies other standard data formats will be used. In case of EJBs / Java RMI , I assume the objects that are transferred between client and server must be serialised as I assume java would be using standard serialization apis to deserialize the objects. Are all these listed above correct?
Wiki sums it up well,
In computer science, in the context of data storage and transmission,
serialization is the process of translating data structures or
object state into a format that can be stored
So your first question
However data could be transferred by means of XML / JSON or via any other data format. So could this be called as serialization?
Yes absolutely. Any format you like, as long as its able to be stored.
Question two:
In case of communication between 2 java based systems, I assume bytestream would be useful where as in case of communication between 2 systems working in different technologies other standard data formats will be used.
Actually Java's built in serialization tends to be only used when its largely invisible to the user and when speed doesn't matter. For example some distributed products might send objects from one node to another using java serialization. For any kind of web service, even from a JVM backed service to another, some kind of friendly format like JSON or XML is far more common. For any product where speed was important or payload size must be as small as possible, they wouldn't use java's serialization but likely some priority binary format.
Protocols like protobuf, avro and thrift were designed to try and give you the best of both worlds. They're somewhat popular but far from universal.
You might also hear the term marshalling, as in a marshaller or marshalling an object. They basically mean the same thing, although in Java land its more common to hear marshalling when you're talking about a non binary format, and serialization when its binary.
This question already has answers here:
What are the pros and cons of XML and JSON? [closed]
(17 answers)
Closed 9 years ago.
I want to store some small chunk of data and don't want to go for any database, we have two choices XML and JSON, now anyone can please suggest which one should I select from performance and architecture point of view.
1. which is better to use? XML or JSON for storing data?
2. What are the pros and cons for both JSON and XML?
any help would be greatly appreciated.
EDIT
We are not using any web service, our application is a stand alone app. We want to use XML or JSON for storing some local data which will be used in the application. The data would be like details of questions and answers, static userdetails etc.
Please keep in mind that JSON is only smaller if the tags are longer than the data.
Probably the fact that the XML is a lot easier to read, and that JSON has a smaller footprint.
XML Pros
Easier to read
Used a lot more than JSON
One of the main industry standards
Versioning possible
Namespace support
Multiple elements with the same name
Validation
XML Cons
Takes up more space
Increased bandwidth because of the size
JSON Pros
Doesn't take up a lot of space
Uses less bandwidth because of it's size (footprint)
Rising in the ranks as one of the main industry standards
JSON Cons
Harder to read
Versioning breaks client/data
If you are sending more data than you send tags then they are about the same and you would have been better off using XML for the fast parsing speeds. I would also argue that people expect slow mobile load times and fast app running times so try and not slow down the app time by using a slower format to parse.
Finally I say JSON, The small footprint will speed up transactions between your app and the web services you're trying to send/receive data to/from.
JSON is the best way to design any mobile application development, because parsing JSON is very light weight operation compare to XML. while XML parsing is always leads to complex memory problem. and JSON can be easily build/parse with GSON library which is again very light weight.
XML Parsing will be head ache if you have different versions of parsers to use. so go for JSON.
Extensible Markup Language (XML) is a text format derived from Standard Generalized Markup Language (SGML).
Most of the excitement around XML is around a new role as an interchangeable data serialization format. XML provides two enormous advantages as a data representation language:
It is text-based.
It is position-independent.
These together encouraged a higher level of application-independence than other data-interchange formats. The fact that XML was already a W3C standard meant that there wasn't much left to fight about (or so it seemed).
Unfortunately, XML is not well suited to data-interchange, much as a wrench is not well-suited to driving nails. It carries a lot of baggage, and it doesn't match the data model of most programming languages. When most programmers saw XML for the first time, they were shocked at how ugly and inefficient it was. It turns out that that first reaction was the correct one. There is another text notation that has all of the advantages of XML, but is much better suited to data-interchange. That notation is JavaScript Object Notation (JSON).
The most informed opinions on XML (see for example xmlsuck.org) suggest that XML has big problems as a data-interchange format, but the disadvantages are compensated for by the benefits of interoperability and openness.
JSON promises the same benefits of interoperability and openness, but without the disadvantages.
Rest of the comparison is here.
I want to send some objects through sockets from client to server. I can serialize they like object or convert to xml. Which of this methods take less memory?
Serializing them will take A LOT less space. You can also try kryo to get an even better size for your serialized objects. It supports Deflate compression/decompression. Take note however that it's non-standard, so the other side of the socket must use the library as well to de-serialize.
Naturally serialization takes a lot less memory that converting to XML... think of all those <...> and </...> tags! Serialization takes care of all that with numbers, not ASCII characters.
Also, you can serialize to xml! http://x-stream.github.io/
Converting to XML takes up more space on the client and server than just sending them serialized, since you are basically copying the content into a new variable. Sending them serialized may not use the full capacity of a packet, but you can always just process the first packet and overwrite it with the next to save some space (At least that's how I'm currently doing it).
However, serializing it will probably make the transfer slower, since you have to send multiple packages. On the other hand, if you put everything into one XML, you might run into size restrictions on the packets
(I'm talking about DatagramSocket and DatagramPacket here, since these are the ones I use. I dont know how the situation is with other transfer methods).
XML vs Java Serialization, one may use more bandwidth, but the main memory used will be your objects. If you are worried about memory used, I would make your object structure more efficient (assuming it is a real issue)
You can stream XML and Java Objects as you serialize/deserialize which is why they shouldn't use much memory.
Obviously, if you build your serialized data before sending it, this will be inefficient.
I'm trying to design a lightweight way to store persistent data in Java. I've already got a very efficient way to serialize POJOs to DataOutputStreams (and back), but I'm trying to think of a good way to ensure that changes to the data in the POJOs gets serialized when necessary.
This is for a client-side app where I'm trying to keep the size of the eventual distributable as low as possible, so I'm reluctant to use anything that would pull-in heavy-weight dependencies. Right now my distributable is almost 10MB, and I don't want it to get much bigger.
I've considered DB4O but its too heavy - I need something light. Really its probably more a design pattern I need, rather than a library.
Any ideas?
The 'lightest weight' persistence option will almost surely be simply marking some classes Serializable and reading/writing from some fixed location. Are you trying to accomplish something more complex than this? If so, it's time to bundle hsqldb and use an ORM.
If your users are tech savvy, or you're just worried about initial payload, there are libraries which can pull dependencies at runtime, such as Grape.
If you already have a compact data output format in bytes (which I assume you have if you can persist efficiently to a DataOutputStream) then an efficient and general technique is to use run-length-encoding on the difference between the previous byte array output and the new byte array output.
Points to note:
If the object has not changed, the difference in byte arrays will be an array of zeros and hence will compress very small....
For the first time you serialize the object, consider the previous output to be all zeros so that you communicate a complete set of data
You probably want to be a bit clever when the object has variable-sized substructures....
You can also try zipping the difference rather than RLE - might be more efficient in some cases where you have a large object graph with a lot of changes