Serialization vs String format

Serialization vs String format - java

Can any one suggest which way is better?
Storing the object in serialized form or read the filecontent as String and construct the object.
Simply,
1.I have a string (str,str1,str2,str3,....) like this in my filestore.
Read this file string and construct java object (ex creating the Linkedlist obj based on the comma separated).
2.Retrieve the Linkedlist obj from the file store using the serialization.
Reading the serialized object from filestore or construct the obj from string.
Which one is the best way?
i am taking the linkedlist here is just for sample.
It may be differ, from the string i have to construct some JSONObject,JsonArray formats...
JSON is not serialized obj, i will do it some other way to make as serializable.
For a lengthy string which one is best, serialize or construct the obj from string?
All thing are related to Java
Please advice me
Regards
S.Chinna

The advantage of using a text format is that you can read and maintain the data in a simple text editor.
The advantage of using a binary format like Object Serialization is you don't have to worry about seperators e.g. what if a string contains a ,
Either approach you suggest is likely to be efficient enough (though I would use an ArrayList)
EDIT: If you have multiple strings a better approach may be to put them on a seperate line each. This way you don't need to worry about ,, and can read/edit/version the file easier.
List<String> list = FileUtils.readLines(file);
As you can see, you would be able to read the entire file in one line.

It depends on the complexity of the objects you have to store. If they are simple, or if you have the time to write to write an own Writer and Reader for your objects, I would always go with a custom text format, because they are most the most easy to debug.
If you have a server understanding text commands, you could even connect with putty or telnet and test your server!
But if you have to transport complex object structures, that might even change during development, I would definitely go with some form of serialization. Please note here that Javas default serialization is NOT a good candidate for a communication protocol, because of the large overhead they produce in defining classes over and over again. Better go with JBossSerialization if you want something API compatible to Javas build in classes, or go with JSON, if you don't have to transport much binary data.

Well, if you care about speed - go for binary serialization. If you want to easily read serialized objects - go for string-based (json for example). And here is a performance test for various serializers:
http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2

The advantages of Java ObjectStream serialization are:
You have minimal code to write, and minimal thinking to do when designing your serialization format.
Dealing with complicated (graph-structured) networks of objects is simple.
The end result should be type-safe and bug-free (assuming that you don't implement your own custom read/write object methods, etc)
The main disadvantage of Java ObjectStream serialization is that it is fragile in the face of changes to the classes that you've serialized. Dealing with this can be difficult. (By contrast, a hand-parsed text format is largely immune to this issue ... and problems are easier to fix.)

Related

Read proto partly instead of full parsing in java

I used to define a proto file, for example
option java_package = "proto.data";
message Data {
repeated string strs = 1;
repeated int ints = 2;
}
I received from network this object's inputstream (or bytes). Then, normally, I do a parsing like Data.parserFrom(stream) or Data.parserFrom(bytes) to get the object.
By this, I have to hold full memory on Data object while I just need travel
all string and integer values in the object. It's bad when the object size is big.
What should I do for this issue?

Unfortunately, there is no way to parse just part of a protobuf. If you want to be sure that you've seen all of the strs or all of the ints, you have to parse the entire message, since the values could appear in any order or even interleaved.
If you only care about memory usage and not CPU time then you could, in theory, use a hand-written parser to parse the message and ignore fields that you don't care about. You still have to do the work of parsing, you can just discard them immediately rather than keeping them in memory. However, to do this you'd need to study the Protobuf wire format and write your own parser. You can use Protobuf's CodedInputStream class but a lot of work still needs to be done manually. The Protobuf library really isn't designed for this.
If you are willing to consider using a different protocol framework, Cap'n Proto is extremely similar in design to Protobufs but features in the ability to read only the part of the message you care about. Cap'n Proto incurs no overhead for the fields you don't examine, other than obviously the bandwidth and memory to receive the raw message bytes. If you are reading from a file, and you use memory mapping (MappedByteBuffer in Java), then only the parts of the message you actually use will be read from disk.
(Disclosure: I am the author of most of Google Protobufs v2 (the version you are probably using) as well as Cap'n Proto.)

Hmm. It appears that it may be already implemented but not adequately documented.
Has you tested it ?
See for discussion:
https://groups.google.com/forum/#!topic/protobuf/7vTGDHe0ZyM
See also, sample test code in google's github:
https://github.com/google/protobuf/blob/4644f99d1af4250dec95339be6a13e149787ab33/java/src/test/java/com/google/protobuf/lazy_fields_lite.proto

Java Serialised object vs Non serialised object

1) Can a non-serialised java object be sent over the network to be executed by another JVM or stored in local file storage to get the data restored?
2) What is the difference between serialising and storing the java object vs storing the java object without serialising it?

Serialization is a way to represent a java object as a series of bytes. Its just a format nothing more.
A "build-in" java serialization is a class that provides an API for conversion of the java object to a series of bytes. That's it. Of course, deserialization is a "complementary" process that allows to convert this binary stream back to the object.
The serialization/deserialization itself has nothing to do with the "sending over the network" thing. Its just convenient to send a binary stream that can be created from the object with the serialization.
Even more, sometimes the built-in serialization is not an optimal way to get the binary stream, because sometimes the object can be converted by using less bytes.
So you can use you're custom protocol, provide your own customization for serialization (for example, Externalizable)
or even use third party libraries like Apache Avro
I think this effectively answers both of your questions:
You can turn the non-serialized object (I guess the one that doesn't implement "Serializable" interface) to the series of bytes (byte stream) by yourself if you want and then send it over the network, store in a binary file, whatsoever.
Of course you'll have to understand how to read this binary format for converting back.
Since serialization is just a protocol of conversion and not a "storage related thing", the answer is obvious.
Hope this helps.

In short, you don't store a non-serialized object in java. So I would say no to both questions.
Edit: ObjectOutputStream and ObjectInputStream can write primitives as well as serializable objects, if that's what you are using.

1) Can a non-serialised java object be sent over the network to be
executed by another JVM or stored in local file storage to get the
data restored?
An object is marshalled using ObjectOutputStream to be sent over the wire. Serialization is a Java standard way of storing the state of an object. You can devise your own of doing the same but there is no point re-inventing the wheel unless you see a big problem in the standard way.
2) What is the difference between serialising and storing the java
object vs storing the java object without serialising it?
Serialization stores the state of the object using ObjectOuputStream and can de de-serialized using ObjectInputStream. Serialized object can be saved to a file or can be sent over the network. Serialization is the standard way to achieve all this. But you can always invent your ways to do so if you really have a point to.

The purpose of serialization is to store the state of objects in a self contained way that doesn't require raw memory references, run time state etc. In other words, objects can be represented as a string of bits that can be stored on disk, sent over a network etc.

Best file format regarding standard string and integer data?

For my project, I need to store info about protocols (the data sent (most likely integers) and in the order it's sent) and info that might be formatted something like this:
'ID' 'STRING' 'ADDITIONAL INTEGER DATA'
This info will be read by a Java program and stored in memory for processing, but I don't know what would be the most sensible format to store this data in?
EDIT: Here's some extra information:
1)I will be using this data in a game server.
2)Since it is a game server, speed is not the primary concern, since this data will primary be read and utilized during startup, which shouldn't occur very often.
3)Memory consumption I would like to keep at a minimum, however.
4)The second data "example" will be used as a "dictionary" to look up names of specific in-game items, their stats and other integer data (and therefore might become very large, unlike the first data containing the protocol information, where each file will only note small protocol bites, like a login protocol for instance).
5)And yes, I would like the data to be "human-editable".
EDIT 2: Here's the choices that I've made:
JSON - For the protocol descriptions
CSV - For the dictionaries

There are many factors that could come to weigh--here are things that might help you figure this out:
1) Speed/memory usage: If the data needs to load very quickly or is very large, you'll probably want to consider rolling your own binary format.
2) Portability/compatibility: Balanced against #1 is the consideration that you might want to use the data elsewhere, with programs that won't read a custom binary format. In this case, your heavy hitters are probably going to be CSV, dBase, XML, and my personal favorite, JSON.
3) Simplicity: Delimited formats like CSV are easy to read, write, and edit by hand. Either use double-quoting with proper escaping or choose a delimiter that will not appear in the data.
If you could post more info about your situation and how important these factors are, we might be able to guide you further.

How about XML, JSON or CSV ?

I've written a similar protocol-specification using XML. (Available here.)
I think it is a good match, since it captures the hierarchal nature of specifying messages / network packages / fields etc. Order of fields are well defined and so on.
I even wrote a code-generator that generated the message sending / receiving classes with methods for each message type in XSLT.
The only drawback as I see it is the verbosity. If you have a really simple structure of the specification, I would suggest you use some simple home-brewed format and write a parser for it using a parser-generator of your choice.

In addition to the formats suggested by others here (CSV, XML, JSON, etc.) you might consider storing the info in a Java properties file. (See the java.util.Properties class.) The code is already there for you, so all you have to figure out is the properties names (or name prefixes) you want to use.
The Properties class also provides for storing/loading properties in a simple XML format.

writing data in to files with java

I am writing a server in java that allows clients to play a game similar to 20 questions. The game itself is basically a binary tree with nodes that are questions about an object and leaves that are guesses at the object's identity. When the game guesses wrong it needs to be able to get the right answer from the player and add it to the tree. This data is then saved to a random access file.
The question is: How do you go about representing a tree within a file so that the data can be reaccessed as a tree at a later time.
If you know where I can find information on keeping data structures like trees organized as such when writing/reading to files then please link it. Thanks a lot.
Thanks for the quick answers everyone. This is a school project so it has some odd requirements like using random access files and telnet.

This data is then saved to a random access file.
That's the hard way to solve your problem (the "random access" bit, I mean).
The problem you are really trying to solve is how to persist a "complicated" data structure. In fact, there are a number of ways that this can be done. Here are some of them ...
Use Java persistence. This is simple to implement; make sure that your data structure is serializable, and then its just a few lines of code to serialize and few more lines to deserialize. The downsides are:
Serialized objects can be fragile in the face of code changes.
Serialization is not incremental. You write/read the whole graph each time.
If you have multiple separate serialized graphs, you need some scheme to name and manage them.
Use XML. This is more work to implement than Java persistence, but it has the advantage of being less fragile. And if something does go wrong, there's a chance you can fix it with XSLT or a text editor. (There are XML "binding" libraries that eliminate a lot of the glue coding.)
Use an SQL database. This addresses all of the downsides of Java persistence, but involves more coding ... and using a different computational model to access the persistent data (query versus graph navigation).
Use a database and an Object Relational Mapping technology; e.g. a JPA or JDO implementation. (Hibernate is a popular choice). These bridge between the database and in-memory views of data in a more or less transparent fashion, and avoids a lot of the glue code that you need to write in the SQL database and XML cases.

I think you're looking for serialization. Try this:
http://java.sun.com/developer/technicalArticles/Programming/serialization/

As mentioned, serialization is what you are looking for. It allows you to write an object to a file, and read it back later with minimal effort. The file will automatically be read back in as your object type. This makes things much easier than trying to store the object yourself using XML.

Java serialization has some pitfalls (like when you update your class). I would serialize in a text format. Json is my first choice here but xml and yaml would work as well.
This way you would have a file that doesn't rely on the binary version of your class.
There are several java libraries: http://www.json.org
Some examples:
http://code.google.com/p/json-simple/wiki/DecodingExamples
http://code.google.com/p/json-simple/wiki/EncodingExamples
And to save and read from the file you can use the Commons Io:
import org.apache.commons.io.FileUtis;
import java.io.File;
...
File dataFile = new File("yourfile.json");
String data = FileUtils.readFileToString(dataFile);
FileUtils.writeStringToFile(dataFile, content);

Best way to save data in a Java application?

I'm trying to find the best way to save the state of a simple application.
From a DB point-of-view there are 4/5 tables with date fields and relationships off course.
Because the app is simple, and I want the user to have the option of moving the data around (usb pen, dropbox, etc), I wanted to put all data in a single file.
What is the best way/lib to do this?
XML usually is the best format for this (readability & openness), but I haven't found any great lib for this without doing SAX/DOM.

If you want to use XML, take a look at XStream for simple serialization of Java objects into XML. Here is "Two minute tutorial".
If you want something simple, standard Java Properties format can be also a way to store/load some small data.

consider using plain JAXB annotations that come with the JDK:
#XmlRootElement
private class Foo {
#XmlAttribute
private String text = "bar";
}
here's a blog-post of mine that gives more details on this simple usage of JAXB (it also mentiones a more "classy" JAXB-based approach -- in case you need better control over your XML schema, e.g. to guarantee backwards compatibility)

2 other options you might consider -
Hsqldb is a small sql db written in
java. More relevant for your
purposes, it can be configured to
simply write to a csv file as it's
data store, so you could conceivably
use it's text output as a portable
datastore and still use sql, if
that's what you prefer.
A second option might be to write the
datastore directly to a serialized
file either directly or through a
library like prevayler. Very good
performance and simple to implement,
cons are the fragility and opacity of
the format.
But if the data is small enough, xml is probably much less bother.

If you don't need to provide semantic meaning to your data then XML is probably a wrong choice. I would recommend using the fat-free alternative JSON, which is much more naturally built for data structures.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.