When should I change a SerialUID? - java

I have a bunch of serialized classes. Normally I have generated serial UIDs for all of them as the Java rules are rather restrictive and recreate Serial Numbers with basically any change. But this lead me to the question, that I haven't been able to find an answer for in the internet:
When does it make sense to break backwards compatibility and manually change the Serial Version UID in the class?

Section 5.6 of the Java Spec helps here:
http://download.oracle.com/javase/6/docs/platform/serialization/spec/version.html#6678
5.6 Type Changes Affecting Serialization
With these concepts, we can now describe how the design will cope with
the different cases of an evolving class. The cases are described in
terms of a stream written by some version of a class. When the stream
is read back by the same version of the class, there is no loss of
information or functionality. The stream is the only source of
information about the original class. Its class descriptions, while a
subset of the original class description, are sufficient to match up
the data in the stream with the version of the class being
reconstituted.
The descriptions are from the perspective of the stream being read in
order to reconstitute either an earlier or later version of the class.
In the parlance of RPC systems, this is a "receiver makes right"
system. The writer writes its data in the most suitable form and the
receiver must interpret that information to extract the parts it needs
and to fill in the parts that are not available.
5.6.1 Incompatible Changes
Incompatible changes to classes are those changes for which the
guarantee of interoperability cannot be maintained. The incompatible
changes that may occur while evolving a class are:
Deleting fields - If a field is deleted in a class, the stream written will not contain its value. When the stream is read by an
earlier class, the value of the field will be set to the default value
because no value is available in the stream. However, this default
value may adversely impair the ability of the earlier version to
fulfill its contract.
Moving classes up or down the hierarchy - This cannot be allowed since the data in the stream appears in the wrong sequence.
Changing a nonstatic field to static or a nontransient field to transient - When relying on default serialization, this change is
equivalent to deleting a field from the class. This version of the
class will not write that data to the stream, so it will not be
available to be read by earlier versions of the class. As when
deleting a field, the field of the earlier version will be initialized
to the default value, which can cause the class to fail in unexpected
ways.
Changing the declared type of a primitive field - Each version of the class writes the data with its declared type. Earlier versions of
the class attempting to read the field will fail because the type of
the data in the stream does not match the type of the field.
Changing the writeObject or readObject method so that it no longer writes or reads the default field data or changing it so that it
attempts to write it or read it when the previous version did not. The
default field data must consistently either appear or not appear in
the stream.
Changing a class from Serializable to Externalizable or vice versa is an incompatible change since the stream will contain data that is
incompatible with the implementation of the available class.
Changing a class from a non-enum type to an enum type or vice versa since the stream will contain data that is incompatible with the
implementation of the available class.
Removing either Serializable or Externalizable is an incompatible change since when written it will no longer supply the fields needed
by older versions of the class.
Adding the writeReplace or readResolve method to a class is incompatible if the behavior would produce an object that is
incompatible with any older version of the class.
5.6.2 Compatible Changes
The compatible changes to a class are handled as follows:
Adding fields - When the class being reconstituted has a field that does not occur in the stream, that field in the object will be
initialized to the default value for its type. If class-specific
initialization is needed, the class may provide a readObject method
that can initialize the field to nondefault values.
Adding classes - The stream will contain the type hierarchy of each object in the stream. Comparing this hierarchy in the stream with the
current class can detect additional classes. Since there is no
information in the stream from which to initialize the object, the
class fields will be initialized to the default values.
Removing classes - Comparing the class hierarchy in the stream with that of the current class can detect that a class has been deleted. In
this case, the fields and objects corresponding to that class are read
from the stream. Primitive fields are discarded, but the objects
referenced by the deleted class are created, since they may be
referred to later in the stream. They will be garbage-collected when
the stream is garbage-collected or reset.
Adding writeObject/readObject methods - If the version reading the stream has these methods then readObject is expected, as usual, to
read the required data written to the stream by the default
serialization. It should call defaultReadObject first before reading
any optional data. The writeObject method is expected as usual to call
defaultWriteObject to write the required data and then may write
optional data.
Removing writeObject/readObject methods - If the class reading the stream does not have these methods, the required data will be read by
default serialization, and the optional data will be discarded.
Adding java.io.Serializable - This is equivalent to adding types. There will be no values in the stream for this class so its fields
will be initialized to default values. The support for subclassing
nonserializable classes requires that the class supertype have a
no-arg constructor and the class itself will be initialized to default
values. If the no-arg constructor is not available, the
InvalidClassException is thrown.
Changing the access to a field - The access modifiers public, package, protected, and private have no effect on the ability of
serialization to assign values to the fields.
Changing a field from static to nonstatic or transient to nontransient - When relying on default serialization to compute the
serializable fields, this change is equivalent to adding a field to
the class. The new field will be written to the stream but earlier
classes will ignore the value since serialization will not assign
values to static or transient fields.

Never. You should organize yourself so that classes have the same serialVersionUID for their entire lifetime. You should (a) resist serialization-incompatible changes to the class; (b) write your own readObject()/writeObject()/readResolve()/writeReplace() objects so as to preserve the initial serialization format, and define an explicit serialVersionUID right at the beginning of the class's lifetime. The instant you change this value you have an enormous headache on your hands. Plan to avoid it.

From the JavaDoc of the Serializable interface:
The serialization runtime associates with each serializable class a version number, called a serialVersionUID, which is used during deserialization to verify that the sender and receiver of a serialized object have loaded classes for that object that are compatible with respect to serialization.
I think this is a good hint to answer your question: As soon as you change the class in a way, that serialization is affected (like adding/removing/changing serialized class members), then you really should change the value of serialVersionUID.

Related

Do a class's methods have to be the same for serialization?

I feel pretty stupid asking this, but for serialization over sockets via ObjectInput/Output Streams to work, do the classes in the program receiving it have to be the exact same and will produce an error if the classes aren't the exact same, or can they have a few different methods/etc? If this is too simple a question to ask here I'm sorry.
By default, the java compiler will generate a private static long serialVersionUID and this ID must match at each end. If the methods are not the same the generated id won't match.
The simple way around this is to hard code the id.
private static final long serialVersionUID = -1;
and it will always match.
The next step is that the fields should match also. However, these can be changed if you write your own readObject() method and in some classes do this because the field shave changed over time.
Spec for how serialVersionUID is calculated
The sequence of items in the stream is as follows:
The class name.
The class modifiers written as a 32-bit integer.
The name of each interface sorted by name.
For each field of the class sorted by field name (except private static and private transient fields:
a. The name of the field.
b. The modifiers of the field written as a 32-bit integer.
c. The descriptor of the field.
If a class initializer exists, write out the following:
a. The name of the method, .
b. The modifier of the method, java.lang.reflect.Modifier.STATIC, written as a 32-bit integer.
c. The descriptor of the method, ()V.
For each non-private constructor sorted by method name and signature:
a. The name of the method, < init >.
b. The modifiers of the method written as a 32-bit integer.
c. The descriptor of the method.
For each non-private method sorted by method name and signature:
a. The name of the method.
b. The modifiers of the method written as a 32-bit integer.
c. The descriptor of the method.
The SHA-1 algorithm is executed on the stream of bytes produced by DataOutputStream and produces five 32-bit values sha[0..4].
The hash value is assembled from the first and second 32-bit values of the SHA-1 message digest. If the result of the message digest, the five 32-bit words H0 H1 H2 H3 H4, is in an array of five int values named sha, the hash value would be computed as follows:
If you are using default (de)serialization there are several compatible changes that can be made without breaking it, see Versioning of Serializable Objects which in 5.6.1 Incompatible Changes lists the following as incompatible changes. Specifically, adding methods does not break serialization.
That said, the default serialization is known to be appropriate in very limited use-cases, and it would better of using other methods, such as JSON.
Deleting fields - If a field is deleted in a class, the stream written will not contain its value. When the stream is read by an earlier class, the value of the field will be set to the default value because no value is available in the stream. However, this default value may adversely impair the ability of the earlier version to fulfill its contract.
Moving classes up or down the hierarchy - This cannot be allowed since the data in the stream appears in the wrong sequence.
Changing a nonstatic field to static or a nontransient field to transient - When relying on default serialization, this change is equivalent to deleting a field from the class. This version of the class will not write that data to the stream, so it will not be available to be read by earlier versions of the class. As when deleting a field, the field of the earlier version will be initialized to the default value, which can cause the class to fail in unexpected ways.
Changing the declared type of a primitive field - Each version of the class writes the data with its declared type. Earlier versions of the class attempting to read the field will fail because the type of the data in the stream does not match the type of the field.
Changing the writeObject or readObject method so that it no longer writes or reads the default field data or changing it so that it attempts to write it or read it when the previous version did not. The default field data must consistently either appear or not appear in the stream.
Changing a class from Serializable to Externalizable or vice versa is an incompatible change since the stream will contain data that is incompatible with the implementation of the available class.
Changing a class from a non-enum type to an enum type or vice versa since the stream will contain data that is incompatible with the implementation of the available class.
Removing either Serializable or Externalizable is an incompatible change since when written it will no longer supply the fields needed by older versions of the class.
Adding the writeReplace or readResolve method to a class is incompatible if the behavior would produce an object that is incompatible with any older version of the class.

Java: what happens when an already loaded class is deserialized

Let's say class com.Foo is loaded from a JAR and later a class with the same name com.Foo, but different definition (other fields) is deserialized (e.g loaded either from DB, or received from a remote call).
What could be the consequences? Will the new received class have any impact? Let's say that the class is used in other parts of the application, being persisted in DB and serialized/JSON encoded later.
If you deserialize a Class<?> object then the class with the fully-qualified class name gets loaded. If it is already loaded you will get the reference to that class.
I think that for a complete answer you should read Java Object Serialization Specification
Here are some quotations from the spec that I think are interessting:
1.1 Overview
Special handling is required for arrays, enum constants, and objects of type Class, ObjectStreamClass, and String. Other objects must implement either the Serializable or the Externalizable interface to be saved in or restored from a stream.
2. Object Output Classes
If the object is a Class, the corresponding ObjectStreamClass is written to the stream, a handle is assigned for the class, and writeObject returns.
3. Object Input Classes
If the object in the stream is a Class, read its ObjectStreamClass descriptor, add it and its handle to the set of known objects, and return the corresponding Class object.
You have the wrong imagination of how Serialization works. You can write a Class instance to an object stream just like other objects but this will not write the byte code of that class nor its definition to the stream. It just creates a symbolic reference to the class which is resolved like any other class reference of the stream: by using its symbolic name trying to resolve it in the context of the class deserializing it. It does not create a new class.
In fact, an instance of java.lang.Class creates even less dependencies to the actual class than writing an instance of it. The instance depends on the serialized form, e.g. the non-transient field of the class, while the symbolic reference represented by an instance of java.lang.Class does not depend on it.
The compatibility between the class present when writing a stream and the class present when deserializing it is determined by the serialVersionUID if it doesn’t match, deserialization will always fail with an exception. If it matches, the implementation will try its best to recover. Fields not present in the stream get their default values, stream fields not present in the actual class and any other unprocessed extra data will be ignored.

SerializationVersionId same but class is modified?

I serialize an object and transfer it over the network.My serialized class object has serilizableId which i defined myself.Now in another JVM I keep the SeriliazibleId same but change some attributes.
What will happen and why?Will It be able deserialize it?
You must refer to the Java Object Serialization Specification here.
In the specific, what you are NOT allowed to do:
Deleting fields - If a field is deleted in a class, the stream written will not contain its value. When the stream is read by an earlier class, the value of the field will be set to the default value because no value is available in the stream. However, this default value may adversely impair the ability of the earlier version to fulfill its contract.
Moving classes up or down the hierarchy - This cannot be allowed since the data in the stream appears in the wrong sequence.
Changing a nonstatic field to static or a nontransient field to transient - When relying on default serialization, this change is equivalent to deleting a field from the class. This version of the class will not write that data to the stream, so it will not be available to be read by earlier versions of the class. As when deleting a field, the field of the earlier version will be initialized to the default value, which can cause the class to fail in unexpected ways.
Changing the declared type of a primitive field - Each version of the class writes the data with its declared type. Earlier versions of the class attempting to read the field will fail because the type of the data in the stream does not match the type of the field.
Changing the writeObject or readObject method so that it no longer writes or reads the default field data or changing it so that it attempts to write it or read it when the previous version did not. The default field data must consistently either appear or not appear in the stream.
Changing a class from Serializable to Externalizable or vice versa is an incompatible change since the stream will contain data that is incompatible with the implementation of the available class.
Changing a class from a non-enum type to an enum type or vice versa since the stream will contain data that is incompatible with the implementation of the available class.
Removing either Serializable or Externalizable is an incompatible change since when written it will no longer supply the fields needed by older versions of the class.
Adding the writeReplace or readResolve method to a class is incompatible if the behavior would produce an object that is incompatible with any older version of the class.
What you are allowed to do instead:
Adding fields - When the class being reconstituted has a field that does not occur in the stream, that field in the object will be initialized to the default value for its type. If class-specific initialization is needed, the class may provide a readObject method that can initialize the field to nondefault values.
Adding classes - The stream will contain the type hierarchy of each object in the stream. Comparing this hierarchy in the stream with the current class can detect additional classes. Since there is no information in the stream from which to initialize the object, the class's fields will be initialized to the default values.
Removing classes - Comparing the class hierarchy in the stream with that of the current class can detect that a class has been deleted. In this case, the fields and objects corresponding to that class are read from the stream. Primitive fields are discarded, but the objects referenced by the deleted class are created, since they may be referred to later in the stream. They will be garbage-collected when the stream is garbage-collected or reset.
Adding writeObject/readObject methods - If the version reading the stream has these methods then readObject is expected, as usual, to read the required data written to the stream by the default serialization. It should call defaultReadObject first before reading any optional data. The writeObject method is expected as usual to call defaultWriteObject to write the required data and then may write optional data.
Removing writeObject/readObject methods - If the class reading the stream does not have these methods, the required data will be read by default serialization, and the optional data will be discarded.
Adding java.io.Serializable - This is equivalent to adding types. There will be no values in the stream for this class so its fields will be initialized to default values. The support for subclassing nonserializable classes requires that the class's supertype have a no-arg constructor and the class itself will be initialized to default values. If the no-arg constructor is not available, the InvalidClassException is thrown.
Changing the access to a field - The access modifiers public, package, protected, and private have no effect on the ability of serialization to assign values to the fields.
Changing a field from static to nonstatic or transient to nontransient - When relying on default serialization to compute the serializable fields, this change is equivalent to adding a field to the class. The new field will be written to the stream but earlier classes will ignore the value since serialization will not assign values to static or transient fields.
If you are using private static final long serialVersionUID; for your class, then you are making sure that any version changes, until they are backward compatible, would not affect the deserialization of your class. If its not backward compatible, then you need to increment the serial version ID.

Serializing enums

I have a serialized object which I added an enum to. This makes it no longer compatible with older versions of the software. I believe this is because the older version is compiled with Java 1.4. I get:
java.io.InvalidClassException: cannot bind enum descriptor to a non-enum class
Ideally I would like to replace the enum with a String and somehow fix the existing object.
A couple of ideas are:
When reading the serialized object, ignore the field with the enum. The value would be lost, but that's OK.
Have two copies of the serialized class, renaming the one with the enum and somehow read the object into the newly renamed class.
Just mark the enum field as transient. It won't get serialized.
You will lose that value, but you said that you don't mind.
transient is a Java keyword.
It flags a field as something that should not be considered part of an object's persistent state.
It marks a member variable not to be serialized when it is persisted to streams of bytes. When an object is transferred through the network, the object needs to be 'serialized'. Serialization converts the object state to serial bytes. Those bytes are sent over the network and the object is recreated from those bytes. Member variables marked by the java transient keyword are not transferred, they are lost intentionally. [source]
I believe this is because the older
version is compiled with Java 1.4. I
get:
No. There were no enums in Java 1.4.1. They were introduced in 1.5. What appears to have happened is that something that was serialized as a String was subsequently changed to an Enum.
Ideally I would like to replace the
enum with a String
Your proposed change just reverses that.
and somehow fix the
existing object.
You can't fix the existing serialized object, but you can make the current class compatible with it, by reverting the source, or figuring out what it must have been and declaring an identical serialVersionUID.
The simplest solution would be to read the serialized entities using your new code and write it out in a format which is understandable by your old version of software. Of course, if this isn't a viable solution, you might need to clarify your requirements a bit more. As in what parts can be changed, what can't be changed etc.

Java serialization, UID not changed. Can I add new variables and method to the class?

I have a class that is serialised. Now I need to add a new variable into the class, with setter and getter methods. This class is sent over wire in RMI.
Without changing the UID, can I add new parameters and getter and setter methods for it? I tried to write an example class that is sent over wire, and did not change the UID, and added new parameters and getter and setter methods for it. On the other end, I tested it and I still got the values properly. I had assumed, if I add new parameters, getter and setter methods, I need to change the UID. Am I wrong?
If you hard-code the SerialVersionUID of a class, (to 1L, usually), store some instances, and then re-define the class, you basically get this behavior (which is more or less common sense):
New fields (present in class definition, not present in the serialized instance) are assigned a default value, which is null for objects, or the same value as an uninitialized field for primitives.
Removed fields (not present in class definition but present in the serialized instance) are simply ignored.
So the general rule of thumb is, if you simply add fields and methods, and don't change any of the existing stuff, AND if you're OK with default values for these new fields, you're generally OK.
Wow, a lot of bad information.
Java serialization is +very+ robust. There are a very well defined set of rules governing backwards compatibility of objects with the same uid and different data. the basic idea is that as long as you don't change the the type of an existing member, you can maintain the same uid without data issues.
that said, your code still needs to be smart about handling classes with potentially missing data. the object may deserialize correctly, but there may not be data in certain fields (e.g. if you added a field to the class and are deserializing an old version of the class). if your code can handle this, than you can probably keep the current uid. if not, then you should probably change it.
in addition to the pre-defined rules, there are advanced usage scenarios where you could even change the type of existing fields and still manage to deserialize the data, but that generally only necessary in extreme situations.
java serialization is very well documented online, you should be able to find all this information in the relevant sun/oracle tutorials/docs.
This only matters if you let Java generate a default UID for your class. It uses the actual members and methods of the class to generate it, thus making it invalid once you change the class structure. If you provide an UID for your class then this only matters if you need to deserialize older versions of your class from a file and such.
Want to define few point to highlight the changes which impacts serialization.
Below you will find the link to Oracle Java Docs for more details.
Incompatible Changes
Incompatible changes to classes are those changes for which the guarantee of interoperability cannot be maintained. The incompatible changes that may occur while evolving a class are:
Deleting fields
Moving classes up or down the hierarchy
Changing a nonstatic field to static or a nontransient field to transient
Changing the declared type of a primitive field
Changing the writeObject or readObject method so that it no longer writes or reads the default field data or changing it so that it attempts to write it or read it when the previous version did not.
Changing a class from Serializable to Externalizable or vice versa.
Changing a class from a non-enum type to an enum type or vice versa.
Removing either Serializable or Externalizable.
Adding the writeReplace or readResolve method to a class, if the behavior would produce an object that is incompatible with any older version of the class.
Link from where the above information is taken
http://docs.oracle.com/javase/7/docs/platform/serialization/spec/version.html#6678

Categories

Resources