This question already has answers here:
Why Java needs Serializable interface?
(13 answers)
Closed 8 years ago.
If we want to serialize an object, we can simply do the following implementation:
class MyClass implements Serializable
{
private static final long serialVersionUID = 12345L;
}
And no extra effort is needed to imperatively implement how the object will be written into and read from files. Java simply takes care of everything.
On the other hand, Externalizable does define explicit serialization and deserialization methods so we can program imperatively.
This leaves me the question: if no extra effort is needed for Serializable, what's the rationale to make it an interface that we have to implement to serialize/deserialize objects, instead of making it by default that every object can be serialized/deserialized?
When a programmer marks a class as Serializable he takes responsibility that if this class will change in future, programs which saved objects will be able to read them back to the updated class. Details are in Effective Java Item 74: Implement Serializable judiciously
There is another rationale. Did you ever notice that ObjectOutput.writeObject(Object obj) accepts Object, not Serializable? This is because it assumes that objects may be saved using different serialization mechanizms. Serializable means that object is supposed to be saved using Java standard serialization
Because:
Not all objects have meaningful semantics for this. Example: singleton object
Security. If you pass objects to someone else's code and they could always capture and transmit the object then there would need to be an opt out for security related code and there would be security bugs when people overlooked an object. So "off by default" is more secure.
The built in serialisation format writes out the classname for every object you write so it is very inefficient. Only use it for very simple cases with little data.
The default serialisation does not share data easily with code written in other languages so using a specific representation should be considered if data written today may need to be read by other software in the future. So it's not a good long term format.
The exact rules of how it works in all cases are not well remembered by all developers.
If you read the book Effective Java by Joshua Bloch it explains how tricky using the built in feature can be. Most developers avoid it for a lot of cases. This answer gives a good rule of thumb https://softwareengineering.stackexchange.com/a/240432/129659
The interface Serializable only works as a mask for identification.If every class
has the ability of being serialized ,then every class need to maintain the serialVersionUID in order to avoid version conflict.Also, it may cause the security problem : some one will use it as a way for creating new object,though
the object is not intended to be created by client code.Using Serializable interface is not safe.See Effective Java for more information.
Related
I am perplexed. I read some threads on stack-overflow regarding marker interfaces in JAVA.
On this thread it is written as:
Marker interfaces aren't 'identified by the JVM' at all. They're identified by the Java code that is interested in them, for example ObjectOutputStream, via the instanceof operator.
Then in comments, it is asserted that:
The implementation in ObjectOutputStream checks if the object has implemented the Serializable interface, if yes perform wirteObject(objectToBeSerialized). So even we can write a marker interface and write a code that checks if an object is an instance of that marker interface and take appropriate action on it then.
On another thread it is written as:
Only Serializable will mark an object as being compatible with Java's built-in serialization machinery.
You can create other empty interfaces, but they won't mean the same thing. Each interface is distinct, even if it defines the same set of methods.
So my question is can we make classes that use serialization without implementing corresponding JAVA's built in interface?
Or is it a special interface that is mandatory to be implemented?
(Asumme, I don't want to use instance of)
TL;DR
Can I use Java serialization/deserialization using Serializable interface, ObjectOutputStream and ObjectInputStream classes, and probably adding readObject and writeObject in the classes implementing Serializable as a valid implementation for Prototype pattern or not?
Note
This question is not to discuss if using copy constructor is better than serialization/deserialization or not.
I'm aware of the Prototype Pattern concept (from Wikipedia, emphasis mine):
The prototype pattern is a creational design pattern in software development. It is used when the type of objects to create is determined by a prototypical instance, which is cloned to produce new objects. This pattern is used to:
avoid subclasses of an object creator in the client application, like the abstract factory pattern does.
avoid the inherent cost of creating a new object in the standard way (e.g., using the 'new' keyword) when it is prohibitively expensive for a given application.
And from this Q/A: Examples of GoF Design Patterns in Java's core libraries, BalusC explains that prototype pattern in Java is implemented by Object#clone only if the class implements Cloneable interface (marker interface similar to Serializable to serialize/deserialize objects). The problem using this approach is noted in blog posts/related Q/As like these:
Copy Constructor versus Cloning
Java: recommended solution for deep cloning/copying an instance
So, another alternative is using a copy constructor to clone your objects (the DIY way), but this fails to implement the prototype pattern for the text I emphasized above:
avoid the inherent cost of creating a new object in the standard way (e.g., using the 'new' keyword)
AFAIK the only way to create an object without invoking its constructor is by deserialization, as noted in the example of the accepted answer of this question: How are constructors called during serialization and deserialization?
So, I'm just asking if using object deserialization through ObjectOutputStream (and knowing what you're doing, marking necessary fields as transient and understanding all the implications of this process) or a similar approach would be a proper implementation of Prototype Pattern.
Note: I don't think unmarshalling XML documents is a right implementation of this pattern because invokes the class constructor. Probably this also happens when unmarshalling JSON content as well.
People would advise using object constructor, and I would mind that option when working with simple objects. This question is more oriented to deep copying complex objects, where I may have 5 levels of objects to clone. For example:
//fields is an abbreviation for primitive type and String type fields
//that can vary between 1 and 20 (or more) declared fields in the class
//and all of them will be filled during application execution
class CustomerType {
//fields...
}
class Customer {
CustomerType customerType;
//fields
}
class Product {
//fields
}
class Order {
List<Product> productList;
Customer customer;
//fields
}
class InvoiceStatus {
//fields
}
class Invoice {
List<Order> orderList;
InvoiceStatus invoiceStatus;
//fields
}
//class to communicate invoice data for external systems
class InvoiceOutboundMessage {
List<Invoice> invoice;
//fields
}
Let's say, I want/need to copy a instance of InvoiceOutboundMessage. I don't think a copy constructor would apply in this case. IMO having a lot of copy constructors doesn't seem like a good design in this case.
Using Java object serialization directly is not quite the Prototype pattern, but serialization can be used to implement the pattern.
The Prototype pattern puts the responsibility of copying on the object to be copied. If you use serialization directly, the client needs to provide the deserialization and serialization code. If you own, or plan to write, all of the classes that are to be copied, it is easy to move the responsibility to those classes:
define a Prototype interface which extends Serializable and adds an instance method copy
define a concrete class PrototypeUtility with a static method copy that implements the serialization and deserialization in one place
define an abstract class AbstractPrototype that implements Prototype. Make its copy method delegate to PrototypeUtility.copy.
A class which needs to be a Prototype can either implement Prototype itself and use PrototypeUtility to do the work, or can just extend AbstractPrototype. By doing so it also advertises that it is safely Serializable.
If you don't own the classes whose instances are to be copied, you can't follow the Prototype pattern exactly, because you can't move the responsibility for copying to those classes. However, if those classes implement Serializable, you can still get the job done by using serialization directly.
Regarding copy constructors, those are a fine way to copy Java objects whose classes you know, but they don't meet the requirement that the Prototype pattern does that the client should not need to know the class of the object instance that it is copying. A client which doesn't know an instance's class but wants to use its copy constructor would have to use reflection to find a constructor whose only argument has the same class as the class it belongs to. That's ugly, and the client couldn't be sure that the constructor it found was a copy constructor. Implementing an interface addresses those issues cleanly.
Wikipedia's comment that the Prototype pattern avoids the cost of creating a new object seems misguided to me. (I see nothing about that in the Gang of Four description.) Wikipedia's example of an object that is expensive to create is an object which lists the occurrences of a word in a text, which of course are expensive to find. But it would be foolish to design your program so that the only way to get an instance of WordOccurrences was to actually analyze a text, especially if you then needed to copy that instance for some reason. Just give it a constructor with parameters that describe the entire state of the instance and assigns them to its fields, or a copy constructor.
So unless you're working with a third-party library that hides its reasonable constructors, forget about that performance canard. The important points of Prototype are that
it allows the client to copy an object instance without knowing its class, and
it accomplishes that goal without creating a hierarchy of factories, as meeting the same goal with the AbstractFactory pattern would.
I'm puzzled by this part of your requirements:
Note: I don't think unmarshalling XML documents is a right
implementation of this pattern because invokes the class constructor.
Probably this also happens when unmarshalling JSON content as well.
I understand that you might not want to implement a copy constructor, but you will always have a regular constructor. If this constructor is invoked by a library then what does it matter? Furthermore object creation in Java is cheap. I've used Jackson for marshalling/unmarshalling Java objects with great success. It is performant and has a number of awesome features that might be very helpful in your case. You could implement a deep copier as follows:
import com.fasterxml.jackson.databind.ObjectMapper;
public class MyCloner {
private ObjectMapper cloner; // with getter and setter
public <T> clone(T toClone){
String stringCopy = mapper.writeValueAsString(toClone);
T deepClone = mapper.readValue(stringCopy, toClone.getClass());
return deepClone;
}
}
Note that Jackson will work automatically with Beans (getter + setter pairs, no-arg constructor). For classes that break that pattern it needs additional configuration. One nice thing about this configuration is that it won't require you to edit your existing classes, so you can clone using JSON without any other part of your code knowing that JSON is being used.
Another reason I like this approach vs. serialization is it is more human debuggable (just look at the string to see what the data is). Additionally, there are tons of tools out there for working with JSON:
Online JSON formatter
Veiw JSON as HTML based webpage
Whereas tools for Java serialization isn't great.
One drawback to this approach is that by default duplicate references in the original object will be made unique in the copied object by default. Here is an example:
public class CloneTest {
public class MyObject { }
public class MyObjectContainer {
MyObject refA;
MyObject refB;
// Getters and Setters omitted
}
public static void runTest(){
MyCloner cloner = new MyCloner();
cloner.setCloner(new ObjectMapper());
MyObjectContainer container = new MyObjectContainer();
MyObject duplicateReference = new MyObject();
MyObjectContainer.setRefA(duplicateReference);
MyObjectContainer.setRefB(duplicateReference);
MyObjectContainer cloned = cloner.clone(container);
System.out.println(cloned.getRefA() == cloned.getRefB()); // Will print false
System.out.println(container.getRefA() == container.getRefB()); // Will print true
}
}
Given that there are several approaches to this problem each with their own pros and cons, I would claim there isn't a 'proper' way to implement the prototype pattern in Java. The right approach depends heavily on the environment you find yourself coding in. If you have constructors which do heavy computation (and can't circumvent them) then I suppose you don't have much option but to use Deserialization. Otherwise, I would prefer the JSON/XML approach. If external libraries weren't allowed and I could modify my beans, then I'd use Dave's approach.
Your question is really interesting Luiggi (I voted for it because the idea is great), it's a pitty you don't say what you are really concerned about. So I'll try to answer what I know and let you choose what you find arguable:
Advantages :
In terms of memory use, you will get a very good memory consumption by using serialization since it serializes your objects in binary format (and not in text as json or worse: xml). You may have to choose a strategy to keep your objects "pattern" in memory as long as you need it, and persist it in a "less used first persisted" strategy, or "first used first persisted"
Coding it is pretty direct. There are some rules to respect, but it you don't have many complex structures, this remains maintainable
No need for external libraries, this is pretty an advantage in institutions with strict security/legal rules (validations for each library to be used in a program)
If you don't need to maintain your objects between versions of the program/ versions of the JVM. You can profit from each JVM update as speed is a real concern for java programs, and it's very related to io operations (JMX, memory read/writes, nio, etc...). So there are big chances that new versions will have optimized io/memory usage/serialization algos and you will find you're writing/reading faster with no code change.
Disadvantages :
You loose all your prototypes if you change any object in the tree. Serialization works only with the same object definition
You need to deserialize an object to see what is inside it: as opposed to the prototype pattern that is 'self documenting' if you take it from a Spring / Guice configuration file. The binary objects saved to disk are pretty opaque
If you're planning to do a reusable library, you're imposing to your library users a pretty strict pattern (implementing Serializable on each object, or using transient for dields that are not serializable). In addition this constraints cannot be checked by the compiler, you have to run the program to see if there's something wrong (which might not be visible immediately if an object in the tree is null for the tests). Naturally, I'm comparing it to other prototyping technologies (Guice for example had the main feature of being compile time checked, Spring did it lately too)
I think it's all what comes to my mind for now, I'll add a comment if any new aspect raises suddenly :)
Naturally I don't know how fast is writing an object as bytes compared to invoking a constructor. The answer to this should be mass write/read tests
But the question is worth thinking.
There are cases where creating new object using copy constructor is different from creating new object "in a standard way". One example is explained in the Wikipedia link in your question. In that example, to create new WordOccurrences using the constructor WordOccurrences(text, word), we need to perform heavyweight computation. If we use copy constructor WordOccurrences(wordOccurences) instead, we can immediately get the result of that computation (in the Wikipedia, clone method is used, but the principle is the same).
I'd like to save a Java object to file (for unit testing later on). I have tried JSON but since I don't own the classes to the objects I'm serializing, deserialization becomes more effort than it is worth. (The getters and setters to the existing classes are overloaded with different types and Jackson cannot figure out how to deserialize the object)
Now I'm exploring other avenues (i.e. serializing to binary or some other format). I'm wondering if there is anything out there that can dump a Java object to binary/file so that deserialization is trivial. I understand you can do this with the Serializable interface, but again I don't own these classes which don't implement this interface so they cannot be modified.
You could also use XStream which does not depents on classes having implemented Serializable interface.
if they implement Serializable you do not need to own them (nor modify them), you can just write them to file using an ObjectOutputStream and read them back in with a ObjectInputStream
Assuming the classes have proper getters and setters you should have no issue doing this
You can also use Kryo. Benchmarks (here and here) say it is one of the fastest to serialize/deserialize and uses less space too. It also doesn't need Serializable to be implemented. That said, I have never used it personally.
I have a question about Java serialization in scenarios where you may need to modify your serializable class and maintain backward compatibility.
I come from deep C# experience, so please allow me to compare Java with .NET.
In my Java scenario, I need to serialize an object with Java's runtime serialization mechanism, and store the binary data in permanent storage to reuse the objects in future. The problem is that, in the future, classes may be subject to changes. Fields may be added or removed.
I don't know Java serialization in the deep, except for this fantastic article about how not to program in Java when dealing with serialization.
As I imagine(d), the serialVersionUID plays a key role in Java serialization, and this is where I need your help.
Apart from the article's example (I know it's bad coding), shall that field not be modified when Eclipse asks to update it after I modified the class?
I remember from the .NET world that when I add new fields I must add the [OptionalField]Attribute to the field to get the backward compatibility, so CLR won't require it in old serialized data. Also, when I need to deprecate a field I must only remove the public methods and not the private fields.
What are the guidelines for best serialization?
Thank you.
[Add] Here is an example. Suppose I have class Foo
public class Foo {
private String bar;
}
Then I change to:
public class Foo {
private String bar;
private Integer eggs;
}
Is compatibility broken between these two version? If I deserialize an "oldFoo" when I have the "newFoo" compiled, does eggs equals null or is an exception thrown? I prefer the first, obviously!!
Let's say you have a class MyClass and you want to ensure serialization compatibility going forward, or at least make sure that you don't change its serialized form unintentionally. You can use Verify.assertSerializedForm() from GS Collections test utilities in most cases.
Start by writing a test that asserts that your class has a serialVersionUID of 0L and has a serial form that's the empty string.
#Test
public void serialized_form()
{
Verify.assertSerializedForm(
0L,
"",
new MyClass());
}
Run the test. It will fail since the String represents a Base64 encoding and is never empty.
org.junit.ComparisonFailure: Serialization was broken. <Click to see difference>
When you click to see the difference, you'll see the actual Base64 encoding. Paste it inside the empty string.
#Test
public void serialized_form()
{
Verify.assertSerializedForm(
0L,
"rO0ABXNyAC9jYXJhbWVsa2F0YS5zaHVrbmlfZ29lbHZhLkV4ZXJjaXNlOVRlc3QkTXlDbGFzc56U\n"
+ "hVp0q+1aAgAAeHA=",
new MyClass());
}
Re-run the test. It's likely to fail again with an error message like this.
java.lang.AssertionError: serialVersionUID's differ expected:<0> but was:<-7019839295612785318>
Paste the new serialVersionUID into the test in place of 0L.
#Test
public void serialized_form()
{
Verify.assertSerializedForm(
-7019839295612785318L,
"rO0ABXNyAC9jYXJhbWVsa2F0YS5zaHVrbmlfZ29lbHZhLkV4ZXJjaXNlOVRlc3QkTXlDbGFzc56U\n"
+ "hVp0q+1aAgAAeHA=",
new MyClass());
}
The test will now pass until you change the serialized form. If you break the test (change the serialized form) by accident, the first thing to do is check that you've specified the serialVerionUID in the Serializable class. If you leave it out, the JVM generates it for you and it's quite brittle.
public class MyClass implements Serializable
{
private static final long serialVersionUID = -7019839295612785318L;
}
If the test is still broken, you can try to restore the serialized form by marking new fields as transient, taking full control over the serialized form using writeObject(), etc.
If the test is still broken, you have to decide whether to find and revert your changes which broke serialization or treat your changes as an intentional change to the serialized form.
When you change the serialized form on purpose, you'll need to update the Base64 String to get the test to pass. When you do, it's crucial that you change the serialVersionUID at the same time. It doesn't matter what number you choose, as long as it's a number you've never used for the class before. The convention is to change it to 2L, then 3L, etc. If you're starting from a randomly generated serialVersionUID (like -7019839295612785318L in the example), you should still bump the number to 2L because it's still the 2nd version of the serialized form.
Note: I am a developer on GS collections.
Java's native serialization support is mainly useful for short term storage or transmission via a network, so instances of an application can communicate with little effort. If you're after longer term storage, I'd suggest you have a look at some XML serialization technique like JAXB.
It's best not to use serialization when you need to keep your data for long period of time.Try using a database or protocol buffer (Protocol Buffers are a way of encoding structured data in an efficient yet extensible format).
If you want to manage the serialized version of the class, you should implement interface Externalizable and specify how to serialize and deserialize the state of your class. This way, the serialized state can be simpler than the "real" state. For example, a TreeMap object has a state that is a red-black tree, while the serialized version is just a list of key-values (and the tree is re-created when the object is deserialized).
However, if your class is simple and it only has some optional fields, you can use the keyword "transient" and make the default serialization ignore it. For example:
public class Foo {
private String bar;
private transient Integer eggs;
}
Unfortunately I do not have a deep knowledge of C# but based on your words I can conclude that Java serialization is weaker. Field serialVersionUID is optional and can help only if you changed the class binary signature but have not changed the serializable fields. If you changed the fields you cannot read previously serialized object.
The only workaround is to implement your own searilzation mechanism. Java allows this. You have to implement your own readObject() and writeObject() methods. These methods should be smart enough to support backwards compatibility.
Please see javadoc of java.io.Serializable for more details.
If you set the serialVersionUID to a constant (let's say 1) then you can freely add new fields without breaking anything. By leaving the serialVersionUID the same between versions, you are telling the serialization algorithm that you know that the classes are compatible.
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
What is the difference between Serializable and Externalizable in Java?
what is the difference between serialization and externalization in java?
Basically, the difference between Serializable and Externalizable is that with classes which implement Serializable, the serialization of the object is taken care of automatically, while classes that implement Externalizable is responsible for serializing itself, without the help of default serialization procedures.
There is more information provided in the API Specification for the Externalizable interface, and the Serializable interface. From the Externalizable interface documentation:
Only the identity of the class of an
Externalizable instance is written in
the serialization stream and it is the
responsibility of the class to save
and restore the contents of its
instances. The writeExternal and
readExternal methods of the
Externalizable interface are
implemented by a class to give the
class complete control over the format
and contents of the stream for an
object and its supertypes.
The Discover the secrets of the Java Serialization API article has a discussion on the Externalizable interface in the "Create Your Own Protocol: the Externalizable Interface" section.
I recommend reading an article called Understand When to Serialize v. Externalize Objects in Java that described the differences between serialization and externalization.
First is describes what serialization is:
The serialization of objects in Java
allows you to make a byte sequence
from any object that has implemented
the Serializable interface; it also
allows you to turn that byte sequence
back into an object.
Next it describes a situation in which externalization might be preferable to serialization:
There might be times when you have
special requirements for the
serialization of an object. For
example, you may have some
security-sensitive parts of the
object, like passwords, which you do
not want to keep and transfer
somewhere. Or, it may be worthless to
save a particular object referenced
from the main object because its value
will become worthless after restoring.
You can control the process of
serialization by implementing the
Externalizable interface instead of
Serializable. This interface extends
the original Serializable interface
and adds writeExternal() and
readExternal(). These two methods will
automatically be called in your
object's serialization and
deserialization, allowing you to
control the whole process.
I recommend reading the entire article, because the excerpts above do not cover the details. The article also contains several code snippets you might find useful.