Deep copy, shallow copy, clone

Deep copy, shallow copy, clone - java

I need clarification on the differences between deep copy, shallow copy, and clone in Java

Unfortunately, "shallow copy", "deep copy" and "clone" are all rather ill-defined terms.
In the Java context, we first need to make a distinction between "copying a value" and "copying an object".
int a = 1;
int b = a; // copying a value
int[] s = new int[]{42};
int[] t = s; // copying a value (the object reference for the array above)
StringBuffer sb = new StringBuffer("Hi mom");
// copying an object.
StringBuffer sb2 = new StringBuffer(sb);
In short, an assignment of a reference to a variable whose type is a reference type is "copying a value" where the value is the object reference. To copy an object, something needs to use new, either explicitly or under the hood.
Now for "shallow" versus "deep" copying of objects. Shallow copying generally means copying only one level of an object, while deep copying generally means copying more than one level. The problem is in deciding what we mean by a level. Consider this:
public class Example {
public int foo;
public int[] bar;
public Example() { };
public Example(int foo, int[] bar) { this.foo = foo; this.bar = bar; };
}
Example eg1 = new Example(1, new int[]{1, 2});
Example eg2 = ...
The normal interpretation is that a "shallow" copy of eg1 would be a new Example object whose foo equals 1 and whose bar field refers to the same array as in the original; e.g.
Example eg2 = new Example(eg1.foo, eg1.bar);
The normal interpretation of a "deep" copy of eg1 would be a new Example object whose foo equals 1 and whose bar field refers to a copy of the original array; e.g.
Example eg2 = new Example(eg1.foo, Arrays.copy(eg1.bar));
(People coming from a C / C++ background might say that a reference assignment produces a shallow copy. However, that's not what we normally mean by shallow copying in the Java context ...)
Two more questions / areas of uncertainty exist:
How deep is deep? Does it stop at two levels? Three levels? Does it mean the whole graph of connected objects?
What about encapsulated data types; e.g. a String? A String is actually not just one object. In fact, it is an "object" with some scalar fields, and a reference to an array of characters. However, the array of characters is completely hidden by the API. So, when we talk about copying a String, does it make sense to call it a "shallow" copy or a "deep" copy? Or should we just call it a copy?
Finally, clone. Clone is a method that exists on all classes (and arrays) that is generally thought to produce a copy of the target object. However:
The specification of this method deliberately does not say whether this is a shallow or deep copy (assuming that is a meaningful distinction).
In fact, the specification does not even specifically state that clone produces a new object.
Here's what the javadoc says:
"Creates and returns a copy of this object. The precise meaning of "copy" may depend on the class of the object. The general intent is that, for any object x, the expression x.clone() != x will be true, and that the expression x.clone().getClass() == x.getClass() will be true, but these are not absolute requirements. While it is typically the case that x.clone().equals(x) will be true, this is not an absolute requirement."
Note, that this is saying that at one extreme the clone might be the target object, and at the other extreme the clone might not equal the original. And this assumes that clone is even supported.
In short, clone potentially means something different for every Java class.
Some people argue (as #supercat does in comments) that the Java clone() method is broken. But I think the correct conclusion is that the concept of clone is broken in the context of OO. AFAIK, it is impossible to develop a unified model of cloning that is consistent and usable across all object types.

The term "clone" is ambiguous (though the Java class library includes a Cloneable interface) and can refer to a deep copy or a shallow copy. Deep/shallow copies are not specifically tied to Java but are a general concept relating to making a copy of an object, and refers to how members of an object are also copied.
As an example, let's say you have a person class:
class Person {
String name;
List<String> emailAddresses
}
How do you clone objects of this class? If you are performing a shallow copy, you might copy name and put a reference to emailAddresses in the new object. But if you modified the contents of the emailAddresses list, you would be modifying the list in both copies (since that's how object references work).
A deep copy would mean that you recursively copy every member, so you would need to create a new List for the new Person, and then copy the contents from the old to the new object.
Although the above example is trivial, the differences between deep and shallow copies are significant and have a major impact on any application, especially if you are trying to devise a generic clone method in advance, without knowing how someone might use it later. There are times when you need deep or shallow semantics, or some hybrid where you deep copy some members but not others.

Deep copy: Clone this object and every reference to every other object it has
Shallow copy: Clone this object and keep its references
Object clone() throws CloneNotSupportedException: It is not specified whether this should return a deep or shallow copy, but at the very least: o.clone() != o

The terms "shallow copy" and "deep copy" are a bit vague; I would suggest using the terms "memberwise clone" and what I would call a "semantic clone". A "memberwise clone" of an object is a new object, of the same run-time type as the original, for every field, the system effectively performs "newObject.field = oldObject.field". The base Object.Clone() performs a memberwise clone; memberwise cloning is generally the right starting point for cloning an object, but in most cases some "fixup work" will be required following a memberwise clone. In many cases attempting to use an object produced via memberwise clone without first performing the necessary fixup will cause bad things to happen, including the corruption of the object that was cloned and possibly other objects as well. Some people use the term "shallow cloning" to refer to memberwise cloning, but that's not the only use of the term.
A "semantic clone" is an object which is contains the same data as the original, from the point of view of the type. For examine, consider a BigList which contains an Array> and a count. A semantic-level clone of such an object would perform a memberwise clone, then replace the Array> with a new array, create new nested arrays, and copy all of the T's from the original arrays to the new ones. It would not attempt any sort of deep-cloning of the T's themselves. Ironically, some people refer to the of cloning "shallow cloning", while others call it "deep cloning". Not exactly useful terminology.
While there are cases where truly deep cloning (recursively copying all mutable types) is useful, it should only be performed by types whose constituents are designed for such an architecture. In many cases, truly deep cloning is excessive, and it may interfere with situations where what's needed is in fact an object whose visible contents refer to the same objects as another (i.e. a semantic-level copy). In cases where the visible contents of an object are recursively derived from other objects, a semantic-level clone would imply a recursive deep clone, but in cases where the visible contents are just some generic type, code shouldn't blindly deep-clone everything that looks like it might possibly be deep-clone-able.

Related

Different types of Shallow copy [SOLVED: aliasing vs copying]

This question is not about explaining how shallow/deep work, but examples of shallow copy.
I am making notes, and I wonder if these two examples both 'fall under the roof' of shallow copies. Yes, they are a little different, but are they both so called 'shallow copies'? Code:
ArrayList<String> colors = new ArrayList<>();
colors.add("Red");
colors.add("Blue");
ArrayList<String> shallowOne = new ArrayList<>(colors);
ArrayList<String> shallowTwo = colors;
ShallowOne has reference to elements inside object colors.
ShallowTwo has direct reference to object colors.
So, are they both using famous concept of Shallow copy? Or..terminology is different for them?

The second example you've given is not an example of shallow copying. In fact, no copy has occurred: this is called aliasing.
There are some languages, however, where this isn't necessarily true. In Rust, for instance, a shallow/deep copy is made on assignment if the datatype has been marked as Copy, and a move assignment happens otherwise (where the old name is no longer usable, so no aliasing happens here either).

Is shallow copy really needed?

I am currently working on a graphs library for Java. As you expect there exists a Vertex class. That class contains an object of typeVertexData<T> and that itself can contain anything.
(I know this might be redundant and i could just do Vertex<T> but for the purpose of the question it doesn't matter).
I made VertexData<T> implement Cloneable and have a public VertexData<T> clone() method returning a deep copy by serializing and deserializing the calling Object, pretty much like described here
Now the question is, since i do have a deep copy method, does it make sense to also have a shallow copy? If so what is a case that a shallow copy would be preferred over a deep copy?
UPDATE: Since most of the answers and comments include some explanation of what a shallow copy is in one way or another, i feel like that i have to clarify a bit. I do know what a shallow copy is, how it works, everything. My question is that since it is a library that i am developing, and since i do have created a deep copy method, does it make sense to also make available a method for shallow copy?
I will add here too that there are no primitive types contained in VertexData<T> class.
So in the context of a container class used to store Vertex data for a graph library, will a shallow copy ever be needed?
If so can you think of an example, within the context of what i am developing?
If not should i add a shallow copy method only for the sake of completeness?
Is that a good practice or does it not matter?

A container type like List<Point> may in some cases be used to hold a bunch of X,Y coordinate pairs, but in other cases may be used to identify a bunch of movable points which are used by other code. The former case may be subdivided into subcases where the owner of the List<Point> is also the exclusive owner of the Point instances therein and may modify them at will, or where the owner will never modify those instances but may share references to them with code that promises not to modify them either.
If the List<Point> is used to encapsulate (X,Y) coordinate pairs, but the the owner might modify the Point objects held therein, then a proper clone of the List<Point> must hold references to copies of the Point objects in question. If it encapsulates coordinate pairs, but nobody will ever modify the objects therein (and recipients of a cloned list wouldn't expose references to the objects therein to any code that might modify them) then a proper clone of the List<Point> could hold references to either the original Point objects or copies thereof; the former would be faster, but the latter would still be semantically correct.
If the List<Point> serves to identify Point instances which may be modified by other code, and any such modification needs to be reflected in the List<Point> itself, then a proper clone must hold references to the same Point objects as the original list. If a clone were to instead hold copies of those Point objects, then it would no longer hold the same semantic information as the original list.
If Java had segregated collection types based upon whether they encapsulate value using exclusively owned mutable instances or shareable immutable instances, or whether they serve to identify the things therein, then it would be possible to have a single concept of "cloning", rather than requiring "deep" and "shallow" cloning. Without such a distinction between collection types, however, it's necessary to have cloning methods which can do whatever will be needed based upon the things in the collection.

It really comes down to requirements. Knowing your object has more than primitive fields alone, it should (and thankfully does) have a deep copy. There is no "hard and fast rule" with whether to use shallow or deep. Since it is "based on requirement", it would be safe to provide both as #RyanJ points out in a comment to another answer.
If you wish to shallow copy your collection or object, and make a change to one attribute, it will change both the reference as well as the copied object. On the other hand, if you wish to deep copy and be able to change values of an object or the copy of the object and NOT have it affect both the copy and original, deep copy is all you need. It all comes down to requirement and what you need your object/system to do. My final recommendation is to do both.

Yes it is required in few cases. you can infer the requirement based on the following points.
if the object has only primitive fields, then you should go for shallow copy.
if the object has references to other objects, then based on the requirement, you should consider shallow copy or deep copy.
if the references are not modified then its not required to do deep copy. here u should go for shallow copy.
if the references are modified then deep copy is preferred.
shallow copy:
shallow copy can lead to unwanted effects if the elements of values are changed from other reference.
deep copy:
during deep copy any Changes to the array values refers to will not result in changes to the array data refers to.
u can refer to this link
to understand more about this with examples.

You will not need a shallow copy. A shallow copy will just assign a new reference variable to your already existing object in memory. An '=' operator will do the work. For more details, please go through this post - In Java, what is a shallow copy?

"There is no deep copy in java," but does it matter for list of Strings?

Let's say I do:
public List<E> gimmeAList(List<E> original) {
return new ArrayList<E>(original); // this is a shallow memcopy.
}
I try to modify the new list, but since Strings (or any immutable object) are immutable, doing something like: gimmeAList(something).get(0).replace("a", "b"); doesn't ACTUALLY seem to change anything in either lists.
So in this case, am I allowed to just assume (for immutable objects in a list) that new ArrayList<E>(original) is guaranteed to be, for all intents and purposes, basically a deep copy?

Yes, absolutely - copying a collection of immutable objects is effectively deep, unless you do something crazy like synchronizing on the references or comparing them for reference equality (i.e. operations which depend on object identity).

Copies of immutable objects are practically (or effectively) deep. However, it matters if the type E is mutable. But if it were so, you could call clone() on each Cloneable E and then you have a "deep copy". I believe your original premise "there is no deep copy in java" is incorrect, assuming you meant there is no way to create a "deep copy".

Object references in Java may be used to encapsulate identity, mutable state, both, or neither, in addition to encapsulating immutable aspects of state other than identity. Making a proper copy of an object which contains references to other objects requires knowing what is encapsulated by those references. Assuming foo is an object with reference-type field bar, and fooCopy is supposed to be a copy of foo...
If foo.bar is used to encapsulate the identity of an object, but not mutable state, then fooCopy.bar should identify the same object.
If foo.bar is used to encapsulate an object's mutable state, but not its identity, then fooCopy.bar should identify a newly-constructed copy of the object identified by foo.bar.
If foo.bar is used to encapsulate both identity and mutable state, then it will generally not be possible to make a proper copy of foo in isolation. If foo is part of a large forest of interconnected objects, it may be possible to construct a new forest of objects whose interconnections are isomorphic to those of the original, and have fooCopy.bar identify the object within that forest that corresponds to foo.bar's place in the original, but that's not apt to be simple.
If foo.bar encapsulates neither identity nor mutable state, then fooCopy.bar may legitimately identify either the same object as foo.bar, or a legitimate copy of the object identified by foo.bar.
Since string references are generally used to encapsulate neither identity nor mutable state, the last scenario applies; one may either copy the string reference or store a reference to a new copy the underlying string, whichever is more convenient.

what is the need of cloning a object in Java

I was reading about the cloning in Java, how to make shallow/deep copies of object etc.
I was wondering why do I need to create object clones in Java? Any real time examples could be helpful in understanding.

Having a cloned copy of something means you can have "before" and "after" versions. You can leave the original alone while you test something out with a copy. You can provide undo by simply reverting to the original version.

Quite often you want to use immutable objects, in which case cloning is an essential part of your code. If for example you have an immutable object that has a list or array type field, your getter should always return a clone of the list or array to preserve immutability.
The other typical use case is when you want "transactional" modifications, when you call several state changing methods but only want the result to be visible if all of them are successful.

A concrete example of cloning is the:
prototype design pattern

As Cloning itself says Duplicate copy of something, so In java when we say cloning of object it means to create or have another same object of existing one.
When we do cloning? when we saw that the creating new object every time is time consuming or we need new object having same or little bit difference w.r.t all ready created object, then we use cloning.
Cloning are of 3 types in java
Shallow copy
Deep copy
cloning
Shallow copy
Shallow copy is the process in which the state of the object is copied to another object, but both the objects point to the same reference in heap area.
Deep Copy
In Deep Copy, two separate objects are created and in deep copy. In this each field of one object is copied to another object.
Now third category to overcome this difficulty in java is the concept of cloning.
Cloning in java is done by implements Cloneable interface. Cloneable is marker interface.
For more deep knowledge on cloning Refer : Cloning in java

You may use a deep cloned copy of your object because you may need a partial result in some method which you would like to use later.

As a way to help preserve encasulation (and therefore make you code more robust), you could clone objects before returnng them from a getter. For example, a getDate method might clone a date field before returning to the caller.

What's the most elegant workaround for inability to pass by reference in Java?

I have deep nested structures, and methods like "remove(<Something>)", "contains(<Something>)" etc. rely on access to the original reference in order to remove it etc. (a copy of the reference won't do).
How have people best worked around this so they can still conveniently add, remove, test for etc. the object they want, within different arbitrary methods and constructors etc., without adding any unnecessary complexity or any unnecessary loss in performance?

Methods like remove and contains work fine with pass by value. The reason is that the even though the references are copied, they copy has the same value of the original. Just look at the Collections API. If you do something like (psuedocode)
List list = new List();
list.add(object1) // assume you have an object1 reference
and then you do
myRemove(list, object1);
both list and object 1 are passed by value, so in the myRemove method they are copies of the original reference. If in myRemove you do
list.remove(object1);
the object is still removed from the list no problem. Furthermore, since the list and object1 references in both scopes point to the same underlying objects, the list reference in the calling scope refers to the list that has the object removed.
The only time you would need pass by reference semantics is if you want to modify a reference in a method and have the modification apply in the scope that called the method.
So if you want to do
List myList = new List();
changeList(myList);
and have changeList change where myList points in the calling scope, it wont work without some trickery. The trickery is called double indirection. Basically, you create an object that has an instance of what you want to be able to access with pass by reference semantics, and you pass the container object around.
So
class MyContainer {
List list
}
now you can pass an instance of MyContainer into a method, and change the list value, and in the calling scope where the list points will be changed. Note that you are not doing anything special here, everything is still pass by value.

How have people best worked around this so...
By use of member fields (for working with references, not copies) and by use of inheritance and interfaces (for handling nested structures).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.