Good evening.
I have a rather involved question. To practice Java, I've been re-implementing some of the data structures in the standard library. Stacks, LinkedLists, Trees, etc. I just established, through a very simple example, that the java.util.Stack class performs a deep copy when either the peek() or pop() methods are used. This is understandable, since the goal would be to protect the contents of the class from outside interference. So far, in my own implementation of the Stack (a naive implementation with a simple array, linked lists will come later), I have not cared for this at all:
public class ArrayStack<T> implements Stack<T> {
private T[] data; // Will expand the array when stack is full.
private int top; // serves as both top and count indicator.
...
...
#Override
public T pop() throws EmptyStackException {
if(top == -1)
throw new EmptyStackException("Stack is empty.");
return data[top--]; // Shallow copy, dangerous!
}
Unfortunately, since a generic cannot be instantiated, I cannot assume a copy constructor and do stuff like return new T(data[top--]); I've been looking around in S.O and I've found two relevant threads which attempt to solve the problem by using some variant of clone(). This thread suggests that the class's signature be extended to:
public class ArrayStack<T extends DeepCloneableClass> implements Stack<T>
...
where DeepCloneableClass is a class that implements an interface that allows for "deep cloning" (see the top response in that thread for the relevant details). The problem with this method, of course, is that I can't really expect from standard classes such as String or Integer to be extending that custom class of mine, and, of course, all my existing jUnit tests are now complaining at compile-time, since they depend on such Stacks of Integers and Strings. So I don't feel as if this solution is viable.
This thread suggests the use of a third-party library for cloning pretty much any object. While it appears that this library is still supported (latest bug fixes date from less than a month ago), I would rather not rely on third-party tools and use whatever Java can provide for me. The reason for this is that the source code for these ADTs might be someday shared with undergraduate college students, and I would rather not have them burdened with installing extra tools.
I am therefore looking for a simple and, if possible, efficient way to maintain a generic Java data structure's inner integrity while still allowing for a simple interface to methods such as pop(), peek(), popFront(), etc.
Thanks very much for any help!
Jason
Why do you need to clone the objects?
Your stack just has a collection of references. You probably don't need to clone them, just make a new array and put the appropriate references in it, then throw away the old array.
Integer, Strings, etc are all immutable, so their contents are safe by design.
As for custom objects, while experienced Java Programmers will certain have mixed feelings about it, implementing a custom interface is certainly one way to approach the problem.
Another one is to make <T extends Serializable> (which is implemented by Integer, String, etc) and "clone" through serialization.
But if you want to teach your students the "right way" I would definitively use a third party library... You can just create a lib folder in your project and configure you build tool / IDE to add the needed jars to the Classpath using relative paths, so your undergraduate students will not have to install or setup anything.
Just for reference, this question may be very useful.
I've been teaching Java introductory classes (as an IT Instructor / not as a college Professor) using this kind of approach, and it is way less painful than it sounds.
The comments helped me understand what I had wrong. I was using the following example to "prove" to myself and others that the Java Standard Library's Collections do a deep copy when providing references to objects in the Collection:
import java.util.Stack;
public class StackTestDeepCopy {
public static void main(String[] args){
Stack<String> st = new Stack<String>();
st.push("Jim");
st.push("Jill");
String top = st.peek();
top = "Jack";
System.out.println(st);
}
}
When printing st, I saw that objects had been unchanged, and concluded that a deep copy had taken place. Wrong! Strings are immutable, and therefore the statement top = "Jack" does not in any way modify the String (not that any Object would be "modified" by a statement like that, but I wasn't thinking straight), it just makes the reference point to a new place on the heap. A new example, involving an actually mutable class, made me understand the error in my ways.
Now that this problem has been solved, I'm quite baffled by the fact that the standard library allows for this. Why is it that accessing elements in the standard library is implemented as a shallow copy? Sounds very unsafe.
java.util.Stack doesn't do a deep copy:
import java.util.Stack;
public class Test {
String foo;
public static void main(String[] args) {
Test test = new Test();
test.foo = "bar";
Stack<Test> stack = new Stack<Test>();
stack.push(test);
Test otherTest = stack.pop();
otherTest.foo = "wibble";
System.out.println("Are the same object: "+(test.foo == otherTest.foo));
}
}
Results in:
Are the same object: true
If it did do a copy then test and otherTest would point to a different object. A typical stack implementation simply returns a reference to the same object that was added onto the stack, not a copy.
You probably also want to set the array item to null before returning, otherwise the array will still hold a reference to the object.
Related
Consider this database model:
Book
isbn primary key
title
In a RDBMS, the database makes sure that two identical rows don't exist for the above model.
Similarly, in Java consider this object model:
Book
- isbn: int
- title: String
+ Book(isbn)
Let's say we are creating a Book object:
Book b = new Book(123456);
Later, in some other part of the code we are creating again an identical Book object:
Book c = new Book(123456);
Can Java make sure that no two objects exist in the JVM heap if they are identical? Just like a RDBMS does?
There's no built-in mechanism in Java that automatically does this for you. You could build something for this, but probably shouldn't. And if you do, then probably not in the way that you show in your question.
First: let's assume that these objects are immutable, so the problem is reduced to "let no two objects be constructed that have the same attributes". This is not a necessary restriction, but this way I can already demonstrate the issues with this approach.
The first issue is that it requires you to keep track of each Book instance in your program in a single central place. You can do that quite easily by having a collection that you fill when an object is constructed.
However, this basically builds a massive memory leak into your program because if nothing else hangs on to this Book object, that collection still will reference it, preventing it from being garbage collected.
You can work around that issue by using WeakReference object to hold on to your Book objects.
Next, if you want to avoid duplicates, you almost certainly want a way to fetch the "original" instance of a Book if you can't create a new one. You can't do that if you simply use the constructor, since the constructor can't "return another object", it will always create and return a new object.
So instead of new Book(12345) you want something like BookFactory.getOrCreateBook(12345). That factory can then either fetch the existing Book object with the given id or create a new one, as required.
One way to make the memory leak issue easier to handle (and also to potentially allow multiple parallel sessions each with their own set of unique Book objects) is to make the BookFactory be a BookSession: i.e. you instantiate one and it keeps tracks of its books. Now that BookSession is the "root" of all Books and if it no longer gets referenced it (and all the books it created) can potentially be garbage collected.
All of this doesn't even get into thread safety which is solvable reasonably easily for immutable objects but can get quite convoluted if you want to allow modifications while still maintaining uniqueness.
A simple BookSession could look a little like this (note that I use a record for book only for brevity of this sample code, this would leave the constructor visible. In "real" code I'd use an equivalent normal class where the constructor isn't accessible to others):
record Book(int isbn, String title) {}
class BookSession {
private final ConcurrentHashMap<Integer, Book> books = new ConcurrentHashMap<>();
public Optional<Book> get(int isbn) {
return Optional.ofNullable(books.get(isbn));
}
public Book getOrCreate(int isbn, String title) {
return books.computeIfAbsent(isbn, (i) -> new Book(i, title));
}
}
You can easily add other methods to the session (such as findByTitle or something like that).
And if you only ever want a single BookSession you could even have a public static final BookSession BOOKS somewhere, if you wanted (but at that point you have re-created the memory leak)
I do not know of a JVM internals specific way of doing this, but it is not that hard to achieve the basic goal. Joachim Sauer's answer goes into depth on why this might not be the greatest idea without some additional forethought :)
If you forego of thread safety, the code is basically just about creating a private constructor and use a factory method that keeps tab on created objects.
Pseudo Java follows
public class Book {
// potential memory leak, see Joachim Sauer's answer (WeakReference)
Map<Book> created = new Map<>();
// other internal fields follow
// can only be invoked from factory method
private Book(String isbn){ /* internals */ }
public Book get(String isbn){
if(created.has(isbn)) return created.get(isbn);
var b = new Book(isbn);
b.add(isbn, b);
return b;
}
}
Converting this to a thread safe implementation is just about adding some details * and is another question. Avoiding the potential memory leak means reading up on weak references.
i.e. locks (synchronized), mutexes, Concurrent*, Atomic*, etc
Neither of the other answers is technically correct.
They will often work, but in situations where multiple ClassLoaders are in play they will both fail.
Any object instance can ever only be unique within the context of a specific ClassLoader, thus 2 instances of the same Book can exist, even if you guard against multiples being created within a specific ClassLoader.
Usually this won't be a problem as many (especially simpler) programs will never have to deal with multiple ClassLoaders existing at the same time.
There is btw no real way to protect against this.
For example, some method has the next implementation:
void setExcludedCategories(List<Long> excludedCategories) {
if (excludedCategories.contains(1L)) {
excludedCategories.remove(1L);
}
}
And it's called in the next way:
setExcludedCategories(Array.asList(1L, 2L, 3L));
Of course, it will lead ot an exception java.lang.UnsupportedOperationException when it will try to remove item.
The question: how can I modify this code to be sure that the input parameter excludedCategories supports remove?
UPD:
Thanks for answers. Let's summarize results:
Always create new ArrayList from the input list to be sure it's mutable - a lot of useless memory would be used -> NO.
Catch the UnsupportedOperationException.
Specify in the JavaDoc that a caller mustn't pass an immutable list - anybody read the JavaDoc? When something doesn't work only :)
Don't use Arrays.asList() in a caller's code - that's an option, if you an owner of this code, but anyway you should know if this concrete method allows immutable or not (see 3).
It seems the second variant is the only way to resolve this problem.
How can I modify this code to be sure that the input parameter excludedCategories supports remove?
In the general case, you can't. Given an arbitrary class that implements the List API, you cannot tell (statically or dynamically) if the optional methods are supported.
You can use instanceof tests to check if the class of the list is known to implement the method or to not implement it. For example ArrayList and LinkedList do, but Collections.UnmodifiableList does not. The problem is that your code could encounter list classes that your tests don't cover. (Especially if it is a library that is intended to be reusable in other peoples applications.)
You could also try to test the behavior of previously unknown classes; e.g. create a test instance, try a remove to see what happens, and record the behavior in a Map<Class, Boolean>. There are two problems with this:
You may not be able to (correctly) instantiate the list class to test it.
The behavior could depend on how you instantiate the class (e.g. constructor parameters) or even on the nature of the element you are trying to remove ... though the latter is pushing the boundary of plausibility.
In fact, the only completely reliable approach is to call the method and catch the exception (if it is thrown) each and every time.
In short, you can't know. If an object implements an interface (such as List) you can't know if it will actually do what is expected for all of the methods. For instance Collections.unmodifiableList() returns a List that throws UnsupportedOperationException. It can't be filtered out via the method signature if you want to be able to get other List implementations.
The best you can do is to throw IllegalArgumentException for known subtypes that don't support what you want. And catch UnsupportedOperationException for other types of cases. But really you should javadoc your method with what is required and that it throws IllegalArgumentException in other cases.
That depends somewhat on what you're trying to do. In your posted example for example you could just catch the UnsupportedOperationException and do something else instead.
This assumes that you can assume that non-mutable containers will throw that on every attempt to modify the container and will do so without side effects (that is they are indeed non-mutable).
In other cases where your code has other side effects than trying to modify the container you will have to make sure these doesn't happen before knowing that you can modify the container.
You can catch the exception in an utility class like in the example below (as others mentioned). Bad thing is you have to do insert/delete to test if there will be exception. You can not use instanceof since all Collections.Unmodifiablexxx classes have default access.
CollectionUtils:
import java.util.List;
public class CollectionUtils {
public <T> boolean isUnmodifiableList(List<T> listToCheck) {
T object = listToCheck.get(0);
try {
listToCheck.remove(object);
} catch (UnsupportedOperationException unsupportedOperationException) {
return true;
}
listToCheck.add(0, object);
return false;
}
}
Main:
import java.util.Arrays;
import java.util.List;
public class Main {
private static final CollectionUtils COLLECTION_UTILS = new CollectionUtils();
public static void main(String[] args) {
setExcludedCategories(Arrays.asList(1L, 2L, 3L));
}
private static void setExcludedCategories(List<Long> excludedCategories) {
if (excludedCategories.contains(1L)) {
if(!COLLECTION_UTILS.<Long>isUnmodifiableList(excludedCategories)){
excludedCategories.remove(1L);
}
}
}
}
Arrays.asList(T... a) returns the List<java.util.Arrays.ArrayList<E>> which is an immutable list. To get your code working just wrap the result with java.util.ArrayList<T> like shown below
setExcludedCategories(new ArrayList<Long>(Arrays.asList(1L, 2L, 3L)));
Always create new ArrayList from the input list to be sure it's mutable - a lot of useless memory would be used -> NO.
Thats actually the preferred way to do things. "A lot of useless memory" isn't a lot in most practical situations, certainly not in your cited exampled.
And ignoring that, its the only robust and inutitively understood idiom.
The only workable alternative would be to explicitly change the name of your method (thus communicating its behavior better), form the example you show, name it "removeExcludedCategories" if its meant to modify the argument list (but not an objects state).
Otherwise if it is meant as a bulk-setter, you're out of luck, there is no commonly recognized naming idiom that clearly communicates that the argument collection is directly incorporated into the state of an object (its dangerous also because the objects state can then be altered without the object knowing about it).
Also, only marginally related, I would design not an exclusion list, but an exclusion set. Sets are conceptually better suited (no duplicates) and there are set implementations that have far better runtime complexity for the most commonly asked question: contains().
I'm fairly new to Java so my knowledge is pretty limited. I'm working on a personal project where I'm trying out some of the techniques used in Guava for creating views/transformations of collections. I made a class called View to take an inputted collection as the backing iterable, and a transformation, and then present it as a read-only iterable. (not a collection, though I don't think it makes much of a difference for this question). Here is a quick example of using it...
public class Node {
public enum Change implements Function<Node, Coordinate> {
TO_COORDINATE;
#Override public Coordinate apply(Node node) {
return new Coordinate(node);
}
}
private HashSet<Node> neighborNodes = new HashSet<Node>();
//various other members
public View<Coordinate> viewNeighborCoordinates() {
return new View<Coordinate>(neighborNodes, Change.TO_COORDINATE);
}
}
now if some method wants to use viewNeighborCoordinates() of this node, and then later some other method also wants to viewNeighborCoordinates() of this node, it seems wasteful to always be returning new objects, right? I mean any number of things should be able to share reference to a view of the same backing iterable with the same transformation, since all they're doing is reading through it. Is there an established way of managing a shared pool of objects which can be "interned" like Strings are? Is it just having to make some sort of ViewFactory that stores a running list of views in use, and everytime someone wants a view, it checks to see if it already has that view and hands it out? (is that even more efficient)?
As already stated, interning is possible (look at Interners), but most probably a bad idea.
Another possibility is lazy initialization of a field storing the View. Since I'm lazy as well, I only point you to a Lombok implementation. Be careful with DCL, if you want to try this. In case your class is immutable, you may need no synchronization at all, like e.g. String.hashCode.
A very simple possibility is eager initialization of a field. Assuming you need the view often, it's the best way.
But without knowing more, your current implementation is best. Beware the root of all evil.
Don't optimize without profiling or benchmarking (and if you benchmark, then do it right, i.e., using caliper or jmh. Home-baked benchmarking in Java just doesn't work).
As its a pain to handle structural changes of the class in two places I often do:
class A {
class C{}
class B{}
private B bChild;
private C cChild;
private Object[] structure() {
return new Object[]{bChild, cChild};
}
public int hashCode() {
Arrays.hashCode(structure());
}
public boolean equals(Object that) {
//type check here
return Arrays.equals(this.structure(), ((A)that).structure());
}
}
What's bad about this approach besides boxing of primitives?
Can it be improved?
It's a clever way to reuse library methods, which is generally a good idea; but it does a great deal of excess allocation and array manipulation, which might be terribly inefficient in such frequently used methods. All in all, I'd say its cute, but it wouldn't pass a review.
In JDK 7 they added the java.util.Objects class. It actually implements a hash and equals utility in a manner that reminds what you wrote. The point being that this approach is actually sanctioned by JDK developers. Ernest Friedman-Hill has a point but in the majority of cases I don't think that the extra few machine instructions are worth saving at the expense of readability.
For example: the hash utility method is implemented as:
public static int hash(Object... values) {
return Arrays.hashCode(values);
}
Someone familiarizing themselves with the code will have a bit more difficulty seeing what's going on. It's less "obvious" than listing the individual fields, as demonstrated by my previously erroneous answer. It is true, that "equals" is generally implemented with an "Object" passed in, so it's debatable, but the input is cast after the reference equality check. That is not the case here.
One improvement might be to store the array as a private data member rather than create it with the structure method, sacrificing a bit of memory to avoid the boxing.
I have code that looks like this:
public class Polynomial {
List<Term> term = new LinkedList<Term>();
and it seems that whenever I do something like term.add(anotherTerm), with anotherTerm being... another Term object, it seems anotherTerm is referencing the same thing as what I've just inserted into term so that whenever I try to change anotherTerm, term.get(2) (let's say) get's changed too.
How can I prevent this from happening?
Since code was requested:
//since I was lazy and didn't want to go through the extra step of Polynomial.term.add
public void insert(Term inserting) {
term.add(inserting);
}
Code calling the insert method:
poly.insert(anotherTerm);
Code creating the anotherTerm Term:
Term anotherTerm = new Term(3, 7.6); //sets coefficient and power to 3 and 7.6
New code calling the insert method:
poly.insert((Term)anotherTerm.clone());
Which unfortunately still doesn't work due to clone() has protected access in java.lang.Object, even after doing public class Term implements Cloneable{
The solution is simple: make Term immutable.
Effective Java 2nd Edition, Item 15: Minimize mutability:
Immutable objects are simple.
Immutable objects can be shared freely.
Immutable objects make great building blocks for other objects.
Classes should be immutable unless there's a very good reason to make them mutable.
If a class cannot be made immutable, limit its mutability as much as possible.
Make every field final unless there is a compelling reason to make it non-final
Something as simple and small as Term really should be made immutable. It's a much better overall design, and you wouldn't have to worry about things like you were asking in your question.
See also
What is meant by immutable?
This advice becomes even more compelling since the other answers are suggesting that you use clone().
Effective Java 2nd Edition, Item 11: Override clone judiciously
Because of the many shortcomings, some expert programmers simply choose to never override the clone method and never invoke it except, perhaps, to copy arrays.
From an interview with author Josh Bloch:
If you've read the item about cloning in my book, especially if you read between the lines, you will know that I think clone is deeply broken.
DO NOT make Term implements Cloneable. Make it immutable instead.
See also
How to properly override clone method?
Why people are so afraid of using clone() (on collection and JDK classes) ?
OK, replacing my old answer with this, now that I understand the question and behavior better.
You can do this if you like:
public void insertTerm(Term term) {
polynomial.insert(new Term(term));
}
and then create a new Term constructor like this:
public Term(Term term) {
this.coefficient = term.coefficient;
this.exponent = term.exponent;
}
That should work.
EDIT: Ok, I think I see what it is you're doing now. If you have this class:
public class Polynomial
{
List<Term> term = new LinkedList<Term>();
public void insert(Term inserting)
{
term.add(inserting);
}
}
And then you do this:
Polynomal poly = new Polynomal()
Term term = new Term();
poly.insert(term);
term.coefficient = 4;
...then the object term is the same object as poly.get(0). "term" and "poly.get(0)" are both references to the same object - changing one will change the other.
Question is no so clear, but i just try , when you are adding the objects , add anotherTerm.clone()
It sounds like you are not instantiating new Objects, just referencing the same one. You should instantiate a new Term, either with Term term = new Term(); or by cloning term.clone().
EDIT to be able to be cloned, Term need to implement the Cloneable interface. That means that you are responsible for how the new copy of a Term should be defined.
Hard to tell without seeing the code that calls the insert method, but sounds like that is the problem.