When using Guava's a ImmutableCollection as a parameter for a function is it better to require an ImmutableCollection as parameter type:
void <T> foo(ImmutableCollection<T> l)
or should the function take a Collection<T> and create an immutable collection itself as in
void <T> foo(Collection<T> l)
{
ImmutableCollection<T> l2 = ImmutableCollection.copyOf(l);
// ...
}
The first version seems preferable because the caller is sure that the map he passes to the function is not modified by it. But the first version requires client code with a collection to call copyOf(), that is:
Collection collection = map.values();
foo(ImmutableCollection.copyOf(collection));
// instead of simply
foo(collection);
PS: This is not completely true, since ImmutableCollection does not have copyOf() but ImmutableList and ImmutableSet do.
I think that it depends on what the foo function is supposed to do with the collection argument.
If foo is going to read the collection elements, then void <T> foo(Collection<T> l) is preferable, because it leaves the decision to the caller.
If foo is going to incorporate the collection into the state of some object, then an immutable collection may be preferable. However, we then need to ask ourselves whether it should be the foo method's responsibility to deal with this, or the caller's responsibility.
There isn't a single right (or "best practice") answer to this. However, using ImmutableCollection as the parameter's formal type could result in complexity and/or unnecessary copying in some cases.
Look at the guava docs: "copyOf is smarter than you think."
So you can use the generic Collection interface with no regrets for performance.
Whether the copy is necessary (rather than a function comment) depends, in my view, on how long you're holding on to the data.
Use the more generic Collection interface; it's much better for you to write the call once than to require all of the clients to do so on every call. If you're really concerned about performance, and profiling shows it to be an issue, you could do a class check on the incoming collection to see whether you can avoid the copy.
It depends on what foo is doing. Most probably it simply reads the collection values in which case it does not need to make a copy of it, especially immutable one.
The one advantage of using ImmutableCollection is that the method guarantees that it will not modify the collection. But then that guarantee is to the user only, the platform does not understand it, so you might as well express it in comments or a custom annotation.
Related
I'm writing a library method that will be used in several places. One of the method's parameters is a collection of objects, and the method does not mutate this collection. Should the method signature specify a mutable or immutable collection?
Option 1: mutable collection as parameter
public static void foo(List<Bar> list) {
// ...
}
Pros: Clients can pass in whichever of List<Bar> or ImmutableList<Bar> is more convenient for them.
Cons: It is not immediately obvious that the list parameter will not be mutated. Clients must read documentation and/or code to realize this. Clients may make unnecessary defensive copies anyway.
Option 2: immutable collection as parameter
public static void foo(ImmutableList<Bar> list) {
// ...
}
Pros: Clients have a guarantee that the list parameter will not be mutated.
Cons: If the client has a List<Bar>, they must first convert it to an ImmutableList<Bar> before calling foo. This conversion wastes a small amount of time, and it is forced on clients whether they like it or not.
Note: For the purposes of this question, let's assume that all clients will have Guava's ImmutableList available, for example because the library and client code all belong to the same codebase that already uses ImmutableList elsewhere.
As the creator of the foo() API, this is none of your business. You take a list and you don't modify it, and that's it. Your code, in fact, doesn't care about the list's mutability (it's a concern of the caller): so document your intent and stop there.
If the caller needs to guarantee that the list will not be tampered with, they would create defensive copies not because you don't promise to leave the list unchanged, but because they need that guarantee.
It's following the same logic that we perform null checks in method implementations: it's needed because our code needs to be robust, not because the caller can send a null argument.
In other words, document your method as you intend to implement them, and leave it up to the caller to pick the list implementation. The reasons of their choices will vary (i.e., it won't always be only whether you'll modify the list).
Leave it at List.
Here are some situations to consider:
Collections.synchronizedCollection does not modify the client's collection.
If it forced clients to input an immutable collection, there'd be no use in making it a synchronized collection, since an immutable collection would already be thread safe.
Collections.frequency does not modify the client's collection.
To check frequency, users would be forced to endure the excess overhead of transferring elements to a new collection.
These reasons aren't why the JDK doesn't expose an immutable interface. Those reasons are explained in the documentation. In the case of synchronizedCollection, although it doesn't modify the client's collection, it does return a modifiable view of the client's collection; some would say this function wouldn't apply here. However, frequency and other query functions still hold strong.
You shouldn't restrict clients for the purpose of trying to advertise safety and nothing more. There should be more justification, otherwise your attempt to help could be a burden on the client. In some cases, your goal can be contradictive to what the function/system is achieving, such as with the synchronizedCollection example.
It's good to encourage safety, but having your system force it onto your clients without the system benefiting from it would be abuse of power; that's not your decision to make.
ernest_k makes a really good point in his answer, suggesting that you need to analyze what your system should be in charge of, and what the client should be in charge of. In this case, it should be up to the client whether the collection is immutable or not, since your system doesn't care about mutability. As he put it, "it's none of your business", which I agree with.
The list returned by the method List.of(E... elements) in java does return an immutable list, but this is not visible at all by looking at the created list. The created list simply throws an Exception instead of not showing the possiblity to change the list at all.
My point is, that List.of(E... elements) should return a ImmutableList that extends List. This way the user can decide if he cares to show this fact of immutability or not.
But I don't find anybody complaining or showing alternative solutions. Even Guava and Apache Commons don't do this by default. Only Guava gives the possibilty to create it (albeit with a lot of code):
List<String> list = new ArrayList<String>(Arrays.asList("one", "two", "three"));
ImmutableList<String> unmodifiableList = ImmutableList.<String>builder().addAll(list).build();
But even this class has a (deprecated) add and remove method.
Can anyone tell me why nobody cares about this (seemingly fundamental) issue?
It's not that nobody cares; it's that this is a problem of considerable subtlety.
The original reason there isn't a family of "immutable" collection interfaces is because of a concern about interface proliferation. There could potentially be interfaces not only for immutability, but synchronized and runtime type-checked collections, and also collections that can have elements set but not added or removed (e.g., Arrays.asList) or collections from which elements can be removed but not added (e.g., Map.keySet).
But it could also be argued that immutability is so important that it should be special-cased, and that there be support in the type hierarchy for it even if there isn't support for all those other characteristics. Fair enough.
The initial suggestion is to have an ImmutableList interface extend List, as
ImmutableList <: List <: Collection
(Where <: means "is a subtype of".)
This can certainly be done, but then ImmutableList would inherit all of the methods from List, including all the mutator methods. Something would have to be done with them; a sub-interface can't "disinherit" methods from a super-interface. The best that could be done is to specify that these methods throw an exception, provide default implementations that do so, and perhaps mark the methods as deprecated so that programmers get a warning at compile time.
This works, but it doesn't help much. An implementation of such an interface cannot be guaranteed to be immutable at all. A malicious or buggy implementation could override the mutator methods, or it could simply add more methods that mutate the state. Any programs that used ImmutableList couldn't make any assumptions that the list was, in fact, immutable.
A variation on this is to make ImmutableList be a class instead of an interface, to define its mutator methods to throw exceptions, to make them final, and to provide no public constructors, in order to restrict implementations. In fact, this is exactly what Guava's ImmutableList has done. If you trust the Guava developers (I think they're pretty reputable) then if you have a Guava ImmutableList instance, you're assured that it is in fact immutable. For example, you could store it in a field with the knowledge that it won't change out from under you unexpectedly. But this also means that you can't add another ImmutableList implementation, at least not without modifying Guava.
A problem that isn't solved by this approach is the "scrubbing" of immutability by upcasting. A lot of existing APIs define methods with parameters of type Collection or Iterable. If you were to pass an ImmutableList to such a method, it would lose the type information indicating that the list is immutable. To benefit from this, you'd have to add immutable-flavored overloads everywhere. Or, you could add instanceof checks everywhere. Both are pretty messy.
(Note that the JDK's List.copyOf sidesteps this problem. Even though there are no immutable types, it checks the implementation before making a copy, and avoids making copies unnecessarily. Thus, callers can use List.copyOf to make defensive copies with impunity.)
As an alternative, one might argue that we don't want ImmutableList to be a sub-interface of List, we want it to be a super-interface:
List <: ImmutableList
This way, instead of ImmutableList having to specify that all those mutator methods throw exceptions, they wouldn't be present in the interface at all. This is nice, except that this model is completely wrong. Since ArrayList is a List, that means ArrayList is also an ImmutableList, which is clearly nonsensical. The problem is that "immutable" implies a restriction on subtypes, which can't be done in an inheritance hierarchy. Instead, it would need to be renamed to allow capabilities to be added as one goes down the hierarchy, for example,
List <: ReadableList
which is more accurate. However, ReadableList is altogether a different thing from an ImmutableList.
Finally, there are a bunch of semantic issues that we haven't considered. One concerns immutability vs. unmodifiability. Java has APIs that support unmodifiability, for example:
List<String> alist = new ArrayList<>(...);
??? ulist = Collections.unmodifiableList(alist);
What should the type of ulist be? It's not immutable, since it will change if somebody changes the backing list alist. Now consider:
???<String[]> arlist = List.of(new String[] { ... }, new String[] { ... });
What should the type be? It's certainly not immutable, as it contains arrays, and arrays are always mutable. Thus it's not at all clear that it would be reasonable to say that List.of returns something immutable.
Removing add, remove, etc. from all the Collection types and creating subinterfaces MutableCollection, MutableList, MutableSet would double the number of Collection interfaces, which is a complexity cost to be considered. Furthermore, Collections aren't cleanly separated into Mutable and Immutable: Arrays.asList supports set, but not add.
Ultimately there's a tradeoff to be made about how much to capture in the type system and how much to enforce at runtime. Reasonable people can disagree as to where to draw the line.
I would say that since commonly collections tend to (or at least should) be treated as "immutable by default" (meaning you're rarely modifying collections that you didn't create), it's not very important to specify that "this is immutable". It would be more useful to specify "you can safely modify this collection if you wish".
Secondly, your suggested approach wouldn't work. You can't extend List and hide methods, so the only option would be to make it return an ImmutableList that's not a subtype of List. That would make it useless, as it would require a new ImmutableList interface, and any existing code wouldn't be able to use it.
So is this optimal design? No, not really, but for backwards compatibility that's not going to change.
Say you are adding x number of objects to a collection, and after or before adding them to a collection you are modifying the objects attributes. When would you add the element to the collection before or after the object has been modified.
Option A)
public static void addToCollection(List<MyObject> objects) {
MyObject newObject = new MyObject();
objects.add(newObject);
newObject.setMyAttr("ok");
}
Option B)
public static void addToCollection(List<MyObject> objects) {
MyObject newObject = new MyObject();
newObject.setMyAttr("ok");
objects.add(newObject);
}
To be on the safe side, you should modify before adding, unless there is a specific reason you cannot do this, and you know the collection can handle the modification. The example can reasonably be assumed to be safe, since the general List contract does not depend upon object attributes - but that says nothing about specific implementations, which may have additional behavior that depends upon the object's value.
TreeSet, and Maps in general do no tolerate modifying objects after they have been inserted, because the structure of the collection is dependent upon the attributes of the object. For trees, any attributes used by the comparator cannot be changed once the item has been added. For maps, it's the hashCode that must remain constant.
So, in general, modify first, and then add. This becomes even more important with concurrent collections, since adding first can lead to other collection users seeing an object before it been assigned it's final state.
The example you provided won't have any issues because you're using a List collection which doesn't care about the Object contents.
If you were using something like TreeMap which internally sorts the contents of the Object keys it stores it could cause the Collection to get into an unexpected state. Again this depends on if the equals method uses the attribute you're changing to compare.
The safest way is to modify the object before placing it into the collection.
One of the good design rules to follow, is not to expose half-constructed object to a 3rd party subsystem.
So, according to this rule, initialize your object to the best of your abilities and then add it to the list.
If objects is an ArrayList then the net result is probably the same, however imaging if objects is a special flavor of List that fires some kind of notification event every time a new object is added to it, then the order will matter greatly.
In my opinion its depend of the settted attribure and tyle of collection, if the collection is a Set and the attribute have infulance on the method equal or hascode then definitely i will set this property before this refer also to sorterd list etc. in other cases this is irrelevant. But for this exapmle where object is created i will first set the atributes than add to collection because the code is better organized.
I think either way it's the same, personally I like B, :)
It really does boil down to what the situation requires. Functionally there's no difference.
One thing you should be careful with, is being sure you have the correct handle to the object you want to modify.
Certainly in this instance, modifying the object is part of the "create the object" thought, and so should be grouped with the constructor as such. After you "create the object" you "add it to the collection". Thus, I would do B, and maybe even add a blank line after the modification to give more emphasis on the two separate thoughts.
The possible answers are either "never" or "it depends".
Personally, I would say, it depends.
Following usage would make a collection appear (to me) to be a flyweight:
public final static List<Integer> SOME_LIST =
Collections.unmodifiableList(
new LinkedList<Integer>(){ // scope begins
{
add(1);
add(2);
add(3);
}
} // scope ends
);
Right? You can't ever change it, because the only place where the
"original" collection object is known (which could be changed), is the
scope inside unmodifiableList's parameter list, which ends immediately.
Second thing is: when you retrieve an element from the list, it's an
Integer which itself is a flyweight.
Other obvious cases where final static and unmodifiableList are
not used, would not be considered as flyweights.
Did I miss something?
Do I have to consider some internal aspects of LinkedList which could
compromise the flyweight?
i think you are referring to the flyweight pattern. the fundamental idea of this pattern is that you are dealing with complex objects whose instances can be reused, and put out different representations with its methods.
to make such a object work correctly it should be immutable.
immutability is clearly given when creating a List the way you described.
but since there is no external object/parameters on which the SOME_LISt operates on i would not call this an example of a flyweight pattern.
another typical property of the flyweight pattern is the "interning" of such objects. when creating just a single instance of an object this does not make sense.
if you are dealing a lot with lists that are passed around from one object to another and you want to ensure the Immutability, a better option might be to use Google-Collections.
final static ImmutableList<Integer> someList = ImmutableList.of(1, 2, 3);
of course it is also possible to construct more complex Immutable Objects with Builders.
this creates an instance of an immutable list. it will still implement the List interface, but will refuse to execute any add(),addAll() set(), remove() operation.
so you can still pass it to methods when a List interface is required, yet be sure that its content is not altered.
I think your example are for immutable objects, a flyweight is something quite different. Immutable objects are candidates for flyweight, but a flyweight doesn't have to be immutable, it just has to be designed to save memory.
Having the library detect that the mutable List has not otherwise escaped is a bit of an ask, although theoretically possible.
If you serialise the returned object, then trusted code could view the internal object. Although the serialised form of the class are documented, it's not documented that the method uses those classes.
In practical terms, any cache is down to the user of the API.
(Why LinkedList for an immutable list, btw? Other than it changes the unmodifiable implementation.)
Integer is only a flyweight from -128 to 127.
See also http://www.javaworld.com/javaworld/jw-07-2003/jw-0725-designpatterns.html.
I usually always find it sufficient to use the concrete classes for the interfaces listed in the title. Usually when I use other types (such as LinkedList or TreeSet), the reason is for functionality and not performance - for example, a LinkedList for a queue.
I do sometimes construct ArrayList with an initial capcacity more than the default of 10 and a HashMap with more than the default buckets of 16, but I usually (especially for business CRUD) never see myself thinking "hmmm...should I use a LinkedList instead ArrayList if I am just going to insert and iterate through the whole List?"
I am just wondering what everyone else here uses (and why) and what type of applications they develop.
Those are definitely my default, although often a LinkedList would in fact be the better choice for lists, as the vast majority of lists seem to just iterate in order, or get converted to an array via Arrays.asList anyway.
But in terms of keeping consistent maintainable code, it makes sense to standardize on those and use alternatives for a reason, that way when someone reads the code and sees an alternative, they immediately start thinking that the code is doing something special.
I always type the parameters and variables as Collection, Map and List unless I have a special reason to refer to the sub type, that way switching is one line of code when you need it.
I could see explicitly requiring an ArrayList sometimes if you need the random access, but in practice that really doesn't happen.
For some kind of lists (e.g. listeners) it makes sense to use a CopyOnWriteArrayList instead of a normal ArrayList. For almost everything else the basic implementations you mentioned are sufficient.
Yep, I use those as defaults. I generally have a rule that on public class methods, I always return the interface type (ie. Map, Set, List, etc.), since other classes (usually) don't need to know what the specific concrete class is. Inside class methods, I'll use the concrete type only if I need access to any extra methods it may have (or if it makes understanding the code easier), otherwise the interface is used.
It's good to be pretty flexible with any rules you do use, though, as a dependancy on concrete class visibility is something that can change over time (especially as your code gets more complex).
Indeed, always use base interfaces Collection, List, Map instead their implementations. To make thinkgs even more flexible you could hide your implementations behind static factory methods, which allow you to switch to a different implementation in case you find something better(I doubt there will be big changes in this field, but you never know). Another benefit is that the syntax is shorter thanks to generics.
Map<String, LongObjectClasName> map = CollectionUtils.newMap();
instead of
Map<String, LongObjectClasName> map = new HashMap<String, LongObjectClasName>();
public class CollectionUtils {
.....
public <T> List<T> newList() {
return new ArrayList<T>();
}
public <T> List<T> newList(int initialCapacity) {
return new ArrayList<T>(initialCapacity);
}
public <T> List<T> newSynchronizedList() {
return new Vector<T>();
}
public <T> List<T> newConcurrentList() {
return new CopyOnWriteArrayList<T>();
}
public <T> List<T> newSynchronizedList(int initialCapacity) {
return new Vector<T>(initialCapacity);
}
...
}
Having just come out of a class about data structure performance, I'll usually look at the kind of algorithm I'm developing or the purpose of the structure before I choose an implementation.
For example, if I'm building a list that has a lot of random accesses into it, I'll use an ArrayList because its random access performance is good, but if I'm inserting things into the list a lot, I might choose a LinkedList instead. (I know modern implementations remove a lot of performance barriers, but this was the first example that came to mind.)
You might want to look at some of the Wikipedia pages for data structures (especially those dealing with sorting algorithms, where performance is especially important) for more information about performance, and the article about Big O notation for a general discussion of measuring the performance of various functions on data structures.
I don't really have a "default", though I suppose I use the implementations listed in the question more often than not. I think about what would be appropriate for whatever particular problem I'm working on, and use it. I don't just blindly default to using ArrayList, I put in 30 seconds of thought along the lines of "well, I'm going to be doing a lot of iterating and removing elements in the middle of this list so I should use a LinkedList".
And I almost always use the interface type for my reference, rather than the implementation. Remember that List is not the only interface that LinkedList implements. I see this a lot:
LinkedList<Item> queue = new LinkedList<Item>();
when what the programmer meant was:
Queue<Item> queue = new LinkedList<Item>();
I also use the Iterable interface a fair amount.
If you are using LinkedList for a queue, you might consider using the Deque interface and ArrayDeque implementing class (introduced in Java 6) instead. To quote the Javadoc for ArrayDeque:
This class is likely to be faster than
Stack when used as a stack, and faster
than LinkedList when used as a queue.
I tend to use one of *Queue classes for queues. However LinkedList is a good choice if you don't need thread safety.
Using the interface type (List, Map) instead of the implementation type (ArrayList, HashMap) is irrelevant within methods - it's mainly important in public APIs, i.e. method signatures (and "public" doesn't necessarily mean "intended to be published outside your team).
When a method takes an ArrayList as a parameter, and you have something else, you're screwed and have to copy your data pointlessly. If the parameter type is List, callers are much more flexible and can, e.g. use Collections.EMPTY_LIST or Collections.singletonList().
I too typically use ArrayList, but I will use TreeSet or HashSet depending on the circumstances. When writing tests, however, Arrays.asList and Collections.singletonList are also frequently used. I've mostly been writing thread-local code, but I could also see using the various concurrent classes as well.
Also, there were times I used ArrayList when what I really wanted was a LinkedHashSet (before it was available).