Avoiding semantic coupling with Java Collection interfaces - java

These days I am reading the Code Complete book and I've passed the part about coupling-levels (simple-data-parameter, simple-object, object-parameter and semantic coupling). Semantic being the "worst" kind:
The most insidious kind of coupling occurs when one module
makes use not of some syntactic element ofanother module but of some semantic
knowledge of another module’s inner workings.
The examples in the book usually lead to run-time failures, and are typical bad code, but today I had a situation that I'm really not sure how to treat.
First I had a class, let's call it Provider fetching some data and returning a List of Foo's.
class Provider
{
public List<Foo> getFoos()
{
ArrayList<Foo> foos = createFoos();//some processing here where I create the objects
return foos;
}
}
A consuming class executes an algorithm, processing the Foo's, merging or removing from the List based on some attributes, not really important. The algorithm does all of it's processing with the head of the list. So there is a lot of operations reading/removing/adding to the head.
(I just realized I could have made the algorithm looking like merge sort, recursively calling it on halves of an array, but that doesn't answer my question :) )
I noticed I'm returning an ArrayList so I changed the providing class' getFoos method to return an LinkedList. The ArrayList has O(n) complexity with head removals, while LinkedList has constant complexity. But it then struck me that I am possibly making a semantic dependency. The code will certainly work with both implementations of List, there are no side-effects, but the performance will also be degraded. And I wrote both classes so I can easily understand, but what if a colleague had to do implement the algorithm, or if he uses the Provider as a source for another algorithm which favors random access. If he doesn't bother with the internals, like he should not, I would mess up his algorithm performance.
Declaring the method to return LinkedList is also possible, but what about the "program to interface" principle?
Is there any way to handle this situation, or my design has flaws in the roots?

The general problem is, how does a producer return something in the form that the consumer prefers? Usually the consumer needs to include the preference in the request. For example
as a flag - getFoos(randomOrLinked)
different methods - getFoosAsArrayList(), getFoosAsLinkedList()
pass a function that creates desired List - getFoos(ArrayList::new)
or pass a desired output List - getFoos(new ArrayList())
But, the producer may have the right to say, this is too complicated for me, I don't care. I'll return a form that's suitable for most use cases, and the consumer needs to handle it properly. If I think ArrayList is best, I'll just do it. (Actually you may have a better choice - a ring structure - that suits both of the two use cases in consideration)
Of course, it should be well documented. Or you could be honest and return ArrayList as the method signature, as long as you commit to it. Don't worry too much about "interface" - ArrayList is an interface (in the general sense), Iterable is an interface, so what's so special about the List interface that's between the two.
There can be another criticism on your design - you return a mutable data structure so that the consumer can directly modify. That is less desirable than return a read-only data structure. If you could, you should return a read-only view of the underlying data; the construction of the view should be inexpensive. The consumer needs to do its own copy if it needs a mutable one.

You need to make a compromise somewhere. In this case, there are two compromises that make sense to me:
If you don't know that all consumers of your Provider will be performing operations that are appropriate to a LinkedList, then stick with the signature that has return type List with implementation that returns ArrayList (ArrayList is a good all-around List implementation). Then within the calling method, wrap the returned List in a LinkedList: LinkedList<Foo> fooList = new LinkedList<>(provider.getFoos()); Make the caller responsible for its own optimizations.
If you know that all consumers of your Provider are going to use it in a LinkedList-appropriate way, then just change the return type to LinkedList -- or add another method that returns a LinkedList.
I strongly prefer the former, but both make sense.

You could have the caller create the LinkedList, instead of the provider - i.e. change
List<Foo> foos = provider.getFoos();
to
List<Foo> foos = new LinkedList<>(provider.getFoos());
Then it doesn't matter what kind of list the provider returns. The downside is that an ArrayList is still created, so there is a tradeoff between efficiency and cleanliness.

Related

Understanding "Coding to an Interface" to hide implementation details [duplicate]

This question already has answers here:
What does it mean to "program to an interface"?
(33 answers)
Closed 1 year ago.
I'm learning about the "coding to an interface to hide implementation details" design principle, and I'm confused over this idea. In my example, I have a WorkoutRoutine class that has a List of RoutineStep. By following the coding to an interface principle, my getWorkoutRoutine() method will return the List interface so that the implementation details of which list is used is hidden from the client.
What if I decide to change the implementation from a List of RoutineStep to an array of RoutineStep, won't disclosing the return type as List reveal to the client that a list was implemented and not an array, tree, graph, or any other data structure? How can I best encapsulate my data structure for holding a collection of RoutineStep, such that I can change the implementation of the data structure without the client code breaking if the List was changed to an array, tree, graph, etc?
public class WorkoutRoutine {
private List<RoutineStep> workoutRoutine;
public List<RoutineStep> getWorkoutRoutine() {
//What if I later decide to change the data structure from a List to an array, tree,
//graph, set, map. What approach should I take so that the client code doesn't break
//as they would have already coded to receive a List but after changing the
//implementation from a list to an array or any other data structure, their code would
//break.
}
}
The idea is to return a type that is as generic as possible but specific enough.
For example returning a LinkedList may be too specific - if the client starts using the getFirstmethod and you later on decide to return an ArrayList for performance reasons, the client code will break. Hence the principle of returning a more generic type, such as a List.
You could even go more generic and return a Collection - it may make sense if you don't think the client code will need to access the n-th position in the collection for example. But if you feel that accessing the n-th step of the routine (routine.get(n-1)) without having to iterate is an important feature, it means that Collection is too generic for your use case.
decide to change the implementation from a List of RoutineStep to an array of RoutineStep
The ArrayList is just what you would choose – a List implementation backed by an array structure.
Later, you may find that your app is more often inserting elements into the middle of the list rather than appending to the end. So you decide to switch your choice of List implementation from ArrayList to LinkedList. By having your method return an object of the more general interface List rather than the more specific concrete classes of ArrayList and LinkedList, your change from one class to the other does not break calling code.
By the way, we generally do not use array in Java where we expect to need the features and flexibility of a List or Set from the Java Collections Framework. We generally use arrays only where we need to conserve RAM because of deploying to constrained devices or we need maximum performance. Arrays were also used for their more convenient compact literals, but the new List.of and Set.of methods fill that need.
without the client code breaking if the List was changed to an array, tree, graph, etc?
If you are making such a massive change to the data structures of your app, then no amount of encapsulation will mask that change. At some point, your design changes may indeed break existing code.
Such breaking changes may be a natural part of the early stages in an emergent design. This is a normal part of our work. Later, in an evolved design, proper use of encapsulation with help protect against smaller changes having wider impact than is necessary, will make your codebase less brittle.

Preferring mutable or immutable collections as method parameters

I'm writing a library method that will be used in several places. One of the method's parameters is a collection of objects, and the method does not mutate this collection. Should the method signature specify a mutable or immutable collection?
Option 1: mutable collection as parameter
public static void foo(List<Bar> list) {
// ...
}
Pros: Clients can pass in whichever of List<Bar> or ImmutableList<Bar> is more convenient for them.
Cons: It is not immediately obvious that the list parameter will not be mutated. Clients must read documentation and/or code to realize this. Clients may make unnecessary defensive copies anyway.
Option 2: immutable collection as parameter
public static void foo(ImmutableList<Bar> list) {
// ...
}
Pros: Clients have a guarantee that the list parameter will not be mutated.
Cons: If the client has a List<Bar>, they must first convert it to an ImmutableList<Bar> before calling foo. This conversion wastes a small amount of time, and it is forced on clients whether they like it or not.
Note: For the purposes of this question, let's assume that all clients will have Guava's ImmutableList available, for example because the library and client code all belong to the same codebase that already uses ImmutableList elsewhere.
As the creator of the foo() API, this is none of your business. You take a list and you don't modify it, and that's it. Your code, in fact, doesn't care about the list's mutability (it's a concern of the caller): so document your intent and stop there.
If the caller needs to guarantee that the list will not be tampered with, they would create defensive copies not because you don't promise to leave the list unchanged, but because they need that guarantee.
It's following the same logic that we perform null checks in method implementations: it's needed because our code needs to be robust, not because the caller can send a null argument.
In other words, document your method as you intend to implement them, and leave it up to the caller to pick the list implementation. The reasons of their choices will vary (i.e., it won't always be only whether you'll modify the list).
Leave it at List.
Here are some situations to consider:
Collections.synchronizedCollection does not modify the client's collection.
If it forced clients to input an immutable collection, there'd be no use in making it a synchronized collection, since an immutable collection would already be thread safe.
Collections.frequency does not modify the client's collection.
To check frequency, users would be forced to endure the excess overhead of transferring elements to a new collection.
These reasons aren't why the JDK doesn't expose an immutable interface. Those reasons are explained in the documentation. In the case of synchronizedCollection, although it doesn't modify the client's collection, it does return a modifiable view of the client's collection; some would say this function wouldn't apply here. However, frequency and other query functions still hold strong.
You shouldn't restrict clients for the purpose of trying to advertise safety and nothing more. There should be more justification, otherwise your attempt to help could be a burden on the client. In some cases, your goal can be contradictive to what the function/system is achieving, such as with the synchronizedCollection example.
It's good to encourage safety, but having your system force it onto your clients without the system benefiting from it would be abuse of power; that's not your decision to make.
ernest_k makes a really good point in his answer, suggesting that you need to analyze what your system should be in charge of, and what the client should be in charge of. In this case, it should be up to the client whether the collection is immutable or not, since your system doesn't care about mutability. As he put it, "it's none of your business", which I agree with.

Why is there no "List.reverse()" method in Java?

In Java, to reverse elements in a List, I need to use:
Collections.reverse(list)
I was just wondering why Java doesn't implement the reverse method within the List interface so that I could do the in-place reverse like this:
list.reverse()
Does anyone have any ideas about this?
Why is there no List.reverse() method in Java?
Because there is a Collections.reverse(List) method instead.
Because the API designers figured it was a bad idea to force every List implementation1 to implement a method that wasn't used 99.9% of the time2. This could be addressed by making the method "optional", but that has downsides too; e.g. runtime exceptions.
Because for some kinds of list (stream wrappers / adapters for example) implementing in-place reverse would be problematic. It changes the memory usage characteristics of the list by requiring it to be reified.
Also note that the generic implementation (source code) of reverse() that is provided by Collection uses set to swap elements. It is close to optimal for the standard list types.
#shmosel comments:
I assume OP is asking why it wasn't added as a default method, as List.sort() was.
Good point. Possibly the 99.9% argument applies. Bear in mind that this would only help people with a codebase that is built using a Java 8 or later compilers, etc.
1 - This includes implementations in your codebase and 3rd-party libraries.
2 - 86% of statistics are made up for theatrical effect :-)
For the same reason that fill and rotate and shuffle and swap and infinitely more possible list functions aren't declared in the List interface. They're not part of the "list" abstraction; rather, they can be implemented on top of that abstraction.
Once a List implements the methods already in the List interface, a reverse function can be written on top of the List abstraction without any knowledge of a particular List implementation. Therefore, it would be pointless to force every class implementing List to provide a custom implementation of reverse (and fill, rotate, shuffle, swap, etc.).
Note: This question is a very specific case of "Why does the Collections class contain standalone (static) methods, instead of them being added to the List interface?" - one could even consider is as a duplicate. Beyond that, arguing about the reasoning behind the decision for each individual method is reading tea leaves, and nobody can tell "the reason" for the design decision for the particular case of the reverse method (until, maybe Josh Bloch posts an answer here). Interestingly, this is a point that is not covered in the Java Collections API Design FAQ...
Some of the other answers seem convincing at the first glance, but raise other questions. Particularly, some of them don't give a reason for the design decision at all. Even if there are other ways to emulate the behavior of a certain method, or when a method is not used "99.9% of all time", it can still make sense to include it in the interface.
Looking at the List interface, you will notice that you can basically implement all methods based on two others:
T get(int index)
int size()
(For a mutable list, you also need set). These are exactly the ones that are still abstract in AbstractList. So all other methods are rather "convenience" methods that can be implemented canonically, based on these two methods. In this regard, I think that the answer Sam Estep contains an important point: One could argue to implement dozens of other methods. And there would certainly be good reasons to do so. Having a look at the actual implementation of Collections#reverse(List):
public static void reverse(List<?> list) {
int size = list.size();
if (size < REVERSE_THRESHOLD || list instanceof RandomAccess) {
for (int i=0, mid=size>>1, j=size-1; i<mid; i++, j--)
swap(list, i, j);
} else {
ListIterator fwd = list.listIterator();
ListIterator rev = list.listIterator(size);
for (int i=0, mid=list.size()>>1; i<mid; i++) {
Object tmp = fwd.next();
fwd.set(rev.previous());
rev.set(tmp);
}
}
}
What is this REVERSE_THRESHOLD and RandomAccess thing there? Seriously, if I felt the necessity to introduce a tagging interface like RandomAccess, I would strongly question my design. Whenever you have a method like
void doSomethingWith(Type x) {
if (x instanceof Special) doSomethingSpecial((Special)x);
else doSomethingNormal(x);
}
then this is a strong sign that this should actually be a polymorphic method, which should be implemented accordingly for the Special type.
So yes, it have been justified to pull the reverse method into the interface, to allow a polymorphic implementation. The same applies to fill rotate, shuffle, swap, sort and others. Similarly, one could have introduced a static method like
Collections.containsAll(containing, others);
that offers what is now done with the Collection#containsAll method. But in general: The designers chose a particular set of methods that they found suitable. One of the reasonings behind leaving out certain methods may be given by one of the bottom lines of the talk about "How to Design a Good API & Why it Matters" by Joshua Bloch, one of the core designers of the Java Collections API:
When in doubt, leave it out
Interestingly, of all the methods for which a polymorphic implementation (via a method in the List interface) could have been reasonable, one actually found its way into the interface, using a Java 8 default method:List#sort(). Maybe others, like reverse, will be added later...
Because Collection is an utilitarian class, that actually based on one of SOLID principle : S - Single Responsibility Principle
This principle states that if we have 2 reasons to change for a class, we have to split the functionality in two classes.
You have a class that play a some role, and if you need to manipulate of inner data you need to create some subsidiary class, that will plays another role.
If you need list.reverse() you need to use Eclipse Collections, when you can use just list.reverseThis(), see this. In JDK list, a lot of method (like sort, max, min) does not be added.
It's two different ways of API design:
A lot of method in Collection -> rich collection -> Eclipse
Collections, drawback: a lot of rarely used method in List,
Only most used method and Utility class -> JDK
collection, drawback: need to use Utility class like Collections,
Reverse is defined in Collections (with an extra (s)). This is not a part of collection hierarchy, rather it has been given as a part of utility class which can be used for different Lists.
Reversing a list is not a key part of defining a list , so its kept out of interface and given separately. If defined in the interface, everyone will have to implement it, which may not be suitable for all.
The makers of collection could have build this in List hierarchy as well, ( Since most list derivations have an abstract class in between, they could have put it in any abstract class in between). However, to simplify everyone's life it makes sense to keep it in single utility class so that we don't have to figure out which class to look for all collection related utility functions.

Is a Collection better than a LinkedList?

Collection list = new LinkedList(); // Good?
LinkedList list = new LinkedList(); // Bad?
First variant gives more flexibility, but is that all? Are there any other reasons to prefer it? What about performance?
These are design decisions, and one size usually doesn't fit all. Also the choice of what is used internally for the member variable can (and usually should be) different from what is exposed to the outside world.
At its heart, Java's collections framework does not provide a complete set of interfaces that describe the performance characteristics without exposing the implementation details. The one interface that describes performance, RandomAccess is a marker interface, and doesn't even extend Collection or re-expose the get(index) API. So I don't think there is a good answer.
As a rule of thumb, I keep the type as unspecific as possible until I recognize (and document) some characteristic that is important. For example, as soon as I want methods to know that insertion order is retained, I would change from Collection to List, and document why that restriction is important. Similarly, move from List to LinkedList if say efficient removal from front becomes important.
When it comes to exposing the collection in public APIs, I always try to start exposing just the few APIs that are expected to get used; for example add(...) and iterator().
Collection list = new LinkedList(); //bad
This is bad because, you don't want this reference to refer say an HashSet(as HashSet also implements Collection and so does many other class's in the collection framework).
LinkedList list = new LinkedList(); //bad?
This is bad because, good practice is to always code to the interface.
List list = new LinkedList();//good
This is good because point 2 days so.(Always Program To an Interface)
Use the most specific type information on non-public objects. They are implementation details, and we want our implementation details as specific and precise as possible.
Sure. If for example java will find and implement more efficient implementation for the List collection, but you already have API that accepts only LinkedList, you won't be able to replace the implementation if you already have clients for this API. If you use interface, you can easily replace the implementation without breaking the APIs.
They're absolutely equivalent. The only reason to use one over the other is that if you later want to use a function of list that only exists in the class LinkedList, you need to use the second.
My general rule is to only be as specific as you need to be at the time (or will need to be in the near future, within reason). Granted, this is somewhat subjective.
In your example I would usually declare it as a List just because the methods available on Collection aren't very powerful, and the distinction between a List and another Collection (Map, Set, etc.) is often logically significant.
Also, in Java 1.5+ don't use raw types -- if you don't know the type that your list will contain, at least use List<?>.

Java: ArrayList for List, HashMap for Map, and HashSet for Set?

I usually always find it sufficient to use the concrete classes for the interfaces listed in the title. Usually when I use other types (such as LinkedList or TreeSet), the reason is for functionality and not performance - for example, a LinkedList for a queue.
I do sometimes construct ArrayList with an initial capcacity more than the default of 10 and a HashMap with more than the default buckets of 16, but I usually (especially for business CRUD) never see myself thinking "hmmm...should I use a LinkedList instead ArrayList if I am just going to insert and iterate through the whole List?"
I am just wondering what everyone else here uses (and why) and what type of applications they develop.
Those are definitely my default, although often a LinkedList would in fact be the better choice for lists, as the vast majority of lists seem to just iterate in order, or get converted to an array via Arrays.asList anyway.
But in terms of keeping consistent maintainable code, it makes sense to standardize on those and use alternatives for a reason, that way when someone reads the code and sees an alternative, they immediately start thinking that the code is doing something special.
I always type the parameters and variables as Collection, Map and List unless I have a special reason to refer to the sub type, that way switching is one line of code when you need it.
I could see explicitly requiring an ArrayList sometimes if you need the random access, but in practice that really doesn't happen.
For some kind of lists (e.g. listeners) it makes sense to use a CopyOnWriteArrayList instead of a normal ArrayList. For almost everything else the basic implementations you mentioned are sufficient.
Yep, I use those as defaults. I generally have a rule that on public class methods, I always return the interface type (ie. Map, Set, List, etc.), since other classes (usually) don't need to know what the specific concrete class is. Inside class methods, I'll use the concrete type only if I need access to any extra methods it may have (or if it makes understanding the code easier), otherwise the interface is used.
It's good to be pretty flexible with any rules you do use, though, as a dependancy on concrete class visibility is something that can change over time (especially as your code gets more complex).
Indeed, always use base interfaces Collection, List, Map instead their implementations. To make thinkgs even more flexible you could hide your implementations behind static factory methods, which allow you to switch to a different implementation in case you find something better(I doubt there will be big changes in this field, but you never know). Another benefit is that the syntax is shorter thanks to generics.
Map<String, LongObjectClasName> map = CollectionUtils.newMap();
instead of
Map<String, LongObjectClasName> map = new HashMap<String, LongObjectClasName>();
public class CollectionUtils {
.....
public <T> List<T> newList() {
return new ArrayList<T>();
}
public <T> List<T> newList(int initialCapacity) {
return new ArrayList<T>(initialCapacity);
}
public <T> List<T> newSynchronizedList() {
return new Vector<T>();
}
public <T> List<T> newConcurrentList() {
return new CopyOnWriteArrayList<T>();
}
public <T> List<T> newSynchronizedList(int initialCapacity) {
return new Vector<T>(initialCapacity);
}
...
}
Having just come out of a class about data structure performance, I'll usually look at the kind of algorithm I'm developing or the purpose of the structure before I choose an implementation.
For example, if I'm building a list that has a lot of random accesses into it, I'll use an ArrayList because its random access performance is good, but if I'm inserting things into the list a lot, I might choose a LinkedList instead. (I know modern implementations remove a lot of performance barriers, but this was the first example that came to mind.)
You might want to look at some of the Wikipedia pages for data structures (especially those dealing with sorting algorithms, where performance is especially important) for more information about performance, and the article about Big O notation for a general discussion of measuring the performance of various functions on data structures.
I don't really have a "default", though I suppose I use the implementations listed in the question more often than not. I think about what would be appropriate for whatever particular problem I'm working on, and use it. I don't just blindly default to using ArrayList, I put in 30 seconds of thought along the lines of "well, I'm going to be doing a lot of iterating and removing elements in the middle of this list so I should use a LinkedList".
And I almost always use the interface type for my reference, rather than the implementation. Remember that List is not the only interface that LinkedList implements. I see this a lot:
LinkedList<Item> queue = new LinkedList<Item>();
when what the programmer meant was:
Queue<Item> queue = new LinkedList<Item>();
I also use the Iterable interface a fair amount.
If you are using LinkedList for a queue, you might consider using the Deque interface and ArrayDeque implementing class (introduced in Java 6) instead. To quote the Javadoc for ArrayDeque:
This class is likely to be faster than
Stack when used as a stack, and faster
than LinkedList when used as a queue.
I tend to use one of *Queue classes for queues. However LinkedList is a good choice if you don't need thread safety.
Using the interface type (List, Map) instead of the implementation type (ArrayList, HashMap) is irrelevant within methods - it's mainly important in public APIs, i.e. method signatures (and "public" doesn't necessarily mean "intended to be published outside your team).
When a method takes an ArrayList as a parameter, and you have something else, you're screwed and have to copy your data pointlessly. If the parameter type is List, callers are much more flexible and can, e.g. use Collections.EMPTY_LIST or Collections.singletonList().
I too typically use ArrayList, but I will use TreeSet or HashSet depending on the circumstances. When writing tests, however, Arrays.asList and Collections.singletonList are also frequently used. I've mostly been writing thread-local code, but I could also see using the various concurrent classes as well.
Also, there were times I used ArrayList when what I really wanted was a LinkedHashSet (before it was available).

Categories

Resources