This question already has answers here:
What does it mean to "program to an interface"?
(33 answers)
Closed 1 year ago.
I'm learning about the "coding to an interface to hide implementation details" design principle, and I'm confused over this idea. In my example, I have a WorkoutRoutine class that has a List of RoutineStep. By following the coding to an interface principle, my getWorkoutRoutine() method will return the List interface so that the implementation details of which list is used is hidden from the client.
What if I decide to change the implementation from a List of RoutineStep to an array of RoutineStep, won't disclosing the return type as List reveal to the client that a list was implemented and not an array, tree, graph, or any other data structure? How can I best encapsulate my data structure for holding a collection of RoutineStep, such that I can change the implementation of the data structure without the client code breaking if the List was changed to an array, tree, graph, etc?
public class WorkoutRoutine {
private List<RoutineStep> workoutRoutine;
public List<RoutineStep> getWorkoutRoutine() {
//What if I later decide to change the data structure from a List to an array, tree,
//graph, set, map. What approach should I take so that the client code doesn't break
//as they would have already coded to receive a List but after changing the
//implementation from a list to an array or any other data structure, their code would
//break.
}
}
The idea is to return a type that is as generic as possible but specific enough.
For example returning a LinkedList may be too specific - if the client starts using the getFirstmethod and you later on decide to return an ArrayList for performance reasons, the client code will break. Hence the principle of returning a more generic type, such as a List.
You could even go more generic and return a Collection - it may make sense if you don't think the client code will need to access the n-th position in the collection for example. But if you feel that accessing the n-th step of the routine (routine.get(n-1)) without having to iterate is an important feature, it means that Collection is too generic for your use case.
decide to change the implementation from a List of RoutineStep to an array of RoutineStep
The ArrayList is just what you would choose – a List implementation backed by an array structure.
Later, you may find that your app is more often inserting elements into the middle of the list rather than appending to the end. So you decide to switch your choice of List implementation from ArrayList to LinkedList. By having your method return an object of the more general interface List rather than the more specific concrete classes of ArrayList and LinkedList, your change from one class to the other does not break calling code.
By the way, we generally do not use array in Java where we expect to need the features and flexibility of a List or Set from the Java Collections Framework. We generally use arrays only where we need to conserve RAM because of deploying to constrained devices or we need maximum performance. Arrays were also used for their more convenient compact literals, but the new List.of and Set.of methods fill that need.
without the client code breaking if the List was changed to an array, tree, graph, etc?
If you are making such a massive change to the data structures of your app, then no amount of encapsulation will mask that change. At some point, your design changes may indeed break existing code.
Such breaking changes may be a natural part of the early stages in an emergent design. This is a normal part of our work. Later, in an evolved design, proper use of encapsulation with help protect against smaller changes having wider impact than is necessary, will make your codebase less brittle.
Related
These days I am reading the Code Complete book and I've passed the part about coupling-levels (simple-data-parameter, simple-object, object-parameter and semantic coupling). Semantic being the "worst" kind:
The most insidious kind of coupling occurs when one module
makes use not of some syntactic element ofanother module but of some semantic
knowledge of another module’s inner workings.
The examples in the book usually lead to run-time failures, and are typical bad code, but today I had a situation that I'm really not sure how to treat.
First I had a class, let's call it Provider fetching some data and returning a List of Foo's.
class Provider
{
public List<Foo> getFoos()
{
ArrayList<Foo> foos = createFoos();//some processing here where I create the objects
return foos;
}
}
A consuming class executes an algorithm, processing the Foo's, merging or removing from the List based on some attributes, not really important. The algorithm does all of it's processing with the head of the list. So there is a lot of operations reading/removing/adding to the head.
(I just realized I could have made the algorithm looking like merge sort, recursively calling it on halves of an array, but that doesn't answer my question :) )
I noticed I'm returning an ArrayList so I changed the providing class' getFoos method to return an LinkedList. The ArrayList has O(n) complexity with head removals, while LinkedList has constant complexity. But it then struck me that I am possibly making a semantic dependency. The code will certainly work with both implementations of List, there are no side-effects, but the performance will also be degraded. And I wrote both classes so I can easily understand, but what if a colleague had to do implement the algorithm, or if he uses the Provider as a source for another algorithm which favors random access. If he doesn't bother with the internals, like he should not, I would mess up his algorithm performance.
Declaring the method to return LinkedList is also possible, but what about the "program to interface" principle?
Is there any way to handle this situation, or my design has flaws in the roots?
The general problem is, how does a producer return something in the form that the consumer prefers? Usually the consumer needs to include the preference in the request. For example
as a flag - getFoos(randomOrLinked)
different methods - getFoosAsArrayList(), getFoosAsLinkedList()
pass a function that creates desired List - getFoos(ArrayList::new)
or pass a desired output List - getFoos(new ArrayList())
But, the producer may have the right to say, this is too complicated for me, I don't care. I'll return a form that's suitable for most use cases, and the consumer needs to handle it properly. If I think ArrayList is best, I'll just do it. (Actually you may have a better choice - a ring structure - that suits both of the two use cases in consideration)
Of course, it should be well documented. Or you could be honest and return ArrayList as the method signature, as long as you commit to it. Don't worry too much about "interface" - ArrayList is an interface (in the general sense), Iterable is an interface, so what's so special about the List interface that's between the two.
There can be another criticism on your design - you return a mutable data structure so that the consumer can directly modify. That is less desirable than return a read-only data structure. If you could, you should return a read-only view of the underlying data; the construction of the view should be inexpensive. The consumer needs to do its own copy if it needs a mutable one.
You need to make a compromise somewhere. In this case, there are two compromises that make sense to me:
If you don't know that all consumers of your Provider will be performing operations that are appropriate to a LinkedList, then stick with the signature that has return type List with implementation that returns ArrayList (ArrayList is a good all-around List implementation). Then within the calling method, wrap the returned List in a LinkedList: LinkedList<Foo> fooList = new LinkedList<>(provider.getFoos()); Make the caller responsible for its own optimizations.
If you know that all consumers of your Provider are going to use it in a LinkedList-appropriate way, then just change the return type to LinkedList -- or add another method that returns a LinkedList.
I strongly prefer the former, but both make sense.
You could have the caller create the LinkedList, instead of the provider - i.e. change
List<Foo> foos = provider.getFoos();
to
List<Foo> foos = new LinkedList<>(provider.getFoos());
Then it doesn't matter what kind of list the provider returns. The downside is that an ArrayList is still created, so there is a tradeoff between efficiency and cleanliness.
Java 8 provides java.util.Arrays.parallelSort, which sorts arrays in parallel using the fork-join framework. But there's no corresponding Collections.parallelSort for sorting lists.
I can use toArray, sort that array, and store the result back in my list, but that will temporarily increase memory usage, which if I'm using parallel sorting is already high because parallel sorting only pays off for huge lists. Instead of twice the memory (the list plus parallelSort's working memory), I'm using thrice (the list, the temporary array and parallelSort's working memory). (Arrays.parallelSort documentation says "The algorithm requires a working space no greater than the size of the original array".)
Memory usage aside, Collections.parallelSort would also be more convenient for what seems like a reasonably common operation. (I tend not to use arrays directly, so I'd certainly use it more often than Arrays.parallelSort.)
The library can test for RandomAccess to avoid trying to e.g. quicksort a linked list, so that can't a reason for a deliberate omission.
How can I sort a List in parallel without creating a temporary array?
There doesn't appear to be any straightforward way to sort a List in parallel in Java 8. I don't think this is fundamentally difficult; it looks more like an oversight to me.
The difficulty with a hypothetical Collections.parallelSort(list, cmp) is that the Collections implementation knows nothing about the list's implementation or its internal organization. This can be seen by examining the Java 7 implementation of Collections.sort(list, cmp). As you observed, it has to copy the list elements out to an array, sort them, and then copy them back into the list.
This is the big advantage of the List.sort(cmp) extension method over Collections.sort(list, cmp). It might seem that this is merely a small syntactic advantage being able to write myList.sort(cmp) instead of Collections.sort(myList, cmp). The difference is that myList.sort(cmp), being an interface extension method, can be overridden by the specific List implementation. For example, ArrayList.sort(cmp) sorts the list in-place using Arrays.sort() whereas the default implementation implements the old copyout-sort-copyback technique.
It should be possible to add a parallelSort extension method to the List interface that has similar semantics to List.sort but does the sorting in parallel. This would allow ArrayList to do a straightforward in-place sort using Arrays.parallelSort. (It's not entirely clear to me what the default implementation should do. It might still be worth it to do copyout-parallelSort-copyback.) Since this would be an API change, it can't happen until the next major release of Java SE.
As for a Java 8 solution, there are a couple workarounds, none very pretty (as is typical of workarounds). You could create your own array-based List implementation and override sort() to sort in parallel. Or you could subclass ArrayList, override sort(), grab the elementData array via reflection and call parallelSort() on it. Of course you could just write your own List implementation and provide a parallelSort() method, but the advantage of overriding List.sort() is that this works on the plain List interface and you don't have to modify all the code in your code base to use a different List subclass.
I think you are doomed to use a custom List implementation augmented with your own parallelSort or else change all your other code to store the big data in Array types.
This is the inherent problem with layers of abstract data types. They're meant to isolate the programmer from details of implementation. But when the details of implementation matter - as in the case of underlying storage model for sort - the otherwise splendid isolation leaves the programmer helpless.
The standard List sort documents provide an example. After the explanation that mergesort is used, they say
The default implementation obtains an array containing all elements in this list, sorts the array, and iterates over this list resetting each element from the corresponding position in the array. (This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.)
In other words, "since we don't know the underlying storage model for a List and couldn't touch it if we did, we make a copy organized in a known way." The parenthesized expression is based on the fact that the List "i'th element accessor" on a linked list is Omega(n), so the normal array mergesort implemented with it would be a disaster. In fact it's easy to implement mergesort efficiently on linked lists. The List implementer is just prevented from doing it.
A parallel sort on List has the same problem. The standard sequential sort fixes it with custom sorts in the concrete List implementations. The Java folks just haven't chosen to go there yet. Maybe in Java 9.
Use the following:
yourCollection.parallelStream().sorted().collect(Collectors.toList());
This will be parallel when sorting, because of parallelStream(). I believe this is what you mean by parallel sort?
Just speculating here, but I see several good reasons for generic sort algorithms preferring to work on arrays instead of List instances:
Element access is performed via method calls. Despite all the optimizations JIT can apply, even for a list that implements RandomAccess, this probably means a lot of overhead compared to plain array accesses which can be optimized very well.
Many algorithms require copying some fragments of the array to temporary structures. There are efficient methods for copying arrays or their fragments. An arbitrary List instance on the other hand, can't be easily copied. New lists would have to be allocated which poses two problems. First, this means allocating some new objects which is likely more costly than allocating arrays. Second, the algorithm would have to choose what implementation of List should be allocated for this temporary structure. There are two obvious solutions, both bad: either just choose some hard-coded implementation, e.g. ArrayList, but then it could just allocate simple arrays as well (and if we're generating arrays then it's much easier if the soiurce is also an array). Or, let the user provide some list factory object, which makes the code much more complicated.
Related to the previous issue: there is no obvious way of copying a list into another due to how the API is designed. The best the List interface offers is addAll() method, but this is probably not efficient for most cases (think of pre-allocating the new list to its target size vs adding elements one by one which many implementations do).
Most lists that need to be sorted will be small enough for another copy to not be an issue.
So probably the designers thought of CPU efficiency and code simplicity most of all, and this is easily achieved when the API accepts arrays. Some languages, e.g. Scala, have sort methods that work directly on lists, but this comes at a cost and probably is less efficient than sorting arrays in many cases (or sometimes there will probably just be a conversion to and from array performed behind the scenes).
By combining the existing answers I came up with this code.
This works if you are not interested in creating a custom List class and if you don't bother to create a temporary array (Collections.sort is doing it anyway).
This uses the initial list and does not create a new one as in the parallelStream solution.
// Convert List to Array so we can use Arrays.parallelSort rather than Collections.sort.
// Note that Collections.sort begins with this very same conversion, so we're not adding overhead
// in comparaison with Collections.sort.
Foo[] fooArr = fooLst.toArray(new Foo[0]);
// Multithread the TimSort. Automatically fallback to mono-thread when size is less than 8192.
Arrays.parallelSort(fooArr, Comparator.comparingStuff(Foo::yourmethod));
// Refill the List using the sorted Array, the same way Collections.sort does it.
ListIterator<Foo> i = fooLst.listIterator();
for (Foo e : fooArr) {
i.next();
i.set(e);
}
This question already has answers here:
Type List vs type ArrayList in Java [duplicate]
(15 answers)
Closed 9 years ago.
I have observed in Java programming language, we code like following:
List mylist = new ArrayList();
Why we should not use following instead of above one?
ArrayList mylist = new ArrayList();
While the second option is viable, the first is preferable in most cases. Typically you want to code to interfaces to make your code less coupled and more cohesive. This is a type of data abstraction, where the user of mylist (I would suggest myList), does not care of the actual implementation of it, only that it is a list.
We may want to change the underlying data structure at some point, and by keeping references, we only need to change the declaration.
The separation of Abstract Data Type and specific implementation is one the key aspects of object oriented programming.
See Interface Instansiation
Just to avoid tight coupling. You should in theory never tie yourself to implementation details, because they might change, opposite to interface contract, which is supposed to be stable. Also, it really simplifies testing.
You could view interface as an overall contract all implementing classes must obey. Instead, implementation-specific details may vary, like how data is represented internally, accessed, etc. - the information that you'd never want to rely on.
If you use ArrayList, you are saying it has to be an ArrayList, not any other kind of List, and to replace it you would have to change every reference to the type. If you use List you are making it clear there is nothing special about the List and it is used as a plain list. It can be changed to another List by changing just one line.
Collection list = new LinkedList(); // Good?
LinkedList list = new LinkedList(); // Bad?
First variant gives more flexibility, but is that all? Are there any other reasons to prefer it? What about performance?
These are design decisions, and one size usually doesn't fit all. Also the choice of what is used internally for the member variable can (and usually should be) different from what is exposed to the outside world.
At its heart, Java's collections framework does not provide a complete set of interfaces that describe the performance characteristics without exposing the implementation details. The one interface that describes performance, RandomAccess is a marker interface, and doesn't even extend Collection or re-expose the get(index) API. So I don't think there is a good answer.
As a rule of thumb, I keep the type as unspecific as possible until I recognize (and document) some characteristic that is important. For example, as soon as I want methods to know that insertion order is retained, I would change from Collection to List, and document why that restriction is important. Similarly, move from List to LinkedList if say efficient removal from front becomes important.
When it comes to exposing the collection in public APIs, I always try to start exposing just the few APIs that are expected to get used; for example add(...) and iterator().
Collection list = new LinkedList(); //bad
This is bad because, you don't want this reference to refer say an HashSet(as HashSet also implements Collection and so does many other class's in the collection framework).
LinkedList list = new LinkedList(); //bad?
This is bad because, good practice is to always code to the interface.
List list = new LinkedList();//good
This is good because point 2 days so.(Always Program To an Interface)
Use the most specific type information on non-public objects. They are implementation details, and we want our implementation details as specific and precise as possible.
Sure. If for example java will find and implement more efficient implementation for the List collection, but you already have API that accepts only LinkedList, you won't be able to replace the implementation if you already have clients for this API. If you use interface, you can easily replace the implementation without breaking the APIs.
They're absolutely equivalent. The only reason to use one over the other is that if you later want to use a function of list that only exists in the class LinkedList, you need to use the second.
My general rule is to only be as specific as you need to be at the time (or will need to be in the near future, within reason). Granted, this is somewhat subjective.
In your example I would usually declare it as a List just because the methods available on Collection aren't very powerful, and the distinction between a List and another Collection (Map, Set, etc.) is often logically significant.
Also, in Java 1.5+ don't use raw types -- if you don't know the type that your list will contain, at least use List<?>.
In Java, when would it be preferential to use a List rather than an Array?
I see the question as being the opposite-
When should you use an Array over a List?
Only you have a specific reason to do so (eg: Project Constraints, Memory Concerns (not really a good reason), etc.)
Lists are much easier to use (imo), and have much more functionality.
Note: You should also consider whether or not something like a Set, or another datastructure is a better fit than a List for what you are trying to do.
Each datastructure, and implmentation, has different pros/cons. Pick the ones that excel at the things that you need to do.
If you need get() to be O(1) for any item? Likely use an ArrayList, Need O(1) insert()? Possibly a Linked List. Need O(1) contains()? Possibly a Hashset.
TLDR: Each data structure is good at some things, and bad at others. Look at your objectives and choose the data structure that best fits the given problem.
Edit:
One thing not noted is that you're
better off declaring the variable as
its interface (i.e. List or Queue)
rather than its implementing class.
This way, you can change the
implementation at some later date
without changing anything else in the
code.
As an example:
List<String> myList = new ArrayList<String>();
vs
List<String> myList = new LinkedList<String>();
Note that myList is a List in both examples.
--R. Bemrose
Rules of thumb:
Use a List for reference types.
Use arrays for primitives.
If you have to deal with an API that is using arrays, it might be useful to use arrays. OTOH, it may be useful to enforce defensive copying with the type system by using Lists.
If you are doing a lot of List type operations on the sequence and it is not in a performance/memory critical section, then use List.
Low-level optimisations may use arrays. Expect nastiness with low-level optimisations.
Most people have answered it already.
There are almost no good reason to use an array instead of List. The main exception being the primitive array (like int[]). You cannot create a primitive list (must have List<Integer>).
The most important difference is that when using List you can decide what implementation will be used. The most obvious is to chose LinkedList or ArrayList.
I would like to point out in this answer that choosing the implementation gives you very fine grained control over the data that is simply not available to array:
You can prevent client from modifying your list by wrapping your list in a Collection.unmodifiableList
You can synchronize a list for multithreading using Collection.synchronizedList
You can create a fixed length queue with implementation of LinkedBlockingQueue
... etc
In any case, even if you don't want (now) any extra feature of the list. Just use an ArrayList and size it with the size of the array you would have created. It will use an Array in the back-end and the performance difference with a real array will be negligible. (except for primitive arrays)
Pretty much always prefer a list. Lists have much more functionality, particularly iterator support. You can convert a list to an array at any time with the toArray() method.
Always prefer lists.
Arrays when
Varargs for a method ( I guess you are forced to use Arrays here ).
When you want your collections to be covariant ( arrays of reference types are covariant ).
Performance critical code.
If you know how many things you'll be holding, you'll want an array. My screen is 1024x768, and a buffer of pixels for that isn't going to change in size ever during runtime.
If you know you'll need to access specific indexes (go get item #763!), use an array or array-backed list.
If you need to add or remove items from the group regularly, use a linked list.
In general, dealing with hardware, arrays, dealing with users, lists.
It depends on what kind of List.
It's better to use a LinkedList if you know you'll be inserting many elements in positions other than the end. LinkedList is not suitable for random access (getting the i'th element).
It's better to use an ArrayList if you don't know, in advance, how many elements there are going to be. The ArrayList correctly amortizes the cost of growing the backing array as you add more elements to it, and is suitable for random access once the elements are in place. An ArrayList can be efficiently sorted.
If you want the array of items to expand (i.e. if you don't know what the size of the list will be beforehand), a List will be beneficial. However, if you want performance, you would generally use an array.
In many cases the type of collection used is an implementation detail which shouldn't be exposed to the outside world. The more generic your returntype is the more flexibility you have changing the implementation afterwards.
Arrays (primitive type, ie. new int[10]) are not generic, you won't be able to change you implementation without an internal conversion or altering the client code. You might want to consider Iterable as a returntype.