For example, I often see this:
Set<Integer> s = new TreeSet<Integer>();
Set<Integer> s = new HashSet<Integer>();
Map<Integer, String> m = new HashMap<Integer, String>();
over
TreeSet<Integer> ts = new TreeSet<Integer>();
HashSet<Integer> hs = new HashSet<Integer>();
HashMap<Integer, String> hm = new HashMap<Integer, String>();
What are the advantages/disadvanges of the former vs the latter?
For me it comes down to a number of points.
Do you care about the implementation? Does your code need to know that the Map is a HashMap or a TreeMap ? Or does it just care that it's got a key/value structure of some kind
It also means that when I'm building my implementation code, if I expose a method that returns Map, I can change the implementation over time without effecting any code that relies on it (hence the reason why it's a bad idea to try and cast these types of values)
The other is that it becomes easier to move these structures around the code, such that any method that can accept a Map is going to be easier to deal with then one that relies on a HashMap for instance
The convention (that I follow) is basically to use the lowest functional interface that meets the need of the API. No point using an interface that does not provide the functionality your API needs (for example, if you need a SortedMap, no point using a Map then)
IMHO
Generally, you want to declare the most general type that has the behavior you're actually using. That way you don't have to change as much code if you decide to take a different concrete class. And you allow users of the function more freedom.
You should read On Understanding Data Abstraction, Revisited by William R. Cook and also his Proposal for Simplified, Modern Definitions of "Object" and "Object Oriented".
Bascially: if you use Java classes as anything else than factories, i.e. if you have a classname anywhere expect after a new operator, then you are not doing object-oriented programming. Following this rule does not guarantee that you are doing OO, but violating this rule means that you aren't.
Note: there's nothing wrong with not doing OO.
The maintenance of an application can cost three times as much as the development. This means you want the code to be as simple and as clear as possible.
If you use a List instead of an ArrayList, you make it clear you are not using any method special to an ArrayList and that it can be changed from another List implementation. The problem with using an ArrayList when it doesn't have to be is that it takes a long time to determine safely that really it could have been a List. i.e. its very hard to prove you never needed something. (It is relatively easy to add something than remove it)
A similar example is using Vector when a List will do. If you see a Vector you say; the developer chose a Vector for a good reason, it is thread safe. But I need to change it now and check that the code is thread safe. They you say, but I can't see how it is used in a multi-threaded way so I have to check all the ways it could possibly be used or do I need to add synchronized when I iterate over it when actually it never need to be thread safe. Using a thread safe collection when it doesn't need to be is not just a waste of CPU time but more importantly a waste of the developers time.
Related
I am developing an application where as a background I need to monitor the user activity on particular objects and later when they are visualized they need to be sorted based on the order of which the user used them ( the last used object must be visualized on the first row of a grid for example.)
So if I have an ArrayList where I store the objects which the user is dealing with in order to add the last used object I need to check if it is already in the list and then move it at the first position. If the object is not there I simply add it at the first position of the list.
So instead of doing all these steps I want to make my own list where the logic explained above will be available.
My question is which scenario is better:
Implement the list interface
Extend the ArrayList class and override the ADD method
Create a class that contains an ArrayList and handles any additional functionality.
I.e. prefer composition over inheritance (and in this case, implementing an interface). It's also possible to have that class implement List for relevant cases and just direct the (relevant) operations to the ArrayList inside.
Also note that LinkedHashMap supports insertion order (default) and access order for iteration, if you don't need a List (or if you can suitably replace it with a Map).
So instead of doing all these steps i want to make my own list where
the logic explained above will be available.
I would try to refactor your design parameters (if you can) in order to be able to use the existing Java Collection Framework classes (perhaps a linked collection type). As a part of the Collections Framework, these have been optimized and maintained for years (so efficiency is likely already nearly optimal), and you won't have to worry about maintaining it yourself.
Of the two options you give, it is possible that neither is the easiest or best.
It doesn't sound like you'll be able to extend AbstractList (as a way of implementing List) so you'll have a lot of wheel reinvention to do.
The ArrayList class is not final, but not expressly designed and documented for inheritance. This can result in some code fragility as inheritance breaks encapsulation (discussed in Effective Java, 2nd Ed. by J. Bloch). This solution may not be the best way to go.
Of the options, if you can't refactor your design to allow use of the Collection classes directly, then write a class that encapsulates a List (or other Collection) as an instance field and add instrumentation to it. Favor composition over inheritance. In this way, your solution will be more robust and easier to maintain than a solution based on inheritance.
I think LinkedHashMap already does what you need - it keeps the elements in the order they were inserted or last accessed (this is determined by the parameter accessOrder in one of the constructors).
https://docs.oracle.com/javase/8/docs/api/java/util/LinkedHashMap.html
EDIT
I don't have enough reputation to comment, so I'm putting it here: You don't actually need a map, so Venkatesh's LinkedHashSet suggestion is better.
You can do something like this:
<T> void update(Set<T> set, T value) {
set.remove(value);
set.add(value);
}
and then
LinkedHashSet<String> set = new LinkedHashSet<>();
update(set, "a");
update(set, "b");
update(set, "c");
update(set, "a");
Iterator<String> it = new LinkedList<String>(set).descendingIterator();
while (it.hasNext()) {
System.out.println(it.next());
}
Output:
a
c
b
You might try using HashMap<Integer, TrackedObject> where TrackedObject is the class of the Object you're keep track of.
When your user uses an object, do
void trackObject(TrackedObject object)
{
int x = hashMap.size();
hashMap.add(Integer.valueOf(x), object);
}
then when you want to read out the tracked objects in order of use:
TrackedObject[] getOrderedArray()
{
TrackedObject[] array = new TrackedObject[hashMap.size()];
for(int i = 0; i < hashMap.size(); i++)
{
array[i] = hashMap.get(Integer.valueOf(i));
}
return array;
}
A LinkedHashSet Also can be helpful in your case. You can keep on adding elements to it, it will keep them in insertion order and also will maintain only unique values.
So say I have a TreeMap<MyDataType, Integer>, where MyDataType is an object that contains a String and a Long. I want to check if the TreeMap contains a key that has a certain String; however, the Long associated with the object does not matter to me. For instance, my TreeMap could look like this:
{MyDataType: ["Tom", 1L] -> 1, MyDataType: ["Billy", 3L] -> 1, MyDataType: ["Ryan", 8L] -> 1}
I want to see if the TreeMap contains a Key (of type MyDataType) whose String value is "Billy". I can think of two ways to do this:
(1) iterate through the TreeMap one by one, checking the String of each MyDataType key.
(2) write a new class that extends TreeMap<MyDataType, Integer> and write a new containsKeyWithStringValue(String toCheck) that specifically does what I want it to do.
Are there any other more concise ways?
I would create a TreeMap<String, Integer> that will map "Tom" directly to 2, "Billy" to 5, etc.
Basically, you are asking if there is a more concise way of doing this (not more efficient, more scalable, etctera)
I think that the answer is no.
The standard Java Map API doesn't provide any mechanisms for querying a map ... apart from get.
The Guava libraries include support for functional-style programming, including stuff for filtering the keys, values and entries of a Map; see here. However, when you include all the boilerplate that is necessary to implement a Guava predicate, it is doubtful that it will be more concise than iterating and testing the entries by hand.
You also have a requirement that an exception is thrown on a "miss"; i.e. when the String / Integer don't match an entry. With that constraint, the answer is definitely No. Collection APIs are generally defined to return null if there is a "miss" because creating and throwing exceptions is relatively expensive in Java.
With Java 8, the answer to the first part is likely to change because the new lambda and related features will make functional-style programming easier. But this won't address your requirement that misses should result in exceptions.
Finally, I don't like your idea of extending TreeMap with custom methods. I think you should either wrap the class, or implement the "extended functionality" as a static helper method. This is not an objective reason, but extending does "feel right" to me in this situation.
Collection list = new LinkedList(); // Good?
LinkedList list = new LinkedList(); // Bad?
First variant gives more flexibility, but is that all? Are there any other reasons to prefer it? What about performance?
These are design decisions, and one size usually doesn't fit all. Also the choice of what is used internally for the member variable can (and usually should be) different from what is exposed to the outside world.
At its heart, Java's collections framework does not provide a complete set of interfaces that describe the performance characteristics without exposing the implementation details. The one interface that describes performance, RandomAccess is a marker interface, and doesn't even extend Collection or re-expose the get(index) API. So I don't think there is a good answer.
As a rule of thumb, I keep the type as unspecific as possible until I recognize (and document) some characteristic that is important. For example, as soon as I want methods to know that insertion order is retained, I would change from Collection to List, and document why that restriction is important. Similarly, move from List to LinkedList if say efficient removal from front becomes important.
When it comes to exposing the collection in public APIs, I always try to start exposing just the few APIs that are expected to get used; for example add(...) and iterator().
Collection list = new LinkedList(); //bad
This is bad because, you don't want this reference to refer say an HashSet(as HashSet also implements Collection and so does many other class's in the collection framework).
LinkedList list = new LinkedList(); //bad?
This is bad because, good practice is to always code to the interface.
List list = new LinkedList();//good
This is good because point 2 days so.(Always Program To an Interface)
Use the most specific type information on non-public objects. They are implementation details, and we want our implementation details as specific and precise as possible.
Sure. If for example java will find and implement more efficient implementation for the List collection, but you already have API that accepts only LinkedList, you won't be able to replace the implementation if you already have clients for this API. If you use interface, you can easily replace the implementation without breaking the APIs.
They're absolutely equivalent. The only reason to use one over the other is that if you later want to use a function of list that only exists in the class LinkedList, you need to use the second.
My general rule is to only be as specific as you need to be at the time (or will need to be in the near future, within reason). Granted, this is somewhat subjective.
In your example I would usually declare it as a List just because the methods available on Collection aren't very powerful, and the distinction between a List and another Collection (Map, Set, etc.) is often logically significant.
Also, in Java 1.5+ don't use raw types -- if you don't know the type that your list will contain, at least use List<?>.
Occasionally I see somebody create an arraylist like this, why?
List numbers = new ArrayList( );
Instead of:
ArrayList<something> numbers = new ArrayList<something>();
If you asking about using interface instead of concrete object, than it is a good practice. Imagine, you will switch to LinkedList tomorrow. In first case you won't need to fix variable declaration.
If the question was about non-using generics, then it is bad. Generics are always good as they give type safety.
What's good:
1. List is a general case for many implementations.
List trololo = new ListImpl();
Hides real implementation for the user:
public List giveMeTheList(){
List trololo = new SomeCoolListImpl();
return trololo;
}
By design it's good: user shouldn't pay attention to the realization. He just gets interface access for the implementation. Implementation should already has all neccessary properties: be fast for appending, be fast for inserting or be unmodifiable, e.t.c.
What's bad:
I've read that all raw types will be restricted in future Java versions, so such code better write this way:
List<?> trololo = new ListImpl<?>();
In general wildcard has the same meaning: you don't know fo sure will your collection be heterogenous or homogeneous?
Someday you could do:
List<something> numbers = new LinkedList<something>();without changing client code which calls numbers.
Declaring interface instead of implementation is indeed the rather good and widespread practice, but it is not always the best way. Use it everytime except for all of the following conditions are true:
You are completely sure, that chosen implementation will satisfy your needs.
You need some implementation-specific feauture, that is not available through interface, e.g. ArrayList.trimToSize()
Of course, you may use casting, but then using interface makes no sense at all.
The first line is old style Java, we had to do it before Java 1.5 introduced generics. But a lot of brilliant software engineers are still forced to use Java 1.4 (or less), because their companies fear risk and effort to upgrade the applications...
OK, that was off the records. A lot of legacy code has been produced with java 1.4 or less and has not been refactored.
The second line includes generics (so it's clearly 1.5+) and the variable is declared as an ArrayList. There's actually no big problem. Sure, always better to code against interfaces, so to my (and others) opinion, don't declare a variable as ArrayList unless you really need the special ArrayList methods.
Most of the time, when you don't care about the implementation, it's better to program to interface. So, something like:
List<something> numbers = new ArrayList<something>();
would be preferred than:
ArrayList<something> numbers = new ArrayList<something>();
The reason is you can tweak your program later for performance reason.
But, you have to be careful not to just choose the most generic interface available. For example, if you want to have a sorted set, instead of to Set, you should program to SortedSet, like this:
SortedSet<something> s = new TreeSet<something>();
If you just blatantly use interface like this:
Set<something> s = new TreeSet<something>();
Someone can modify the implementation to HashSet and your program will be broken.
Lastly, this program to interface will even be much more useful when you define a public API.
Two differences are that numbers in the first line is of type List, not ArrayList. This is possible because ArrayList is a descendant of List; that is, it has everything that List has, so can fill in for a List object. (This doesn't work the other way around.)
The second line's ArrayList is typed. This means that the second numbers list can only hold type something objects.
I usually always find it sufficient to use the concrete classes for the interfaces listed in the title. Usually when I use other types (such as LinkedList or TreeSet), the reason is for functionality and not performance - for example, a LinkedList for a queue.
I do sometimes construct ArrayList with an initial capcacity more than the default of 10 and a HashMap with more than the default buckets of 16, but I usually (especially for business CRUD) never see myself thinking "hmmm...should I use a LinkedList instead ArrayList if I am just going to insert and iterate through the whole List?"
I am just wondering what everyone else here uses (and why) and what type of applications they develop.
Those are definitely my default, although often a LinkedList would in fact be the better choice for lists, as the vast majority of lists seem to just iterate in order, or get converted to an array via Arrays.asList anyway.
But in terms of keeping consistent maintainable code, it makes sense to standardize on those and use alternatives for a reason, that way when someone reads the code and sees an alternative, they immediately start thinking that the code is doing something special.
I always type the parameters and variables as Collection, Map and List unless I have a special reason to refer to the sub type, that way switching is one line of code when you need it.
I could see explicitly requiring an ArrayList sometimes if you need the random access, but in practice that really doesn't happen.
For some kind of lists (e.g. listeners) it makes sense to use a CopyOnWriteArrayList instead of a normal ArrayList. For almost everything else the basic implementations you mentioned are sufficient.
Yep, I use those as defaults. I generally have a rule that on public class methods, I always return the interface type (ie. Map, Set, List, etc.), since other classes (usually) don't need to know what the specific concrete class is. Inside class methods, I'll use the concrete type only if I need access to any extra methods it may have (or if it makes understanding the code easier), otherwise the interface is used.
It's good to be pretty flexible with any rules you do use, though, as a dependancy on concrete class visibility is something that can change over time (especially as your code gets more complex).
Indeed, always use base interfaces Collection, List, Map instead their implementations. To make thinkgs even more flexible you could hide your implementations behind static factory methods, which allow you to switch to a different implementation in case you find something better(I doubt there will be big changes in this field, but you never know). Another benefit is that the syntax is shorter thanks to generics.
Map<String, LongObjectClasName> map = CollectionUtils.newMap();
instead of
Map<String, LongObjectClasName> map = new HashMap<String, LongObjectClasName>();
public class CollectionUtils {
.....
public <T> List<T> newList() {
return new ArrayList<T>();
}
public <T> List<T> newList(int initialCapacity) {
return new ArrayList<T>(initialCapacity);
}
public <T> List<T> newSynchronizedList() {
return new Vector<T>();
}
public <T> List<T> newConcurrentList() {
return new CopyOnWriteArrayList<T>();
}
public <T> List<T> newSynchronizedList(int initialCapacity) {
return new Vector<T>(initialCapacity);
}
...
}
Having just come out of a class about data structure performance, I'll usually look at the kind of algorithm I'm developing or the purpose of the structure before I choose an implementation.
For example, if I'm building a list that has a lot of random accesses into it, I'll use an ArrayList because its random access performance is good, but if I'm inserting things into the list a lot, I might choose a LinkedList instead. (I know modern implementations remove a lot of performance barriers, but this was the first example that came to mind.)
You might want to look at some of the Wikipedia pages for data structures (especially those dealing with sorting algorithms, where performance is especially important) for more information about performance, and the article about Big O notation for a general discussion of measuring the performance of various functions on data structures.
I don't really have a "default", though I suppose I use the implementations listed in the question more often than not. I think about what would be appropriate for whatever particular problem I'm working on, and use it. I don't just blindly default to using ArrayList, I put in 30 seconds of thought along the lines of "well, I'm going to be doing a lot of iterating and removing elements in the middle of this list so I should use a LinkedList".
And I almost always use the interface type for my reference, rather than the implementation. Remember that List is not the only interface that LinkedList implements. I see this a lot:
LinkedList<Item> queue = new LinkedList<Item>();
when what the programmer meant was:
Queue<Item> queue = new LinkedList<Item>();
I also use the Iterable interface a fair amount.
If you are using LinkedList for a queue, you might consider using the Deque interface and ArrayDeque implementing class (introduced in Java 6) instead. To quote the Javadoc for ArrayDeque:
This class is likely to be faster than
Stack when used as a stack, and faster
than LinkedList when used as a queue.
I tend to use one of *Queue classes for queues. However LinkedList is a good choice if you don't need thread safety.
Using the interface type (List, Map) instead of the implementation type (ArrayList, HashMap) is irrelevant within methods - it's mainly important in public APIs, i.e. method signatures (and "public" doesn't necessarily mean "intended to be published outside your team).
When a method takes an ArrayList as a parameter, and you have something else, you're screwed and have to copy your data pointlessly. If the parameter type is List, callers are much more flexible and can, e.g. use Collections.EMPTY_LIST or Collections.singletonList().
I too typically use ArrayList, but I will use TreeSet or HashSet depending on the circumstances. When writing tests, however, Arrays.asList and Collections.singletonList are also frequently used. I've mostly been writing thread-local code, but I could also see using the various concurrent classes as well.
Also, there were times I used ArrayList when what I really wanted was a LinkedHashSet (before it was available).