questions on using hashset to define a new set - java

With respect to the following two different definitions of sets, what are the differences:
Set<Integer> intset = new Hashset<Integer>();
Set<Integer> intset = new Set<Integer>();
Thanks.

Since Set is an interface, the second won't compile.

You can't declare new Set its an interface. All of those(Set, Map, List) are interfaces the java.collections package. They are not directly instantiable, but require that implementations(HashSet, ArrayList, Hashmap) be supplied in the right hand side of the assignment operation.

The second one won't even compile. Often people ask what's the difference between these:
HashSet<Integer> intset = new Hashset<Integer>();
Set<Integer> intset = new HashSet<Integer>();
and perhaps that's what you meant to ask. The difference here is that code written using the first definition is dependent on the particular choice of Set implementation (HashSet vs. TreeSet or something else) whereas the second declaration would let you trivially change to a different implementation without modifying any other code. It's a good practice in general -- keeps you flexible.

java.util.Set is an interface while java.util.HashSet is an actual implementation.

You cannot instantiate a Set the way you are in your second definition.
Set is an Interface, and cannot be instantiated, it simply defines a contract by which concrete implementations can follow.
You can, however, instantiate an anonymous inner class that would follow Set's interface:
Set<Integer> intSet = new Set<Integer>() {
//need to define all set methods here...
};

In Java Set is an interface and thus cannot be initiated, thus the second one is wrong.
But the actual difference in terminology is that a Set is a mathematical concept conveying to some rules (e.g. uniqueness, order non-importance). A HashSet is a technique to implement the Set concept, using a Hashtable, which makes it computationally very fast --- amortized constant time insertion, deletion and access.

Set is an interface that provides some abstract methods for specific Set implementations to provide. You can't initialize a Set object, the same way you can't a List. An interface is similar to a class, however everything it contains is abstract and a class can implement multiple interfaces, but only extend a single class. Also, a class can contain both abstract and concrete methods while an interface can't. Interfaces are sort of a way of dealing with multiple inheritance.
Anyway:
http://download.oracle.com/javase/6/docs/api/java/util/Set.html
http://download.oracle.com/javase/tutorial/java/concepts/interface.html

Compiler error on line:
Set<Integer> intset = new Set<Integer>();

Related

What are the differences between these two object declarations which uses an Interface and a Class in Java? [duplicate]

PMD would report a violation for:
ArrayList<Object> list = new ArrayList<Object>();
The violation was "Avoid using implementation types like 'ArrayList'; use the interface instead".
The following line would correct the violation:
List<Object> list = new ArrayList<Object>();
Why should the latter with List be used instead of ArrayList?
Using interfaces over concrete types is the key for good encapsulation and for loose coupling your code.
It's even a good idea to follow this practice when writing your own APIs. If you do, you'll find later that it's easier to add unit tests to your code (using Mocking techniques), and to change the underlying implementation if needed in the future.
Here's a good article on the subject.
Hope it helps!
This is preferred because you decouple your code from the implementation of the list. Using the interface lets you easily change the implementation, ArrayList in this case, to another list implementation without changing any of the rest of the code as long as it only uses methods defined in List.
In general I agree that decoupling interface from implementation is a good thing and will make your code easier to maintain.
There are, however, exceptions that you must consider. Accessing objects through interfaces adds an additional layer of indirection that will make your code slower.
For interest I ran an experiment that generated ten billion sequential accesses to a 1 million length ArrayList. On my 2.4Ghz MacBook, accessing the ArrayList through a List interface took 2.10 seconds on average, when declaring it of type ArrayList it took on average 1.67 seconds.
If you are working with large lists, deep inside an inner loop or frequently called function, then this is something to consider.
ArrayList and LinkedList are two implementations of a List, which is an ordered collection of items. Logic-wise it doesn't matter if you use an ArrayList or a LinkedList, so you shouldn't constrain the type to be that.
This contrasts with say, Collection and List, which are different things (List implies sorting, Collection does not).
Why should the latter with List be used instead of ArrayList?
It's a good practice : Program to interface rather than implementation
By replacing ArrayList with List, you can change List implementation in future as below depending on your business use case.
List<Object> list = new LinkedList<Object>();
/* Doubly-linked list implementation of the List and Deque interfaces.
Implements all optional list operations, and permits all elements (including null).*/
OR
List<Object> list = new CopyOnWriteArrayList<Object>();
/* A thread-safe variant of ArrayList in which all mutative operations
(add, set, and so on) are implemented by making a fresh copy of the underlying array.*/
OR
List<Object> list = new Stack<Object>();
/* The Stack class represents a last-in-first-out (LIFO) stack of objects.*/
OR
some other List specific implementation.
List interface defines contract and specific implementation of List can be changed. In this way, interface and implementation are loosely coupled.
Related SE question:
What does it mean to "program to an interface"?
Even for local variables, using the interface over the concrete class helps. You may end up calling a method that is outside the interface and then it is difficult to change the implementation of the List if necessary.
Also, it is best to use the least specific class or interface in a declaration. If element order does not matter, use a Collection instead of a List. That gives your code the maximum flexibility.
Properties of your classes/interfaces should be exposed through interfaces because it gives your classes a contract of behavior to use, regardless of the implementation.
However...
In local variable declarations, it makes little sense to do this:
public void someMethod() {
List theList = new ArrayList();
//do stuff with the list
}
If its a local variable, just use the type. It is still implicitly upcastable to its appropriate interface, and your methods should hopefully accept the interface types for its arguments, but for local variables, it makes total sense to use the implementation type as a container, just in case you do need the implementation-specific functionality.
In general for your line of code it does not make sense to bother with interfaces. But, if we are talking about APIs there is a really good reason. I got small class
class Counter {
static int sizeOf(List<?> items) {
return items.size();
}
}
In this case is usage of interface required. Because I want to count size of every possible implementation including my own custom. class MyList extends AbstractList<String>....
Interface is exposed to the end user. One class can implement multiple interface. User who have expose to specific interface have access to some specific behavior which are defined in that particular interface.
One interface also have multiple implementation. Based on the scenario system will work with different scenario (Implementation of the interface).
let me know if you need more explanation.
The interface often has better representation in the debugger view than the concrete class.

TreeSet in Java

I understand Java best practice suggests that while declaring a variable the generic set interface is used to declare on the left and the specific implementation on the right.
Thus if I have to declare the Set interfaces, the right way is,
Set<String> set = new TreeSet<>();
However with this declaration I'm not able to access the set.last() method.
However when I declare it this way,
TreeSet<String> set = new TreeSet<>();
I can access, last() and first().
Can someone help me understand why?
Thelast() and first() are specific methods belonging to TreeSet and not the generic interface Set. When you refer to the variable, it's looking at the source type and not the allocated type, therefore if you're storing a TreeSet as a Set, it may only be treated as a Set. This essentially hides the additional functionality.
As an example, every class in Java extends Object. Therefore this is a totally valid allocation:
final Object mMyList = new ArrayList<String>();
However, we'll never be able to use ArrayList style functionality when referring directly to mMyList without applying type-casting, since all Java can tell is that it's dealing with a generic Object; nothing more.
You can use the SortedSet interface as your declaration (which has the first() and last() methods).
The interface of Set does not provide last() and first() because it does not always make sense for a Set.
Java is a static typing language. When compiler see you doing set.last(), as it is expecting set to be a Set. It will not know whether set is a TreeSet that provides last(), or it is a HashSet that does not. That's why it is complaining.
Hold on. You do not need to declare it using the concrete class TreeSet here. TreeSet bears a SortedSet interface which provides such methods. In another word: because you need to work with a Set that is sorted (for which it make sense to provide first() and last()), the interface that you should work against should be SortedSet instead of Set
Hence what you should be doing is
SortedSet<String> set = new TreeSet<>();
To add to the existing answers here, the reason is that TreeSet implements the interface NavigableSet which inherits from the interface java.util.SortedSet the methods comparator, first, last and spliterator.
Just the Set, on the other side, doesn't have SortedSet methods because it's not inherit from it.
Check it out from more Java TreeSet examples.

Collection List and Subclass Initialization

Its always said its better to use a collection object as below
1) List st = new LinkedList();
2) Map mp = new HashMap();
Than
3) LinkedList st = new LinkedList();
4) HashMap mp = new HashMap();
I agree by defining as above (1,2) I can reassign the same variable (st,mp) to other objects of List, Map interface
But Here I cant use the methods that are defined only in LinkedList, Hashmap which is correct as those are not visible for List, Map . (Please correct me if am worng)
But if am defining a object of HashMap or LinkedList, I want to use it for some special functionality from these.
Then Why is it said the best way to create a collection object is as done in ( 1,2 )
Because most of the time you don't need the special methods. If you need the special methods, then obviously you need to reference the specific type.
Lesson for today: Don't blindly apply programming principles without using your own brain.
But if am defining a object of HashMap or LinkedList, I want to use it for some special functionality from these.
In that case, you should absolutely declare the variable using the concrete class. That's fine.
The point of using the interface instead is to indicate that you only need the functionality exposed by that interface, leaving you open to potentially change implementation later. (Although you'd need to be careful of the performance and even behavioural implications of which concrete implementation you choose.)
I agree by defining as above (1,2) I can reassign the same variable
(st,mp) to other objects of List,Map interface
Yes, it's a general practice called programming against interfaces.
But Here I cant use the methods that are defined only in LinkedList,
Hashmap which is correct as those are not visible for List,Map .
(Please correct me if am worng)
No, you are right.
But if am defining a object of HashMap or LinkedList, I want to use it
for some special functionality from these.
Then Why is it said the best way to create a collection object is as
done in ( 1,2 )
This isn't the best way. If you need to use specific methods of those classes you need the reference to the concrete type. If you need to use those collections from a client class that is not supposed to know the internal implementation than it's better to expose only the interface.
Through interfaces you define service contracts. As you say, should you change the lower implementation of a given interface, you can do it flawlesly without any impact on your current code.
If you need any particular behaviour of the particular classes it's absolutely right to use them. Maps usually extend the AbstractMap class that itself implements Map, making the subclasses inherit those methods.
Of course, many classes throw IllegalOperationException on some defined methods of the Map interface, so that implementation type change is not always flawless (but in most cases, it is, because each map has a particular asset that makes it the most appropiate choice for a given context).
Use the type that suits you, not the one that someone says it's the correct one. Every rule has exceptions.
Because if you use the interface to access the collections, you are free to change the implementation. Eg use a ArrayList instead LinkedList, or a synchronized version of it.
This mostly applies to cases where you have a Collection in a public interface of the class, internally i wouldn't bother, just use what you need.

Why do we first declare subtypes as their supertype before we instantiate them?

Reading other people's code, I've seen a lot of:
List<E> ints = new ArrayList<E>();
Map<K, V> map = new HashMap<K, V>();
My question is: what is the point/advantage of instantiating them that way as opposed to:
ArrayList<E> ints = new ArrayList<E>();
HashMap<K, V> map = new HashMap<K, V>();
What also makes it odd is that I've never seen anything like:
CharSequence s = new String("String");
or
OutputStream out = new PrintStream(OutputStream);
Duplicates (of the first part of the question):
When/why to use/define an interface
Use interface or type for variable definition in java?
When should I use an interface in java?
why are interfaces created instead of their implementations for every class
What's the difference between these two java variable declarations?
Quick answer? Using interfaces and superclasses increases the portability and maintainability of your code, principally by hiding implementation detail. Take the following hypothetical example:
class Account {
private Collection<Transaction> transactions;
public Account() {
super();
transactions = new ArrayList<Transaction>(4);
}
public Collection<Transaction> getTransactions() {
return transactions;
}
}
I've declared a contract for an Account that states that the transactions posted to the account can be retrieved as a Collection. The callers of my code don't have to care what kind of collection my method actually returns, and shouldn't. And that frees me to change up the internal implementation if I need to, without impacting (aka breaking) unknown number of clients. So to wit, if I discover that I need to impose some kind of uniqueness on my transactions, I can change the implementation shown above from an ArrayList to a HashSet, with no negative impact on anyone using my class.
public Account() {
super();
transactions = new HashSet<Transaction>(4);
}
As far as your second question, I can say that you use the principal of portability and encapsulation wherever they make sense. There are not a terrible lot of CharSequence implementations out there, and String is by far the most used common. So you just won't see alot of developers declaring CharSequence variables in their code.
Using interfaces has the main advantage that you can later change the implementation (the class) without the need to change more than the single line where you create the instance and do the assignment.
For
List<E> ints = new ArrayList<E>();
Map<K, V> map = new HashMap<K, V>();
List and Map are the interfaces, so any class implementing those interfaces can be assigned to these references.
ArrayList is one of the several classes (another is LinkedList) which implement List interface.
Same with Map. HashMap, LinkedHashMap, TreeMap all implement Map.
It is a general principle To program for interfaces and not for implementations. Due to this, the programming task becomes easier. You can dynamically change the behavior of the references.
If you write
ArrayList<E> ints = new ArrayList<E>();
HashMap<K, V> map = new HashMap<K, V>();
ints and map will be ArrayList and HashMap only, forever.
Is a design principle that you program to the interface and not to the implementation.
That way you may provide later a new implementation to the same interface.
From the above link Eric Gamma explains:
This principle is really about dependency relationships which have to be carefully managed in a large app. It's easy to add a dependency on a class. It's almost too easy; just add an import statement and modern Java development tools like Eclipse even write this statement for you. Interestingly the inverse isn't that easy and getting rid of an unwanted dependency can be real refactoring work or even worse, block you from reusing the code in another context. For this reason you have to develop with open eyes when it comes to introducing dependencies. This principle tells us that depending on an interface is often beneficial.
Here, the termin interface refers not only to the Java artifact, but the public interface a given object has, which is basically composed of the methods it has, so, it could be a Java interface ( like List in your example ) or a concrete superclass.
So in your example if you ever want to use a LinkedList instead it would be harder because the type is already declared as ArrayList when just list would've been enough.
Of course, if you need specific methods from a given implementation, you have to declare it of that type.
I hope this helps.
#Bhushan answered why.
To answer your confusion Why nobody uses
CharSequence s = new String("String");
or
OutputStream out = new PrintStream(OutputStream);
CharSequence contains only few common methods. Other classes that implement this interface are mostly buffers and only String is immutable. CharSequence defines common api for classes backed by char array and This interface does not refine the general contracts of the equals and hashCode methods (see javadoc).
OutputStream is low-level api for writing data. Because PrintStream adds extra convenient methods for writing - higher level of abstraction, it's used over OutputStream.
You do this to make sure later when working with the variable you (or anyone using your classes) won't rely on methods specific for the implementation chosen (ArrayList, HashMap, etc.)
The reason behind this is not technical but the stuff you have to read between the lines of code: The List and Map examples says: "I'm only interested in basic list/map stuff, basically you can use anything here." An extreme example of that would be
Iterable<Foo> items = new ArrayList<Foo>();
when you really only want to do some stuff for each thing.
As an added bonus this makes it a little easier to refactor the code later into common utility classes/methods where the concrete type is not required. Or do you want to code your algorithm multiple times for each kind of collection?
The String example on the other hand is not seen wildly, because a) String is special class in Java - each "foo" literal is automatically a String and sooner or later you have to give the characters to some method which only accepts String and b) the CharSequence is really ahh minimal. It does not even support Unicode beyond the BMP properly and it misses most query/manipulation methods of String.
This (good) style of declaring the type as the Interface the class implements is important because it forces us to use methods only defined in the Interface.
As a result, when we need to change our class implementations (i.e. we find our ArraySet is better than the standard HashSet) we are guaranteed that if we change the class our code will work because both classes implement the strictly-enforced Interface.
It is just easier to think of String as of String. As well as it's easier (and more beneficial) to think of WhateverList as of List.
The bonuses are discussed many times, but in brief you simply separate the concerns: when you need a CharSequence, you use it. It's highly unlikely that you need ArrayList only: usually, any List will do.
When you at some point decide to use a different implementation, say:
List<E> ints = new LinkedList<E>();
instead of
List<E> ints = new ArrayList<E>();
this change needs to be done only at a single place.
There is the right balance to strike:
usually you use the type which gives you the most appropriate guarantees. Obviously, a List is also a Collection which is also something Iterable. But a collection does not give you an order, and an iterable does not have an "add" method.
Using ArrayList for the variable type is also reasonable, when you want to be a bit more explicit about the need for fast random access by object position - in a LinkedList, a "get(100)" is a lot slower. (It would be nice if Java had an interface for this, but I don't think there is one. By using ArrayList, you disallow casting an array as list.)
List<E> ints = new ArrayList<E>();
If you write some code that deals only with List then it will work for any class that implements List (e.g. LinkedList, etc). But, if your code directly deals with ArrayList then it's limited to ArrayList.
CharSequence s = new String("String");
Manually instantiating a String object is not good. You should use string literal instead. I am just guessing the reason that you don't see CharSequence might because it's quite new and also, strings are immutable.
This is programming to the interface not the implementation, as per the Gang of Four. This will help to stop the code becoming dependent on methods that are added to particular implementations only, and make it easier to change to use a different implementation if that becomes necessary for whatever reason, e.g. performance.

Why should the interface for a Java class be preferred?

PMD would report a violation for:
ArrayList<Object> list = new ArrayList<Object>();
The violation was "Avoid using implementation types like 'ArrayList'; use the interface instead".
The following line would correct the violation:
List<Object> list = new ArrayList<Object>();
Why should the latter with List be used instead of ArrayList?
Using interfaces over concrete types is the key for good encapsulation and for loose coupling your code.
It's even a good idea to follow this practice when writing your own APIs. If you do, you'll find later that it's easier to add unit tests to your code (using Mocking techniques), and to change the underlying implementation if needed in the future.
Here's a good article on the subject.
Hope it helps!
This is preferred because you decouple your code from the implementation of the list. Using the interface lets you easily change the implementation, ArrayList in this case, to another list implementation without changing any of the rest of the code as long as it only uses methods defined in List.
In general I agree that decoupling interface from implementation is a good thing and will make your code easier to maintain.
There are, however, exceptions that you must consider. Accessing objects through interfaces adds an additional layer of indirection that will make your code slower.
For interest I ran an experiment that generated ten billion sequential accesses to a 1 million length ArrayList. On my 2.4Ghz MacBook, accessing the ArrayList through a List interface took 2.10 seconds on average, when declaring it of type ArrayList it took on average 1.67 seconds.
If you are working with large lists, deep inside an inner loop or frequently called function, then this is something to consider.
ArrayList and LinkedList are two implementations of a List, which is an ordered collection of items. Logic-wise it doesn't matter if you use an ArrayList or a LinkedList, so you shouldn't constrain the type to be that.
This contrasts with say, Collection and List, which are different things (List implies sorting, Collection does not).
Why should the latter with List be used instead of ArrayList?
It's a good practice : Program to interface rather than implementation
By replacing ArrayList with List, you can change List implementation in future as below depending on your business use case.
List<Object> list = new LinkedList<Object>();
/* Doubly-linked list implementation of the List and Deque interfaces.
Implements all optional list operations, and permits all elements (including null).*/
OR
List<Object> list = new CopyOnWriteArrayList<Object>();
/* A thread-safe variant of ArrayList in which all mutative operations
(add, set, and so on) are implemented by making a fresh copy of the underlying array.*/
OR
List<Object> list = new Stack<Object>();
/* The Stack class represents a last-in-first-out (LIFO) stack of objects.*/
OR
some other List specific implementation.
List interface defines contract and specific implementation of List can be changed. In this way, interface and implementation are loosely coupled.
Related SE question:
What does it mean to "program to an interface"?
Even for local variables, using the interface over the concrete class helps. You may end up calling a method that is outside the interface and then it is difficult to change the implementation of the List if necessary.
Also, it is best to use the least specific class or interface in a declaration. If element order does not matter, use a Collection instead of a List. That gives your code the maximum flexibility.
Properties of your classes/interfaces should be exposed through interfaces because it gives your classes a contract of behavior to use, regardless of the implementation.
However...
In local variable declarations, it makes little sense to do this:
public void someMethod() {
List theList = new ArrayList();
//do stuff with the list
}
If its a local variable, just use the type. It is still implicitly upcastable to its appropriate interface, and your methods should hopefully accept the interface types for its arguments, but for local variables, it makes total sense to use the implementation type as a container, just in case you do need the implementation-specific functionality.
In general for your line of code it does not make sense to bother with interfaces. But, if we are talking about APIs there is a really good reason. I got small class
class Counter {
static int sizeOf(List<?> items) {
return items.size();
}
}
In this case is usage of interface required. Because I want to count size of every possible implementation including my own custom. class MyList extends AbstractList<String>....
Interface is exposed to the end user. One class can implement multiple interface. User who have expose to specific interface have access to some specific behavior which are defined in that particular interface.
One interface also have multiple implementation. Based on the scenario system will work with different scenario (Implementation of the interface).
let me know if you need more explanation.
The interface often has better representation in the debugger view than the concrete class.

Categories

Resources