Saw the code snippet like
Set<Record> instances = new HashSet<Record>();
I am wondering if Hashset is a special kind of set. Any difference between them?
A Set represents a generic "set of values". A TreeSet is a set where the elements are sorted (and thus ordered), a HashSet is a set where the elements are not sorted or ordered.
A HashSet is typically a lot faster than a TreeSet.
A TreeSet is typically implemented as a red-black tree (See http://en.wikipedia.org/wiki/Red-black_tree - I've not validated the actual implementation of sun/oracle's TreeSet), whereas a HashSet uses Object.hashCode() to create an index in an array. Access time for a red-black tree is O(log(n)) whereas access time for a HashSet ranges from constant-time to the worst case (every item has the same hashCode) where you can have a linear search time O(n).
The HashSet is an implementation of a Set.
Set is a collection that contains no duplicate elements. Set is an interface.
HashSet implements the Set interface, backed by a hash table (actually a HashMap instance).
Since HashSet is one of the specific implementations of Set interface.
ASet can be any of following since it was implemented by below classes
ConcurrentSkipListSet : A scalable concurrent NavigableSet implementation based on a ConcurrentSkipListMap. The elements of the set are kept sorted according to their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
CopyOnWriteArraySet : A Set that uses an internal CopyOnWriteArrayList for all of its operations.
EnumSet : A specialized Set implementation for use with enum types. All of the elements in an enum set must come from a single enum type that is specified, explicitly or implicitly, when the set is created.
TreeSet :A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
LinkedHashSet: ash table and linked list implementation of the Set interface, with predictable iteration order. This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries.
But HashSet can be only LinkedHashSet since LinkedHashSet subclasses HashSet
The question has been answered, but I haven't seen the answer to why the code mentions both types in the same code.
Typically, you want to code against interfaces which in this case is Set. Why? Because if you reference your object through interfaces always (except the new HashSet()) then it is trivial to change the implementation of the object later if you find it would be better to do so because you've only mentioned it once in your code base (where you did new HashSet()).
Set is the general interface to a set-like collection, while HashSet is a specific implementation of the Set interface (which uses hash codes, hence the name).
Set is a parent interface of all set classes like TreeSet, LinkedHashSet etc.
HashSet is a class implementing Set interface.
HashSet is a class derived from Set interface. As a derived class of Set, the HashSet attains the properties of Set. Important and the most frequently used derived classes of Set are HashSet and TreeSet.
**
Set:
**
It is an interface which is a subtype of Collection interface, just like LIST and QUEUE.
Set has below 3 subclasses, it is used to store multiple objects without duplicates.
HashSet
LinkedHashSet
TreeSet(which implements SortedSet interface)
**
HashSet:
**
Can use one NULL value(as Duplicate is not allowed), data is stored randomly as it does not maintain sequence.
Related
I know LinkedHashMap has a predictable iteration order (insertion order). Does the Set returned by LinkedHashMap.keySet() and the Collection returned by LinkedHashMap.values() also maintain this order?
The Map interface provides three
collection views, which allow a map's contents to be viewed as a set
of keys, collection of values, or set
of key-value mappings. The order of
a map is defined as the order in which
the iterators on the map's collection
views return their elements. Some map
implementations, like the TreeMap
class, make specific guarantees as to
their order; others, like the
HashMap class, do not.
-- Map
This linked list defines the iteration
ordering, which is normally the order
in which keys were inserted into the
map (insertion-order).
-- LinkedHashMap
So, yes, keySet(), values(), and entrySet() (the three collection views mentioned) return values in the order the internal linked list uses. And yes, the JavaDoc for Map and LinkedHashMap guarantee it.
That is the point of this class, after all.
Looking at the source, it looks like it does. keySet(), values(), and entrySet() all use the same entry iterator internally.
Don't get confused with LinkedHashMap.keySet() and LinkedHashMap.entrySet() returning Set and hence it should not guarantee ordering !
Set is an interface with HashSet,TreeSet etc beings its implementations. The HashSet implementation of Set interface does not guarantees ordering. But TreeSet does. Also LinkedHashSet does.
Therefore it depends on how Set has been implemented in LinkedHashMap to know whether the returning Set reference will guarantee ordering or not.
I went through the source code of LinkedHashMap, it looks like this:
private final class KeySet extends AbstractSet<K> {...}
public abstract class AbstractSet<E> extends AbstractCollection<E> implements Set<E> {...}
Thus LinkedHashMap/HashMap has its own implementation of Set i.e. KeySet. Thus don't confuse this with HashSet.
Also, the order is maintained by how the elements are inserted into the bucket. Look at the addEntry(..) method of LinkedHashMap and compare it with that of HashMap which highlights the main difference between HashMap and LinkedHashMap.
You can assume so. The Javadoc says 'predictable iteration order', and the only iterators available in a Map are those for the keySet(), entrySet(), and values().
So in the absence of any further qualification it is clearly intended to apply to all of those iterators.
AFAIK it is not documented so you cannot "formally" assume so. It is unlikely, however, that the current implementation would change.
If you want to ensure order, you may want to iterate over the map entires and insert them into a sorted set with an order function of your choice, though you will be paying a performance cost, naturally.
I need to store objects in a collection in the order they were added, that's why I need a List. However, the list should contain no duplicates. I also need to quickly determine if an object already exists in the collection. Instead of iterating the list every time, it would be better to have something like a HashSet. I can quickly both find and add elements and preserve the insertion order.
The question is - should I:
extend ArrayList by adding a HashSet field?
implement one of the Java collection interfaces (List or Set)?
simply create a new class with two fields - ArrayList and
HashSet?
The 1st option has the disadvantage - I don't need all of the ArrayList methods, so I'd have to override all of them so that users of my class don't call base class methods that would simply mess things up (for instance, one could remove an object from the list but the object would still exist in the set). And there's no way to remove the base class methods (except from overriding it and throwing an exception).
Similarly for 2, I'd really have to implement all methods of the interface.
The 3rd option looks the best to me, but it makes the code implementation dependent, because my class doesn't implement any interface.
What should I do in this case? I'd like to have all add methods the List interface has. - LinkedHashSet is not an option.
You could use a LinkedHashSet, which a Set implementation that ensures that iteration order is the same order you added elements in.
Hash table and linked list implementation of the Set interface, with predictable iteration order. ... This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order).
No need to implements anything on your own. Use LinkedHashSet which maintains encounter order.
Does a Java Set retain order? A method is returning a Set to me and supposedly the data is ordered but iterating over the Set, the data is unordered. Is there a better way to manage this? Does the method need to be changed to return something other than a Set?
The Set interface does not provide any ordering guarantees.
Its sub-interface SortedSet represents a set that is sorted according to some criterion. In Java 6, there are two standard containers that implement SortedSet. They are TreeSet and ConcurrentSkipListSet.
In addition to the SortedSet interface, there is also the LinkedHashSet class. It remembers the order in which the elements were inserted into the set, and returns its elements in that order.
LinkedHashSet is what you need.
As many of the members suggested use LinkedHashSet to retain the order of the collection.
U can wrap your set using this implementation.
SortedSet implementation can be used for sorted order but for your purpose use LinkedHashSet.
Also from the docs,
"This implementation spares its clients from the unspecified, generally chaotic ordering provided by HashSet, without incurring the increased cost associated with TreeSet. It can be used to produce a copy of a set that has the same order as the original, regardless of the original set's implementation:"
Source : http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashSet.html
Set is just an interface. In order to retain order, you have to use a specific implementation of that interface and the sub-interface SortedSet, for example TreeSet or LinkedHashSet. You can wrap your Set this way:
Set myOrderedSet = new LinkedHashSet(mySet);
To retain the order use List or a LinkedHashSet.
Here is a quick summary of the order characteristics of the standard Set implementations available in Java:
keep the insertion order: LinkedHashSet and CopyOnWriteArraySet (thread-safe)
keep the items sorted within the set: TreeSet, EnumSet (specific to enums) and ConcurrentSkipListSet (thread-safe)
does not keep the items in any specific order: HashSet (the one you tried)
For your specific case, you can either sort the items first and then use any of 1 or 2 (most likely LinkedHashSet or TreeSet). Or alternatively and more efficiently, you can just add unsorted data to a TreeSet which will take care of the sorting automatically for you.
A LinkedHashSet is an ordered version of HashSet that maintains a doubly-linked List across all elements. Use this class instead of HashSet when you care about the iteration order.
From the javadoc for Set.iterator():
Returns an iterator over the elements in this set. The elements are returned in no particular order (unless this set is an instance of some class that provides a guarantee).
And, as already stated by shuuchan, a TreeSet is an implemention of Set that has a guaranteed order:
The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
Normally set does not keep the order, such as HashSet in order to quickly find a emelent, but you can try LinkedHashSet it will keep the order which you put in.
There are 2 different things.
Sort the elements in a set. For which we have SortedSet and similar implementations.
Maintain insertion order in a set. For which LinkedHashSet and CopyOnWriteArraySet (thread-safe) can be used.
The Set interface itself does not stipulate any particular order. The SortedSet does however.
Iterator returned by Set is not suppose to return data in Ordered way.
See this Two java.util.Iterators to the same collection: do they have to return elements in the same order?
Only SortedSet can do the ordering of the Set
Why does Collections.sort() apply only for Lists and not for Sets? Is there any particular reason?
Most (but not all) Set implementations do not have a concept of order, so Collections.sort does not support them as a whole. If you want a set with a concept of order, you can use something like a TreeSet:
A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
Or a LinkedHashSet:
Hash table and linked list implementation of the Set interface, with predictable iteration order. This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order)
A Set, by definition, has no order.
A Set is not a List. While a List, by contract, is supposed to retain insertion order (otherwise, methods such as .get(someindex) would not make any sense), this is not the case for a Set. You have no method to get an element at a particular index in a Set! Neither do you have methods to insert at a particular position etc.
More specifically, the ordering of Set is undefined; however, implementations of Set can add ordering constraints.
For instance:
LinkedHashSet retains insertion ordering;
TreeSet maintains natural ordering of its elements, either because its elements implement Comparable, or because you supply a Comparator.
If you sorted a LinkedHashSet, you would break its insertion ordering guarantee!
A set is not ordered. You can use SortedSet. Or you can create a List from the set and sort it.
List is an ordered set of elements while Set is not which implies that none of Set elements will have any sequence number. So you can't sort it.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why does HashSet implementation in Sun Java use HashMap as its backing?
I know what a hashset and hashmap is - pretty well versed with them.
There is 1 thing which really puzzled me.
Example:
Set <String> testing= new HashSet <String>();
Now if you debug it using eclipse right after the above statements, under debugger variables tab, you will noticed that the set 'testing' internally is implemented as a hashmap.
Why does it need a hashmap since there is no key,value pair involved in sets collection
It's an implementation detail. The HashMap is actually used as the backing store for the HashSet. From the docs:
This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.
(emphasis mine)
The answer is right in the API docs
"This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important."
So you don't even need the debugger to know this.
In answer to your question: it is an implementation detail. It doesn't need to use a HashMap, but it is probably just good code re-use. If you think about it, in this case the only difference is that a Set has different semantics from a Map. Namely, maps have a get(key) method, and Sets do not. Sets do not allow duplicates, Maps allow duplicate values, but they must be under different keys.
It is probably really easy to use a HashMap as the backing of a HashSet, because all you would have to do would be to use hashCode (defined on all objects) on the value you are putting in the Set to determine if a dupe, i.e., it is probably just doing something like
backingHashMap.put(toInsert.hashCode(), toInsert);
to insert items into the Set.
In most cases the Set is implemented as wrapper for the keySet() of a Map. This avoids duplicate implementations. If you look at the source you will see how it does this.
You might find the method Collections.newSetFromMap() which can be used to wrap ConcurrentHashMap for example.
The very first sentence of the class's Javadoc states that it is backed by a HashMap:
This class implements the Set interface, backed by a hash table (actually a HashMap instance).
If you'll look at the source code of HashSet you'll see that what it stores in the map is as the key is the entry you are using, and the value is a mere marker Object (named PRESENT).
Why is it backed by a HashMap? Because this is the simplest way to store a set of items in a (conceptual) hashtable and there is no need for HashSet to re-invent an implementation of a hashtable data structure.
It's just a matter of convenience that the standard Java class library implements HashSet using a HashMap -- they only need to implement one data structure and then HashSet stores its data in a HashMap with the actual set objects as the key and a dummy value (typically Boolean.TRUE) as the value.
HashMap has already all the functionality that HashSet requires. There would be no sense to duplicate the same algorithms.
it allows you to easily and quickly determine whether an object is already in the set or not.