Does a Java Set retain order? A method is returning a Set to me and supposedly the data is ordered but iterating over the Set, the data is unordered. Is there a better way to manage this? Does the method need to be changed to return something other than a Set?
The Set interface does not provide any ordering guarantees.
Its sub-interface SortedSet represents a set that is sorted according to some criterion. In Java 6, there are two standard containers that implement SortedSet. They are TreeSet and ConcurrentSkipListSet.
In addition to the SortedSet interface, there is also the LinkedHashSet class. It remembers the order in which the elements were inserted into the set, and returns its elements in that order.
LinkedHashSet is what you need.
As many of the members suggested use LinkedHashSet to retain the order of the collection.
U can wrap your set using this implementation.
SortedSet implementation can be used for sorted order but for your purpose use LinkedHashSet.
Also from the docs,
"This implementation spares its clients from the unspecified, generally chaotic ordering provided by HashSet, without incurring the increased cost associated with TreeSet. It can be used to produce a copy of a set that has the same order as the original, regardless of the original set's implementation:"
Source : http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashSet.html
Set is just an interface. In order to retain order, you have to use a specific implementation of that interface and the sub-interface SortedSet, for example TreeSet or LinkedHashSet. You can wrap your Set this way:
Set myOrderedSet = new LinkedHashSet(mySet);
To retain the order use List or a LinkedHashSet.
Here is a quick summary of the order characteristics of the standard Set implementations available in Java:
keep the insertion order: LinkedHashSet and CopyOnWriteArraySet (thread-safe)
keep the items sorted within the set: TreeSet, EnumSet (specific to enums) and ConcurrentSkipListSet (thread-safe)
does not keep the items in any specific order: HashSet (the one you tried)
For your specific case, you can either sort the items first and then use any of 1 or 2 (most likely LinkedHashSet or TreeSet). Or alternatively and more efficiently, you can just add unsorted data to a TreeSet which will take care of the sorting automatically for you.
A LinkedHashSet is an ordered version of HashSet that maintains a doubly-linked List across all elements. Use this class instead of HashSet when you care about the iteration order.
From the javadoc for Set.iterator():
Returns an iterator over the elements in this set. The elements are returned in no particular order (unless this set is an instance of some class that provides a guarantee).
And, as already stated by shuuchan, a TreeSet is an implemention of Set that has a guaranteed order:
The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
Normally set does not keep the order, such as HashSet in order to quickly find a emelent, but you can try LinkedHashSet it will keep the order which you put in.
There are 2 different things.
Sort the elements in a set. For which we have SortedSet and similar implementations.
Maintain insertion order in a set. For which LinkedHashSet and CopyOnWriteArraySet (thread-safe) can be used.
The Set interface itself does not stipulate any particular order. The SortedSet does however.
Iterator returned by Set is not suppose to return data in Ordered way.
See this Two java.util.Iterators to the same collection: do they have to return elements in the same order?
Only SortedSet can do the ordering of the Set
Related
I know LinkedHashMap has a predictable iteration order (insertion order). Does the Set returned by LinkedHashMap.keySet() and the Collection returned by LinkedHashMap.values() also maintain this order?
The Map interface provides three
collection views, which allow a map's contents to be viewed as a set
of keys, collection of values, or set
of key-value mappings. The order of
a map is defined as the order in which
the iterators on the map's collection
views return their elements. Some map
implementations, like the TreeMap
class, make specific guarantees as to
their order; others, like the
HashMap class, do not.
-- Map
This linked list defines the iteration
ordering, which is normally the order
in which keys were inserted into the
map (insertion-order).
-- LinkedHashMap
So, yes, keySet(), values(), and entrySet() (the three collection views mentioned) return values in the order the internal linked list uses. And yes, the JavaDoc for Map and LinkedHashMap guarantee it.
That is the point of this class, after all.
Looking at the source, it looks like it does. keySet(), values(), and entrySet() all use the same entry iterator internally.
Don't get confused with LinkedHashMap.keySet() and LinkedHashMap.entrySet() returning Set and hence it should not guarantee ordering !
Set is an interface with HashSet,TreeSet etc beings its implementations. The HashSet implementation of Set interface does not guarantees ordering. But TreeSet does. Also LinkedHashSet does.
Therefore it depends on how Set has been implemented in LinkedHashMap to know whether the returning Set reference will guarantee ordering or not.
I went through the source code of LinkedHashMap, it looks like this:
private final class KeySet extends AbstractSet<K> {...}
public abstract class AbstractSet<E> extends AbstractCollection<E> implements Set<E> {...}
Thus LinkedHashMap/HashMap has its own implementation of Set i.e. KeySet. Thus don't confuse this with HashSet.
Also, the order is maintained by how the elements are inserted into the bucket. Look at the addEntry(..) method of LinkedHashMap and compare it with that of HashMap which highlights the main difference between HashMap and LinkedHashMap.
You can assume so. The Javadoc says 'predictable iteration order', and the only iterators available in a Map are those for the keySet(), entrySet(), and values().
So in the absence of any further qualification it is clearly intended to apply to all of those iterators.
AFAIK it is not documented so you cannot "formally" assume so. It is unlikely, however, that the current implementation would change.
If you want to ensure order, you may want to iterate over the map entires and insert them into a sorted set with an order function of your choice, though you will be paying a performance cost, naturally.
I have data of which the sequence is as important as its unique elements. Meaning if something has already been added it should not be added again and the sequence must be remembered.
Set does not remember the sequence in which it was added (either hash or sort), and List is not unique.
What is the best solution to this problem?
Should one have a list and loop through it to test for uniqueness - which I'm trying to avoid?
Or should one have two collections, one a List and one a Set - which I'm also trying to avoid?
Or is there a different solution to this problem altogether.
In the bellow code was your reference
LinkedHashSet<String> al=new LinkedHashSet<String>();
al.add("guru");
al.add("karthik");
al.add("raja");
al.add("karthik");
Iterator<String> itr=al.iterator();
while(itr.hasNext()){
System.out.println(itr.next());
}
output
guru
karthik
raja
Use LinkedHashSet. It serves as both a List and a Set. It has the uniqueness quality of a set but still remembers the order in which you inserted items to it which allows you to iterate it by order of insertion.
From the Docs:
Hash table and linked list implementation of the Set interface, with predictable iteration order. This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order). Note that insertion order is not affected if an element is re-inserted into the set. (An element e is reinserted into a set s if s.add(e) is invoked when s.contains(e) would return true immediately prior to the invocation.)
You can use SortedSet
or LinkedHashSet
LinkedHashSet is the best possible way out
I have a Set<String> set that I persist with Neo4j Spring in java. To be able to retrieve elements from that set in the order that elements were added to it. Sets do not retain order. I have tried using a Collection<String>/List<String> instead because Listss have ordering, but Neo4j doesn't like Collection. What else can be used for ordered storage?
EDIT: By order, I mean insertion order.
There is a special implementation of Set, the class TreeSet keeps the elements in the set sorted, either by their natural ordering or by asking a Comparator how they should be ordered. TreeSets reorders the set whenever you add/remove elements.
There is also the LinkedHashSet implementation which keeps the items according to the insertion order.
Collection is an interface that the interfaces Set and List both extend. (And other interfaces as well)
Collection does not guarantee ordering. All they care about is the possibility to add and remove elements. Set does not allow more than one copy of each element to be added. The Set interface itself does not guarantee ordering. The List interface guarantee ordering but also allows multiple copies of the same element.
Summary: For your case, use LinkedHashSet.
List is an ordered collection (also known as a sequence). The user of this interface has precise control over where in the list each element is inserted. The user can access elements by their integer index (position in the list), and search for elements in the list.
List<String> list new ArrayList<String>();
Why does Collections.sort() apply only for Lists and not for Sets? Is there any particular reason?
Most (but not all) Set implementations do not have a concept of order, so Collections.sort does not support them as a whole. If you want a set with a concept of order, you can use something like a TreeSet:
A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
Or a LinkedHashSet:
Hash table and linked list implementation of the Set interface, with predictable iteration order. This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order)
A Set, by definition, has no order.
A Set is not a List. While a List, by contract, is supposed to retain insertion order (otherwise, methods such as .get(someindex) would not make any sense), this is not the case for a Set. You have no method to get an element at a particular index in a Set! Neither do you have methods to insert at a particular position etc.
More specifically, the ordering of Set is undefined; however, implementations of Set can add ordering constraints.
For instance:
LinkedHashSet retains insertion ordering;
TreeSet maintains natural ordering of its elements, either because its elements implement Comparable, or because you supply a Comparator.
If you sorted a LinkedHashSet, you would break its insertion ordering guarantee!
A set is not ordered. You can use SortedSet. Or you can create a List from the set and sort it.
List is an ordered set of elements while Set is not which implies that none of Set elements will have any sequence number. So you can't sort it.
Saw the code snippet like
Set<Record> instances = new HashSet<Record>();
I am wondering if Hashset is a special kind of set. Any difference between them?
A Set represents a generic "set of values". A TreeSet is a set where the elements are sorted (and thus ordered), a HashSet is a set where the elements are not sorted or ordered.
A HashSet is typically a lot faster than a TreeSet.
A TreeSet is typically implemented as a red-black tree (See http://en.wikipedia.org/wiki/Red-black_tree - I've not validated the actual implementation of sun/oracle's TreeSet), whereas a HashSet uses Object.hashCode() to create an index in an array. Access time for a red-black tree is O(log(n)) whereas access time for a HashSet ranges from constant-time to the worst case (every item has the same hashCode) where you can have a linear search time O(n).
The HashSet is an implementation of a Set.
Set is a collection that contains no duplicate elements. Set is an interface.
HashSet implements the Set interface, backed by a hash table (actually a HashMap instance).
Since HashSet is one of the specific implementations of Set interface.
ASet can be any of following since it was implemented by below classes
ConcurrentSkipListSet : A scalable concurrent NavigableSet implementation based on a ConcurrentSkipListMap. The elements of the set are kept sorted according to their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
CopyOnWriteArraySet : A Set that uses an internal CopyOnWriteArrayList for all of its operations.
EnumSet : A specialized Set implementation for use with enum types. All of the elements in an enum set must come from a single enum type that is specified, explicitly or implicitly, when the set is created.
TreeSet :A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
LinkedHashSet: ash table and linked list implementation of the Set interface, with predictable iteration order. This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries.
But HashSet can be only LinkedHashSet since LinkedHashSet subclasses HashSet
The question has been answered, but I haven't seen the answer to why the code mentions both types in the same code.
Typically, you want to code against interfaces which in this case is Set. Why? Because if you reference your object through interfaces always (except the new HashSet()) then it is trivial to change the implementation of the object later if you find it would be better to do so because you've only mentioned it once in your code base (where you did new HashSet()).
Set is the general interface to a set-like collection, while HashSet is a specific implementation of the Set interface (which uses hash codes, hence the name).
Set is a parent interface of all set classes like TreeSet, LinkedHashSet etc.
HashSet is a class implementing Set interface.
HashSet is a class derived from Set interface. As a derived class of Set, the HashSet attains the properties of Set. Important and the most frequently used derived classes of Set are HashSet and TreeSet.
**
Set:
**
It is an interface which is a subtype of Collection interface, just like LIST and QUEUE.
Set has below 3 subclasses, it is used to store multiple objects without duplicates.
HashSet
LinkedHashSet
TreeSet(which implements SortedSet interface)
**
HashSet:
**
Can use one NULL value(as Duplicate is not allowed), data is stored randomly as it does not maintain sequence.