I need to store objects in a collection in the order they were added, that's why I need a List. However, the list should contain no duplicates. I also need to quickly determine if an object already exists in the collection. Instead of iterating the list every time, it would be better to have something like a HashSet. I can quickly both find and add elements and preserve the insertion order.
The question is - should I:
extend ArrayList by adding a HashSet field?
implement one of the Java collection interfaces (List or Set)?
simply create a new class with two fields - ArrayList and
HashSet?
The 1st option has the disadvantage - I don't need all of the ArrayList methods, so I'd have to override all of them so that users of my class don't call base class methods that would simply mess things up (for instance, one could remove an object from the list but the object would still exist in the set). And there's no way to remove the base class methods (except from overriding it and throwing an exception).
Similarly for 2, I'd really have to implement all methods of the interface.
The 3rd option looks the best to me, but it makes the code implementation dependent, because my class doesn't implement any interface.
What should I do in this case? I'd like to have all add methods the List interface has. - LinkedHashSet is not an option.
You could use a LinkedHashSet, which a Set implementation that ensures that iteration order is the same order you added elements in.
Hash table and linked list implementation of the Set interface, with predictable iteration order. ... This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order).
No need to implements anything on your own. Use LinkedHashSet which maintains encounter order.
Related
As my understand, Set can help to prevent duplicate record, but it can not be sort.
If use List, then it can be sort, but I cant prevent duplicate record.
If I use TreeSet, then the object in my TreeSet must implements Comparable, which is need big changes in my project, and if possible, I not prefer to change this.
Maybe I can convert the Set to List, and then sort it. However, my program follow is a bit complicated, means after I sort it, I still need to add in object inside this list, and at the same time, I would like to prevent duplicate, so, to do this, I have to convert the sorted List to Set again, and then only add in new object inside.
I would like to seek for an approach which can make my collection sorted according to object value and prevent duplicate record at the same time.
Hopefully I am not asking wrong things on this.
TreeSet seems the best fit for your requirements.
If I use TreeSet, then the object in my TreeSet must implements Comparable
That's not true. TreeSet doesn't require your element type to implement Comparable. You can pass a Comparator to the TreeSet constructor instead.
Since your element type doesn't implement Comparable, you would have needed a Comparator anyway if you were going to sort a List, so instead of sorting a List, use that Comparator with a TreeSet.
This is more of a theoretical question. If I have an arbitrary collection c that isn't ordered and I obtain two java.util.Iterators by calling c.iterator() twice, do both iterators have to return c's elements in the same order?
I mean, in practice they probably always will, but are they forced to do so by contract?
Thanks,
Jan
No they are not.
"There are no guarantees concerning the order in which the elements are
returned (unless this collection is an instance of some class that
provides a guarantee)."
See the Collection#iterator api contract.
That includes from one iterator to the next (as it doesn't say anything about requiring that).
Also consider that something could have changed in the underlying collection between getting those two iterators! Something added or removed.
Implementation of Iterators are provided by the specific Collection class. Iterator for List will give the ordered element while Set will not
Because most Data structures are not ordered by default so it is not certain that they will iterate in same order.
If you want same order you have to sort the collection first.
I'm looking for a constantly sorted list in java, which can also be used to retrieve an object very quickly. PriorityQueue works great for the "constantly sorted" requirement, and HashMap works great for the fast retrieval by key, but I need both in the same list. At one point I had wrote my own, but it does not implement the collections interfaces (so can't be used as a drop-in replacement for a java.util.List etc), and I'd rather stick to standard java classes if possible.
Is there such a list out there? Right now I'm using 2 lists, a priority queue and a hashmap, both contain the same objects. I use the priority queue to traverse the first part of the list in sorted order, the hashmap for fast retrieval by key (I need to do both operations interchangeably), but I'm hoping for a more elegant solution...
Edit: I should add that I need to have the list sorted by a different comparator then what is used for retrieval by key; the list is sorted by a long value, the key retrieval is a String.
Since you're already using HashMap, that implies that you have unique keys. Assuming that you want to order by those keys, TreeMap is your answer.
It sounds like what you're talking about is a collection with an automatically-maintained index.
Try looking at GlazedLists which use "list pipelines" to efficiently propagate changes -- their SortedList class should do the job.
edit: missed your retrieval-by-key requirement. That can be accomplished with GlazedLists.syncEventListToMap and GlazedLists.syncEventListToMultimap -- syncEventListToMap works if there are no duplicate keys, and syncEventListToMultimap works if there are duplicate keys. The nice part about this approach is that you can create multiple maps based on different indices.
If you want to use TreeMaps for indices -- which may give you better performance -- you need to keep your TreeMaps privately encapsulated within a custom class of your choosing, that exposes the interfaces/methods you want, and create accessors/mutators for that class to keep the indices in sync with the collection. Be sure to deal with concurrency issues (via synchronized methods or locks or whatever) if you access the collection from multiple threads.
edit: finally, if fast traversal of the items in sorted order is important, consider using ConcurrentSkipListMap instead of TreeMap -- not for its concurrency, but for its fast traversal. Skip lists are linked lists with multiple levels of linkage, one that traverses all items, the next that traverses every K items on average (for a given constant K), the next that traverses every K2 items on average, etc.
TreeMap
http://download.oracle.com/javase/6/docs/api/java/util/TreeMap.html
Go with a TreeSet.
A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).
I haven't tested this so I might be wrong, so consider this just an attempt.
Use TreeMap, wrap the key of this map as an object which has two attributes (the string which you use as the key in hashmap and the long which you use to maintain the sort order in PriorityQueue). Now for this object, override the equals and hashcode method using the string. Implement the comparable interface using the long.
Why don't you encapsulate your solution to a class that implements Collection or Map?
This way you could simply delegate the retrieval methods to the faster/better suiting collection. Just make sure that calls to write-methods (add/remove/put) will be forwarded to both collections. Remember indirect accesses, like iterator.remove(). Most of these methods are optional to implement, but you have to deactivate them (Collections.unmodifiableXXX will help here in most cases).
Saw the code snippet like
Set<Record> instances = new HashSet<Record>();
I am wondering if Hashset is a special kind of set. Any difference between them?
A Set represents a generic "set of values". A TreeSet is a set where the elements are sorted (and thus ordered), a HashSet is a set where the elements are not sorted or ordered.
A HashSet is typically a lot faster than a TreeSet.
A TreeSet is typically implemented as a red-black tree (See http://en.wikipedia.org/wiki/Red-black_tree - I've not validated the actual implementation of sun/oracle's TreeSet), whereas a HashSet uses Object.hashCode() to create an index in an array. Access time for a red-black tree is O(log(n)) whereas access time for a HashSet ranges from constant-time to the worst case (every item has the same hashCode) where you can have a linear search time O(n).
The HashSet is an implementation of a Set.
Set is a collection that contains no duplicate elements. Set is an interface.
HashSet implements the Set interface, backed by a hash table (actually a HashMap instance).
Since HashSet is one of the specific implementations of Set interface.
ASet can be any of following since it was implemented by below classes
ConcurrentSkipListSet : A scalable concurrent NavigableSet implementation based on a ConcurrentSkipListMap. The elements of the set are kept sorted according to their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
CopyOnWriteArraySet : A Set that uses an internal CopyOnWriteArrayList for all of its operations.
EnumSet : A specialized Set implementation for use with enum types. All of the elements in an enum set must come from a single enum type that is specified, explicitly or implicitly, when the set is created.
TreeSet :A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
LinkedHashSet: ash table and linked list implementation of the Set interface, with predictable iteration order. This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries.
But HashSet can be only LinkedHashSet since LinkedHashSet subclasses HashSet
The question has been answered, but I haven't seen the answer to why the code mentions both types in the same code.
Typically, you want to code against interfaces which in this case is Set. Why? Because if you reference your object through interfaces always (except the new HashSet()) then it is trivial to change the implementation of the object later if you find it would be better to do so because you've only mentioned it once in your code base (where you did new HashSet()).
Set is the general interface to a set-like collection, while HashSet is a specific implementation of the Set interface (which uses hash codes, hence the name).
Set is a parent interface of all set classes like TreeSet, LinkedHashSet etc.
HashSet is a class implementing Set interface.
HashSet is a class derived from Set interface. As a derived class of Set, the HashSet attains the properties of Set. Important and the most frequently used derived classes of Set are HashSet and TreeSet.
**
Set:
**
It is an interface which is a subtype of Collection interface, just like LIST and QUEUE.
Set has below 3 subclasses, it is used to store multiple objects without duplicates.
HashSet
LinkedHashSet
TreeSet(which implements SortedSet interface)
**
HashSet:
**
Can use one NULL value(as Duplicate is not allowed), data is stored randomly as it does not maintain sequence.
What is the difference between Collection and List in Java? When should I use which?
First off: a List is a Collection. It is a specialized Collection, however.
A Collection is just that: a collection of items. You can add stuff, remove stuff, iterate over stuff and query how much stuff is in there.
A List adds the information about a defined sequence of stuff to it: You can get the element at position n, you can add an element at position n, you can remove the element at position n.
In a Collection you can't do that: "the 5th element in this collection" isn't defined, because there is no defined order.
There are other specialized Collections as well, for example a Set which adds the feature that it will never contain the same element twice.
Collection is the root interface to the java Collections hierarchy. List is one sub interface which defines an ordered Collection, other sub interfaces are Queue which typically will store elements ready for processing (e.g. stack).
The following diagram demonstrates the relationship between the different java collection types:
Java API is the best to answer this
Collection
The root interface in the collection
hierarchy. A collection represents a
group of objects, known as its
elements. Some collections allow
duplicate elements and others do not.
Some are ordered and others unordered.
The JDK does not provide any direct
implementations of this interface: it
provides implementations of more
specific subinterfaces like Set and
List. This interface is typically used
to pass collections around and
manipulate them where maximum
generality is desired.
List (extends Collection)
An ordered collection (also known as a
sequence). The user of this interface
has precise control over where in the
list each element is inserted. The
user can access elements by their
integer index (position in the list),
and search for elements in the list.
Unlike sets, lists typically allow
duplicate elements. More formally,
lists typically allow pairs of
elements e1 and e2 such that
e1.equals(e2), and they typically
allow multiple null elements if they
allow null elements at all. It is not
inconceivable that someone might wish
to implement a list that prohibits
duplicates, by throwing runtime
exceptions when the user attempts to
insert them, but we expect this usage
to be rare.
List and Set are two subclasses of Collection.
In List, data is in particular order.
In Set, it can not contain the same data twice.
In Collection, it just stores data with no particular order and can contain duplicate data.
Collection is the Super interface of List so every Java list is as well an instance of collection. Collections are only iterable sequentially (and in no particular order) whereas a List allows access to an element at a certain position via the get(int index) method.
Collection is the main interface of Java Collections hierarchy and List(Sequence) is one of the sub interfaces that defines an ordered collection.
Collection is a high-level interface describing Java objects that can contain collections of other objects. It's not very specific about how they are accessed, whether multiple copies of the same object can exist in the same collection, or whether the order is important. List is specifically an ordered collection of objects. If you put objects into a List in a particular order, they will stay in that order.
And deciding where to use these two interfaces is much less important than deciding what the concrete implementation you use is. This will have implications for the time and space performance of your program. For example, if you want a list, you could use an ArrayList or a LinkedList, each of which is going to have implications for the application. For other collection types (e.g. Sets), similar considerations apply.