What are the pros and cons of a TreeSet [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Just wondering what the pros and cons of a TreeSet is, if anyone could tell me please? Thanks!

One of the Collection classes. It lets you access the elements in your collection by key, or sequentially by key. It has considerably more overhead than ArrayList or HashMap. Use HashSet when you don’t need sequential access, just lookup by key. Use an ArrayList and use Arrays. sort if you just want the elements in order. TreeSet keeps the elements in order at all times. With ArrayList you just sort when you need to.
With TreeSets the key must be embedded in the object you store in the collection. Often you might have TreeSet of Strings. All you can do then is tell if a given String is in the Set. It won’t find you an associated object he way a Treemap will. With a TreeMap the keys and the objects they are associated with are separate.
TreeSet and its brother TreeMap oddly have nothing to do with representing trees. Internally they use a tree organisation to give you an alphabetically sorted Set/Map, but you have no control over links between parents and children.
Internally TreeSet uses red-black trees. There is no need to presort the data to get a well-balanced tree. On the other hand, if the data are sorted (ascending or descending), it won’t hurt as it does with some other types of tree.
If you don’t supply a Comparator to define the ordering you want, TreeSet requires a Comparable implementation on the item class to define the natural order.

Cons: One pitfall with TreeSet is that it implements the Set interface in an unexpected way.
If a TreeSet contains object a, then object b is considered part of the set if a.compareTo(b) returns 0, even if a.equals(b) is false, so if compareTo and equals isn't implemented in a consistent way, you are in for a bad ride.
This is especially a problem when a method returns a Set, and you don't know if the implementation is a TreeSet or, for instance, a HashSet.
The lesson to learn here is, always avoid implementing compareTo and equals inconsistently. If you need to order objects in a way that is inconsistent with equals, use a Comparator.

TreeSet:
Pros: sorted, based on a red/black tree algorithm, provides O(log(N)) complexity for operations.
Cons: value must either be Comparable or you need to provide Comparator in the constructor. Moreover, the HashSet implementation provides better performance as it provides ~O(1) complexity.

TreeSet fragments memory and has additional memory overheads. You can look at the sources and calculate amount of additional memory and amount of additional objects it creates. Of course it depends on the nature of stored objects and you also can suspect me to be paranoiac about memory :) but it's better to not spend it here and there - you have GC, you have cache misses and all of these things are slooow.
Often you can use PriorityQueue instead of TreeSet. And in your typical use case it's better just to sort the array of strings.

I guess this datastructure would be using binary tree to maintain data so that ascending order retrieval is possible. In that case, if it tries to keep the tree in balance then the remove operation would be bit costly.

Related

TreeSet Vs Tree

I have few questions related to Collection Frameworks's TreeSet that I am putting here.
Is the only functional difference between TreeSet and ArrayList classes is constraint of unique elements and elements being sorted too in TreeSet?
Presence of prefix Tree creates a confusion about visualizing a TreeSet as a hierarchical data structure or linear one. Mathematical sets are linear data structures while name Tree in computing indicates a hierarchical one.
Is there really any similarity / relation between Tree Data Structure and Java's TreeSet or name TreeSet just a coincidence?
I mean, it doesn't seem that set will have anything to do with parent - child relationships.
EDIT - Looks like, I was confused about what I am trying to ask which got clarified after pondering over comments and answers. I guess, my main question should have been "why mathematical set DS ( sorted or unsorted ) is implemented via a Tree?" and that is a duplicate of How to implement Set data structure?
Is the only functional difference between TreeSet and ArrayList
classes is constraint of unique elements and elements being sorted too
in TreeSet?
That is major difference, apart from internal implementation, and this enables TreeSet to provide functions like subset, tailset, headSet which are not possible with a ArrayList.
Presence of prefix Tree creates a confusion about visualizing a
TreeSet as a hierarchical data structure or linear one. Mathematical
sets are linear data structures while name Tree in computing indicates
a hierarchical one.
Yes, it is hierarchical structure. Internally the implementation is a Red-black binary tree.
Is there really any similarity / relation between Tree Data Structure
and Java's TreeSet or name TreeSet just a coincidence?
The internal implementation is a R-B binary tree.
On a side note, since these two are different data structures, time complexity of TreeSet is completely from ArrayList for same set of operations. For ex: add ArrayList is O(1) but for TreeSet it is O(logn), search for arrayList is O(n) and for TreeSet is is O(logn) and so on...
TreeSet is real tree, not coincidence.
So there's many difference with Arraylist.
For example performance ( I mean Big-O ) is totally different.
In terms of usage it is just a Set, plus some extra goodies like having a definite sequence. However, it is internally implemented as a tree.
The naming convention here is similar as with HashSet, another Set internally implemented as a hash table.
Internally TreeSet is present as a Tree structure. So this fact influences on operations complexity. Most of operation require O(log n) actions for TRee based structures but array based structures work in constant time for most used read only operation. So HashSet is based on array and allows const time access to its values.
Also they provide different functionality. HashSet just stores elements. It behaves like math set as you sad, in linear manner.
But TreeSet provides more operation: take a look at NavigableSet and SortedSer interfaces it implements. Elements of TreeSet are always sorted. But in the same time they require setting sorting rules for them provided by impelemting Comparable interface or using side Comparator object.

Storing data in sorted manner in a HashSet

From what I read, HashSet stores data in a unsorted manner. However I was given this question . (I don't want anyone to solve it)
Write a program to keep car details in a HashSet with sorted manner
on the based of car name using comparator. Also calculate which car
have maximum and minimum speed And average speed of all car.car
class structure could be like.Also consider exception scenarios
(car is a class that has a element name)
I am confused now. Is it possible the question is slightly wrong? If so can anyone help me in figuring the correct question?
The question appears to be slightly wrong. It should be :
Write a program to keep car details in a Set with sorted manner on the based of car name using comparator. Also calculate which car have maximum and minimum speed And average speed of all car.car class structure could be like.Also consider exception scenarios
Because, only TreeSet has a constructor which accepts Comparator. You cannot use Comparator with HashSet.
I suppose that what is asked is to use a LinkedHashSet. Is is a subclass of HashSet as per first requirement, with predictable iteration order. This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order).
If you cannot make sure that car details will be added in the right order, you should use dedicated subclass of HashSet, using also a LinkedList for the order, but controlling insertion order. But as the question does not expose performance requirement, it is hard to say if this is really the required implementation.

Which is faster in accessing elements from Java collections [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I am trying to understand which is faster in accessing elements from collections in Java like ArrayList, LinkedList, HashSet, TreeSet, HashMap, TreeMap etc.
From this question: Suitable java collection for fast get and fast removal, I got to know that ArrayList takes O(1) and TreeMap as O(log n)
where as this: Map/ArrayList: which one is faster to search for an element shows that ArryList is O(n), HashMap as O(1) and TreeMap as O(log n)
where as this: Why is it faster to process a sorted array than an unsorted array? says that sorted array is faster than unsorted array. As the elements in TreeMap are sorted then can I assume all sorted collections are faster than un-sorted collections?
Please help me in understanding which is faster to use in accessing elements from java collections of list, set, map etc implementations.
Every collection type is suitable for a particular scenario. There is no fastest or best collection.
If you need fast access to elements using index, ArrayList is your answer.
If you need fast access to elements using a key, use HashMap.
If you need fast add and removal of elements, use LinkedList (but it has a very poor index access performance).
and so on.
It depends whether you want to access an element as index based(in case of list) or see if an Object exists in the Collection
If you want to access an element index based,then arraylist is faster as it implements RandomAccess Marker interface and is internally backed by an array.
Sets are internally backed by Map ,so performance of Map and Set is same(Set use a dummy Object as value in key-value pair).I would suggest you to use a HashSet.
The problem that many programmers dont notice is that performance of Hashset or HashMap is best O(1) when the hashing function of Key Object is good,ie. it produces different values for different Objects (though this is not a strict requirement).
NOTE :- If you are Hashing funciton is not good,it degrades to a LinkedList internally and its performance degrades to O(n)
My personal preference is to Use EnumMap or EnumSet.It simply uses the Enum values for its functioning and programmers dont have to worry about the Enum's hashcode/equals function.For rest other cases,use HashSet or HashMap(if you dont have to make it ordered)

java constantly sorted list with quick retrieval

I'm looking for a constantly sorted list in java, which can also be used to retrieve an object very quickly. PriorityQueue works great for the "constantly sorted" requirement, and HashMap works great for the fast retrieval by key, but I need both in the same list. At one point I had wrote my own, but it does not implement the collections interfaces (so can't be used as a drop-in replacement for a java.util.List etc), and I'd rather stick to standard java classes if possible.
Is there such a list out there? Right now I'm using 2 lists, a priority queue and a hashmap, both contain the same objects. I use the priority queue to traverse the first part of the list in sorted order, the hashmap for fast retrieval by key (I need to do both operations interchangeably), but I'm hoping for a more elegant solution...
Edit: I should add that I need to have the list sorted by a different comparator then what is used for retrieval by key; the list is sorted by a long value, the key retrieval is a String.
Since you're already using HashMap, that implies that you have unique keys. Assuming that you want to order by those keys, TreeMap is your answer.
It sounds like what you're talking about is a collection with an automatically-maintained index.
Try looking at GlazedLists which use "list pipelines" to efficiently propagate changes -- their SortedList class should do the job.
edit: missed your retrieval-by-key requirement. That can be accomplished with GlazedLists.syncEventListToMap and GlazedLists.syncEventListToMultimap -- syncEventListToMap works if there are no duplicate keys, and syncEventListToMultimap works if there are duplicate keys. The nice part about this approach is that you can create multiple maps based on different indices.
If you want to use TreeMaps for indices -- which may give you better performance -- you need to keep your TreeMaps privately encapsulated within a custom class of your choosing, that exposes the interfaces/methods you want, and create accessors/mutators for that class to keep the indices in sync with the collection. Be sure to deal with concurrency issues (via synchronized methods or locks or whatever) if you access the collection from multiple threads.
edit: finally, if fast traversal of the items in sorted order is important, consider using ConcurrentSkipListMap instead of TreeMap -- not for its concurrency, but for its fast traversal. Skip lists are linked lists with multiple levels of linkage, one that traverses all items, the next that traverses every K items on average (for a given constant K), the next that traverses every K2 items on average, etc.
TreeMap
http://download.oracle.com/javase/6/docs/api/java/util/TreeMap.html
Go with a TreeSet.
A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).
I haven't tested this so I might be wrong, so consider this just an attempt.
Use TreeMap, wrap the key of this map as an object which has two attributes (the string which you use as the key in hashmap and the long which you use to maintain the sort order in PriorityQueue). Now for this object, override the equals and hashcode method using the string. Implement the comparable interface using the long.
Why don't you encapsulate your solution to a class that implements Collection or Map?
This way you could simply delegate the retrieval methods to the faster/better suiting collection. Just make sure that calls to write-methods (add/remove/put) will be forwarded to both collections. Remember indirect accesses, like iterator.remove(). Most of these methods are optional to implement, but you have to deactivate them (Collections.unmodifiableXXX will help here in most cases).

When do you know when to use a TreeSet or LinkedList?

What are the advantages of each structure?
In my program I will be performing these steps and I was wondering which data structure above I should be using:
Taking in an unsorted array and
adding them to a sorted structure1.
Traversing through sorted data and removing the right one
Adding data (never removing) and returning that structure as an array
When do you know when to use a TreeSet or LinkedList? What are the advantages of each structure?
In general, you decide on a collection type based on the structural and performance properties that you need it to have. For instance, a TreeSet is a Set, and therefore does not allow duplicates and does not preserve insertion order of elements. By contrast a LinkedList is a List and therefore does allow duplicates and does preserve insertion order. On the performance side, TreeSet gives you O(logN) insertion and deletion, whereas LinkedList gives O(1) insertion at the beginning or end, and O(N) insertion at a selected position or deletion.
The details are all spelled out in the respective class and interface javadocs, but a useful summary may be found in the Java Collections Cheatsheet.
In practice though, the choice of collection type is intimately connected to algorithm design. The two need to be done in parallel. (It is no good deciding that your algorithm requires a collection with properties X, Y and Z, and then discovering that no such collection type exists.)
In your use-case, it looks like TreeSet would be a better fit. There is no efficient way (i.e. better than O(N^2)) to sort a large LinkedList that doesn't involve turning it into some other data structure to do the sorting. There is no efficient way (i.e. better than O(N)) to insert an element into the correct position in a previously sorted LinkedList. The third part (copying to an array) works equally well with a LinkedList or TreeSet; it is an O(N) operation in both cases.
[I'm assuming that the collections are large enough that the big O complexity predicts the actual performance accurately ... ]
The genuine power and advantage of TreeSet lies in interface it realizes - NavigableSet
Why is it so powerfull and in which case?
Navigable Set interface add for example these 3 nice methods:
headSet(E toElement, boolean inclusive)
tailSet(E fromElement, boolean inclusive)
subSet(E fromElement, boolean fromInclusive, E toElement, boolean toInclusive)
These methods allow to organize effective search algorithm(very fast).
Example: we need to find all the names which start with Milla and end with Wladimir:
TreeSet<String> authors = new TreeSet<String>();
authors.add("Andreas Gryphius");
authors.add("Fjodor Michailowitsch Dostojewski");
authors.add("Alexander Puschkin");
authors.add("Ruslana Lyzhichko");
authors.add("Wladimir Klitschko");
authors.add("Andrij Schewtschenko");
authors.add("Wayne Gretzky");
authors.add("Johann Jakob Christoffel");
authors.add("Milla Jovovich");
authors.add("Taras Schewtschenko");
System.out.println(authors.subSet("Milla", "Wladimir"));
output:
[Milla Jovovich, Ruslana Lyzhichko, Taras Schewtschenko, Wayne Gretzky]
TreeSet doesn't go over all the elements, it finds first and last elemenets and returns a new Collection with all the elements in the range.
TreeSet:
TreeSet uses Red-Black tree underlying. So the set could be thought as a dynamic search tree. When you need a structure which is operated read/write frequently and also should keep order, the TreeSet is a good choice.
If you want to keep it sorted and it's append-mostly, TreeSet with a Comparator is your best bet. The JVM would have to traverse the LinkedList from the beginning to decide where to place an item. LinkedList = O(n) for any operations, TreeSet = O(log(n)) for basic stuff.
The most important point when choosing a data structure are its inherent limitations. For example if you use TreeSet to store objects and during run-time your algorithm changes attributes of these objects which affect equal comparisons while the object is an element of the set, get ready for some strange bugs.
The Java Doc for Set interface state that:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
Interface Set Java Doc

Categories

Resources