TreeSet Vs Tree - java

I have few questions related to Collection Frameworks's TreeSet that I am putting here.
Is the only functional difference between TreeSet and ArrayList classes is constraint of unique elements and elements being sorted too in TreeSet?
Presence of prefix Tree creates a confusion about visualizing a TreeSet as a hierarchical data structure or linear one. Mathematical sets are linear data structures while name Tree in computing indicates a hierarchical one.
Is there really any similarity / relation between Tree Data Structure and Java's TreeSet or name TreeSet just a coincidence?
I mean, it doesn't seem that set will have anything to do with parent - child relationships.
EDIT - Looks like, I was confused about what I am trying to ask which got clarified after pondering over comments and answers. I guess, my main question should have been "why mathematical set DS ( sorted or unsorted ) is implemented via a Tree?" and that is a duplicate of How to implement Set data structure?

Is the only functional difference between TreeSet and ArrayList
classes is constraint of unique elements and elements being sorted too
in TreeSet?
That is major difference, apart from internal implementation, and this enables TreeSet to provide functions like subset, tailset, headSet which are not possible with a ArrayList.
Presence of prefix Tree creates a confusion about visualizing a
TreeSet as a hierarchical data structure or linear one. Mathematical
sets are linear data structures while name Tree in computing indicates
a hierarchical one.
Yes, it is hierarchical structure. Internally the implementation is a Red-black binary tree.
Is there really any similarity / relation between Tree Data Structure
and Java's TreeSet or name TreeSet just a coincidence?
The internal implementation is a R-B binary tree.
On a side note, since these two are different data structures, time complexity of TreeSet is completely from ArrayList for same set of operations. For ex: add ArrayList is O(1) but for TreeSet it is O(logn), search for arrayList is O(n) and for TreeSet is is O(logn) and so on...

TreeSet is real tree, not coincidence.
So there's many difference with Arraylist.
For example performance ( I mean Big-O ) is totally different.

In terms of usage it is just a Set, plus some extra goodies like having a definite sequence. However, it is internally implemented as a tree.
The naming convention here is similar as with HashSet, another Set internally implemented as a hash table.

Internally TreeSet is present as a Tree structure. So this fact influences on operations complexity. Most of operation require O(log n) actions for TRee based structures but array based structures work in constant time for most used read only operation. So HashSet is based on array and allows const time access to its values.
Also they provide different functionality. HashSet just stores elements. It behaves like math set as you sad, in linear manner.
But TreeSet provides more operation: take a look at NavigableSet and SortedSer interfaces it implements. Elements of TreeSet are always sorted. But in the same time they require setting sorting rules for them provided by impelemting Comparable interface or using side Comparator object.

Related

What's the basic (not immediate) backing data structure used by JAVA TreeSet?

So TreeSet uses TreeMap as backing data structures (with dummy vaues corresponding to keys) & TreeMap in turn uses Red-Black tree which is a self balancing BST.
Now what does this Red-Black tree use as a backing data structure? Is it an array or linkedlist?
My understanding is that it's a linkedlist because in TreeSet, operations like .first() return the smallest value & not the root & it has O(1) time complexity.
So basically it's a linkedlist alongwith bunch of pointers for least, greatest, root of linkedlist etc. Is my understanding correct?
It is neither an array nor a linked list. It is a tree of Java objects, which is distinct from either.
Look at, for example, the difference between the diagram of a linked list and a tree. They're fundamentally different.
The red-black tree you mention is the data structure. It does not have a "backing data structure."

Fast LinkedList search and delete in java

I am using Java's Linkedlist in my project. I have to build a delete function that removes an element with a specified unique id (id is a filed in my class) in the Linkedlist. As per the Java official document, were I to use LinkedList.remove, the runtime would be O(n) as the process happens in two steps, the first of which is a linear search with a runtime of O(n) followed by the actual delete which takes O(1).
In an attempt to speed things up, I wanted to use a binary tree for lookup, where each node in the tree is (id, reference to the node in the linkedlist). I am not exactly sure how to implement this in Java. In C/C++, one could just store a pointer as reference to the node in the linkedlist.
==
If you are wondering why I have to use LinkedList, it's because I am building an order-matching engine for exchanges. LinkedList offers superior runtime as far as insert is concerned. I am also using insertion sort to keep prices in the orderbook sorted. Priority queue does not suit my needs because I have to show the sorted order book in real time.
Have you seen the video of Stroustrup's conference talk where he showed that you should use std::vector unless you have measured a performance benefit of not using std::vector? He showed that std::vector is almost always the correct collection to use, and showed that it is faster than linked list even when inserting and deleting in the middle.
Now translate that to Java: use ArrayList unless you have measured better performance with something else.
Why is that? With modern processor architectures, there is a locality benefit: elements that you compare together, elements that you process together, are all stored next to each other in memory and are likely to be in the CPU's cache at the same time. This allows them to be fetched and written to much faster than when they're in main memory. This is not the case with a linked list, where elements are allocated individually and spread all over the place. (This locality benefit is much more pronounced in C++ where you have the actual objects next to each other, but it's still valid to a smaller extent in Java, where you have the references next to each other, albeit not the actual objects.)
Now with ArrayList, you can keep the orders sorted by price, and use binary search to insert an order in the right place.
If your performance measurement shows that LinkedList is preferable, then unfortunately Java doesn't give you access to the internal representation – the actual nodes – of the LinkedList, so you'll have to homebrew your own list.
Why are you using a List?
If you have a unique id for each object, why not put it in a Map with the id as the key? If you choose a HashMap is implementation removal is O(1). If you implement using LinkedHashMap you can preserve insertion order as well.
LinkedList insertion is superior to....what?
HashMap get/put complexity
You can easily solve this by having a small change.
First have an object that has your value and id as fields
class MyElement implements Comparable{
int id,value;
//Implement compareTo() to sort based on values
//Override equals() method to compare ids
//Override hashcode() to return the id
}
Now use a TreeSet to store these objects.
In this data structure the incoming objects get sorted and deletion and insertion also find lower time complexity of O(log n)
To preserve order by id and to get good performance use TreeMap. Put, remove and get operations will be O(log n).
EDIT:
For preserving order of insertion of elements for each id you can use TreeMap < Integer, ArrayList < T > >, i.e. for each id you can save elements with particular id in list in order of insertion.

What is the Big-O of a TreeSet with a comparator?

I'm using it to sort Objects based on particular attributes of each object (ex: date and quantity). When I attach a comparator to a TreeSet, what type of performance will I be getting when I add N values into it and letting it self-sort?
From the documentation for TreeSet, emphasis mine:
This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).
Your choice of comparator does not matter assuming your comparator is O(1) with respect to the size of the set (which it generally is).
The number of values you are inserting is not relevant to the complexity analysis of the insert operation itself.

When do you know when to use a TreeSet or LinkedList?

What are the advantages of each structure?
In my program I will be performing these steps and I was wondering which data structure above I should be using:
Taking in an unsorted array and
adding them to a sorted structure1.
Traversing through sorted data and removing the right one
Adding data (never removing) and returning that structure as an array
When do you know when to use a TreeSet or LinkedList? What are the advantages of each structure?
In general, you decide on a collection type based on the structural and performance properties that you need it to have. For instance, a TreeSet is a Set, and therefore does not allow duplicates and does not preserve insertion order of elements. By contrast a LinkedList is a List and therefore does allow duplicates and does preserve insertion order. On the performance side, TreeSet gives you O(logN) insertion and deletion, whereas LinkedList gives O(1) insertion at the beginning or end, and O(N) insertion at a selected position or deletion.
The details are all spelled out in the respective class and interface javadocs, but a useful summary may be found in the Java Collections Cheatsheet.
In practice though, the choice of collection type is intimately connected to algorithm design. The two need to be done in parallel. (It is no good deciding that your algorithm requires a collection with properties X, Y and Z, and then discovering that no such collection type exists.)
In your use-case, it looks like TreeSet would be a better fit. There is no efficient way (i.e. better than O(N^2)) to sort a large LinkedList that doesn't involve turning it into some other data structure to do the sorting. There is no efficient way (i.e. better than O(N)) to insert an element into the correct position in a previously sorted LinkedList. The third part (copying to an array) works equally well with a LinkedList or TreeSet; it is an O(N) operation in both cases.
[I'm assuming that the collections are large enough that the big O complexity predicts the actual performance accurately ... ]
The genuine power and advantage of TreeSet lies in interface it realizes - NavigableSet
Why is it so powerfull and in which case?
Navigable Set interface add for example these 3 nice methods:
headSet(E toElement, boolean inclusive)
tailSet(E fromElement, boolean inclusive)
subSet(E fromElement, boolean fromInclusive, E toElement, boolean toInclusive)
These methods allow to organize effective search algorithm(very fast).
Example: we need to find all the names which start with Milla and end with Wladimir:
TreeSet<String> authors = new TreeSet<String>();
authors.add("Andreas Gryphius");
authors.add("Fjodor Michailowitsch Dostojewski");
authors.add("Alexander Puschkin");
authors.add("Ruslana Lyzhichko");
authors.add("Wladimir Klitschko");
authors.add("Andrij Schewtschenko");
authors.add("Wayne Gretzky");
authors.add("Johann Jakob Christoffel");
authors.add("Milla Jovovich");
authors.add("Taras Schewtschenko");
System.out.println(authors.subSet("Milla", "Wladimir"));
output:
[Milla Jovovich, Ruslana Lyzhichko, Taras Schewtschenko, Wayne Gretzky]
TreeSet doesn't go over all the elements, it finds first and last elemenets and returns a new Collection with all the elements in the range.
TreeSet:
TreeSet uses Red-Black tree underlying. So the set could be thought as a dynamic search tree. When you need a structure which is operated read/write frequently and also should keep order, the TreeSet is a good choice.
If you want to keep it sorted and it's append-mostly, TreeSet with a Comparator is your best bet. The JVM would have to traverse the LinkedList from the beginning to decide where to place an item. LinkedList = O(n) for any operations, TreeSet = O(log(n)) for basic stuff.
The most important point when choosing a data structure are its inherent limitations. For example if you use TreeSet to store objects and during run-time your algorithm changes attributes of these objects which affect equal comparisons while the object is an element of the set, get ready for some strange bugs.
The Java Doc for Set interface state that:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
Interface Set Java Doc

confusing java data structures

Maybe the title is not appropriate but I couldn't think of any other at this moment. My question is what is the difference between LinkedList and ArrayList or HashMap and THashMap .
Is there a tree structure already for Java(ex:AVL,red-black) or balanced or not balanced(linked list). If this kind of question is not appropriate for SO please let me know I will delete it. thank you
ArrayList and LinkedList are implementations of the List abstraction. The first holds the elements of the list in an internal array which is automatically reallocated as necessary to make space for new elements. The second constructs a doubly linked list of holder cells, each of which refers to a list element. While the respective operations have identical semantics, they differ considerably in performance characteristics. For example:
The get(int) operation on an ArrayList takes constant time, but it takes time proportional to the length of the list for a LinkedList.
Removing an element via the Iterator.remove() takes constant time for a LinkedList, but it takes time proportional to the length of the list for an ArrayList.
The HashMap and THashMap are both implementations of the Map abstraction that are use hash tables. The difference is in the form of hash table data structure used in each case. The HashMap class uses closed addressing which means that each bucket in the table points to a separate linked list of elements. The THashMap class uses open addressing which means that elements that hash to the same bucket are stored in the table itself. The net result is that THashMap uses less memory and is faster than HashMap for most operations, but is much slower if you need the map's set of key/value pairs.
For more detail, read a good textbook on data structures. Failing that, look up the concepts in Wikipedia. Finally, take a look at the source code of the respective classes.
Read the API docs for the classes you have mentioned. The collections tutorial also explains the differences fairly well.
java.util.TreeMap is based on a red-black tree.
Regarding the lists:
Both comply with the List interface, but their implementation is different, and they differ in the efficiency of some of their operations.
ArrayList is a list stored internally as an array. It has the advantage of random access, but a single item addition is not guaranteed to run in constant time. Also, removal of items is inefficient.
A LinkedList is implemented as a doubly connected linked list. It does not support random access, but removing an item while iterating through it is efficient.
As I remember, both (LinkedList and ArrayList) are the lists. But they have defferent inner realization.

Categories

Resources