Dealing with identical data in a binary search tree - java

Im in the process of teaching myself data structures and I am currently working on a binary search tree. I was curious how you would sort the tree if you had identical data. For example say that my data consists of [4,6,2,8,4,5,7,3].
I set 4 as the root element
put 6 to the right of it
put 2 to the left of 4
put 8 to the right of 6
Then I get to 4 where do I put it since 4=4? To the left or the right?
Option #1
Option #2
Are either one of these correct or are they both wrong? If they are both wrong could you show me how they should be sorted. Thanks!

Usually binary trees do not allow data duplication. If you make a custom implementation you can store a count of elements. TreeSet in Java is an example - it contains only unique elements.
Actually the cases you listed broke the whole structure of the tree. Search operations will look weird now and couldn't be performed with O(ln n). It will take O(n) in worst case so you loose all the benefits of this data structure.

If this is a sort-tree, then what you have will work fine, either way; in the end you'll do a tree-walk and dump the data.
If this is a search-tree, then I'd just drop the extra (redundant) data once it's been encountered; "it exists". You did say this is a search-tree, and while not ideal, it's not actually broken - if you search for "4" you'll simply catch the root node (in this case), and never decend below that to see any other "4". It isn't optimal, having all the extra #'s around.
There will be best-case and worst-case situations regardless of which way you choose; don't worry too much about left/right decisions - generally just doesn't matter. IF you have a solid grasp of details in a known data-stream you'd be able to make an optimal decision for that specific case.

Related

Name this collection

This question is language-agnostic (although it assumes one that is both procedural and OO).
I'm having trouble finding if there is a standard name for a collection with the following behavior:
-Fixed-capacity of N elements, maintaining insertion order.
-Elements are added to the 'Tail'
-Whenever an item is added, the head of the collection is returned (FIFO), although not necessarily removed.
-If the collection now contains more than N elements, the Head is removed - otherwise it remains in the collection (now having advanced one step further towards its ultimate removal).
I often use this structure to keep a running count - i.e. the frame length of the past N frames, so as to provide 'moving window' across which I can average, sum, etc.
Sounds very similar to a circular buffer to me; with the exception that you are probably under-defining or over constraining the add / remove behavior.
Note that there are two "views" of a circular buffer. One is the layout view, which has a section of memory being written to with "head" and "tail" indexes and a bit of logic to "wrap" around when the tail is "before" the head. The other is a "logical" view where you have a queue that's not exposing how it is being laid out, but definately has a limited number of slots which it can "grow to".
Within the context of doing computation, there is a very long standing project that I love (although the cli interface is a bit foreign if you're not use to such things). It's called the RoundRobinDatabase, where each databases stores exactly N copies of a single value (providing graphs, averages, etc). It adjusts the next bin based on a number of parameters, but most often it advances bins based on time. It's often the tool behind a large number of network throughput graphs, and it has configurable bin collision resolution, etc.
In general, algorithims that are sensitive to the last "some number" of entries are often called "sliding box" algorithms, but that's focusing on the algorithm and not on the data structure :)
The programming riddle sounds like a circular linked list to me.
Well, all these description fits, doesn't it?
• Fixed-capacity of N elements, maintaining insertion order.
• Elements are added to the 'Tail'
• Whenever an item is added, the head of the collection is returned (FIFO), although not necessarily removed.
This link with source codes for counting frames probably helps too: frameCounter

Quadtree performance

Anyone knows where i can find some documentation on, or know how many operations insertion and queries takes in a quadtree?
wiki says O(logn) but i found another source saying O(nlogn) and i need to know which is true.
I'm working with a point quadtree
http://www.codeproject.com/Articles/30535/A-Simple-QuadTree-Implementation-in-C
http://en.wikipedia.org/wiki/Quadtree
Search: O(logn): it must traverse down the entire tree to find the element. To be specific the log in this case is log_4, as there are 4 children.
Insert(single point): O(logn): You must traverse the tree place to find the insertion location, then some small constant amount of work to split the points in that quadrant.
Insert(n points): O(nlogn), every point must inserted, leading to nlogn. I hope this is what the other site you read meant be nlogn, otherwise they would be very wrong.
The original paper is called "Quad trees a data structure for retrieval on composite keys" by Finkel and Bentley.

Comparing 2 b-tree's to see if they contain the same values

Seeing that a 2 b-tree's could have the same values, yet a different shape, is there an algorithm to go through the values and compare if both tree's have the same keys?
The point is to be able to bail out if they contain different keys (as soon as possible).
A recursive algorithm probably won't work unless you are performing a lookup in both b-tree's at the same time I'm guessing.
I've seen algorithm's that traverse a b-tree, but I don't want to traverse both, and then compare the keys, I want something smarter that will bail out as early as possible if there is a difference.
Basically the function returns true/false.
The fundamental technique is to somehow have an object that represents the current point in the in-order traversal. Once you have two of those, one for each instance of the tree, you just keep pumping them for the next key, and the first time the two return a different next key, you're done.
In C# you'd use yield return to make a traversal that yields up a single key at a time, and keeps track of where it is in the tree. You can then pass two of those to SequenceEquals, and it will bail out as soon as it encounters the first difference. In Java you'd have to build that mechanism yourself, but it's not that hard to do.
Assuming you mean a b-tree then all you need to do is iterate over both at once. Any deviation between either iterator will prove that their contents differ. It is unlikely you will find a better algorithm than that without collecting more details as you build the trees.
If you are not talking about the b-tree which is described as:
... a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.
then you need to sort it first then traverse it.

Usage examples of binary search

I just realized that in my 4+ years of Java programming (mostly desktop apps) I never used the binary search methods in the Arrays class for anything practical. Not even once. Some reasons I can think of:
100% of the time you can get away with linear search, maps or something else that isn't binary search.
The incoming data is almost never sorted, and making it sorted requires an extra sorting step.
So I wonder if it's just me, or do a lot of people never use binary search? And what are some good, practical usage examples of binary search?
On the desktop, you're probably just dealing with the user's data, which might not be all that big. If you are querying over very large datasets, shared by many users, then it can be a different matter. A lot of people don't necessarily deal with binary search directly, but anyone using a database is probably using it implicitly. If you use AppEngine, for example, datastore queries almost certainly use binary search.
I would say it boils down to this:
If we are going to do a binary search, we're going to have a key to search by. If we have a key, we're probably using a map instead of an array.
There's another very important thing to keep in mind:
Binary search is a clear-cut example of how thinking like a good programmer is very different than thinking like a normal person. It's one of those cognitive leaps that makes you really think about taking operations that are traditionally done (when done by humans) in order-n time and taking it down to order-lg-n time. And that makes it very, very useful even if it's never used in production code.
I hardly ever, if ever use a binary search.
But I would if:
I needed to search the same list multiple times
the list was long enough to have a performance problem (although I'm often guilty of micro-optimization)
However, I often use hash tables / dictionaries for fast lookups.
For production code on my day job, a Set or Map is always good enough so far.
For algorithmic problems that a I solve for fun, binary search is a very useful technique. For starters, if the set of elements never changes (i.e. you are never going to insert or delete elements in the set being queried) a Map/Set has no advantage over binary search - and a binary search over a simple array avoids a lot of the overhead associated with querying a more complex data structure. In many cases I have seen it to be actually faster than a HashMap.
Binary search is also a more general technique than just querying for membership in a set. Binary search can be performed on any monotone function to find a value for which the function satisfies a certain criteria. You can find a more detailed explanation here. But as I said, my line of work does not bring up enough computationally involved problems for this to be applicable.
Assume you have to search an element in a list.
You could use linear search, you’ll get O(n).
Alternatively, you could sort it by fastest algorithm (O(log n)*n), and binary search(O(log n)). You’ll get O((log n)*n + log n).
That means when searching large size of list, binary search is better. Also, it depends data structure of list. If list is a link based list, binary search is bad practice.

Data structures: Which should I use for these conditions?

This shouldn't be a difficult question, but I'd just like someone to bounce it off of before I continue. I simply need to decide what data structure to use based on these expected activities:
Will need to frequently iterate through in sorted order (starting at the head).
Will need to remove/restore arbitrary elements from the/a sorted view.
Later I'll be frequently resorting the data and working with multiple sorted views.
Also later I'll be frequently changing the position of elements within their sorted views.
This is in Java, by the way.
My best guess is that I'll either be rolling some custom Linked Hash Set (to arrange the links in sorted order) or possibly just using a Tree Set. But I'm still not completely sure yet. Recommendations?
Edit: I guess because of the arbitrary remove/restore, I should probably stick with a Tree Set, right?
Actually, not necessarily. Hmmm...
In theory, I'd say the right data structure is a multiway tree - preferably something like a B+ tree. Traditionally this is a disk-based data structure, but modern main memory has a lot of similar characteristics due to layers of cache and virtual memory.
In-order iteration of a B+ tree is very efficient because (1) you only iterate through the linked-list of leaf nodes - branch nodes aren't needed, and (2) you get extremely good locality.
Finding, removing and inserting arbitrary elements is log(n) as with any balanced tree, though with different constant factors.
Resorting within the tree is mostly a matter of choosing an algorithm that gives good performance when operating on a linked list of blocks (the leaf nodes), minimising the need to use leaf nodes - variants of quicksort or mergesort seem like likely candidates. Once the items are sorted in the branch nodes, just propogate the summary information back through the leaf nodes.
BUT - pragmatically, this is only something you'd do if you're very sure that you need it. Odds are good that you're better off using some standard container. Algorithm/data structure optimisation is the best kind of optimisation, but it can still be premature.
Standard LinkedHashSet or LinkedMultiset from google collections if you want your data structure to store not unique values.

Categories

Resources