At what level is a given element in a Java TreeSet? - java

Does anybody know a fast way to detect at what level a given element is in a TreeSet? By level, I mean the depth of this element in the tree, i.e. the number of its ancestors.
Background.
I use Java's TreeSet class to store my elements.
To compare two elements I need to compute some auxiliary information about them. I cannot store this auxiliary information for each element as it would take too much memory.
On the other hand, if I regenerate the auxiliary information for each comparison, my program is too slow.
When an element is inserted in the TreeSet, my current implementation computes the auxiliary information for the element it inserts and does not recompute it until the element has found its place in the TreeSet. Afterwards, the auxiliary information is discarded.
To speed up my program, I would like to store the auxiliary information also for the top levels of the TreeSet, as they are involved in many comparisons. So, after comparing two nodes, I would like to decide whether to keep or discard their auxiliary information based on their depth in the TreeSet.
Update.
I would also be grateful if somebody could suggest an alternate class implementing some kind of balanced trees (AVL trees, red/black trees, Splay trees, ...), and where one has access to the height of an element.

The exact depth of each node is simply not exposed by a TreeSet. You'd have to write your own if this is how you want to do it.
You might be able to do something like have each key in the set refer to a shared object that manages the auxiliary information. Each time compareTo() is invoked on a key, it would notify the manager to update its counter for that key. The manager would use these stats to decide which should maintain their auxiliary information.

Related

Implementing a traversal with dynamic depth

Is it possible to create a traversal in java using neo4j that keeps a state for the duration of the traversal?
For example, I need an Evaluator that is almost identical to toDepth(), except that the depth at the current node is based on another comparison. Say that you had a linked list with 20 items, and you wanted the 10th [valid] one, meaning that some of the items had a particular property flag excluding them from the count. So the final returned item might actually be the 12th in the Path.
The only efficient way I can think of doing this is being able to store some state variable that is accessible to each individual evaluation. Is that possible?
I understand that I could write my own custom traversing functions to do this, but it would be nice if I could build it into the Traversal Framework.

Most efficient way to keep a collection in Java?

Currently, I have a LinkedList which stores a custom Node class. The Nodes are currently removed in order and evaluated, which generally adds more Nodes back into the LinkedList, treating it like a Queue.
But in reality I don't care about maintaining the order of the Nodes because the order they are being added or removed doesn't matter. You can remove the 1st, 54th, or 1032nd Node from the List, it doesn't matter. All that matters is the Nodes are being processed quickly, which means one is removed (at random), mutated, then added back along with several variations of it (once again the order doesn't matter).
Since I haven't been able to find a Java Bag implementation, what is the most efficient way to maintain this type of collection ?
PS Out of laziness I have avoid using arrays because the collection of nodes could theoretically range from 1 Node to 3^64 Nodes in size, though it's more likely to stay under a million.
The Java HashSet or TreeSet types might be good here, since they represent unordered collections of elements that support quick insertion and deletion of elements. That said, you can't possibly hold 364 values in memory, since that's appromately 3.4336838 × 1030, a number vastly bigger than any amount of RAM that I know of can hold.
EDIT: Based on the described use case (support efficient insertion and removal of random elements), you might want to adopt the approach described in this older question for building a data structure that does just that. Intuitively, you would use an ArrayList, then remove elements by swapping them to the end of the ArrayList and removing them. This gives O(1) insertion and O(1) removal with extremely low overhead.
Hope this helps!

Comparing 2 b-tree's to see if they contain the same values

Seeing that a 2 b-tree's could have the same values, yet a different shape, is there an algorithm to go through the values and compare if both tree's have the same keys?
The point is to be able to bail out if they contain different keys (as soon as possible).
A recursive algorithm probably won't work unless you are performing a lookup in both b-tree's at the same time I'm guessing.
I've seen algorithm's that traverse a b-tree, but I don't want to traverse both, and then compare the keys, I want something smarter that will bail out as early as possible if there is a difference.
Basically the function returns true/false.
The fundamental technique is to somehow have an object that represents the current point in the in-order traversal. Once you have two of those, one for each instance of the tree, you just keep pumping them for the next key, and the first time the two return a different next key, you're done.
In C# you'd use yield return to make a traversal that yields up a single key at a time, and keeps track of where it is in the tree. You can then pass two of those to SequenceEquals, and it will bail out as soon as it encounters the first difference. In Java you'd have to build that mechanism yourself, but it's not that hard to do.
Assuming you mean a b-tree then all you need to do is iterate over both at once. Any deviation between either iterator will prove that their contents differ. It is unlikely you will find a better algorithm than that without collecting more details as you build the trees.
If you are not talking about the b-tree which is described as:
... a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.
then you need to sort it first then traverse it.

Speeding up a linked list?

I'm a student and fairly new to Java. I was looking over the different speeds achieved by the two collections in Java, Linked List, and ArrayList. I know that an ArrayList is much much faster at looking up and placing in values into its indexes. My question is:
how can one make a linked list faster, if at all possible?
Thanks for any help.
zmahir
When talking about speed, perhaps you mean complexity. Insertion and retrieval operations for ArrayList (and arrays) are O(1), while for LinkedList they are O(n). And this cannot be changed - it is 'by definition'.
O(n) means that in order to insert an object at a given position, or retrieve it, you must traverse, in the worst case, all (n) the items in the list. Hence n operations. For ArrayList this is only one operation.
You probably can't. You don't know the size (well, ok you can), nor the location of each element. To find element 100 in a linked list, you need to start with item 1, find it's link to item 2, etc. until you find 100. This makes inserting into this list a tedious job.
There are many alternatives depending on your exact goals. You can use b-trees or similar methods to split the large linked list into smaller ones. Or use hashlists if you want to quickly find items. Or use simple arrays. But if you want a list that performs like an ArrayList, why not use an ArrayList?
You can split off regions which are linked to the main linked list, so this gives you entry points directly inside the list so you don't have to walk up to them. See the subList method here: http://download.oracle.com/javase/1.4.2/docs/api/java/util/AbstractList.html. This is useful if you have a number of 'sentences' made out of words, say. You can use a separate linked list to iterate over the sentences, which are sublists of the main linked list.
You can also use a ListIterator when adding, removing, or accessing elements. This helps greatly with increasing the speed of sequential access. See the listIterator method for this, and the class: http://download.oracle.com/javase/1.4.2/docs/api/java/util/ListIterator.html.
Speed of a linked list could be improved by using skip lists: http://igoro.com/archive/skip-lists-are-fascinating/
a linked list uses pointers to walk through the items, so for example if you asked for the 5th item, the runtime will start from the first item and walks through each pointer until it reaches the 5th item.
there is really not much you can do about it. a linked list may not be a good choice if you need fast acces to items. although there are some optimizations for it such as creating a circular linked list or a double linked list where you can walk back and forth the list but this really depends on the business logic and the application requirements.
my advise is to avoid linked lists if it does not match your needs and changing to a different data structure might be the best approach.
As a general rule, data structures are designed to do certain things well. LinkedLists are designed to be faster than ArrayLists at inserting elements and removing elements and about the same as ArrayLists at iterating across the list in order. When you change the way a LinkedList works, you make it no longer a true LinkedList, so there's not really any way to modify them to be faster at something and still be a LinkedList.
You'll need to examine the way you're using this particular collection and decide whether a LinkedList is really the best data structure for your purposes. If you share with us how you're using it, and why you need it to be faster, then we can advise you on which data structure you ought to consider using.
Lots of people smarter than you or I have looked at the implementation of the Java collection classes. If there were an optimization to be made, they would have found it and already made it.
Since the collection classes are pretty much as optimized as they can be, our primary task should be to choose the correct one.
When choosing your collection type, don't forget about things like HashSet. If order doesn't matter, and you don't need to put duplicates in the collection, then HashSet may be appropriate.
I'm a student and fairly new to Java. ... how can one make a linked list faster, if at all possible?
The standard Java collection type (indeed all data structures implemented in any language!) represent compromises on various "measures" such as:
The amount of memory needed to represent the data structure.
The time taken to perform various operations; e.g. for a "list" the operations of interest are insertion, removal, indexing, contains, iteration and so on.
How easy or hard it is to integrate / reuse the collection type; see below.
So for instance:
ArrayList offers lower memory overheads, fast indexing (O(1)), but slow contains, random insertion and removal (O(N)).
LinkedList has higher memory overheads, slow indexing and contains (O(N)), but faster removal (O(1)) under certain circumstances.
The various performance measures are typically determines by the maths of the various data structures. For example, if you have a chain of nodes, the only way to get the ith node is to step through them from the beginning. This involves following i pointers.
Sometimes you can modify the data structures to improve one aspect of the performance. But this typically comes at the cost of some other aspect of the performance. (For example, you could add a separate index to make indexing of a linked list faster. But the cost of maintaining the index on insertion / deletion would mean that you'd probably be better of using an ArrayList.)
In some cases the integration / reuse requirements have significant impact on performance.
For example, it is theoretically possible to optimize a linked list's space usage by adding a next field to the list element type, combining the element and node objects and saving 16 or so bytes per list entry. However, this would make the list type less general (the member/element class would need to implement a specific interface), and has the restriction that an element can belong to at most one list at any time. These restrictions are so limiting that this approach is rarely used in Java.
For a second example, consider the problem of inserting at a given position in a linked list. For the LinkedList class, this is normally an O(N) operation, because you have to step through the list to find the position. In theory, if an application could find and remember a position, it should be able to perform the insertion at that position in O(1). Unfortunately, neither the List APIs provides no way to "remember" a position.
While neither of these examples is a fundamental roadblock to a developer "doing his own thing", they illustrate that using general data structure APIs and general implementations of those APIs has performance implications, and therefore represents a trade-off between performance and ease-of-use.
I'm a bit surprised by the answers here. There are big difference between the theoretical performance of LinkedLists and ArrayLists compared to the actual performance of the Java implementations.
What makes the Java LinkedList slower than a theoretical LinkedList is that it does a lot more than just the operations. For example it checks for concurrent modifications and other safeties.
If you know your use case, you can write a your own simple implementation of a LinkedList and it will be much faster.

Data structures: Which should I use for these conditions?

This shouldn't be a difficult question, but I'd just like someone to bounce it off of before I continue. I simply need to decide what data structure to use based on these expected activities:
Will need to frequently iterate through in sorted order (starting at the head).
Will need to remove/restore arbitrary elements from the/a sorted view.
Later I'll be frequently resorting the data and working with multiple sorted views.
Also later I'll be frequently changing the position of elements within their sorted views.
This is in Java, by the way.
My best guess is that I'll either be rolling some custom Linked Hash Set (to arrange the links in sorted order) or possibly just using a Tree Set. But I'm still not completely sure yet. Recommendations?
Edit: I guess because of the arbitrary remove/restore, I should probably stick with a Tree Set, right?
Actually, not necessarily. Hmmm...
In theory, I'd say the right data structure is a multiway tree - preferably something like a B+ tree. Traditionally this is a disk-based data structure, but modern main memory has a lot of similar characteristics due to layers of cache and virtual memory.
In-order iteration of a B+ tree is very efficient because (1) you only iterate through the linked-list of leaf nodes - branch nodes aren't needed, and (2) you get extremely good locality.
Finding, removing and inserting arbitrary elements is log(n) as with any balanced tree, though with different constant factors.
Resorting within the tree is mostly a matter of choosing an algorithm that gives good performance when operating on a linked list of blocks (the leaf nodes), minimising the need to use leaf nodes - variants of quicksort or mergesort seem like likely candidates. Once the items are sorted in the branch nodes, just propogate the summary information back through the leaf nodes.
BUT - pragmatically, this is only something you'd do if you're very sure that you need it. Odds are good that you're better off using some standard container. Algorithm/data structure optimisation is the best kind of optimisation, but it can still be premature.
Standard LinkedHashSet or LinkedMultiset from google collections if you want your data structure to store not unique values.

Categories

Resources