I must work with a Collection, and I am not sure about using a List or a Set. This collection must be sorted, but not by the order of insertion but for another one, so each time a new item is added, a Comparator should be executed in order to reorder the Collection. So, for this reason, an ArrayList could be the best option.
Removing objects from that Collection must be possible too, furthermore, I would really appreciate using removeIf method, so a Set would be the best option here.
Getting and iterating over the Collection will be the most repeated scenario, so it must have a good performance in this scenario.
Seeing that, I think that a Set would be a good decision, however, I was thinking about converting the Set into a List when adding items, then, once the list has been resorted, converting it back to a Set. Is it bad performing? What do you think?
Thanks in advance
Unless you have bulk inserts during which you would need no sorting, TreeSet is fine. Simply measure both solutions.
With TreeSet inserting already ordered items, like rereading a set from disk, performs bad in that even a balanced tree, will have a bit too large depth. That however can be remedied.
For better performance you might go for a B-tree (needs 3rd party code) instead of the binary TreeSet. Measure that too, as typically a facet such as deletion with rebalancing might be done suboptimally.
This depends a lot on how you fill and use your collection and performance of which operation is the most important.
Do you fill the collection with items at once? Or add new elements from time to time? Does the performance of adding elements matter? Or only the iteration performance is important?
If performance is critical, it might make sense to implement a few solutions and compare their performance using a benchmark.
I personally don't believe that iteration performance of a TreeSet is that much worse that ArrayLists or LinkedLists or LinkedHashMaps. Especially compared to linked data structures. Iteration on a tree should not be that different in the performance. But I have no data, so this is just a belief here.
Below are two implementation ideas.
First, if you load a lot of data at once and then add new items rather seldom, load the data into an ArrayList and sort it using Collections.sort. If you need to add another item do a binary search (Collections.binarySearch) and insert the element at the corresponding position. Wrap it all in a custom List implementation and you're good to go.
Next, if you fill the collection with the data "in the beginning" and then the collection is hardly modified, you may simply cache the iteration order in an ArrayList. Every time the collection is modified, reset this list and. When iteration is requested and the list is not null, just use it, otherwise first fill it in the order of the sorted set.
Related
Basically, I have some data structure of a ton of objects, and this structure will be accessed by multiple threads and will need to account for that.
A lot of iteration and object manipulation will need to be performed constantly (each main loop iteration can result in every single object in the data structure being modified in a worst case, nothing modified in best/normal case).
Currently, I am using a CopyOnWriteArrayList as my structure. Additionally, on each iteration, I make sure not to add duplicates, in an attempt to keep the size of the list down.
Using locks/synchronized is not ideal as I want to avoid holding up the threads for these operations.
As far as I can tell, my options for this are as follows:
Run a contains() check for each element to be added
Create a HashSet from the list and convert it back (essentially removing all duplicates)
Use a ConcurrentHashMap instead of a list for the data structure
Something else?
I am aware that ArrayLists are much better with iteration while object manipulation and duplicate checking are better handled by strictly using a HashMap. Since my case will need both, I'm wondering what the best solution is here.
I should also mention that the ordering of the elements is a non-issue.
Edit: To clarify this further, the collection will be having elements constantly added, removed, and modified. To which degree depends on each specific run time (based off of generally random events), so I'm cautious about making any assumptions about how often it will occur. The only thing that is guaranteed to happen is that the collection will be iterated through completely each time, performing multiple checks on each element.
This answer addresses your concurrency concerns:
A lot of iteration and object manipulation will need to be performed constantly (each main loop iteration can result in every single object in the data structure being modified in worst case, nothing modified in best/normal case).
Will the collection be modified? If not just choose which ever collection makes most sense and synchronize on the objects. Once they are inside the collection you get no synchronization benefits from the CopyOnWriteArraylist or ConcurrentHashMap.
If the collection will be modified the follow up is, how often?
If a lot do not use a CopyOnWriteArrayList. If a little then choose based on highest search performance.
In collection we could sort the set or map as per to our requirement. Treeset or TreeMap also provides the sorted collection. Is there any benefit of using treeset when we require sorted collection
The posters before me have not mentioned an important criterion: If the elements in your collection change their state often after insertion, i.e. you need to re-sort the same collection several times, maybe a TreeSet or TreeMap are not ideal because elements are only sorted during insertion, never afterwards. I.e. if you change the sorting key of an element in a TreeSet/TreeMap, it will not be re-sorted automatically. You need to remove the element from the collection before updating it and re-add it after updating it, so as to make sure it will be inserted at the right spot. You could use my UpdateableTreeSet to help you keep a TreeSet sorted.
Having said the above, you can conclude that in this case maybe an unsorted collection plus using Collections.sort() on demand might be the easier way to go. Which way is faster overall depends on your situation. I guess that UpdateableTreeSet should pretty much help you keep sorting of an existing collection limited to the places where you really change sorting keys.
TreeSet
It is always beneficial to have sorted set when ever it is require.
log(n) time cost for the basic operations (add, remove and contains)
TreeSet have few handy methods to deal with the ordered set like first(), last(), headSet(), and tailSet() etc
The items inside TreeSet are automatically sorted according to their natural ordering if you are not giving your own comparator.
Also please go through the TreeSet documentation. TreeSet
In addition to what others have said, TreeSet has some pretty cool capabilities, like the ability to quickly obtain a sub-set.
Outside of that, it's a question of how often you need things sorted. If you are going to create 100 sets, and only need 1 or 2 of them sorted, then the overhead of sorting during insertion is probably not worth it. But if you are going to sort the set even a single time, tree set will be the way to go.
The biggest difference is
TreeSet keeps the data sorted all the time and a set which you maintain by manually sorted may not be sorted at all times.
So TreeSet is recommended if you don't want to keep sorting the set all the time.
I'm a student and fairly new to Java. I was looking over the different speeds achieved by the two collections in Java, Linked List, and ArrayList. I know that an ArrayList is much much faster at looking up and placing in values into its indexes. My question is:
how can one make a linked list faster, if at all possible?
Thanks for any help.
zmahir
When talking about speed, perhaps you mean complexity. Insertion and retrieval operations for ArrayList (and arrays) are O(1), while for LinkedList they are O(n). And this cannot be changed - it is 'by definition'.
O(n) means that in order to insert an object at a given position, or retrieve it, you must traverse, in the worst case, all (n) the items in the list. Hence n operations. For ArrayList this is only one operation.
You probably can't. You don't know the size (well, ok you can), nor the location of each element. To find element 100 in a linked list, you need to start with item 1, find it's link to item 2, etc. until you find 100. This makes inserting into this list a tedious job.
There are many alternatives depending on your exact goals. You can use b-trees or similar methods to split the large linked list into smaller ones. Or use hashlists if you want to quickly find items. Or use simple arrays. But if you want a list that performs like an ArrayList, why not use an ArrayList?
You can split off regions which are linked to the main linked list, so this gives you entry points directly inside the list so you don't have to walk up to them. See the subList method here: http://download.oracle.com/javase/1.4.2/docs/api/java/util/AbstractList.html. This is useful if you have a number of 'sentences' made out of words, say. You can use a separate linked list to iterate over the sentences, which are sublists of the main linked list.
You can also use a ListIterator when adding, removing, or accessing elements. This helps greatly with increasing the speed of sequential access. See the listIterator method for this, and the class: http://download.oracle.com/javase/1.4.2/docs/api/java/util/ListIterator.html.
Speed of a linked list could be improved by using skip lists: http://igoro.com/archive/skip-lists-are-fascinating/
a linked list uses pointers to walk through the items, so for example if you asked for the 5th item, the runtime will start from the first item and walks through each pointer until it reaches the 5th item.
there is really not much you can do about it. a linked list may not be a good choice if you need fast acces to items. although there are some optimizations for it such as creating a circular linked list or a double linked list where you can walk back and forth the list but this really depends on the business logic and the application requirements.
my advise is to avoid linked lists if it does not match your needs and changing to a different data structure might be the best approach.
As a general rule, data structures are designed to do certain things well. LinkedLists are designed to be faster than ArrayLists at inserting elements and removing elements and about the same as ArrayLists at iterating across the list in order. When you change the way a LinkedList works, you make it no longer a true LinkedList, so there's not really any way to modify them to be faster at something and still be a LinkedList.
You'll need to examine the way you're using this particular collection and decide whether a LinkedList is really the best data structure for your purposes. If you share with us how you're using it, and why you need it to be faster, then we can advise you on which data structure you ought to consider using.
Lots of people smarter than you or I have looked at the implementation of the Java collection classes. If there were an optimization to be made, they would have found it and already made it.
Since the collection classes are pretty much as optimized as they can be, our primary task should be to choose the correct one.
When choosing your collection type, don't forget about things like HashSet. If order doesn't matter, and you don't need to put duplicates in the collection, then HashSet may be appropriate.
I'm a student and fairly new to Java. ... how can one make a linked list faster, if at all possible?
The standard Java collection type (indeed all data structures implemented in any language!) represent compromises on various "measures" such as:
The amount of memory needed to represent the data structure.
The time taken to perform various operations; e.g. for a "list" the operations of interest are insertion, removal, indexing, contains, iteration and so on.
How easy or hard it is to integrate / reuse the collection type; see below.
So for instance:
ArrayList offers lower memory overheads, fast indexing (O(1)), but slow contains, random insertion and removal (O(N)).
LinkedList has higher memory overheads, slow indexing and contains (O(N)), but faster removal (O(1)) under certain circumstances.
The various performance measures are typically determines by the maths of the various data structures. For example, if you have a chain of nodes, the only way to get the ith node is to step through them from the beginning. This involves following i pointers.
Sometimes you can modify the data structures to improve one aspect of the performance. But this typically comes at the cost of some other aspect of the performance. (For example, you could add a separate index to make indexing of a linked list faster. But the cost of maintaining the index on insertion / deletion would mean that you'd probably be better of using an ArrayList.)
In some cases the integration / reuse requirements have significant impact on performance.
For example, it is theoretically possible to optimize a linked list's space usage by adding a next field to the list element type, combining the element and node objects and saving 16 or so bytes per list entry. However, this would make the list type less general (the member/element class would need to implement a specific interface), and has the restriction that an element can belong to at most one list at any time. These restrictions are so limiting that this approach is rarely used in Java.
For a second example, consider the problem of inserting at a given position in a linked list. For the LinkedList class, this is normally an O(N) operation, because you have to step through the list to find the position. In theory, if an application could find and remember a position, it should be able to perform the insertion at that position in O(1). Unfortunately, neither the List APIs provides no way to "remember" a position.
While neither of these examples is a fundamental roadblock to a developer "doing his own thing", they illustrate that using general data structure APIs and general implementations of those APIs has performance implications, and therefore represents a trade-off between performance and ease-of-use.
I'm a bit surprised by the answers here. There are big difference between the theoretical performance of LinkedLists and ArrayLists compared to the actual performance of the Java implementations.
What makes the Java LinkedList slower than a theoretical LinkedList is that it does a lot more than just the operations. For example it checks for concurrent modifications and other safeties.
If you know your use case, you can write a your own simple implementation of a LinkedList and it will be much faster.
I have specific requirements for the data structure to be used in my program in Java. It (Data Structure) should be able to hold large amounts of data (not fixed), my main operations would be to add at the end, and delete/read from the beginning (LinkedLists look good soo far). But occasionally, I need to delete from the middle also and this is where LinkedLists are soo painful. Can anyone suggest me a way around this? Or any optimizations through which I can make deletion less painful in LinkedLists?
Thanks for the help!
A LinkedHashMap may suit your purpose
You'd use an iterator to pull stuff from the front
and lookup the entry by key when you needed to access the middle of the list
LinkedList falls down on random accesses. Deletion, without the random access look up, is constant time and so really not too bad for long lists.
ArrayList is generally fast. Inserts and removes from the middle are faster than you might expect because block memory moves are surprisingly fast. Removals and insertions near the start to cause all the following data to be moved down or up.
ArrayDeque is like ArrayList only it uses a circular buffer and has a strange interface.
Usual advice: try it.
you can try using linked list with a pointers after evey 10000th element so that you can reduce the time to find the middle which you wish to delete.
here are some different variations of linked list:
http://experimentgarden.blogspot.com/2009/08/performance-analysis-of-thirty-eight.html
LinkedHashMap is probably the way to go. Great for iteration, deque operations, and seeking into the middle. Costs extra in memory, though, as you'll need to manage a set of keys on top of your basic collection. Plus I think it'll leave 'gaps' in the spaces you've deleted, leading to a non-consecutive set of keys (shouldn't affect iteration, though).
Edit: Aha! I know what you need: A LinkedMultiSet! All the benefit of a LinkedHashMap, but without the superfluous key set. It's only a little more complex to use, though.
First you need to consider whether you will delete from the center of the list often compared to the length of the list. If your list has N items but you delete much less often than 1/N, don't worry about it. Use LinkedList or ArrayDeque as you prefer. (If your lists are occasionally huge and then shrink, but are mostly small, LinkedList is better as it's easy to recover the memory; otherwise, ArrayDeque doesn't need extra objects, so it's a bit faster and more compact--except the underlying array never shrinks.)
If, on the other hand, you delete quite a bit more often than 1/N, then you should consider a LinkedHashSet, which maintains a linked list queue on top of a hash set--but it is a set, so keep in mind that you can't store duplicate elements. This has the overhead of LinkedList and ArrayDeque put together, but if you're doing central deletes often, it'll likely be worth it.
The optimal structure, however--if you really need every last ounce of speed and are willing to spend the coding time to get it--would be a "resizable" array (i.e. reallocated when it was too small) with a circular buffer where you could blank out elements from the middle by setting them to null. (You could also reallocate the buffer when too much was empty if you had a perverse use case then.) I don't advise coding this unless you either really enjoy coding high-performance data structures or have good evidence that this is one of the key bottlenecks in your code and thus you really need it.
I need a class that implements Iterable, and does not need to be safe for concurrent usage. Of the various options, such as LinkedList, HashSet, ArrayList etc, which is the lightest-weight?
To clarify the use-case, I need to be able to add a number of objects to the Iterable (typically 3 or 4), and then something else needs to iterate over it.
ArrayList. From the Javadoc
The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.
That entirely depends on what you mean by "lightest weight". What operations do you need to do, and how often? Do you know the final size beforehand? Are you trying to save execution time or memory?
I would agree that zkarthik that ArrayList is very often a good choice... but it will behave very badly if you want to create a large collection and then repeatedly remove the first element, for example. There's a good reason for there being so many different collections: they have different performance characteristics for different situations.
They all have very different features and behavior, so you should base your choice on how you will use them. For example, for random access and high locality, use an ArrayList; if you need fast unordered insertion and querying, use a HashSet.
If by 'lightweight', you mean 'best performance' then the question is almost impossible to answer without understanding how the collection will be used. All you've told us so for is that it doesn't need to support concurrent usage, but in order to have any hope of answering the question we'd need to know things like
How many objects will be stored in the collection (on average)
What is the relative frequency of read and write access
Is random-access required
Is ordered access required
A number of people have suggested ArrayList may be best. However, I seem to recall reading (possibly in Effective Java 2nd edition), that for certain patterns of usage, Queue performs better than List, because it does not incurr the penalty of random access. In other words, you can add/remove items from a List in any order, but you can only add/remove items in a queue in a specific order (i.e. add to the tail, and remove from the head).