Which concurrent collection to use?

Which concurrent collection to use? - java

I have a scenario where an unknown amount of threads add elements to a collection on a server. The data in this collection doesn't have to be sorted and it also won't be iterated. Only two simple operations shall work on this collection:
Adding an element (and removing old element in some cases)
Reading all elements from the collection (not one by one, but the whole collection in order to serialize it and send it to a client. Of course the elements could also be moved to another collection which is then serialized afterwards.)
Which collection would be ideal for this use case? I would choose ConcurrentHashMap, bu I don't know if this choice is good.
Edit: I've forgotten one important requirement: If an element of a certain kind is already in this collection and another one of the same kind is added, then the old one shall be removed before the new one is added. For this requirement I wanted to use hash values to avoid searching. The objects that are stored are simple: They contain a unique user name and some strings and ints. The object's user name should be used as key.

Yes, ConcurrentHashMap is appropriate for this. Use the user name as key type (K) and the associated user information ("some strings and ints") as value type (V) in the map. Use put to add new key-value pairs, remove to remove key-value pairs, and entrySet to get all key-value pairs in the container (if that is what you meant by "reading all elements from the collection").

i think the best thing to use is actually ConcurrentSkipListSet . the reason:
Iterators are weakly consistent, returning elements reflecting the
state of the set at some point at or since the creation of the
iterator. They do not throw ConcurrentModificationException, and may
proceed concurrently with other operations. Ascending ordered views
and their iterators are faster than descending ones.
this means you can go over the entire list and read all of the items, while adding other items. it's completely concurrent !
do note that adding items take O(logN) time.

I believe there is a concurrent list implementation in java.util.concurrent. CopyOnWriteArrayList which can be useful for your requirement.
or you can use :
List<Object> objList = Collections.synchronizedList(new ArrayList<Object>());

It's not part of the standard library, but you can use this concurrent doubly linked list. Its iterator is weakly consistent and won't throw a ConcurrentModificationException, or you can use toArray and loop through the returned array.

Related

Which Java Collection should I use?

In this question How can I efficiently select a Standard Library container in C++11? is a handy flow chart to use when choosing C++ collections.
I thought that this was a useful resource for people who are not sure which collection they should be using so I tried to find a similar flow chart for Java and was not able to do so.
What resources and "cheat sheets" are available to help people choose the right Collection to use when programming in Java? How do people know what List, Set and Map implementations they should use?

Since I couldn't find a similar flowchart I decided to make one myself.
This flow chart does not try and cover things like synchronized access, thread safety etc or the legacy collections, but it does cover the 3 standard Sets, 3 standard Maps and 2 standard Lists.
This image was created for this answer and is licensed under a Creative Commons Attribution 4.0 International License. The simplest attribution is by linking to either this question or this answer.
Other resources
Probably the most useful other reference is the following page from the oracle documentation which describes each Collection.
HashSet vs TreeSet
There is a detailed discussion of when to use HashSet or TreeSet here:
Hashset vs Treeset
ArrayList vs LinkedList
Detailed discussion: When to use LinkedList over ArrayList?

Summary of the major non-concurrent, non-synchronized collections
Collection: An interface representing an unordered "bag" of items, called "elements". The "next" element is undefined (random).
Set: An interface representing a Collection with no duplicates.
HashSet: A Set backed by a Hashtable. Fastest and smallest memory usage, when ordering is unimportant.
LinkedHashSet: A HashSet with the addition of a linked list to associate elements in insertion order. The "next" element is the next-most-recently inserted element.
TreeSet: A Set where elements are ordered by a Comparator (typically natural ordering). Slowest and largest memory usage, but necessary for comparator-based ordering.
EnumSet: An extremely fast and efficient Set customized for a single enum type.
List: An interface representing a Collection whose elements are ordered and each have a numeric index representing its position, where zero is the first element, and (length - 1) is the last.
ArrayList: A List backed by an array, where the array has a length (called "capacity") that is at least as large as the number of elements (the list's "size"). When size exceeds capacity (when the (capacity + 1)-th element is added), the array is recreated with a new capacity of (new length * 1.5)--this recreation is fast, since it uses System.arrayCopy(). Deleting and inserting/adding elements requires all neighboring elements (to the right) be shifted into or out of that space. Accessing any element is fast, as it only requires the calculation (element-zero-address + desired-index * element-size) to find it's location. In most situations, an ArrayList is preferred over a LinkedList.
LinkedList: A List backed by a set of objects, each linked to its "previous" and "next" neighbors. A LinkedList is also a Queue and Deque. Accessing elements is done starting at the first or last element, and traversing until the desired index is reached. Insertion and deletion, once the desired index is reached via traversal is a trivial matter of re-mapping only the immediate-neighbor links to point to the new element or bypass the now-deleted element.
Map: An interface representing an Collection where each element has an identifying "key"--each element is a key-value pair.
HashMap: A Map where keys are unordered, and backed by a Hashtable.
LinkedhashMap: Keys are ordered by insertion order.
TreeMap: A Map where keys are ordered by a Comparator (typically natural ordering).
Queue: An interface that represents a Collection where elements are, typically, added to one end, and removed from the other (FIFO: first-in, first-out).
Stack: An interface that represents a Collection where elements are, typically, both added (pushed) and removed (popped) from the same end (LIFO: last-in, first-out).
Deque: Short for "double ended queue", usually pronounced "deck". A linked list that is typically only added to and read from either end (not the middle).
Basic collection diagrams:
Comparing the insertion of an element with an ArrayList and LinkedList:

Even simpler picture is here. Intentionally simplified!
Collection is anything holding data called "elements" (of the same type). Nothing more specific is assumed.
List is an indexed collection of data where each element has an index. Something like the array, but more flexible.
Data in the list keep the order of insertion.
Typical operation: get the n-th element.
Set is a bag of elements, each elements just once (the elements are distinguished using their equals() method.
Data in the set are stored mostly just to know what data are there.
Typical operation: tell if an element is present in the list.
Map is something like the List, but instead of accessing the elements by their integer index, you access them by their key, which is any object. Like the array in PHP :)
Data in Map are searchable by their key.
Typical operation: get an element by its ID (where ID is of any type, not only int as in case of List).
The differences
Set vs. Map: in Set you search data by themselves, whilst in Map by their key.
N.B. The standard library Sets are indeed implemented exactly like this: a map where the keys are the Set elements themselves, and with a dummy value.
List vs. Map: in List you access elements by their int index (position in List), whilst in Map by their key which os of any type (typically: ID)
List vs. Set: in List the elements are bound by their position and can be duplicate, whilst in Set the elements are just "present" (or not present) and are unique (in the meaning of equals(), or compareTo() for SortedSet)

It is simple: if you need to store values with keys mapped to them go for the Map interface, otherwise use List for values which may be duplicated and finally use the Set interface if you don’t want duplicated values in your collection.
Here is the complete explanation http://javatutorial.net/choose-the-right-java-collection , including flowchart etc

Map
If choosing a Map, I made this table summarizing the features of each of the ten implementations bundled with Java 11.

Common collections, Common collections

Two java.util.Iterators to the same collection: do they have to return elements in the same order?

This is more of a theoretical question. If I have an arbitrary collection c that isn't ordered and I obtain two java.util.Iterators by calling c.iterator() twice, do both iterators have to return c's elements in the same order?
I mean, in practice they probably always will, but are they forced to do so by contract?
Thanks,
Jan

No they are not.
"There are no guarantees concerning the order in which the elements are
returned (unless this collection is an instance of some class that
provides a guarantee)."
See the Collection#iterator api contract.
That includes from one iterator to the next (as it doesn't say anything about requiring that).
Also consider that something could have changed in the underlying collection between getting those two iterators! Something added or removed.

Implementation of Iterators are provided by the specific Collection class. Iterator for List will give the ordered element while Set will not

Because most Data structures are not ordered by default so it is not certain that they will iterate in same order.
If you want same order you have to sort the collection first.

When to use HashMap over LinkedList or ArrayList and vice-versa

What is the reason why we cannot always use a HashMap, even though it is much more efficient than ArrayList or LinkedList in add,remove operations, also irrespective of the number of the elements.
I googled it and found some reasons, but there was always a workaround for using HashMap, with advantages still alive.

Lists represent a sequential ordering of elements.
Maps are used to represent a collection of key / value pairs.
While you could use a map as a list, there are some definite downsides of doing so.
Maintaining order:
A list by definition is ordered. You add items and then you are able to iterate back through the list in the order that you inserted the items. When you add items to a HashMap, you are not guaranteed to retrieve the items in the same order you put them in. There are subclasses of HashMap like LinkedHashMap that will maintain the order, but in general order is not guaranteed with a Map.
Key/Value semantics:
The purpose of a map is to store items based on a key that can be used to retrieve the item at a later point. Similar functionality can only be achieved with a list in the limited case where the key happens to be the position in the list.
Code readability
Consider the following examples.
// Adding to a List
list.add(myObject); // adds to the end of the list
map.put(myKey, myObject); // sure, you can do this, but what is myKey?
map.put("1", myObject); // you could use the position as a key but why?
// Iterating through the items
for (Object o : myList) // nice and easy
for (Object o : myMap.values()) // more code and the order is not guaranteed
Collection functionality
Some great utility functions are available for lists via the Collections class. For example ...
// Randomize the list
Collections.shuffle(myList);
// Sort the list
Collections.sort(myList, myComparator);

Lists and Maps are different data structures. Maps are used for when you want to associate a key with a value and Lists are an ordered collection.
Map is an interface in the Java Collection Framework and a HashMap is one implementation of the Map interface. HashMap are efficient for locating a value based on a key and inserting and deleting values based on a key. The entries of a HashMap are not ordered.
ArrayList and LinkedList are an implementation of the List interface. LinkedList provides sequential access and is generally more efficient at inserting and deleting elements in the list, however, it is it less efficient at accessing elements in a list. ArrayList provides random access and is more efficient at accessing elements but is generally slower at inserting and deleting elements.

I will put here some real case examples and scenarios when to use one or another, it might be of help for somebody else:
HashMap
When you have to use cache in your application. Redis and membase are some type of extended HashMap. (Doesn't matter the order of the elements, you need quick ( O(1) ) read access (a value), using a key).
LinkedList
When the order is important (they are ordered as they were added to the LinkedList), the number of elements are unknown (don't waste memory allocation) and you require quick insertion time ( O(1) ). A list of to-do items that can be listed sequentially as they are added is a good example.

The downfall of ArrayList and LinkedList is that when iterating through them, depending on the search algorithm, the time it takes to find an item grows with the size of the list.
The beauty of hashing is that although you sacrifice some extra time searching for the element, the time taken does not grow with the size of the map. This is because the HashMap finds information by converting the element you are searching for, directly into the index, so it can make the jump.
Long story short...
LinkedList: Consumes a little more memory than ArrayList, low cost for insertions(add & remove)
ArrayList: Consumes low memory, but similar to LinkedList, and takes extra time to search when large.
HashMap: Can perform a jump to the value, making the search time constant for large maps. Consumes more memory and takes longer to find the value than small lists.

java constantly sorted list with quick retrieval

I'm looking for a constantly sorted list in java, which can also be used to retrieve an object very quickly. PriorityQueue works great for the "constantly sorted" requirement, and HashMap works great for the fast retrieval by key, but I need both in the same list. At one point I had wrote my own, but it does not implement the collections interfaces (so can't be used as a drop-in replacement for a java.util.List etc), and I'd rather stick to standard java classes if possible.
Is there such a list out there? Right now I'm using 2 lists, a priority queue and a hashmap, both contain the same objects. I use the priority queue to traverse the first part of the list in sorted order, the hashmap for fast retrieval by key (I need to do both operations interchangeably), but I'm hoping for a more elegant solution...
Edit: I should add that I need to have the list sorted by a different comparator then what is used for retrieval by key; the list is sorted by a long value, the key retrieval is a String.

Since you're already using HashMap, that implies that you have unique keys. Assuming that you want to order by those keys, TreeMap is your answer.

It sounds like what you're talking about is a collection with an automatically-maintained index.
Try looking at GlazedLists which use "list pipelines" to efficiently propagate changes -- their SortedList class should do the job.
edit: missed your retrieval-by-key requirement. That can be accomplished with GlazedLists.syncEventListToMap and GlazedLists.syncEventListToMultimap -- syncEventListToMap works if there are no duplicate keys, and syncEventListToMultimap works if there are duplicate keys. The nice part about this approach is that you can create multiple maps based on different indices.
If you want to use TreeMaps for indices -- which may give you better performance -- you need to keep your TreeMaps privately encapsulated within a custom class of your choosing, that exposes the interfaces/methods you want, and create accessors/mutators for that class to keep the indices in sync with the collection. Be sure to deal with concurrency issues (via synchronized methods or locks or whatever) if you access the collection from multiple threads.
edit: finally, if fast traversal of the items in sorted order is important, consider using ConcurrentSkipListMap instead of TreeMap -- not for its concurrency, but for its fast traversal. Skip lists are linked lists with multiple levels of linkage, one that traverses all items, the next that traverses every K items on average (for a given constant K), the next that traverses every K2 items on average, etc.

TreeMap
http://download.oracle.com/javase/6/docs/api/java/util/TreeMap.html

Go with a TreeSet.
A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).

I haven't tested this so I might be wrong, so consider this just an attempt.
Use TreeMap, wrap the key of this map as an object which has two attributes (the string which you use as the key in hashmap and the long which you use to maintain the sort order in PriorityQueue). Now for this object, override the equals and hashcode method using the string. Implement the comparable interface using the long.

Why don't you encapsulate your solution to a class that implements Collection or Map?
This way you could simply delegate the retrieval methods to the faster/better suiting collection. Just make sure that calls to write-methods (add/remove/put) will be forwarded to both collections. Remember indirect accesses, like iterator.remove(). Most of these methods are optional to implement, but you have to deactivate them (Collections.unmodifiableXXX will help here in most cases).

Java collection insertion: Set vs. List

I'm thinking about filling a collection with a large amount of unique objects.
How is the cost of an insert in a Set (say HashSet) compared to an List (say ArrayList)?
My feeling is that duplicate elimination in sets might cause a slight overhead.

There is no "duplicate elimination" such as comparing to all existing elements. If you insert into hash set, it's really a dictionary of items by hash code. There's no duplicate checking unless there already are items with the same hash code. Given a reasonable (well-distributed) hash function, it's not that bad.
As Will has noted, because of the dictionary structure HashSet is probably a bit slower than an ArrayList (unless you want to insert "between" existing elements). It also is a bit larger. I'm not sure that's a significant difference though.

You're right: set structures are inherently more complex in order to recognize and eliminate duplicates. Whether this overhead is significant for your case should be tested with a benchmark.
Another factor is memory usage. If your objects are very small, the memory overhead introduced by the set structure can be significant. In the most extreme case (TreeSet<Integer> vs. ArrayList<Integer>) the set structure can require more than 10 times as much memory.

If you're certain your data will be unique, use a List. You can use a Set to enforce this rule.
Sets are faster than Lists if you have a large data set, while the inverse is true for smaller data sets. I haven't personally tested this claim.
Which type of List?
Also, consider which List to use. LinkedLists are faster at adding, removing elements.
ArrayLists are faster at random access (for loops, etc), but this can be worked around using the Iterator of a LinkedList. ArrayLists are are much faster at: list.toArray().

You have to compare concrete implementations (for example HashSet with ArrayList), because the abstract interfaces Set/List don't really tell you anything about performance.
Inserting into a HashSet is a pretty cheap operation, as long as the hashCode() of the object to be inserted is sane. It will still be slightly slower than ArrayList, because it's insertion is a simple insertion into an array (assuming you insert in the end and there's still free space; I don't factor in resizing the internal array, because the same cost applies to HashSet as well).

If the goal is the uniqueness of the elements, you should use an implementation of the java.util.Set interface. The class java.util.HashSet and java.util.LinkedHashSet have O(alpha) (close to O(1) in the best case) complexity for insert, delete and contains check.
ArrayList have O(n) for object (not index) contains check (you have to scroll through the whole list) and insertion (if the insertion is not in tail of the list, you have to shift the whole underline array).
You can use LinkedHashSet that preserve the order of insertion and have the same potentiality of HashSet (takes up only a bit more of memory).

I don't think you can make this judgement simply on the cost of building the collection. Other things that you need to take into account are:
Is the input dataset ordered? Is there a requirement that the output data structure preserves insertion order?
Is there a requirement that the output data structure is ordered (or reordered) based on element values?
Will the output data structure be subsequently modified? How?
Is there a requirement that the output data structure is duplicate free if other elements are added subsequently?
Do you know how many elements are likely to be in the input dataset?
Can you measure the size of the input dataset? (Or is it provided via an iterator?)
Does space utilization matter?
These can all effect your choice of data structure.

Java List:
If you don't have such requirement that you have to keep duplicate or not. Then you can use List instead of Set.
List is an interface in Collection framework. Which extends Collection interface. and ArrayList, LinkedList is the implementation of List interface.
When to use ArrayList or LinkedList
ArrayList: If you have such requirement that in your application mostly work is accessing the data. Then you should go for ArrayList. because ArrayList implements RtandomAccess interface which is Marker Interface. because of Marker interface ArrayList have capability to access the data in O(1) time. and you can use ArrayList over LinkedList where you want to get data according to insertion order.
LinkedList: If you have such requirement that your mostly work is insertion or deletion. Then you should use LinkedList over the ArrayList. because in LinkedList insertion and deletion happen in O(1) time whereas in ArrayList it's O(n) time.
Java Set:
If you have requirement in your application that you don't want any duplicates. Then you should go for Set instead of List. Because Set doesn't store any duplicates. Because Set works on the principle of Hashing. If we add object in Set then first it checks object's hashCode in the bucket if it's find any hashCode present in it's bucked then it'll not add that object.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.