I was thinking of how I would go about implementing a thread-safe RingBuffer in Java and Android (as for some reason there is none, even after all these years, not even a circular queue. So, no (Circular/Ring)ByteBuffer, nor (Circular/Ring)(Buffer/Queue).
Even majority of the RingBuffer implementations that are third party are said to be not thread safe, which makes me think it really isn't as simple as I think it is going to be. What I was thinking about was doing something like this:
Have an Object (say RingBufferPosition) that encapsulates both the Head and Tail position.
Have the RingBuffer maintain an AtomicReference to the RingBufferPosition
When a thread adds something, it will create a temporary (unfortunately, I don't know enough of Java to determine this, but "Stack-allocated") object, which will be recycled over and over, updating it with the new updated head and tail, until it can CAS successfully.
When a thread removes something, it will do similar to adding something.
Everything is accessed in an array allocated to the max length, hence, the head and tail can access/update the current element in O(1) time.
Would this work, and better yet, would it yield any benefits over simply synchronizing access to the collection?
A small code sample/pseudocode (has not been run yet, and I do not even know how to remotely test an atomic data structure, I plan on using it for buffering/streaming media but I haven't gotten that far yet as I need to create this first) can be found here. I have comments/documentation that details my concerns there.
Lastly, to address a possible "Why" question, as in "Why do you need such performance", I'll be truthful. I have always found data structures, especially atomic/lock-free data structures very interesting, and I found this as a very good exercise to learn, plus I always wanted to create a Ring Buffer. I could have just "synchronized" everything, however I do also value performance.
Multiple reader/multiple writer ring buffers are tricky.
Your way doesn't work, because you can't update that start/end position AND the array contents atomically. Consider adding to the buffer: If you update the end position first, then there is a moment before you update the array when the buffer contains an invalid item. If you update the array first, then there's nothing to stop simultaneous additions from stomping on the same array element.
There are lots of ways to deal with these problems, but the various ways have different trade-offs, and you have better options available if you can get rid of the multiple reader or multiple writer requirement.
If I had to guess at why we don't have a concurrent ring buffer in the standard library, I'd say it's because there is no one best way to implement it that is going to be good for most scenarios. The data structure used for ConcurrentLinkedQueue, in contrast, is simple and elegant and an obvious choice when a concurrent linked list is required.
Related
I have an array list that I am using in my multi-threaded application, and I want a way to somehow be able to iterate through the array while not causing any exceptions to be thrown when I add a element to the list as I am iterating. Is there a way to stop the array list from being modified while I iterate through it?
Edit:
I now realize that my question was very poorly submitted, and the down votes are deserved. This is an attempt to fix my question.
What I want to do is have some way to 'block' the list before I iterate through it so that I don't get a concurrent modification exception. The issue is that the iterating will take quite a bit of processor time to complete, because the list will be very large, and the action I wish to carry out on each element will take a fair amount of time. This is an issue if I use synchronized methods, because the add method will block for a large amount of time and decrease application performance. So what I am trying to do is create a class that imitates an array list, except that when it is 'blocked', and a method tries to modify it, it will store that request, and when the list is 'unblocked' it will perform all the requests in a separate thread. The issue is that when I try to implement this strategy, I have to store the requests in some sort of list, and run into the same problems I do before, with having to block the ability to add to the request list while the request thread is iterating over the requests. I'm at a loss for how to implement this solution, or even if it is the correct one. if anyone could help me that would be much appreciated.
Your options are either to work with synchronization or use implementation provided in package java.util.concurrent. You also need to read up on the issue. you are asking a very fundamental and classical question and there is a LOT of info on this issue.But here are your basic options:
Use syncronization - veru expensive performance wise but is absolutely bullet-proof. read up in synchronizedterm or read up on Lock interface and its implementations. Also see about Semaphore class. Note that this option would create a very serious bottle neck in your performance
As one of the comments said use CopyOnWriteArrayList class. Also has some draw backs but in majority of cases is a better option then full synchronization.
When choosing between the 2 options consider the following points: while synchranization is a bullet-proof solution if done right, it is a very tidious work that demands a lot of testing which is not trivial. So this is already a big draw back. Plus in majority of cases reads outnumber writes or Array size is small enough (say up to few hundreds elements) where copy on write is acceptable. So it is my guess that majority of cases CopyOnWriteArrayList would be preferable. The point is that there is no clear cut answer when choosing between the two options above. A programmer needs to look at the circumstances and choose the option that fits his/her case better
I am currently designing around a big memory index structure (several giga bytes). The index is actually a RTree which leafes are BTrees (dont ask). It supports a special query and pushes it to the logical limit.
Since those nodes are soley search nodes I ask my self how to best make it parallel.
I know of six solutions so far:
Block reads when a write is scheduled. The tree is completely blocked until the last read is finished and then the write is performed and after the write the tree can yet again used for multiple reads. (reads need no locking).
Clone Nodes to change and reuse existing nodes (including leafs) and switch between both by simply yet again stop reads switch and done. Since leaf pointers must be altered also the leaf pointers might become their own collection making it possible to switch modifications atomar and changes can be redo to a second version to avoid copy of the pointer on each insert.
Use independent copies of the index like double buffering. Update one copy of the index, switch it. Once noone reads the old index, alter this index in the same way. This way the change can be done without blocking existing reads. If another insert hits the tree in a reasonable amount of time these changes can also be done.
Use a serial share nothing architecture so each search thread has its own copy. Since a thread can only alter its tree after a single read is performed, this would be also lock free and simple. Due reads are spread evenly for each worker thread (being bound to a certain core), the throughput would not be harmed.
Use write / read locks for each node being about to be written and do only block a subtree during write. This would involve additional operations against the tree since splitting and merging would propagate upwards and therefore require a repass of the insert (since expanding locks upwards (parentwise) would introduce the chance of a deadlock). Since Split and Merge are not that frequent if you have a higher page size, this would also be a good way. Actually currently my BTree implementation currently uses a similar mechanism by spliting a node and reinsert the value unless no split is needed (which is not optimal but more simple).
Use double buffer for each node like the shadow cache for databases where each page is switched between two versions. So everytime a node is modified a copy is modified and once a read is issued the old versions are used or the new one. Each node gets a version number and the version that is more close to the active version (latest change) is choosen. To switch between to version, one needs only an atomar change on the root information. This way the tree can be altered and used. This swith can be done every time but it must be ensured that no read is using the old version when overriding the new one. This method has the possibility to not interfer with cache locality in order to link leafs and alike. But it also requires twice the amount of memory since a back buffer must be present but saves allocation time and might be good for a high frequency of changes.
With all that thoughts what is best? I know it depends but what is done in the wild? If there are 10 read threads (or even more) and being blocked by a single write operation I guess this is nothing I really want.
Also how about L3, L2 and L1 cache and in scenarios with multiple CPUs? Any issues on that? The beauty of the double buffering is the chance that those reads hitting the old version are still working with the correct cache version.
The version of creating a fresh copy of a node is quite not appealing. So what is meet in the wild of todays database landscapes?
[update]
By rereading the post, I wonder if using the write locks for split and merge would be better suited by creating replacement nodes since for a split and a merge I need to copy somewhat the half of elements around, those operations are very rare and so actually copy a node completely would do the trick by replacing this node in the parent node which is a simple and fast operation. This way the actual blocks for reads would be very limited and since we create copies anyway, the blocking only happens when the new nodes are replaced. Since during those access leafs may not be altered it is unimportant since the information density has not changed. But again this needs for every access of a node a increment and decrement of a read lock and checking for intended write locks. This all is overhead and this all is blocking further reads.
[Update2]
Solution 7. (currently favored)
Currently we favor a double buffer for the internal (non-leaf) nodes and use something similar to row locking.
Our logical tables that we try to decompose using those index structure (which is all a index does) results in using algebra of sets on those information. I noticed that this algebra of sets is linear (O(m+n) for intersection and union) and gives us the chance to lock each entry being part of such operation.
By double buffering the internal nodes (which is not hard to implement nor does it cost much (about <1% memory overhead)) we can live problem free on that issue not blocking too much read operations.
Since we batch modifications in a certain way it is very rarely seen that a given column is updated but once it is, it takes more time since those modifications might go in the thousands for this single entry.
So the goal is to alter the algebra of sets used to simply intersect those columns being currently modified later on. Since only one column is modified at a time such operation would only block once. And for everyone currently reading it, the write operation has to wait. And guess what, once a write operation waits, it usually lets another write operation of another column taking place that is not bussy. We calculate the propability of such a block to be very very low. So we dont need to care.
The locking mechanism is done using check for write, check for write intention, add read, check for write again and procced with the read. So there is no explicit object locking. We access fixed areas of bytes and if the structure is clear everything critical is planed to move into a c++ version to make it somewhat faster (2x we guess and this only takes one person one or two weeks to do especially if you use a Java to C++ translator).
The only effect that is now also important might be the caching issue since it invalidates L1 caches and maybe L2 too. So we plan to collect all modifications on such a table / index to be scheduled to run within 1 or more minutes timeshare but be evenly distributed to not make a system that has performance hickhups.
If you know of anything that helps us please go ahead.
As noone replied I would like to summarize what we (I) finally did. The structure is now separated. We have a RTree which leaf are actually Tables. Those tables can be even remote so we have a distribution way that is mostly transparent thanks to RMI and proxies.
The rest was simply easy. The RTree has the way to advise a table to split and this split is again a table. This split is been done on a single maschine and transfered to another if it has to be remote. Merge is almost similar.
This remote also is true for threads bound to different CPUs to avoid cache issues.
About the modification in memory it is as I already suggested. we duplicate internal nodes and turned the table 90° and adapted the algebraic set algorithms to handle locked columns efficiently. The test for a single table is simple and compared to the 1000ends of entries per column not a performance issue after all. Deadlocks are also impossible since one column is used at a time so there is only one lock per thread. We experiment with doing columns in parallel which would increase the response time. We also think about binding columns to a given virtual core so there is no locking again since the column is in isolation and yet again the modification can be serialized.
This way one can utilize 20 cores and more per CPU and also avoid cache misses.
Basically, I have some data structure of a ton of objects, and this structure will be accessed by multiple threads and will need to account for that.
A lot of iteration and object manipulation will need to be performed constantly (each main loop iteration can result in every single object in the data structure being modified in a worst case, nothing modified in best/normal case).
Currently, I am using a CopyOnWriteArrayList as my structure. Additionally, on each iteration, I make sure not to add duplicates, in an attempt to keep the size of the list down.
Using locks/synchronized is not ideal as I want to avoid holding up the threads for these operations.
As far as I can tell, my options for this are as follows:
Run a contains() check for each element to be added
Create a HashSet from the list and convert it back (essentially removing all duplicates)
Use a ConcurrentHashMap instead of a list for the data structure
Something else?
I am aware that ArrayLists are much better with iteration while object manipulation and duplicate checking are better handled by strictly using a HashMap. Since my case will need both, I'm wondering what the best solution is here.
I should also mention that the ordering of the elements is a non-issue.
Edit: To clarify this further, the collection will be having elements constantly added, removed, and modified. To which degree depends on each specific run time (based off of generally random events), so I'm cautious about making any assumptions about how often it will occur. The only thing that is guaranteed to happen is that the collection will be iterated through completely each time, performing multiple checks on each element.
This answer addresses your concurrency concerns:
A lot of iteration and object manipulation will need to be performed constantly (each main loop iteration can result in every single object in the data structure being modified in worst case, nothing modified in best/normal case).
Will the collection be modified? If not just choose which ever collection makes most sense and synchronize on the objects. Once they are inside the collection you get no synchronization benefits from the CopyOnWriteArraylist or ConcurrentHashMap.
If the collection will be modified the follow up is, how often?
If a lot do not use a CopyOnWriteArrayList. If a little then choose based on highest search performance.
All,
I have been going through a lot of sites that post about the performance of various Collection classes for various actions i.e. adding an element, searching and deleting. But I also notice that all of them provide different environments in which the test was conducted i.e. O.S, memory, threads running etc.
My question is, if there is any site/material that provides the same performance information on best test environment basis? i.e. the configurations should not be an issue or catalyst for poor performance of any specific data structure.
[Updated]: Example, HashSet and LinkedHashSet both have a complexity of O (1) for inserting an element. However, Bruce Eckel' test claims that insertion is going to take more time for LinkedHashSet than for HashSet [http://www.artima.com/weblogs/viewpost.jsp?thread=122295]. So should I still go by the Big-Oh notation ?
Here are my recommendations:
First of all, don't optimize :) Not that I am telling you to design crap software, but just to focus on design and code quality more than premature optimization. Assuming you've done that, and now you really need to worry about which collection is best beyond purely conceptual reasons, let's move on to point 2
Really, don't optimize yet (roughly stolen from M. A. Jackson)
Fine. So your problem is that even though you have theoretical time complexity formulas for best cases, worst cases and average cases, you've noticed that people say different things and that practical settings are a very different thing from theory. So run your own benchmarks! You can only read so much, and while you do that your code doesn't write itself. Once you're done with the theory, write your own benchmark - for your real-life application, not some irrelevant mini-application for testing purposes - and see what actually happens to your software and why. Then pick the best algorithm. It's empirical, it could be regarded as a waste of time, but it's the only way that actually works flawlessly (until you reach the next point).
Now that you've done that, you have the fastest app ever. Until the next update of the JVM. Or of some underlying component of the operating system your particular performance bottleneck depends on. Guess what? Maybe your clients have different ones. Here comes the fun: you need to be sure that your benchmark is valid for others or in most cases (or have fun writing code for different cases). You need to collect data from users. LOTS. And then you need to do that over and over again to see what happens and if it still holds true. And then re-write your code accordingly over and over again (The - now terminated - Engineering Windows 7 blog is actually a nice example of how user data collection helps to make educated decisions to improve user experience.
Or you can... you know... NOT optimize. Platforms and compilers will change, but a good design should - on average - perform well enough.
Other things you can also do:
Have a look at the JVM's source code. It's very educative and you discover a herd of hidden things (I'm not saying that you have to use them...)
See that other thing on your TODO list that you need to work on? Yes, the one near the top but that you always skip because it's too hard or not fun enough. That one right there. Well get to it and leave the optimization thingy alone: it's the evil child of a Pandora's Box and a Moebius band. You'll never get out of it, and you'll deeply regret you tried to have your way with it.
That being said, I don't know why you need the performance boost so maybe you have a very valid reason.
And I am not saying that picking the right collection doesn't matter. Just that ones you know which one to pick for a particular problem, and that you've looked at alternatives, then you've done your job without having to feel guilty. The collections have usually a semantic meaning, and as long as you respect it you'll be fine.
In my opinion, all you need to know about a data structure is the Big-O of the operations on it, not subjective measures from different architectures. Different collections serve different purposes.
Maps are dictionaries
Sets assert uniqueness
Lists provide grouping and preserve iteration order
Trees provide cheap ordering and quick searches on dynamically changing contents that require constant ordering
Edited to include bwawok's statement on the use case of tree structures
Update
From the javadoc on LinkedHashSet
Hash table and linked list implementation of the Set interface, with predictable iteration order.
...
Performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list, with one exception: Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity. Iteration over a HashSet is likely to be more expensive, requiring time proportional to its capacity.
Now we have moved from the very general case of choosing an appropriate data-structure interface to the more specific case of which implementation to use. However, we still ultimately arrive at the conclusion that specific implementations are well suited for specific applications based on the unique, subtle invariant offered by each implementation.
What do you need to know about them, and why? The reason that benchmarks show a given JDK and hardware setup is so that they could (in theory) be reproduced. What you should get from benchmarks is an idea of how things will work. For an ABSOLUTE number, you will need to run it vs your own code doing your own thing.
The most important thing to know is the Big O runtime of various collections. Knowing that getting an element out of an unsorted ArrayList is O(n), but getting it out of a HashMap is O(1) is HUGE.
If you are already using the correct collection for a given job, you are 90% of the way there. The times when you need to worry about how fast you can, say, get items out of a HashMap should be pretty darn rare.
Once you leave single threaded land and move into multi-threaded land, you will need to start worrying about things like ConcurrentHashMap vs Collections.synchronized hashmap. Until you are multi threaded, you can just not worry about this kind of stuff and focus on which collection for which use.
Update to HashSet vs LinkedHashSet
I haven't ever found a use case where I needed a Linked Hash Set (because if I care about order I tend to have a List, if I care about O(1) gets, I tend to use a HashSet. Realistically, most code will use ArrayList, HashMap, or HashSet. If you need anything else, you are in a "edge" case.
The different collection classes have different big-O performances, but all that tells you is how they scale as they get large. If your set is big enough the one with O(1) will outperform the one with O(N) or O(logN), but there's no way to tell what value of N is the break-even point, except by experiment.
Generally, I just use the simplest possible thing, and then if it becomes a "bottleneck", as indicated by operations on that data structure taking much percent of time, then I will switch to something with a better big-O rating. Quite often, either the number of items in the collection never comes near the break-even point, or there's another simple way to resolve the performance problem.
Both HashSet and LinkedHashSet have O(1) performance. Same with HashMap and LinkedHashMap (actually the former are implemented based on the later). This only tells you how these algorithms scale, not how they actually perform. In this case, LinkHashSet does all the same work as HashSet but also always has to update a previous and next pointer to maintain the order. This means that the constant (this is an important value also when talking about actual algorithm performance) for HashSet is lower than LinkHashSet.
Thus, since these two have the same Big-O, they scale the same essentially - that is, as n changes, both have the same performance change and with O(1) the performance, on average, does not change.
So now your choice is based on functionality and your requirements (which really should be what you consider first anyway). If you only need fast add and get operations, you should always pick HashSet. If you also need consistent ordering - such as last accessed or insertion order - then you must also use the Linked... version of the class.
I have used the "linked" class in production applications, well LinkedHashMap. I used this in one case for a symbol like table so wanted quick access to the symbols and related information. But I also wanted to output the information in at least one context in the order that the user defined those symbols (insertion order). This makes the output more friendly for the user since they can find things in the same order that they were defined.
If I had to sort millions of rows I'd try to find a different way. Maybe I could improve my SQL, improve my algorithm, or perhaps write the elements to disk and use the operating system's sort command.
I've never had a case where collections where the cause of my performance issues.
I created my own experimentation with HashSets and LinkedHashSets. For add() and contains the running time is O(1) , not taking into consideration for a lot of collisions. In the add() method for a linkedhashset, I put the object in a user created hash table which is O(1) and then put the object in a separate linkedlist to account for order. So the running time to remove an element from a linkedhashset, you must find the element in the hashtable and then search through the linkedlist that has the order. So the running time is O(1) + O(n) respectively which is o(n) for remove()
I have specific requirements for the data structure to be used in my program in Java. It (Data Structure) should be able to hold large amounts of data (not fixed), my main operations would be to add at the end, and delete/read from the beginning (LinkedLists look good soo far). But occasionally, I need to delete from the middle also and this is where LinkedLists are soo painful. Can anyone suggest me a way around this? Or any optimizations through which I can make deletion less painful in LinkedLists?
Thanks for the help!
A LinkedHashMap may suit your purpose
You'd use an iterator to pull stuff from the front
and lookup the entry by key when you needed to access the middle of the list
LinkedList falls down on random accesses. Deletion, without the random access look up, is constant time and so really not too bad for long lists.
ArrayList is generally fast. Inserts and removes from the middle are faster than you might expect because block memory moves are surprisingly fast. Removals and insertions near the start to cause all the following data to be moved down or up.
ArrayDeque is like ArrayList only it uses a circular buffer and has a strange interface.
Usual advice: try it.
you can try using linked list with a pointers after evey 10000th element so that you can reduce the time to find the middle which you wish to delete.
here are some different variations of linked list:
http://experimentgarden.blogspot.com/2009/08/performance-analysis-of-thirty-eight.html
LinkedHashMap is probably the way to go. Great for iteration, deque operations, and seeking into the middle. Costs extra in memory, though, as you'll need to manage a set of keys on top of your basic collection. Plus I think it'll leave 'gaps' in the spaces you've deleted, leading to a non-consecutive set of keys (shouldn't affect iteration, though).
Edit: Aha! I know what you need: A LinkedMultiSet! All the benefit of a LinkedHashMap, but without the superfluous key set. It's only a little more complex to use, though.
First you need to consider whether you will delete from the center of the list often compared to the length of the list. If your list has N items but you delete much less often than 1/N, don't worry about it. Use LinkedList or ArrayDeque as you prefer. (If your lists are occasionally huge and then shrink, but are mostly small, LinkedList is better as it's easy to recover the memory; otherwise, ArrayDeque doesn't need extra objects, so it's a bit faster and more compact--except the underlying array never shrinks.)
If, on the other hand, you delete quite a bit more often than 1/N, then you should consider a LinkedHashSet, which maintains a linked list queue on top of a hash set--but it is a set, so keep in mind that you can't store duplicate elements. This has the overhead of LinkedList and ArrayDeque put together, but if you're doing central deletes often, it'll likely be worth it.
The optimal structure, however--if you really need every last ounce of speed and are willing to spend the coding time to get it--would be a "resizable" array (i.e. reallocated when it was too small) with a circular buffer where you could blank out elements from the middle by setting them to null. (You could also reallocate the buffer when too much was empty if you had a perverse use case then.) I don't advise coding this unless you either really enjoy coding high-performance data structures or have good evidence that this is one of the key bottlenecks in your code and thus you really need it.