Sorry if this was asked before, but I could not find my exact scenario.
Currently I have a background thread that adds an element to a list and removes the old data every few minutes. Theoretically there can be at most 2 items in the list at a time and the items are immutable. I also have multiple threads that will grab the first element in the list whenever they need it. In this scenario, is it necessary to explicitly serialized operations on the list? My assumption that since I am just grabbing references to the elements, if the background thread deletes elements from the list, that should not matter since the thread already grabs a copy of the reference before the deletion. There is probably a better way to do this. Thanks in advanced.
Yes, synchronization is still needed here, because adding and removing are not atomic operations. If one thread calls add(0, new Object()) at the same time another calls remove(0), the result is undefined; for example, the remove() might end up having no effect.
Depending on your usage, you might be able to use a non-blocking list class like ConcurrentLinkedQueue. However, given that you are pushing one change every few minutes, I doubt you are gaining much in performance by avoiding synchronization.
Related
I'm looking for a fast Set or Map implementation that has a weaker thread-safety in favor of speed.
The idea is to have a data structure that can quickly be checked whether it contains a (Long) entry at best without thread-synchronization. It is okay if a new entry that is written by another thread becomes visible to the other threads at a later time.
I know already that the non thread-safe HashSet Java standard implementation may disrupt the datastructures while inserting a new element and a reader thread ends up in an endless loop during lookup.
I also know that whenever the writing methods are using synchronized blocks, all reader methods should be synchronized as well in a multi-threaded implementation.
So my ultimate goal is to find a possibility to insert in O(1) and lookup in O(1) where
the inserts might get queued in some way for bulk insert at a sync-point (if there is no other possibility)
the read does not get stuck but should not need to wait (for any writers)
the inserted element should be visible to any subsequent reads of the thread that added the element (which might prevent the aforementioned queue)
I am experimenting with Longs that represent hash-codes mapping to Lists of usually one, sometimes two or more entries.
Is there a way to achieve this e.g. via an array and compare-and-exchange and which is faster than using the ConcurrentHashMap?
How would a sketched implementation look like given that the input consists of node-ids (type Long) of a graph that is traversed with multiple threads that somehow exchange information which nodes have been visited already (as described in the list above).
I really appreciate any comments and ideas,
thanks in advance!
Edit: Added extended information on the actual task that I am doing some hobby-research on and which led me to asking the question here in the forum.
I have a Java program using a CopyOnWriteArrayList that is being iterated through by an update() method in a thread that runs on a certain time interval and that processes each item in the list. The items in the list are not thread-safe, so I'm processing modifications to the items using various CopyOnWriteArrayList buffers that are used to know which items to modify and how when the update() method iterates through the list.
The issue is that once I've used those buffers, I clear() them, but am afraid that an item may have been added to the buffer between when the buffer is used and when it's cleared.
For example:
mainList.addAll(bufferAddList);
bufferAddList.clear();
Example 2:
for (Item item : mainList) { // Main iterator loop
if (bufferModItemList.contains(item)) {
item.modify();
}
// Do other stuff
}
bufferModItemList.clear();
I'm afraid that using synchronize around the code blocks that read/modify those lists would cause the main thread to lock up (which where calls to modifications would come from), because the processing that happens during iteration through the items takes a while (Python scripts are called and waited on). That's why I'm using the buffers in the first place.
Is there a better way to do this than either of the ways I've mentioned?
Note that both examples show code that would run in the update() method, which is in the thread's "infinite" while loop.
UPDATE
It appears that using ConcurrentLinkedQueue will satisfy the issue with Example 2. However, it seems as though using it for the first example would be overkill, since I essentially just want to add all the nodes to the mainList. It would be great if ConcurrentLinkedQueue had a .pollAll() method!
Also, for the following example, I'd have to iteratively remove items from mainList, which I don't want to do since it's a CopyOnWriteArrayList. Although I'm not sure it needs to be one any more...
Example 3:
mainList.removeAll(bufferRemoveList);
bufferRemoveList.clear();
I have a class which has:
2 fields holding time-ordered list (list1, list2).
3 read-only methods which iterate above lists to
generate summary statistics.
1 mutating method, which looks for a match of given 'new-item' in list1. If match is not found, it adds 'new-item' to list1. If match is found, it removes the match from list1 and adds both match and 'new-item' to list2.
Lets assume that multiple concurrent invocation of all methods are possible. I need to achieve thread-safety while maximising performance.
Approach1 (extremely slow) - Declare field-types as ArrayList and use synchronise keyword on all methods.
Approach2 - Declare field-type as CopyOnWriteArrayList and synchronise the mutating method.
Questions
Does Approach2 ensure thread-safety?
Are there better alternatives?
Do you need the random access offered by an ArrayList? Can you instead use a thread-safe ordered collection like ConcurrentSkipListSet (non-blocking) or PriorityBlockingQueue (blocking)? Both have log(n) insertions.
Mutation methods in both cases are thread-safe.
Edit: Just note, you would still run into atomicity concerns. If you need the add's to be done attomically then you would need more coarse locking.
Approach number 2 does not guarantee thread-safety.
The two operations on collections are not atomic: first you remove an item, then you add it to the other collection. Some thread might in the meantime execute a read-only method to find out that the item is missing in list 1, and is not yet added to the list 2. It depends on your application whether this is acceptable.
On the other hand, it is also possible that: a read-only method first iterates through list 1, and finds that it contains item x; in the meantime the updating method executes and transfers item x; the read-only method continues and iterates through list 2 in which it finds item x one more item. Again, it depends on your application whether this is acceptable.
Other solutions are possible, but that would require more details about what are you trying to achieve exactly.
One obvious way would be to modify approach number 1, and instead of using synchronized on every method, use a readers-writer lock. You would read-lock in every read-only method and write-lock in the mutating one.
You could also use two separate readers-writer locks. One for the first collection and one for the other. If your read-only methods iterate through both of the lists, they would have to read-acquire both of the locks up front, before doing anything. On the other hand the mutating method would have to first write-acquire the first lock, and if it wishes to transfer an item, then it should write-acquire the second lock.
You'd need to do some testing to see if it works nicely for you. Still there are definitely even better ways to handle it, but you'd need to provide more details.
The time it takes to lock a method is less than a micro-second. If a fraction of a micro-second matters, you might consider something more complex, both otherwise something simple is usually better.
Just using thread safe collection is not enough when you perform multiple operations, e.g. remove from one list and add to another is two operations, and any number of thread can get in between those operations.
Note: if you do lots of updates this can be slower.
Project background aside, I've implemented a table of custom JComboBoxes. Each row of ComboBoxes is exclusive: while each ComboBox has its own model (to allow different selections), each choice can only be selected once per row. This is done by adding a tag to the front of an item when selected and removing it again when deselected. If a user tries to select a tagged item, nothing happens.
However, this only works when using a Vector as the backing for the list of options. I can get the Vector of strings, use either set() or setElementAt(), and boom presto it works.
With an ArrayList instead of a Vector, however, this doesn't work at all. I was under the impression that ArrayLists functioned similarly in that I can retrieve an anonymous ArrayList, change its contents, and all other objects relying on the contents of that ArrayList will update accordingly, just like the Vector implementation does.
I was hoping someone could tell me why this is different, as both Vector and ArrayList implement List and supposedly should have similar behavior.
EDIT:
Thanks for the prompt responses! All answers refer to synchronization disparities between ArrayList and Vector. However, my project does not explicitly create new threads. Is it possible that this is a synchronization issue between my data and the Swing thread? I'm not good enough with threads to know...
2nd EDIT:
Thanks again everybody! The synchronization between data and Swing answers my question readily enough, though I'd still be interested in more details if there's more to it.
I suspect the difference is due to Vector being thread-safe and ArrayList not. This affects the visibility of changes to its elements to different threads. When you change an element in a Vector, the change becomes visible to other threads instantly. (This is because its methods are synchronized using locks, which create a memory barrier, effectively synchronizing the current state of the thread's memory - including the latest changes in it - with that of other threads.) However, with ArrayList such synchronization does not automatically happen, thus the changes made by one thread may become visible to other threads only later (and in arbitrary order), or not at all.
Since Swing is inherently multithreadedd, you need to ensure that data changes are visible between different (worker, UI) threads.
Vector is synchronized. It uses the synchronized keyword to ensure that all threads that access it see a consistent result. ArrayList is not synchronized. When one thread sets an element of an ArrayList there is no guarantee that another thread will see the update.
Access to Vector elements are synchronized, whereas its not for an ArrayList. If you have different threads accessing and modifying the lists, you will see different behavior between the two.
I don't have time to test this code, and your code sample is still really light (a nice fully functional sample would be more helpful - I don't want to write a full app to test this) but I'm willing to bet that if you wrapped your call to 'setSelectDeselect' (as shown in your pastebin) like this then ArrayList would work as well as Vector:
Runnable selectRunnable = new Runnable()
{
public void run()
{
setSelectDeselect(cat, itemName, selected);
}
};
SwingUtilities.invokeLater(selectRunnable);
You're updating your ArrayList in the middle of event processing. The above code will defer the update until after the event is complete. I suspect there's something else at play here that would be apparent from reviewing the rest of your code.
My program has 100 threads.
Every single thread does this:
1) if arrayList is empty, add element with certain properties to it
2) if arrayList is not empty, iterate through elements found in arrayList, if found suitable element (matching certain properties), get it and remove the arrayList
The problem here is that while one thread is iterating through the arrayList, other 99 threads are waiting for the lock on arrayList.
What would you suggest to me if I want all 100 threads to work in lock-less condition? So they all have work to do?
Thanks
Have you looked at shared vs exclusive locking? You could use a shared lock on the list, and then have a 'deleted' property on the list elements. The predicate you use to check the list elements would need to make sure the element is not marked 'deleted' in addition to whatever other queries you have - also due to potential read-write conflicts, you would need to lock on each element as you traverse. Then periodically get an exclusive lock on the list to perform the deletes for real.
The read lock allows for a lot of concurrency on the list. The exclusive locks on each element of the list are not as nice, but you need to force the memory model to update your 'deleted' flag to each thread, so there's no way around that.
First if you're not running on a machine that has 64 cores or more your 100 threads are probably a performance hog in themselves.
Then an ArrayList for what you're describing is certainly not a good choice because removing an element does not run in amortized constant time but in linear time O(n). So that's a second performance hog. You probably want to use a LinkedList instead of your ArrayList (if you insist on using a List).
Now of course I doubt very much that you need to iterate over your complete list each time you need to find one element: wouldn't another data structure be more appropriate? Maybe that the elements that you put in your list have such a concept as "equality" and hence a Map with an O(1) lookup time could be used instead?
That's just for a start: as I showed you, there are at least two serious performances issues in what you described.... Maybe you should clarify your question if you want more help.
If your notion of "suitable element (matching certain properties)" can be encoded using a Comparator then a PriorityBlockingQueue would allow each thread to poll the queue, taking the next element without having to search the list or enqueuing a new element if the queue is empty.
Addendum: Thilo raise an essential point: As your approach evolves, you may want to determine empirically how many threads are optimal.
The key is to only use the object lock on arraylist when you actually need to.
A good idea would be to subclass arraylist and provide synchro on single read + write + delete processes.
This will ensure fine granularity with the locking while allowing the threads to run through the array list while protecting the semantics of the arraylist.
Have a single thread own the array and be responsible for adding to it and iterating over it to find work to do. Once a unit of work is found, put the work on a BlockingQueue. Have all your worker threads use take() to remove work from the queue.
This allows multiple units of work to be discovered per pass through the array and they can be handed off to waiting worker threads fairly efficiently.