How to deep copy a hashmap when working with multiple threads - java

In my application I have two threads. Thread 1 is transferring data to Thread 2. After the data is transferred the data in thread 1 is cleared within thread 2. Thread 1 goes on its merry way placing more data in the HashMap as it comes in to be transferred to Thread 2 later. In the meantime, Thread 2 does what it needs to do with the data. The code I have below is the part in thread 2 where that data transfer between threads happens. The entire application works just fine, but my questions is, is there a better way to make this copy of the thread 1 data for thread 2 without using the keyword new to create a whole new object?
I figure doing this might cause more garbage collections to occur? Should I not worry about this?
synchronized(this){
// Make a copy of the data map then clear it.
cachedData = new HashMap<String,ArrayList<Float>>(data);
data.clear();
}

So if you are accessing this data HashMap from multiple threads then you will have to have a synchronized block on every access. Just because you are grabbing a cached copy here does not mean that other threads get to use the data without synchronization.
If you want to have concurrent usage of HashMap without having to synchronize around each usage then you should be using a ConcurrentHashMap.
The entire application works just fine, but my questions is, is there a better way to make this copy of the thread 1 data for thread 2 without using the keyword new to create a whole new object?
Taking into account the cautions I mentioned above, if you want to take a snapshot of a HashMap so you can work with the contents in a specific thread then the pattern you mention is fine and is often used. This pattern is also used when you need to iterate through a Collection and modify it inside of the loop but without doing an iterator.remove().
If you just need the keys or the values then make sure to take a copy of the data.keySet() or data.values() instead.

Why not just:
synchronized(this){
cachedData = data;
data = new HashMap<String,ArrayList<Float>>();
}
This is similar to what you have, but involves no copying of the data.
I wouldn't worry about the new too much (not unless you can prove through profiling that it's a problem).

Related

Can all Asynchronous tasks read a ArrayList at sametime without anydelay

In my Android app, I have a arrayList as mentioned below.
Public List<String> prefCoinList = new ArrayList<String>() ;
I will be executing 10 asynchronous task using THREAD_POOL_EXECUTOR as mentioned below.
new asyncTask().executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR, order);
Each asynchronous task will only read the arrayList "prefCoinList" and checks for a particular value.
Question
Will all the 10 Asynchronous task will run without any deadlocks on the arrayList "prefCoinList" ?
Any thread locks / hanging issue will be there ?
If reading the arraylist at same time possible then will each thread get their own copy (or) all thread will wait and read the arraylist when they get their turn ?
If the list is not modified after construction then you can access it from multiple threads without issue. Each thread will read the single copy of the object.
If the list is modified occasionally then you could use a ReadWriteLock or other access control mechanism.
Let me asnwer each question.
1) Yeah, all 10 asynchronous calls will run without any problem, it is not a lot of info to be process.
2)You will not have thread locks, you only have to take care of use runOnUIThread because at that moment it stops the asynchronous call.
3) Java functions work passing the reference of the object. So, you have two options. You can pass to your asynchronous function the same array but you can't modify it, or you can create a copy of the array to each function to avoid problems.

How to know that no operation is running on ConcurrentHashMap or in Idle state in JAVA?

I have a situation like, whenever my ConcurrentHashMap updates I need to clear an existing file and write the entire data into the File again. So every time I update clearing the file and writing the data again into the file causes high latency. So I am thinking that whenever my hashmap is in Idle state, like if no updating operation is going on then I will write the entire data into the file, else I will wait until the hashmap is idle.
Basically, I will be deleting Strings continuously from the Map. So everytime I delete a String from the HashMap writing to the file is a very costly operation. So is there a way to know that no deletion operation is going on the ConcurrentHashMap?
So is there a way to know that no deletion operation is going on the ConcurrentHashMap?
Short answer: no there isn't a way.
But even if there wasn't you would still get into problems. For example, suppose that new updates arrived immediately after you started clearing / writing.
I think the solution is to use two maps and a queue.
When an update request happens:
perform the update on the concurrent hashmap
add the request to the queue
In a background thread:
pull requests from the queue, and perform updates on the second (shadow) hashmap
periodically or based on some other criteria, cease pulling requests and flush the shadow hashmap to the file.
The primary hashmap is always updated quickly, and is always up to date. Operations updating and using the primary hashmap do not get (significantly) blocked.
The queue provides request buffering while the shadow hashmap is being written.
The second hashmap is only accessed by one thread, so it doesn't need to be concurrent. Therefore it will be faster.
The state of the file will typically be a little behind the primary hashmap. But that is inevitable. The only way to avoid that is to block updates to the primary map ... which is what you are trying to avoid.
Another way to approach this would be to make writing to the file faster. I suspect that the reason it is slow is because your current design requires you to clear and rewrite the file each time. Another approach would be to write only the changes to the file. This means you may have more work to do on restart ... assuming the purpose of the file is to record the map state so that you can restart.
Sounds like you would need to make use of encapsulation by wrapping the ConcurrentHashMap in a class and possibly have add/remove methods with a Queue. Look at the java.util.concurrent package for other options.
The Idea would be to use a Queue. Every access to the Map would go by calling the add/remove from the wrapper and add that to the queue. Then there would be an infinite Thread loop consuming the queue. While doing that you can check if the queue is empty and do the file persisting.

Are simultaneous reads from an array thread-safe?

I have an array which contains integer values declared like this:
int data[] = new int[n];
Each value needs to be processed and I am splitting the work into pieces so that it can be processed by separate threads. The array will not be modified during processing.
Can all the processing threads read separate parts of the array concurrently? Or do I have to use a lock?
In other words: is this work order thread-safe?
Array is created and filled
Threads are created and started
Thread 0 reads data[0..3]
Thread 1 reads data[4..7]
Thread 2 reads data[8..n]
Reading contents of an array (or any other collection, fields of an object, etc.) by multiple threads is thread-safe provided that the data is not modified in the meantime.
If you fill the array with data to process and pass it to different threads for reading, then the data will be properly read and no data race will be possible.
Note that this will only work if you create the threads after you have filled the array. If you pass the array for processing to some already existing threads without synchronization, the contents of the array may not be read correctly. In such case the method in which the thread obtains the reference to the array should be synchronized, because a synchronized block forces memory update between threads.
On a side note: using an immutable collection may be a good idea. That way you ensure no modification is even possible. I would sugges using such wrapper. Check the java.util.concurrent.atomic package, there should be something you can use.
As long as the threads don't modify the contents in the array, it is fine to read the array from multiple threads.
If you ensure all the threads are just reading, its thread safe. Though you should not be relying on that fact alone and try to make your array immutable via a wrapper.
Sure, if you just want to read it, pass the array to the threads when you create them. There won't be any problem as long as you don't modify it.
Reading from array array is Thread-Safe Operation but if you are Modifying array than consider using class AtomicIntegerArray.
Consider populating a ConcurrendLinkedQueue and have each thread pull from it. This would ensure that there are no concurrency issues.
Your threads would each be pulling their data from the top of the queue and processing it.

Data structure for non-blocking aggregation of Thread values?

Background:
I have a large thread-pool in java each process has some internal state.
I would like to gather some global information about the states -- to do that I have an associative commutative aggregation function (e.g. sum -- mine needs to be plug-able though).
The solution needs to have a fixed memory consumption and be log-free in best case not disturbing the pool at all. So no thread should need to require a log (or enter a synchronized area) when writing to the data-structure. The aggregated value is only read after the threads are done, so I don't need an accurate value all the time. Simply collecting all values and aggregate them after the pool is done might lead to memory problems.
The values are going to be more complex datatypes so I cannot use AtomicInteger etc.
My general Idea for the solution:
Have a log-free collection where all threads put their updates to. I don't even need the order of the events.
If it gets to big run the aggregation function on it (compacting it) while the threads continue filling it.
My question:
Is there a data structure that allows for something like that or do I need to implement it from scratch? I couldn't find anything that directly matches my problem. If I have to implement from scratch what would be a good non-blocking collection class to start from?
If the updates are infrequent (relatively speaking) and the aggregation function is fast, I would recommend aggregrating every time:
State myState;
AtomicReference<State> combinedState;
do
{
State original = combinedState.get();
State newCombined = Aggregate(original, myState);
} while(!combinedState.compareAndSet(original, newCombined));
I don't quite understand the question but I would, at first sight, suggest an IdentityHashMap where keys are (references to) your thread objects and values are where your thread objects write their statistics.
An IdentityHashMap only relies on reference equality, as such there would never be any conflict between two thread objects; you could pass a reference to that map to each thread (which would then call .get(this) on the map to get a reference to the collecting data structure), which would then collect the data it wants. Otherwise you could just pass a reference to the collecting data structure to the thread object.
Such a map is inherently thread safe for your use case, as long as you create the key/value pair for that thread before starting the thread, and because no thread object will ever modify the map anyway since they won't have a referece to it. With some management smartness you can even remove entries from this map, even if the map is not even thread-safe, once the thread is done with its work.
When all is done, you have a map whose values contains all the data collected.
Hope this helps... Reading the question again, in any case...

threads accessing non-synchronised methods in Java

can I ask to explain me how threads and synchronisation works in Java?
I want to write a high-performance application. Inside this application, I read a data from files into some nested classes, which are basically a nut-shell around HashMap.
After the data reading is finished, I start threads which need to go through the data and perform different checks on it. However, threads never change the data!
If I can guarantee (or at least try to guarantee;) that my threads never change the data, can I use them calling non-synchronised methods of objects containing data?
If multiple threads access the non-synchronised method, which does not change any class field, but has some internal variables, is it safe?
artificial example:
public class Data{
// this hash map is filled before I start threads
protected Map<Integer, Spike> allSpikes = new HashMap<Integer, Spike>();
public HashMap returnBigSpikes(){
Map<Integer, Spike> bigSpikes = new HashMap<Integer, Spike>();
for (Integer i: allSpikes.keySet()){
if (allSpikes.get(i).spikeSize > 100){
bigSpikes.put(i,allSpikes.get(i));
}
}
return bigSpikes;
}
}
Is it safe to call a NON-synchronised method returnBigSpikes() from threads?
I understand now that such use-cases are potentially very dangerous, because it's hard to control, that data (e.g., returned bigSpikes) will not be modified. But I have already implemented and tested it like this and want to know if I can use results of my application now, and change the architecture later...
What happens if I make the methods synchronised? Will be the application slowed down to 1 CPU performance? If so, how can I design it correctly and keep the performance?
(I read about 20-40 Gb of data (log messages) into the main memory and then run threads, which need to go through the all data to find some correlation in it; each thread becomes only a part of messages to analyse; but for the analysis, the thread should compare each message from its part with many other messages from data; that's why I first decided to allow threads to read data without synchronisation).
Thank You very much in advance.
If allSpikes is populated before all the threads start, you could make sure it isn't changed later by saving it as an unmodifiable map.
Assuming Spike is immutable, your method would then be perfectly safe to use concurrently.
In general, if you have a bunch of threads where you can guarantee that only one thread will modify a resource and the rest will only read that resource, then access to that resource doesn't need to be synchronised. In your example, each time the method returnBigSpikes() is invoked it creates a new local copy of bigSpikes hashmap, so although you're creating a hashmap it is unique to each thread, so no sync'ing problems there.
As long as anything practically immutable (eg. using final keyword) and you use an unmodifiableMap everything is fine.
I would suggest the following UnmodifiableData:
public class UnmodifiableData {
final Map<Integer,Spike> bigSpikes;
public UnmodifiableData(Map<Integer,Spike> bigSpikes) {
this.bigSpikes = Collections.unmodifiableMap(new HashMap<>(bigSpikes));
}
....
}
Your plan should work fine. You do not need to synchronize reads, only writes.
If, however, in the future you wish to cache bigSpikes so that all threads get the same map then you need to be more careful about synchronisation.
If you use ConcurrentHashMap, it will do all syncronization work for you. Its bettr, then making synronization around ordinary HashMap.
Since allSpikes is initialized before you start threads it's safe. Concurrency problems appear only when a thread writes to a resource and others read from it.

Categories

Resources