Concurrently accessing different members of the same object in Java

Concurrently accessing different members of the same object in Java - java

I am familiar with many of the mechanisms and idioms surrounding concurrency in Java. Where I am confused is with a simple concept: concurrent access of different members of the same object.
I have a set of variables which can be accessed by two threads, in this case concerning graphical information within a game engine. I need to be able to modify the position of an object in one thread and read it off in another. The standard approach to this problem is to write the following code:
private int xpos;
private object xposAccess;
public int getXpos() {
int result;
synchronized (xposAccess) {
result = xpos;
}
return result;
}
public void setXpos(int xpos) {
synchronized (xposAccess) {
this.xpos = xpos;
}
}
However, I'm writing a real-time game engine, not a 20 questions application. I need things to work fast, especially when I access and modify them as often as I do the position of a graphical asset. I want to remove the synchronized overhead. Even better, I'd like to remove the function call overhead altogether.
private int xpos;
private int bufxpos;
...
public void finalize()
{
bufxpos = xpos;
...
}
Using locks, I can make the threads wait on each other, and then call finalize() while the object is neither being accessed nor modified. After this quick buffering step, both threads are free to act on the object, with one modifying/accessing xpos and one accessing bufxpos.
I have already had success using a similar method where the information was copied in to a second object, and each thread acted on a separate object. However, both members are still part of the same object in the above code, and some funny things begin to happen when both my threads access the object concurrently, even when acting on different members. Unpredictable behaviour, phantom graphical objects, random errors in screen position, etc. To verify that this was indeed a concurrency issue, I ran the code for both threads in a single thread, where it executed flawlessly.
I want performance above all else, and I am considering buffering the critical data in to separate objects. Are my errors caused by concurrent access of the same objects? Is there a better solution for concurrency?
EDIT: If you are doubting my valuation of performance, I should give you more context. My engine is written for Android, and I use it to draw hundreds or thousands of graphic assets. I have a single-threaded solution working, but I have seen a near doubling in performance since implementing the multi-threaded solution, despite the phantom concurrency issues and occasional uncaught exceptions.
EDIT: Thanks for the fantastic discussion about multi-threading performance. In the end, I was able to solve the problem by buffering the data while the worker threads were dormant, and then allowing them each their own set of data within the object to operate on.

If you are dealing with just individual primitives, such as AtomicInteger, which has operations like compareAndSet, are great. They are non-blocking and you can get a good deal of atomicity, and fall back to blocking locks when needed.
For atomically setting accessing variables or objects, you can leverage non-blocking locks, falling back to traditional locks.
However, the simplest step forward from where you are in your code is to use synchronized but not with the implicit this object, but with several different member objects, one per partition of members that need atomic access: synchronized(partition_2) { /* ... */ }, synchronized(partition_1) { /* ... */ }, etc. where you have members private Object partition1;, private Object partition2; etc.
However, if the members cannot be partitioned, then each operation must acquire more than one lock. If so, use the Lock object linked earlier, but make sure that all operation acquires the locks it needs in some universal order, otherwise your code might deadlock.
Update: Perhaps it is genuinely not possible to increase the performance if even volatile presents an unacceptable hit to performance. The fundamental underlying aspect, which you cannot work around, is that mutual exclusion necessarily implies a tradeoff with the substantial benefits of a memory hierarchy, i. e. caches. The fastest per-processor-core memory cache cannot hold variables that you are synchronizing. Processor registers are arguably the fastest "cache" and even if the processor is sophisticated enough to keep the closest caches consistent, it still precludes keeping values in registers. Hopefully this helps you see that it is a fundamental block to performance and there is no magic wand.
In case of mobile platforms, the platform is deliberately designed against letting arbitrary apps run as fast as possible, because of battery life concerns. It is not a priority to let any one app exhaust battery in a couple of hours.
Given the first factor, the best thing to do would be redesign your app so that it doesn't need as much mutual exclusion -- consider tracking x-pos inconsistently except if two objects come close to each other say within a 10x10 box. So you have locking on a coarse grid of 10x10 boxes and as long an object is within it you track position inconsistently. Not sure if that applies or makes sense for your app, but it is just an example to convey the spirit of an algorithm redesign rather than search for a faster synchronization method.

I don't think that I get exactly what you mean, but generally
Is there a better solution for concurrency?
Yes, there is:
prefer Java Lock API over the intrinsic built-in lock.
think of using non-blocking constructs provided in atomic API such as AtomicInteger for better performance.

I think synchronization or any kind of locking can be avoided here with using an immutable object for inter-thread communication. Let's say the message to be sent looks like this:
public final class ImmutableMessage {
private final int xPos;
// ... other fields with adhering the rules of immutability
public ImmutableObject(int xPos /* arguments */) { ... }
public int getXPos() { return xPos; }
}
Then somewhere in the writer thread:
sharedObject.message = new ImmutableMessage(1);
The reader thread:
ImmutableMessage message = sharedObject.message;
int xPos = message.getXPos();
The shared object (public field for the shake of simplicity):
public class SharedObject {
public volatile ImmutableMessage message;
}
I guess things change rapidly in a real-time game engine which might end up creating a lot of ImmutableMessage object which in the end may degrade the performance, but may be it is balanced by the non-locking nature of this solution.
Finally, if you have one free hour for this topic, I think it's worth to watch this video about the Java Memory Model by Angelika Langer.

Related

How volatile keyword works?

I'm reading about volatile keyword. After reading about volatile keyword, Going through below example for more understanding.
public class TaskRunner {
private static int number;
private static boolean ready;
private static class Reader extends Thread {
#Override
public void run() {
while (!ready) {
Thread.yield();
}
System.out.println(number);
}
}
public static void main(String[] args) {
new Reader().start();
number = 42;
ready = true;
}
}
What I've understood that, In Java application multiple threads can access shared data structure at any point of time. Some writes to it first time, some updates and some reads them and so on.
So while these happenings, every thread access the shared data structure's value from main memory only. But some times thread's operated value on shared data structure remains in its cache until the OS dont put it in main memory. So in that duration if any other thread access the shared data structure, will not get the updated value, which is updated by last thread and still in its cache.
Volatile is used, once shared data structure value is changed, should be moved to main memory first before it get accessed by any other thread. Is it correct understanding ?
What's scenario where thread still not getting updated value even after using volatile ?

But some times thread's operated value on shared data structure remains in its cache until the OS dont put it in main memory. So in that duration if any other thread access the shared data structure, will not get the updated value, which is updated by last thread and still in its cache.
It is not the OS. There is a CPU instruction that is used by JVM to reset CPU cache. Honestly speaking, this claim is incorrect too, because Java Memory Model tells nothing about such instructions. This is one of the ways to implement volatile behaviour.
does volatile keword in java really have to do with caches?

Java is rather high-level: As a language it is not designed for any particular CPU design. Furthermore, java compiles to bytecode, which is an intermediate product: Java does not offer, nor does it have the goal of, letting you write low-level CPU-architecture-specific operations.
And yet, caches are a low-level CPU architecture specific concept. Sure, every modern CPU has them, pretty much, but who knows what happens in 20 years?
So putting volatile in terms of what it does to CPU caches is skipping some steps.
volatile has an effect on your java code. That effect is currently implemented on most VMs I know of by sending the CPU some instructions about flushing caches.
It's better to deal with volatile at the java level itself, not at the 'well most VMs implement it like this' level - after all, that can change.
The way java is set up is essentially as follows:
If there are no comes-before relationships established between any 2 lines of code anywhere in java, then you should assume that java is like schroedinger's cat: Every thread both has and does not have a local cached copy of every field on every object loaded in the entire VM, and whenever you either write or read anything, the universe flips a coin, uses that to determine if you get the copy or not, and will always flip it to mess with you. During tests on your own machine, the coin flips to make the tests pass. During production on crunch weekend when millions of dollars are on the line, it flips to make your code fail.
The only way out is to ensure your code doesn't depend on the coin flip.
The way to do that is to use the comes-before rules, which you can review in the Java Memory Model.
volatile is one way to add them.
In the above code, without the volatile, the Reader thread may always use its local copy of ready, and thus will NEVER be ready, even if it has been many hours since your main set ready to true. In practice that's unlikely but the JMM says that a VM is allowed to coin flip here: It may have your Reader thread continue almost immediately, it may hold it up for an hour, it may hold it up forever. All legal - this code is broken, its behaviour depends on the coin flip which is bad.
Once you introduce volatile, though, you establish a comes-before relationship and now you're guaranteeing that Reader continues. Effectively, volatile both disables coinflips on the variable so marked and also established comes-before once reads/writes matter:
IF a thread observes an updated value in a volatile variable, THEN all lines that ran before whatever thread's code updated that variables have a comes-before relationship with all lines that will ran after the code in the thread that read the update.
So, to be clear:
Without any volatile marks here, it is legal for a VM to let Reader hang forever. It is also legal for a VM to let reader continue (to let it observe that ready is now true, whilst Reader STILL sees that number is 0 (and not 42), even after it passed the ready check! - but it doesn't have to, it is also allowed for the VM to have reader never pass the ready check, or to have it pass ready check and observe 42. A VM is free to do it however it wants; whatever seems fastest for this particular mix of CPU, architecture and phase of the moon right now.
With volatile, reader WILL end up continuing sooner rather than later, and once it has done so, it will definitely observe 42. But if you swap ready = true; and number = 42; that guarantee is no longer granted.

Emulating a memory barrier in Java to get rid of volatile reads

Assume I have a field that's accessed concurrently and it's read many times and seldom written to.
public Object myRef = new Object();
Let's say a Thread T1 will be setting myRef to another value, once a minute, while N other Threads will be reading myRef billions of times continuously and concurrently. I only need that myRef is eventually visible to all threads.
A simple solution would be to use an AtomicReference or simply volatile like this:
public volatile Object myRef = new Object();
However, afaik volatile reads do incur a performance cost. I know it's minuscule, this is more like something I wonder rather than something I actually need. So let's not be concerned with performance and assume this a purely theoretical question.
So the question boils down to: Is there way to safely bypass volatile reads for references that are only seldom written to, by doing something at the write site?
After some reading, it looks like memory barriers could be what I need. So if a construct like this existed, my problem would be solved:
Write
Invoke Barrier (sync)
Everything is synced and all threads will see the new value. (without a permanent cost at read sites, it can be stale or incur a one time cost as the caches are synced, but after that it's all back to regular field gets till next write).
Is there such a construct in Java, or in general? At this point I can't help but think if something like this existed, it would have been already incorporated into the atomic packages by the much smarter people maintaining those. (Disproportionately frequent read vs write might not have been a case to care for?) So maybe there is something wrong in my thinking and such a construct is not possible at all?
I have seen some code samples use 'volatile' for a similar purpose, exploiting it's happen-before contract. There is a separate sync field e.g.:
public Object myRef = new Object();
public volatile int sync = 0;
and at writing thread/site:
myRef = new Object();
sync += 1 //volatile write to emulate barrier
I am not sure this works, and some argue this works on x86 architecture only. After reading related sections in JMS, I think it's only guaranteed to work if that volatile write is coupled with a volatile read from the threads who need to see the new value of myRef. (So doesn't get rid of the volatile read).
Returning to my original question; is this possible at all? Is it possible in Java? Is it possible in one of the new APIs in Java 9 VarHandles?

So basically you want the semantics of a volatile without the runtime cost.
I don't think it is possible.
The problem is that the runtime cost of volatile is due the instructions that implement the memory barriers in the writer and the reader code. If you "optimize" the reader by getting rid of its memory barrier, then you are no longer guaranteed that the reader will see the "seldomly written" new value when it is actually written.
FWIW, some versions of the sun.misc.Unsafe class provide explicit loadFence, storeFence and fullFence methods, but I don't think that using them will give any performance benefit over using a volatile.
Hypothetically ...
what you want is for one processor in a multi-processor system to be able to tell all of the other processors:
"Hey! Whatever you are doing, invalidate your memory cache for address XYZ, and do it now."
Unfortunately, modern ISAs don't support this.
In practice, each processor controls its own cache.

Not quite sure if this is correct but I might solve this using a queue.
Create a class that wraps an ArrayBlockingQueue attribute. The class has an update method and a read method. The update method posts the new value onto the queue and removes all values except the last value. The read method returns the result of a peek operation on the queue, i.e. read but do not remove. Threads peeking the element at the front of the queue do so unimpeded. Threads updating the queue do so cleanly.

You can use ReentrantReadWriteLock which is designed for few writes many reads scenario.
You can use StampedLock which is designed for the same case of few writes many reads, but also reads can be attempted optimistically. Example:
private StampedLock lock = new StampedLock();
public void modify() { // write method
long stamp = lock.writeLock();
try {
modifyStateHere();
} finally {
lock.unlockWrite(stamp);
}
}
public Object read() { // read method
long stamp = lock.tryOptimisticRead();
Object result = doRead(); //try without lock, method should be fast
if (!lock.validate(stamp)) { //optimistic read failed
stamp = lock.readLock(); //acquire read lock and repeat read
try {
result = doRead();
} finally {
lock.unlockRead(stamp);
}
}
return result;
}
Make your state immutable and allow controlled modifications only by cloning the existing object and altering only necessary properties via constructor. Once the new state is constructed, you assign it to the reference being read by the many reading threads. This way reading threads incur zero cost.

X86 provides TSO; you get [LoadLoad][LoadStore][StoreStore] fences for free.
A volatile read requires release semantics.
r1=Y
[LoadLoad]
[LoadStore]
...
And as you can see, this is already provided by the X86 for free.
In your case most of the calls are a read and the cacheline will already be in the local cache.
There is a price to pay on compiler level optimizations, but on a hardware level, a volatile read is just as expensive as a regular read.
On the other hand the volatile write is more expensive because it requires a [StoreLoad] to guarantee sequential consistency (in the JVM this is done using a lock addl %(rsp),0 or an MFENCE). Since writes are very seldom in your situation, this isn't an issue.
I would be careful with optimizations on this level because it is very easy to make the code more complex than is actually needed. Best to guide your development efforts by some benchmarks e.g. using JMH and preferably test it on real hardware. Also there could be other nasty creatures hidden like false sharing.

Under what conditions will writes to non-volatile variables be unseen by other threads? Can I force such conditions for experimental purposes?

I've recently been reading a lot here on SO and elsewhere about threaded memory management, in particular, the use of the volatile keyword. I'm beginning to feel reasonably confident with the concept, however, in order to full appreciate the effect it has I would like to try and run some experiments which illustrate it.
Here is my setup: I have a producer thread (it reads audio data from the microphone, related to my previous question, but the actual data doesn't matter) which passes on data as byte[] to a separate consumer thread. The way in which the data is shared between threads is the primary variable in my experiment: I have tried an ArrayBlockingQueue; I have tried a shared volatile byte[] reference (with a array = array self-reference as recommended in this blog post); and I have also tried a normal non-volatile byte[] with no self reference. Both threads also write the data to disk as they go along.
My hope was to find that, after running for some length of time, the non-volatile byte[] version would have discrepancies between the data that the producer attempted to share and the data that the consumer read data due to some memory writes not being visible in time, while the other two versions would have exactly the same data logged by each thread because of the precautions taken to ensure publication of memory writes. As it happens however, I find 100% accuracy whatever method I use.
I can already think of a few possibilities as to why this occurred, but my main question is: under what conditions are writes to a non-volatile variable unseen to another thread, which as far as I understand is the whole point of volatile? And can I force these conditions for experimental purposes?
My thoughts so far are:
Maybe the two threads are running on the same core and share the same cache, so memory writes are visible immediately?
Maybe CPU load is a factor? Perhaps I need many threads all doing different things before I see any problem?
Maybe I need to wait longer: perhaps such problems are very rare?
Could anyone either suggest how I could design such an experiment or explain why my idea is flawed?
Many thanks.

You won't be able to easily observe the effects of a lack of barriers in your code on an x86 because it has a fairly strong memory model. But that does not mean that the same code would not break on a different architecture. On x86, you generally need to play with the JIT compiler and help it make an optimisation that would not be allowed with a volatile variable, for example variable hoisting.
The code below, on my machine with hotspot 7u25 server, never ends if the variable is non-volatile but stops promptly if it is. You might need to change the sleep delay depending on your machine.
public class Test {
static /* volatile */ boolean done = false;
public static void main(String[] args) throws Exception {
Runnable waiter = new Runnable() {
#Override public void run() {
while(!done);
System.out.println("Exited loop");
}
};
new Thread(waiter).start();
Thread.sleep(100); //wait for JIT compilation
done = true;
System.out.println("done is true");
}
}

In Java can I depend on reference assignment being atomic to implement copy on write?

If I have an unsynchronized java collection in a multithreaded environment, and I don't want to force readers of the collection to synchronize[1], is a solution where I synchronize the writers and use the atomicity of reference assignment feasible? Something like:
private Collection global = new HashSet(); // start threading after this
void allUpdatesGoThroughHere(Object exampleOperand) {
// My hypothesis is that this prevents operations in the block being re-ordered
synchronized(global) {
Collection copy = new HashSet(global);
copy.remove(exampleOperand);
// Given my hypothesis, we should have a fully constructed object here. So a
// reader will either get the old or the new Collection, but never an
// inconsistent one.
global = copy;
}
}
// Do multithreaded reads here. All reads are done through a reference copy like:
// Collection copy = global;
// for (Object elm: copy) {...
// so the global reference being updated half way through should have no impact
Rolling your own solution seems to often fail in these type of situations, so I'd be interested in knowing other patterns, collections or libraries I could use to prevent object creation and blocking for my data consumers.
[1] The reasons being a large proportion of time spent in reads compared to writes, combined with the risk of introducing deadlocks.
Edit: A lot of good information in several of the answers and comments, some important points:
A bug was present in the code I posted. Synchronizing on global (a badly named variable) can fail to protect the syncronized block after a swap.
You could fix this by synchronizing on the class (moving the synchronized keyword to the method), but there may be other bugs. A safer and more maintainable solution is to use something from java.util.concurrent.
There is no "eventual consistency guarantee" in the code I posted, one way to make sure that readers do get to see the updates by writers is to use the volatile keyword.
On reflection the general problem that motivated this question was trying to implement lock free reads with locked writes in java, however my (solved) problem was with a collection, which may be unnecessarily confusing for future readers. So in case it is not obvious the code I posted works by allowing one writer at a time to perform edits to "some object" that is being read unprotected by multiple reader threads. Commits of the edit are done through an atomic operation so readers can only get the pre-edit or post-edit "object". When/if the reader thread gets the update, it cannot occur in the middle of a read as the read is occurring on the old copy of the "object". A simple solution that had probably been discovered and proved to be broken in some way prior to the availability of better concurrency support in java.

Rather than trying to roll out your own solution, why not use a ConcurrentHashMap as your set and just set all the values to some standard value? (A constant like Boolean.TRUE would work well.)
I think this implementation works well with the many-readers-few-writers scenario. There's even a constructor that lets you set the expected "concurrency level".
Update: Veer has suggested using the Collections.newSetFromMap utility method to turn the ConcurrentHashMap into a Set. Since the method takes a Map<E,Boolean> my guess is that it does the same thing with setting all the values to Boolean.TRUE behind-the-scenes.
Update: Addressing the poster's example
That is probably what I will end up going with, but I am still curious about how my minimalist solution could fail. – MilesHampson
Your minimalist solution would work just fine with a bit of tweaking. My worry is that, although it's minimal now, it might get more complicated in the future. It's hard to remember all of the conditions you assume when making something thread-safe—especially if you're coming back to the code weeks/months/years later to make a seemingly insignificant tweak. If the ConcurrentHashMap does everything you need with sufficient performance then why not use that instead? All the nasty concurrency details are encapsulated away and even 6-months-from-now you will have a hard time messing it up!
You do need at least one tweak before your current solution will work. As has already been pointed out, you should probably add the volatile modifier to global's declaration. I don't know if you have a C/C++ background, but I was very surprised when I learned that the semantics of volatile in Java are actually much more complicated than in C. If you're planning on doing a lot of concurrent programming in Java then it'd be a good idea to familiarize yourself with the basics of the Java memory model. If you don't make the reference to global a volatile reference then it's possible that no thread will ever see any changes to the value of global until they try to update it, at which point entering the synchronized block will flush the local cache and get the updated reference value.
However, even with the addition of volatile there's still a huge problem. Here's a problem scenario with two threads:
We begin with the empty set, or global={}. Threads A and B both have this value in their thread-local cached memory.
Thread A obtains obtains the synchronized lock on global and starts the update by making a copy of global and adding the new key to the set.
While Thread A is still inside the synchronized block, Thread B reads its local value of global onto the stack and tries to enter the synchronized block. Since Thread A is currently inside the monitor Thread B blocks.
Thread A completes the update by setting the reference and exiting the monitor, resulting in global={1}.
Thread B is now able to enter the monitor and makes a copy of the global={1} set.
Thread A decides to make another update, reads in its local global reference and tries to enter the synchronized block. Since Thread B currently holds the lock on {} there is no lock on {1} and Thread A successfully enters the monitor!
Thread A also makes a copy of {1} for purposes of updating.
Now Threads A and B are both inside the synchronized block and they have identical copies of the global={1} set. This means that one of their updates will be lost! This situation is caused by the fact that you're synchronizing on an object stored in a reference that you're updating inside your synchronized block. You should always be very careful which objects you use to synchronize. You can fix this problem by adding a new variable to act as the lock:
private volatile Collection global = new HashSet(); // start threading after this
private final Object globalLock = new Object(); // final reference used for synchronization
void allUpdatesGoThroughHere(Object exampleOperand) {
// My hypothesis is that this prevents operations in the block being re-ordered
synchronized(globalLock) {
Collection copy = new HashSet(global);
copy.remove(exampleOperand);
// Given my hypothesis, we should have a fully constructed object here. So a
// reader will either get the old or the new Collection, but never an
// inconsistent one.
global = copy;
}
}
This bug was insidious enough that none of the other answers have addressed it yet. It's these kinds of crazy concurrency details that cause me to recommend using something from the already-debugged java.util.concurrent library rather than trying to put something together yourself. I think the above solution would work—but how easy would it be to screw it up again? This would be so much easier:
private final Set<Object> global = Collections.newSetFromMap(new ConcurrentHashMap<Object,Boolean>());
Since the reference is final you don't need to worry about threads using stale references, and since the ConcurrentHashMap handles all the nasty memory model issues internally you don't have to worry about all the nasty details of monitors and memory barriers!

According to the relevant Java Tutorial,
We have already seen that an increment expression, such as c++, does not describe an atomic action. Even very simple expressions can define complex actions that can decompose into other actions. However, there are actions you can specify that are atomic:
Reads and writes are atomic for reference variables and for most primitive variables (all types except long and double).
Reads and writes are atomic for all variables declared volatile (including long and double variables).
This is reaffirmed by Section §17.7 of the Java Language Specification
Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values.
It appears that you can indeed rely on reference access being atomic; however, recognize that this does not ensure that all readers will read an updated value for global after this write -- i.e. there is no memory ordering guarantee here.
If you use an implicit lock via synchronized on all access to global, then you can forge some memory consistency here... but it might be better to use an alternative approach.
You also appear to want the collection in global to remain immutable... luckily, there is Collections.unmodifiableSet which you can use to enforce this. As an example, you should likely do something like the following...
private volatile Collection global = Collections.unmodifiableSet(new HashSet());
... that, or using AtomicReference,
private AtomicReference<Collection> global = new AtomicReference<>(Collections.unmodifiableSet(new HashSet()));
You would then use Collections.unmodifiableSet for your modified copies as well.
// ... All reads are done through a reference copy like:
// Collection copy = global;
// for (Object elm: copy) {...
// so the global reference being updated half way through should have no impact
You should know that making a copy here is redundant, as internally for (Object elm : global) creates an Iterator as follows...
final Iterator it = global.iterator();
while (it.hasNext()) {
Object elm = it.next();
}
There is therefore no chance of switching to an entirely different value for global in the midst of reading.
All that aside, I agree with the sentiment expressed by DaoWen... is there any reason you're rolling your own data structure here when there may be an alternative available in java.util.concurrent? I figured maybe you're dealing with an older Java, since you use raw types, but it won't hurt to ask.
You can find copy-on-write collection semantics provided by CopyOnWriteArrayList, or its cousin CopyOnWriteArraySet (which implements a Set using the former).
Also suggested by DaoWen, have you considered using a ConcurrentHashMap? They guarantee that using a for loop as you've done in your example will be consistent.
Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.
Internally, an Iterator is used for enhanced for over an Iterable.
You can craft a Set from this by utilizing Collections.newSetFromMap like follows:
final Set<E> safeSet = Collections.newSetFromMap(new ConcurrentHashMap<E, Boolean>());
...
/* guaranteed to reflect the state of the set at read-time */
for (final E elem : safeSet) {
...
}

I think your original idea was sound, and DaoWen did a good job getting the bugs out. Unless you can find something that does everything for you, it's better to understand these things than hope some magical class will do it for you. Magical classes can make your life easier and reduce the number of mistakes, but you do want to understand what they are doing.
ConcurrentSkipListSet might do a better job for you here. It could get rid of all your multithreading problems.
However, it is slower than a HashSet (usually--HashSets and SkipLists/Trees hard to compare). If you are doing a lot of reads for every write, what you've got will be faster. More importantly, if you update more than one entry at a time, your reads could see inconsistent results. If you expect that whenever there is an entry A there is an entry B, and vice versa, the skip list could give you one without the other.
With your current solution, to the readers, the contents of the map are always internally consistent. A read can be sure there's an A for every B. It can be sure that the size() method gives the precise number of elements that will be returned by the iterator. Two iterations will return the same elements in the same order.
In other words, allUpdatesGoThroughHere and ConcurrentSkipListSet are two good solutions to two different problems.

Can you use the Collections.synchronizedSet method? From HashSet Javadoc http://docs.oracle.com/javase/6/docs/api/java/util/HashSet.html
Set s = Collections.synchronizedSet(new HashSet(...));

Replace the synchronized by making global volatile and you'll be alright as far as the copy-on-write goes.
Although the assignment is atomic, in other threads it is not ordered with the writes to the object referenced. There needs to be a happens-before relationship which you get with a volatile or synchronising both reads and writes.
The problem of multiple updates happening at once is separate - use a single thread or whatever you want to do there.
If you used a synchronized for both reads and writes then it'd be correct but the performance may not be great with reads needing to hand-off. A ReadWriteLock may be appropriate, but you'd still have writes blocking reads.
Another approach to the publication issue is to use final field semantics to create an object that is (in theory) safe to be published unsafely.
Of course, there are also concurrent collections available.

Why does unsynchronization make ArrayList faster and less secure?

I read the following statement:
ArrayLists are unsynchronized and therefore faster than Vector, but less secure in a multithreaded environment.
I would like to know why unsynchronization can improve the speed, and why it will be less secure?

I will try to address both of your questions:
Improve speed
If the ArrayList were synchronized and multiple threads were trying to read data out of the list at the same time, the threads would have to wait to get an exclusive lock on the list. By leaving the list unsynchronized, the threads don't have to wait and the program will run faster.
Unsafe
If multiple threads are reading and writing to a list at the same time, the threads can have unstable view of the list, and this can cause instability in multi-threaded programs.

The whole point of synchronization is that it means only one thread has access to an object at any given time. Take a box of chocolates as an example. If the box is synchronized (Vector), and you get there first, no one else can take any and you get your pick. If the box is NOT synchronized (ArrayList), anyone walking by can snag a chocolate - It will disappear faster, but you may not get the ones you want.

ArrayLists are unsynchronized and
therefore faster than Vector, but less
secure in a multithreaded environment.
I would like to know why
unsynchronization can improve the
speed,and why it will be less secure?
When multiple threads are reading/writing to a shared memory location, the program might compute incorrect results due to lack of mutual exclusion and proper visibility. Hence lack of synchronization is considered "unsafe". This blog post by Jeremy Manson might provide a good introduction to the topic.
When the JVM executes a synchronized method, it makes sure that the current thread has an exclusive lock on the object on which the method is invoked. Similarly when the method finishes execution, the JVM releases the lock held by the executing thread. Synchronized methods provide mutual exclusion and visibility guarantees - and is important for "safety" (i.e. guaranteeing correctness) of the executing code. But, if only one thread is ever accessing the methods of the object, there is no safety issues to worry about. Although the JVM performance has improved over the years, uncontended synchronization (i.e. locking/unlocking of objects accessed by only one thread) still takes non-zero amount of time. For unsynchronized methods, the JVM does not pay this extra penalty - hence they are faster than their synchronized counterparts.
Vectors force their choice on you. All methods are synchronized and it is difficult to use them incorrectly. But when Vectors are used in a single-threaded context, you pay the price for the extra synchronization unnecessarily. ArrayLists leave the choice to you. When used in the multi-threaded context, it is up to you (the programmer) to correctly synchronizing the code; but when used in a single-threaded context you are guaranteed not to pay any extra synchronization overhead.
Also, when an collection is populated initially, and read subsequently ArrayLists perform better even in a multi-threaded context. For example, consider this method:
public synchronized List<String> getList() {
List<String> list = new Vector<String>();
list.add("Foo");
list.add("Bar");
return Collections.unmodifiableList(list);
}
A list is created, populated, and an immutable view of it is safely published. Looking at the code above it is clear that all subsequent uses of this list are reads and won't need any synchronization even when used by multiple threads - the object is effectively immutable. Using a Vector here incurs the synchronization overhead even for reads where it is not needed; using an ArrayList instead would perform better.

Data structures that synchronize use locks (or other synchronization constructs) to ensure that their data is always in a consistent state. Oftentimes, this requires that one or more threads wait on another thread to finish updating the structure's state, which will then reduce performance, since a wait has been introduced where before there was none.

2 threads can modify the list at the same time and add a new item or delete/modify the same item in the list at the same time because no synchronization (or lock mechanism if you prefer) exists. So imagine you delete one item of the list while somebody else is trying to work with it or you modify an item while someone uses it, it's not very secure.
http://download.oracle.com/javase/1.4.2/docs/api/java/util/ArrayList.html
Read the "Note that this implementation is not synchronized." paragraph, it explains a bit better.
And I forgot, considering speed, it seems quite trivial to imagine that when you try to control the access to a data, you add some mechanisms that prevent other people from accessing your data. Thus, you add some more computations so it is slower...

Non-blocking data structures will be faster than ones that bock, because of that fact. With blocking data structures, if a resources is acquired by some entity it will take time for another entity to acquire that same resource, once it becomes available.
However, this can be less secure in some instances depending on the situation. The main points of contention are during writes. If it can be guaranteed that the data contained in a data structure will not change it has been added and will only be accessed to read the value than there will not be a problem. The issues arise when there is a conflict between a write and a read, or a write and a write.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.