Iterator versus Stream of Java 8

Iterator versus Stream of Java 8 - java

To take advantage of the wide range of query methods included in java.util.stream of Jdk 8 I am attempted to design domain models where getters of relationship with * multiplicity (with zero or more instances ) return a Stream<T>, instead of an Iterable<T> or Iterator<T>.
My doubt is if there is any additional overhead incurred by the Stream<T> in comparison to the Iterator<T>?
So, is there any disadvantage of compromising my domain model with a Stream<T>?
Or instead, should I always return an Iterator<T> or Iterable<T>, and leave to the end-user the decision of choosing whether to use a stream, or not, by converting that iterator with the StreamUtils?
Note that returning a Collection is not a valid option because in this case most of the relationships are lazy and with unknown size.

There's lots of performance advice here, but sadly much of it is guesswork, and little of it points to the real performance considerations.
#Holger gets it right by pointing out that we should resist the seemingly overwhelming tendency to let the performance tail wag the API design dog.
While there are a zillion considerations that can make a stream slower than, the same as, or faster than some other form of traversal in any given case, there are some factors that point to streams haven a performance advantage where it counts -- on big data sets.
There is some additional fixed startup overhead of creating a Stream compared to creating an Iterator -- a few more objects before you start calculating. If your data set is large, it doesn't matter; it's a small startup cost amortized over a lot of computation. (And if your data set is small, it probably also doesn't matter -- because if your program is operating on small data sets, performance is generally not your #1 concern either.) Where this does matter is when going parallel; any time spent setting up the pipeline goes into the serial fraction of Amdahl's law; if you look at the implementation, we work hard to keep the object count down during stream setup, but I'd be happy to find ways to reduce it as that has a direct effect on the breakeven data set size where parallel starts to win over sequential.
But, more important than the fixed startup cost is the per-element access cost. Here, streams actually win -- and often win big -- which some may find surprising. (In our performance tests, we routinely see stream pipelines which can outperform their for-loop over Collection counterparts.) And, there's a simple explanation for this: Spliterator has fundamentally lower per-element access costs than Iterator, even sequentially. There are several reasons for this.
The Iterator protocol is fundamentally less efficient. It requires calling two methods to get each element. Further, because Iterators must be robust to things like calling next() without hasNext(), or hasNext() multiple times without next(), both of these methods generally have to do some defensive coding (and generally more statefulness and branching), which adds to inefficiency. On the other hand, even the slow way to traverse a spliterator (tryAdvance) doesn't have this burden. (It's even worse for concurrent data structures, because the next/hasNext duality is fundamentally racy, and Iterator implementations have to do more work to defend against concurrent modifications than do Spliterator implementations.)
Spliterator further offers a "fast-path" iteration -- forEachRemaining -- which can be used most of the time (reduction, forEach), further reducing the overhead of the iteration code that mediates access to the data structure internals. This also tends to inline very well, which in turn increases the effectiveness of other optimizations such as code motion, bounds check elimination, etc.
Further, traversal via Spliterator tend to have many fewer heap writes than with Iterator. With Iterator, every element causes one or more heap writes (unless the Iterator can be scalarized via escape analysis and its fields hoisted into registers.) Among other issues, this causes GC card mark activity, leading to cache line contention for the card marks. On the other hand, Spliterators tend to have less state, and industrial-strength forEachRemaining implementations tend to defer writing anything to the heap until the end of the traversal, instead storing its iteration state in locals which naturally map to registers, resulting in reduced memory bus activity.
Summary: don't worry, be happy. Spliterator is a better Iterator, even without parallelism. (They're also generally just easier to write and harder to get wrong.)

Let’s compare the common operation of iterating over all elements, assuming that the source is an ArrayList. Then, there are three standard ways to achieve this:
Collection.forEach
final E[] elementData = (E[]) this.elementData;
final int size = this.size;
for (int i=0; modCount == expectedModCount && i < size; i++) {
action.accept(elementData[i]);
}
Iterator.forEachRemaining
final Object[] elementData = ArrayList.this.elementData;
if (i >= elementData.length) {
throw new ConcurrentModificationException();
}
while (i != size && modCount == expectedModCount) {
consumer.accept((E) elementData[i++]);
}
Stream.forEach which will end up calling Spliterator.forEachRemaining
if ((i = index) >= 0 && (index = hi) <= a.length) {
for (; i < hi; ++i) {
#SuppressWarnings("unchecked") E e = (E) a[i];
action.accept(e);
}
if (lst.modCount == mc)
return;
}
As you can see, the inner loop of the implementation code, where these operations end up, is basically the same, iterating over indices and directly reading the array and passing the element to the Consumer.
Similar things apply to all standard collections of the JRE, all of them have adapted implementations for all ways to do it, even if you are using a read-only wrapper. In the latter case, the Stream API would even slightly win, Collection.forEach has to be called on the read-only view in order to delegate to the original collection’s forEach. Similarly, the iterator has to be wrapped to protect against attempts to invoke the remove() method. In contrast, spliterator() can directly return the original collection’s Spliterator as it has no modification support. Thus, the stream of a read-only view is exactly the same as the stream of the original collection.
Though all these differences are hardly to notice when measuring real life performance as, as said, the inner loop, which is the most performance relevant thing, is the same in all cases.
The question is which conclusion to draw from that. You still can return a read-only wrapper view to the original collection, as the caller still may invoke stream().forEach(…) to directly iterate in the context of the original collection.
Since the performance isn’t really different, you should rather focus on the higher level design like discussed in “Should I return a Collection or a Stream?”

Related

CopyOnWriteArraySet vs ConcurrentHashMap-backed Set [duplicate]

There seems to be a lot of different implementations and ways to generate thread-safe Sets in Java.
Some examples include
1) CopyOnWriteArraySet
2) Collections.synchronizedSet(Set set)
3) ConcurrentSkipListSet
4) Collections.newSetFromMap(new ConcurrentHashMap())
5) Other Sets generated in a way similar to (4)
These examples come from Concurrency Pattern: Concurrent Set implementations in Java 6
Could someone please simply explain the differences, advantages, and disadvantage of these examples and others? I'm having trouble understanding and keeping straight everything from the Java Std Docs.

The CopyOnWriteArraySet is a quite simple implementation - it basically has a list of elements in an array, and when changing the list, it copies the array. Iterations and other accesses which are running at this time continue with the old array, avoiding necessity of synchronization between readers and writers (though writing itself needs to be synchronized). The normally fast set operations (especially contains()) are quite slow here, as the arrays will be searched in linear time.
Use this only for really small sets which will be read (iterated) often and changed seldom. (Swing's listener-sets would be an example, but these are not really sets, and should be only used from the EDT anyway.)
Collections.synchronizedSet will simply wrap a synchronized-block around each method of the original set. You should not access the original set directly. This means that no two methods of the set can be executed concurrently (one will block until the other finishes) - this is thread-safe, but you will not have concurrency if multiple threads are using the set. If you use the iterator, you usually still need to synchronize externally to avoid ConcurrentModificationExceptions when modifying the set between iterator calls. The performance will be like the performance of the original set (but with some synchronization overhead, and blocking if used concurrently).
Use this if you only have low concurrency, and want to be sure all changes are immediately visible to the other threads.
ConcurrentSkipListSet is the concurrent SortedSet implementation, with most basic operations in O(log n). It allows concurrent adding/removing and reading/iteration, where iteration may or may not tell about changes since the iterator was created. The bulk operations are simply multiple single calls, and not done atomically – other threads may observe only some of them.
Obviously you can use this only if you have some total order on your elements.
This looks like an ideal candidate for high-concurrency situations, for not-too-large sets (because of the O(log n)).
For the ConcurrentHashMap (and the Set derived from it): Here most basic options are (on average, if you have a good and fast hashCode()) in O(1) (but might degenerate to O(n) when many keys have the same hash code), like for HashMap/HashSet. There is a limited concurrency for writing (the table is partitioned, and write access will be synchronized on the needed partition), while read access is fully concurrent to itself and the writing threads (but might not yet see the results of the changes currently being written). The iterator may or may not see changes since it was created, and bulk operations are not atomic.
Resizing is slow (as for HashMap/HashSet), thus you should try to avoid this by estimating the needed size on creation (and using about 1/3 more of that, as it resizes when 3/4 full).
Use this when you have large sets, a good (and fast) hash function and can estimate the set size and needed concurrency before creating the map.
Are there other concurrent map implementations one could use here?

It is possible to combine the contains() performance of HashSet with the concurrency-related properties of CopyOnWriteArraySet by using the AtomicReference<Set> and replacing the whole set on each modification.
The implementation sketch:
public abstract class CopyOnWriteSet<E> implements Set<E> {
private final AtomicReference<Set<E>> ref;
protected CopyOnWriteSet( Collection<? extends E> c ) {
ref = new AtomicReference<Set<E>>( new HashSet<E>( c ) );
}
#Override
public boolean contains( Object o ) {
return ref.get().contains( o );
}
#Override
public boolean add( E e ) {
while ( true ) {
Set<E> current = ref.get();
if ( current.contains( e ) ) {
return false;
}
Set<E> modified = new HashSet<E>( current );
modified.add( e );
if ( ref.compareAndSet( current, modified ) ) {
return true;
}
}
}
#Override
public boolean remove( Object o ) {
while ( true ) {
Set<E> current = ref.get();
if ( !current.contains( o ) ) {
return false;
}
Set<E> modified = new HashSet<E>( current );
modified.remove( o );
if ( ref.compareAndSet( current, modified ) ) {
return true;
}
}
}
}

If the Javadocs don't help, you probably should just find a book or article to read about data structures. At a glance:
CopyOnWriteArraySet makes a new copy of the underlying array every time you mutate the collection, so writes are slow and Iterators are fast and consistent.
Collections.synchronizedSet() uses old-school synchronized method calls to make a Set threadsafe. This would be a low-performing version.
ConcurrentSkipListSet offers performant writes with inconsistent batch operations (addAll, removeAll, etc.) and Iterators.
Collections.newSetFromMap(new ConcurrentHashMap()) has the semantics of ConcurrentHashMap, which I believe isn't necessarily optimized for reads or writes, but like ConcurrentSkipListSet, has inconsistent batch operations.

Concurrent set of weak references
Another twist is a thread-safe set of weak references.
Such a set is handy for tracking subscribers in a pub-sub scenario. When a subscriber is going out of scope in other places, and therefore headed towards becoming a candidate for garbage-collection, the subscriber need not be bothered with gracefully unsubscribing. The weak reference allows the subscriber to complete its transition to being a candidate for garbage-collection. When the garbage is eventually collected, the entry in the set is removed.
While no such set is directly provided with the bundled classes, you can create one with a few calls.
First we start with making a Set of weak references by leveraging the WeakHashMap class. This is shown in the class documentation for Collections.newSetFromMap.
Set< YourClassGoesHere > weakHashSet =
Collections
.newSetFromMap(
new WeakHashMap< YourClassGoesHere , Boolean >()
)
;
The Value of the map, Boolean, is irrelevant here as the Key of the map makes up our Set.
In a scenario such as pub-sub, we need thread-safety if the subscribers and publishers are operating on separate threads (quite likely the case).
Go one step further by wrapping as a synchronized set to make this set thread-safe. Feed into a call to Collections.synchronizedSet.
this.subscribers =
Collections.synchronizedSet(
Collections.newSetFromMap(
new WeakHashMap <>() // Parameterized types `< YourClassGoesHere , Boolean >` are inferred, no need to specify.
)
);
Now we can add and remove subscribers from our resulting Set. And any “disappearing” subscribers will eventually be automatically removed after garbage-collection executes. When this execution happens depends on your JVM’s garbage-collector implementation, and depends on the runtime situation at the moment. For discussion and example of when and how the underlying WeakHashMap clears the expired entries, see this Question, *Is WeakHashMap ever-growing, or does it clear out the garbage keys?
*.

Is there an improved alternative to Java CopyOnWriteArrayList implementation and how can I request a change to Java spec?

CopyOnWriteArrayList almost has the behavior I want, and if unnecessary copies were removed it would be exactly what I am looking for. In particular, it could act exactly like ArrayList for adds made to the end of the ArrayList - i.e., there is no reason to actually make a new copy every single time which is so wasteful. It could just virtually restrict the end of the ArrayList to capture the snapshot for the readers, and update the end after the new items are added.
This enhancement seems like it would be worth having since for many applications the most common type of addition would be to the end of the ArrayList - which is even a reason for choosing to use an ArrayList to begin with.
There also would be no extra overhead since it could only not copy when appending and although it would still have to check if a re-size is necessary ArrayList has to do this anyways.
Is there any alternative implementation or data structure that has this behavior without the unnecessary copies for additions at the end (i.e., thread-safe and optimized to allow frequent reads with writes only being additions at the end of the list)?
How can I submit a change request to request a change to the Java specification to eliminate copies for additions to the end of a CopyOnWriteArrayList (unless a re-size is necessary)?
I'd really liked to see this changed with the core Java libraries rather than maintaining and using my own custom code.

Sounds like you're looking for a BlockingDeque, and in particular an ArrayBlockingQueue.
You may also want a ConcurrentLinkedQueue, which uses a "wait-free" algorithm (aka non-blocking) and may therefore be faster in many circumstances. It's only a Queue (not a Dequeue) and thus you can only insert/remove at the head of the collection, but it sounds like that might be good for your use case. But in exchange for the wait-free algorithm, it has to use a linked list rather than an array internally, and that means more memory (including more garbage when you pop items) and worse memory locality. The wait-free algorithm also relies on a compare and set (CAS) loop, which means that while it's faster in the "normal" case, it can actually be slower under high contention, as each thread needs to try its CAS several times before it wins and is able to move forward.
My guess is that the reason that lists don't get as much love in java.util.concurrent is that a list is an inherently racy data structure in most use cases other iteration. For instance, something like if (!list.isEmpty()) { return list.get(0); } is racy unless it's surrounded by a synchronized block, in which case you don't need an inherently thread-safe structure. What you really need is a "list-type" interface that only allows operations at the ends -- and that's exactly what Queue and Deque are.

To answer your questions:
I'm not aware of an alternative implementation that is a fully functional list.
If your idea is truly viable, I can think of a number of ways to proceed:
You can submit "requests for enhancement" (RFE) through the Java Bugs Database. However, in this case I doubt that you will get a positive response. (Certainly, not a timely one!)
You could create an RFE issue on Guava or Apache Commons issues tracker. This might be more fruitful, though it depends on convincing them ...
You could submit a patch to the OpenJDK team with an implementation of your idea. I can't say what the result might be ...
You could submit a patch (as above) to Guava or Apache Commons via their respective issues trackers. This is the approach that is most likely to succeed, though it still depends on convincing "them" that it is technically sound, and "a good thing".
You could just put the code for your proposed alternative implementation on Github, and see what happens.
However, all of this presupposes that your idea is actually going to work. Based on the scant information you have provided, I'm doubtful. I suspect that there may be issues with incomplete encapsulation, concurrency and/or not implementing the List abstraction fully / correctly.
I suggest that you put your code on Github so that other people can take a good hard look at it.

there is no reason to actually make a new copy every single time which is so wasteful.
This is how it works. It works by replacing the previous array with new array in a compare and swap action. It is a key part of the thread safety design that you always have a new array even if all you do is replace an entry.
thread-safe and optimized to allow frequent reads with writes only being additions at the end of the list
This is heavily optimised for reads, any other solution will be faster for writes, but slower for reads and you have to decide which one you really want.
You can have a custom data structure which will be the best of both worlds, but it not longer a generic solution which is what CopyOnWriteArrayList and ArrayDeque provide.
How can I submit a change request to request a change to the Java specification to eliminate copies for additions to the end of a CopyOnWriteArrayList (unless a re-size is necessary)?
You can do this through the bugs database, but what you propose is a fundamental change in how the data structure works. I suggest proposing a new/different data structure which works the way you want. In the mean time I suggest implementing it yourself as a working example as you will get want you want faster.
I would start with an AtomicReferenceArray as this can be used to perform the low level actions you need. The only problem with it is it is not resizable so you would need to determine the maximum size you would every need.

CopyOnWriteArrayList has a performance drawback because it creates a copy of the underlying array of the list on write operations. The array copying is making the write operations slow. May be, CopyOnWriteArrayList is advantageous for a usage of a List with high read rate and low write rate.
Eventually I started coding my own implementation using the java.util.concurrent.locks,ReadWriteLock. I did my implementation simply by maintaining object level ReadWriteLock instance, and gaining the read lock in the read operations and gaining the write lock in the write operations. The code looks like this.
public class ConcurrentList< T > implements List< T >
{
private final ReadWriteLock readWriteLock = new ReentrantReadWriteLock();
private final List< T > list;
public ConcurrentList( List<T> list )
{
this.list = list;
}
public boolean remove( Object o )
{
readWriteLock.writeLock().lock();
boolean ret;
try
{
ret = list.remove( o );
}
finally
{
readWriteLock.writeLock().unlock();
}
return ret;
}
public boolean add( T t )
{
readWriteLock.writeLock().lock();
boolean ret;
try
{
ret = list.add( t );
}
finally
{
readWriteLock.writeLock().unlock();
}
return ret;
}
public void clear()
{
readWriteLock.writeLock().lock();
try
{
list.clear();
}
finally
{
readWriteLock.writeLock().unlock();
}
}
public int size()
{
readWriteLock.readLock().lock();
try
{
return list.size();
}
finally
{
readWriteLock.readLock().unlock();
}
}
public boolean contains( Object o )
{
readWriteLock.readLock().lock();
try
{
return list.contains( o );
}
finally
{
readWriteLock.readLock().unlock();
}
}
public T get( int index )
{
readWriteLock.readLock().lock();
try
{
return list.get( index );
}
finally
{
readWriteLock.readLock().unlock();
}
}
//etc
}
The performance improvement observed was notable.
Total time taken for 5000 reads + 5000 write ( read write ratio is 1:1) by 10 threads were
ArrayList - 16450 ns( not thread safe)
ConcurrentList - 20999 ns
Vector -35696 ns
CopyOnWriteArrayList - 197032 ns
please follow this link for more info about the test case used for obtaining above results
However, in order to avoid ConcurrentModificationException when using the Iterator, I just created a copy of the current List and returned the iterator of that. This means this list does not return and Iterator which can modify the original List. Well, for me, this is o.k. for the moment.
public Iterator<T> iterator()
{
readWriteLock.readLock().lock();
try
{
return new ArrayList<T>( list ).iterator();
}
finally
{
readWriteLock.readLock().unlock();
}
}
After some googling I found out that CopyOnWriteArrayList has a similar implementaion, as it does not return an Iterator which can modify the original List. Javadoc says,
The returned iterator provides a snapshot of the state of the list when the iterator was constructed. No synchronization is needed while traversing the iterator. The iterator does NOT support the remove method.

Is it a sensible optimization to check whether a variable holds a specific value before writing that value?

if (var != X)
var = X;
Is it sensible or not? Will the compiler always optimize-out the if statement? Are there any use cases that would benefit from the if statement?
What if var is a volatile variable?
I'm interested in both C++ and Java answers as the volatile variables have different semantics in both of the languages. Also the Java's JIT-compiling can make a difference.
The if statement introduces branching and additional read that wouldn't happen if we always overwrote var with X, so it's bad. On the other hand, if var == X then using this optimization we perform only a read and we do not perform a write, which could have some effects on cache. Clearly, there are some trade-offs here. I'd like to know how it looks like in practice. Has anyone done any testing on this?
EDIT:
I'm mostly interested about how it looks like in a multi-processor environment. In a trivial situation there doesn't seem to be much sense in checking the variable first. But when cache coherency has to be kept between processors/cores the extra check might be actually beneficial. I just wonder how big impact can it have? Also shouldn't the processor do such an optimization itself? If var == X assigning it once more value X should not 'dirt-up' the cache. But can we rely on this?

Yes, there are definitely cases where this is sensible, and as you suggest, volatile variables are one of those cases - even for single threaded access!
Volatile writes are expensive, both from a hardware and a compiler/JIT perspective. At the hardware level, these writes might be 10x-100x more expensive than a normal write, since write buffers have to be flushed (on x86, the details will vary by platform). At the compiler/JIT level, volatile writes inhibit many common optimizations.
Speculation, however, can only get you so far - the proof is always in the benchmarking. Here's a microbenchmark that tries your two strategies. The basic idea is to copy values from one array to another (pretty much System.arraycopy), with two variants - one which copies unconditionally, and one that checks to see if the values are different first.
Here are the copy routines for the simple, non-volatile case (full source here):
// no check
for (int i=0; i < ARRAY_LENGTH; i++) {
target[i] = source[i];
}
// check, then set if unequal
for (int i=0; i < ARRAY_LENGTH; i++) {
int x = source[i];
if (target[i] != x) {
target[i] = x;
}
}
The results using the above code to copy an array length of 1000, using Caliper as my microbenchmark harness, are:
benchmark arrayType ns linear runtime
CopyNoCheck SAME 470 =
CopyNoCheck DIFFERENT 460 =
CopyCheck SAME 1378 ===
CopyCheck DIFFERENT 1856 ====
This also includes about 150ns of overhead per run to reset the target array each time. Skipping the check is much faster - about 0.47 ns per element (or around 0.32 ns per element after we remove the setup overhead, so pretty much exactly 1 cycle on my box).
Checking is about 3x slower when the arrays are the same, and 4x slower then they are different. I'm surprised at how bad the check is, given that it is perfectly predicted. I suspect that the culprit is largely the JIT - with a much more complex loop body, it may be unrolled fewer times, and other optimizations may not apply.
Let's switch to the volatile case. Here, I've used AtomicIntegerArray as my arrays of volatile elements, since Java doesn't have any native array types with volatile elements. Internally, this class is just writing straight through to the array using sun.misc.Unsafe, which allows volatile writes. The assembly generated is substantially similar to normal array access, other than the volatile aspect (and possibly range check elimination, which may not be effective in the AIA case).
Here's the code:
// no check
for (int i=0; i < ARRAY_LENGTH; i++) {
target.set(i, source[i]);
}
// check, then set if unequal
for (int i=0; i < ARRAY_LENGTH; i++) {
int x = source[i];
if (target.get(i) != x) {
target.set(i, x);
}
}
And here are the results:
arrayType benchmark us linear runtime
SAME CopyCheckAI 2.85 =======
SAME CopyNoCheckAI 10.21 ===========================
DIFFERENT CopyCheckAI 11.33 ==============================
DIFFERENT CopyNoCheckAI 11.19 =============================
The tables have turned. Checking first is ~3.5x faster than the usual method. Everything is much slower overall - in the check case, we are paying ~3 ns per loop, and in the worst cases ~10 ns (the times above are in us, and cover the copy of the whole 1000 element array). Volatile writes really are more expensive. There is about 1 ns of overheaded included in the DIFFERENT case to reset the array on each iteration (which is why even the simple is slightly slower for DIFFERENT). I suspect a lot of the overhead in the "check" case is actually bounds checking.
This is all single threaded. If you actual had cross-core contention over a volatile, the results would be much, much worse for the simple method, and just about as good as the above for the check case (the cache line would just sit in the shared state - no coherency traffic needed).
I've also only tested the extremes of "every element equal" vs "every element different". This means the branch in the "check" algorithm is always perfectly predicted. If you had a mix of equal and different, you wouldn't get just a weighted combination of the times for the SAME and DIFFERENT cases - you do worse, due to misprediction (both at the hardware level, and perhaps also at the JIT level, which can no longer optimize for the always-taken branch).
So whether it is sensible, even for volatile, depends on the specific context - the mix of equal and unequal values, the surrounding code and so on. I'd usually not do it for volatile alone in a single-threaded scenario, unless I suspected a large number of sets are redundant. In heavily multi-threaded structures, however, reading and then doing a volatile write (or other expensive operation, like a CAS) is a best-practice and you'll see it quality code such as java.util.concurrent structures.

Is it a sensible optimization to check whether a variable holds a specific value before writing that value?
Are there any use cases that would benefit from the if statement?
It is when assignment is significantly more costly than an inequality comparison returning false.
A example would be a large* std::set, which may require many heap allocations to duplicate.
**for some definition of "large"*
Will the compiler always optimize-out the if statement?
That's a fairly safe "no", as are most questions that contain both "optimize" and "always".
The C++ standard makes rare mention of optimizations, but never demands one.
What if var is a volatile variable?
Then it may perform the if, although volatile doesn't achieve what most people assume.

In general the answer is no. Since if you have simple datatype, compiler would be able to perform any necessary optimizations. And in case of types with heavy operator= it is responsibility of operator= to choose optimal way to assign new value.

There are situations where even a trivial assignment of say a pointersized variable can be more expensive than a read and branch (especially if predictable).
Why? Multithreading. If several threads are only reading the same value, they can all share that value in their caches. But as soon as you write to it, you have to invalidate the cacheline and get the new value the next time you want to read it or you have to get the updated value to keep your cache coherent. Both situations lead to more traffic between the cores and add latency to the reads.
If the branch is pretty unpredictable though it's probably still slower.

In C++, assigning a SIMPLE variable (that is, a normal integer or float variable) is definitely and always faster than checking if it already has that value and then setting it if it didn't have the value. I would be very surprised if this wasn't true in Java too, but I don't know how complicated or simple things are in Java - I've written a few hundred lines, and not actually studied how byte code and JITed bytecode actually works.
Clearly, if the variable is very easy to check, but complicated to set, which could be the case for classes and other such things, then there may be a value. The typical case where you'd find this would be in some code where the "value" is some sort of index or hash, but if it's not a match, a whole lot of work is required. One example would be in a task-switch:
if (current_process != new_process_to_run)
current_process == new_process_to_run;
Because here, a "process" is a complex object to alter, but the != can be done on the ID of the process.
Whether the object is simple or complex, the compiler will almost certainly not understand what you are trying to do here, so it will probably not optimize it away - but compilers are more clever than you think SOMETIMES, and more stupid at other times, so I wouldn't bet either way.
volatile should always force the compiler to read and write values to the variable, whether it "thinks" it is necessary or not, so it will definitely READ the variable and WRITE the variable. Of course, if the variable is volatile it probably means that it can change or represents some hardware, so you should be EXTRA careful with how you treat it yourself too... An extra read of a PCI-X card could incur several bus cycles (bus cycles being an order of magnitude slower than the processor speed!), which is likely to affect the performance much more. But then writing to a hardware register may (for example) cause the hardware to do something unexpected, and checking that we have that value first MAY make it faster, because "some operation starts over", or something like that.

It would be sensible if you had read-write locking semantics involved, whenever reading is usually less disruptive than writing.

In Objective-C you have the situation where assigning a object address to a pointer variable may require that the object be "retained" (reference count incremented). In such a case it makes sense to see if the value being assigned is the same as the value currently in the pointer variable, to avoid having to do the relatively expensive increment/decrement operations.
Other languages that use reference counting likely have similar scenarios.
But when assigning, say, an int or a boolean to a simple variable (outside of the multiprocessor cache scenario mentioned elsewhere) the test is rarely merited. The speed of a store in most processors is at least as fast as the load/test/branch.

In java the answer is always no. All assignments you can do in Java are primitive. In C++, the answer is still pretty much always no - if copying is so much more expensive than an equality check, the class in question should do that equality check itself.

LinkedList Vs ConcurrentLinkedQueue

Currently in a multithreaded environment, we are using a LinkedList to hold data. Sometimes in the logs we get NoSuchElementException while it is polling the linkedlist. Please help in understanding the performance impact if we move from the linkedlist to ConcurrentLinkedQueue implementation.
Thanks,
Sachin

When you get a NoSuchElementException then this maybe because of not synchronizing properly.
For example: You're checking with it.hasNext() if an element is in the list and afterwards trying to fetch it with it.next(). This may fail when the element has been removed in between and that can also happen when you use synchronized versions of Collection API.
So your problem cannot really be solved with moving to ConcurrentLinkedQueue. You may not getting an exception but you've to be prepared that null is returned even when you checked before that it is not empty. (This is still the same error but implementation differs.) This is true as long as there is no proper synchronization in YOUR code having checks for emptiness and element retrieving in the SAME synchronized scope.
There is a good chance that you trade NoSuchElementException for having new NullPointerException afterwards.
This may not be an answer directly addressing your question about performance, but having NoSuchElementException in LinkedList as a reason to move to ConcurrentLinkedQueue sounds a bit strange.
Edit
Some pseudo-code for broken implementations:
//list is a LinkedList
if(!list.isEmpty()) {
... list.getFirst()
}
Some pseudo-code for proper sync:
//list is a LinkedList
synchronized(list) {
if(!list.isEmpty()) {
... list.getFirst()
}
}
Some code for "broken" sync (does not work as intended).
This maybe the result of directly switching from LinkedList to CLQ in the hope of getting rid of synchronization on your own.
//queue is instance of CLQ
if(!queue.isEmpty()) { // Does not really make sense, because ...
... queue.poll() //May return null! Good chance for NPE here!
}
Some proper code:
//queue is instance of CLQ
element = queue.poll();
if(element != null) {
...
}
or
//queue is instance of CLQ
synchronized(queue) {
if(!queue.isEmpty()) {
... queue.poll() //is not null
}
}

ConcurrentLinkedQueue [is] an unbounded, thread-safe, FIFO-ordered queue. It uses a linked structure, similar to those we saw in Section 13.2.2 as the basis for skip lists, and in Section 13.1.1 for hash table overflow chaining. We noticed there that one of the main attractions of linked structures is that the insertion and removal operations implemented by pointer rearrangements perform in constant time. This makes them especially useful as queue implementations, where these operations are always required on cells at the ends of the structure, that is, cells that do not need to be located using the slow sequential search of linked structures.
ConcurrentLinkedQueue uses a CAS-based wait-free algorithm that is, one that guarantees that any thread can always complete its current operation, regardless of the state of other threads accessing the queue. It executes queue insertion and removal operations in constant time, but requires linear time to execute size. This is because the algorithm, which relies on co-operation between threads for insertion and removal, does not keep track of the queue size and has to iterate over the queue to calculate it when it is required.
From Java Generics and Collections, ch. 14.2.
Note that ConcurrentLinkedQueue does not implement the List interface, so it suffices as a replacement for LinkedList only if the latter was used purely as a queue. In this case, ConcurrentLinkedQueue is obviously a better choice. There should be no big performance issue unless its size is frequently queried. But as a disclaimer, you can only be sure about performance if you measure it within your own concrete environment and program.

Different types of thread-safe Sets in Java

There seems to be a lot of different implementations and ways to generate thread-safe Sets in Java.
Some examples include
1) CopyOnWriteArraySet
2) Collections.synchronizedSet(Set set)
3) ConcurrentSkipListSet
4) Collections.newSetFromMap(new ConcurrentHashMap())
5) Other Sets generated in a way similar to (4)
These examples come from Concurrency Pattern: Concurrent Set implementations in Java 6
Could someone please simply explain the differences, advantages, and disadvantage of these examples and others? I'm having trouble understanding and keeping straight everything from the Java Std Docs.

It is possible to combine the contains() performance of HashSet with the concurrency-related properties of CopyOnWriteArraySet by using the AtomicReference<Set> and replacing the whole set on each modification.
The implementation sketch:
public abstract class CopyOnWriteSet<E> implements Set<E> {
private final AtomicReference<Set<E>> ref;
protected CopyOnWriteSet( Collection<? extends E> c ) {
ref = new AtomicReference<Set<E>>( new HashSet<E>( c ) );
}
#Override
public boolean contains( Object o ) {
return ref.get().contains( o );
}
#Override
public boolean add( E e ) {
while ( true ) {
Set<E> current = ref.get();
if ( current.contains( e ) ) {
return false;
}
Set<E> modified = new HashSet<E>( current );
modified.add( e );
if ( ref.compareAndSet( current, modified ) ) {
return true;
}
}
}
#Override
public boolean remove( Object o ) {
while ( true ) {
Set<E> current = ref.get();
if ( !current.contains( o ) ) {
return false;
}
Set<E> modified = new HashSet<E>( current );
modified.remove( o );
if ( ref.compareAndSet( current, modified ) ) {
return true;
}
}
}
}

If the Javadocs don't help, you probably should just find a book or article to read about data structures. At a glance:
CopyOnWriteArraySet makes a new copy of the underlying array every time you mutate the collection, so writes are slow and Iterators are fast and consistent.
Collections.synchronizedSet() uses old-school synchronized method calls to make a Set threadsafe. This would be a low-performing version.
ConcurrentSkipListSet offers performant writes with inconsistent batch operations (addAll, removeAll, etc.) and Iterators.
Collections.newSetFromMap(new ConcurrentHashMap()) has the semantics of ConcurrentHashMap, which I believe isn't necessarily optimized for reads or writes, but like ConcurrentSkipListSet, has inconsistent batch operations.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.