I have been reading Effective Java on
Item 46: Prefer for-each loops to traditional for loops
In the part where are mentioned the cases when is iterator/for loop needed isntead of for-each loop, there is this point:
Parallel iteration—If you need to traverse multiple collections in
parallel, then you need explicit control over the iterator or index
variable, so that all iterators or index variables can be advanced
in lockstep.
Now, I understand what explicit control over iterator/index variable mean (not controller by for each loop). But I could not understand the meaning of lockstep in this sense. I tried to google it and found an article on Wikipedia which states:
Lockstep systems are fault-tolerant computer systems that run the same
set of operations at the same time in parallel.
This I understand as having aditional instance of for example server for fail-over That's ok. But I fail to fully understand what could be the exact meaning in the context of iterating over collection in programming.
In this context, the meaning is more like the military marching.
Or, when one operation advances, other operations advances/follows with it.
Or more specifically, if you want to iterate over two collections, you cannot easily the foreach construct:
for (Item i : list1) { //only allows you to iterate over 1 list.
}
Iterate over 2 collections )
Iterator iter1 = list1.iterator();
Iterator iter2 = list2.iterator();
while (iter1.hasNext() && iter2.hasNext()){
Item a = iter1.next();
Item b = iter2.next();
doSomething(a, b);
}
i.e. while iterating list1, iterating list2 follows with it - "in lockstep"
Lockstep execution means that the same statement will be executed on all the processors at the same time "in parallel". This is of special importance when you are dealing with GPGPU (General Purpose Graphics Processing Unit) programming. GPU's actually do the exact same operation in parallel on a different data set.
Example: In a for loop with independent operations on data (say a vector addition problem), all the processors may call the add operation simultaneously, then assignment operation simultaneously on two separate vector index, in a lockstep fashion, as one addition and assignment is independent from another.
The meaning of "lockstep" in this context is not special, but is the English-language meaning, interpreted as "at the same time".
Here, it just means that the index and iterator are advanced at the same time, so that they always correspond to the same element. Kind of like two people walking side-by-side--they have to step forward together if they're to remain side-by-side.
Related
Question: What is the optimal (performance-wise) solution for the add, removal, modification of items within an ArrayList which at the same time avoids the ConcurrentModificationException from being thrown during operations?
Context: Based on my research looking into this question, there doesn't seem to be any straight-forward answers to the question at hand - most recommend using CopyOnWriteArrayList, but my understanding is that it is not recommended for array lists of large size (which I am working with, hence the performance-aspect of the question).
Thus, my understanding can be summarized as the following, but want to make sure if is correct/incorrect:
IMPORTANT NOTE: The following statements all assume that the operation is done within a synchronized block.
Remove during iteration of an ArrayList should be done with an Iterator, because for loop results in unpredictable behavior if removal is done within the middle of a collection. Example:
Iterator<Item> itemIterator = items.iterator();
while (itemIterator.hasNext()) {
Item item = itemIterator.next();
// check if item needs to be removed
itemIterator.remove();
}
For add operations, cannot be done with an Iterator, but can be with ListIterator. Example:
ListIterator<Item> itemIterator = list.listIterator();
while(itemIterator.hasNext()){
\\ do some operation which requires iteration of the ArrayList
itemIterator.add(item);
}
For add operations, a ListIterator does NOT have to be necessarily be used (i.e. simply items.add(item) should not cause any problems).
For add operations while going through the collection can be done with EITHER a ListIterator or a for loop, but NOT an Iterator. Example:
Iterator<Item> itemIterator = item.iterator();
while (itemIterator.hasNext()) {
\\ do some operation which requires iteration of the ArrayList
items.add(item); \\ NOT acceptable - cannot modify ArrayList while in an Iterator of that ArrayList
}
Modification of an item within an ArrayList can be done with either an Iterator or a for loop with the same performance complexity (is this true?). Example:
\\ iterator example
Iterator<Item> itemIterator = item.iterator();
while (itemIterator.hasNext()) {
Item item = itemIterator.next();
item.update(); // modifies the item within the ArrayList during iteration
}
\\ for loop example
for (Item item : items){
item.update();
}
Will modification during iteration with the Iterator have the same performance as the for loop? Are there any thread-safety differences between the approaches?
Bonus question: what advantage does using a synchronizedList of the ArrayList for add/remove/modify operations vs. for loop vs. iterator if it also requires a synchronized block?
There is no difference between while loops and for loops and in fact, the idiomatic form of a loop using an iterator explicitly, is a for loop:
for(Iterator<Item> it = items.iterator(); it.hasNext(); ) {
Item item = it.next();
item.update();
}
which gets compiled to exactly the same code as
for(Item item: items) {
item.update();
}
Try it online!
There are no performance differences for identical compiled code dependent to the original source code used to produce it.
Instead of focusing on the loop form, you have to focus on the fundamental limitations when inserting or removing elements of an ArrayList. Each time you insert or remove an element, the elements behind the affected index have to be copied to a new location. This isn’t very expensive, as the array only consists of references to the objects, but the costs can easily add up when doing it repeatedly.
So, if you know that the number of insertions or removals is predictably small or will happen at the end or close to the end (so there is only a small number of elements to copy), it’s not a problem. But when inserting or removing an arbitrary number of elements at arbitrary positions in a loop, you run into a quadratic time complexity.
You can avoid this, by using
items.removeIf(item -> /* check and return whether to remove the item*/);
This will use an internal iteration and postpone the moving of elements until their final position is known, leading to a linear time complexity.
If that’s not feasible, you might be better off copying the list into a new list, skipping the unwanted elements. This will be slightly less efficient but still have a linear time complexity. That’s also the solution for inserting a significant number of items at arbitrary positions.
The item.update(); in an entirely different category. “the item within the ArrayList” is a wrong mindset. As said above, the ArrayList contains references to objects whereas the object itself is not affected by “being inside the ArrayList”. In fact, objects can be in multiple collections at the same time, as all standard collections only contain references.
So item.update(); changes the Item object, which is an operation independent of the ArrayList, which is dangerous when you assume a thread safety based on the list.
When you have code like
Item item = items.get(someIndex);
// followed by using item
where get is from a synchronizedList
or a manually synchronized retrieval operation which returns the item to the caller or any other form of code which uses a retrieved Item outside the synchronized block,
then your code is not thread safe. It doesn’t help when the update() call is done under a synchronization or lock when looping over the list, when the other uses are outside the synchronization or lock. To be thread safe, all uses of an object must be protected by the same thread safety construct.
So even when you use the synchronizedList, you must not only guard your loops manually, as the documentation already tells, you also have to expand the protection to all other uses of the contained elements, if they are mutable.
Alternatively, you could have different mechanisms for the list and the contained elements, if you know what you are doing, but it still means that the simplicity of “just wrap the list with synchronizedList” isn’t there.
So what advantage does it have? Actually none. It might have helped developers during the migration from Java 1.1 and its all-synchronized Vector and Hashtable to Java 2’s Collection API. But I never had a use for the synchronized wrappers at all. Any nontrivial use case requires manual synchronization (or locking) anyway.
I have multiple threads iterating over a list. All these threads will in the end find a matching element to remove from such list.
To avoid inconsistent states what should I use for the list? Vector? ArrayList? Other?
Here is an example with Vectors. It doesn't give errors but I'm sure it could:
for(int i=0; i<timersVector.size(); i++){
currTimerThread = timersVector.get(i);
if(currTimerThread.getRowViewTag().equals(parent.getTag())){
currTimerThread.stopTimer();
timersVector.remove(i);
Log.i(tag, "timerVector size: "+timersVector.size());
}
}
For example, if one thread is entering the loop and size is 10 and right after another thread is removing the element at 5, what would happen to the first one?
Thanks for any help
For a Vector each operation is thread safe, however multiple operations are not. As you are performing multiple operations, you need to hold a lock on the collection while performing them all. i.e. outside the loop in this case.
e.g. the element you get(i) and the element you remove(i) could be changed by another thread. There is no guarantee the element you removed is the one you checked.
BTW ArrayList replaced Vector in 1998. I suggest you use that and synchronize as required and/or use Collections.synchronizedList(new ArrayList<>())
Accessing a List from multiple threads requires a synchronized List wrapper. The java.util.Collections utility class contains all kind of synchronized wrappers.
In your case, wrap your list (don't use Vector, it's there of backward compatibility only) using this simple line of code:
List<Timer> timers = Collections.synchronizedList(originalTimers);
Suggestion: Usage of synchornized map would be more efficient in your case and wouldn't require a loop to search through items.
I was wondering about the Java 8 streams (Stream<E>), they have the following methods:
forEach(Consumer<? super E> action)
forEachOrdered(Consumer<? super E> action)
What were the arguments against not supplying the following signature?
forEachOrdered(BiConsumer<Integer, ? super E> action)
Which would then return the index of the item in the stream and the item itself.
With this overload it would be possible to actually use the index in case the stream was ordered.
I am really curious to see what the arguments are against it.
Edit, the same actually holds for Iterator<E> with forEachRemaining, and possibly more classes.
If none of the classes provide such option, then I suspect it has been considered for Java 8 and denied.
indexing every element requires a sequential assignment of the indexes. this would defeat the point of parallel operations, since each operation would have to synchronize to get its index.
Streams and Iterators do not have to be finite. Both Stream::generate
and Stream::iterate return infinite Streams. How would you handle indexing with an infinite stream? Let the index overflow to negative numbers? Use a BigInteger (and potentially run out of memory)?
There isn't a good solution to handling indexing for infinite streams, so the designers (correctly, in my opinion) left it out of the API.
Adding a single method providing an index would require all implementation methods to be doubled to have one maintaining an index and one without. There’s more to it than visible in the API. If you are curious you may look at the type tree of the internal interface java.util.stream.Sink<T> to get an idea. All of them would be affected. The alternative would be to always maintain an index even if it is not required.
And it adds an ambiguity. Does the index reflect the source index, i.e. does not change on filtering, or is it a position in the final stream? On the other hand you can always insert a mapping from an item type to a type holding the item and an index at any places in the chain. This would clear the ambiguity. And the limitations to that solution are the same that a JRE provided solution would have.
In case of an Iterator the answer is even simpler. Since forEachRemaining must be provided as a default interface method it cannot add the maintenance of an index. So at the time it is invoked, it doesn’t know how many items have been consumed so far. And starting the count with zero at that time, ignoring all previous items would be a feature that a lot of developers would question even more.
I have read all above answers, however, personally i disagree with them. I think some method(e.g. indexed()) should be added and it can be executed sequentially, even in parallel stream because this method will be verify fast, no need to execute in parallel. You can add 'index' by map. for example:
List<String> list = N.asList("a", "b", "c");
final AtomicLong idx = new AtomicLong(0);
list.stream().map(e -> Indexed.of(idx.getAndIncrement(), e)).forEach(N::println);
Or you can use third library: abacus-common, the code will be:
List<String> list = N.asList("a", "b", "c");
Stream.of(list).indexed().forEach(N::println);
// output:
// [0]=a
// [1]=b
// [2]=c
Disclosure: I'm the developer of abacus-common.
I wonder what is the best way to implement a "for-each" loop over an ArrayList or every kind of List.
Which of the followings implementations is the best and why? Or is there a best way?
Thank you for your help.
List values = new ArrayList();
values.add("one");
values.add("two");
values.add("three");
...
//#0
for(String value : values) {
...
}
//#1
for(int i = 0; i < values.size(); i++) {
String value = values.get(i);
...
}
//#2
for(Iterator it = values.iterator(); it.hasNext(); ) {
String value = it.next();
...
}
//#3
Iterator it = values.iterator();
while (it.hasNext()) {
String value = (String) it.next();
...
}
#3 has a disadvantage because the scope of the iterator it extends beyond the end of the loop. The other solutions don't have this problem.
#2 is exactly the same as #0, except #0 is more readable and less prone to error.
#1 is (probably) less efficient because it calls .size() every time through the loop.
#0 is usually best because:
it is the shortest
it is least prone to error
it is idiomatic and easy for other people to read at a glance
it is efficiently implemented by the compiler
it does not pollute your method scope (outside the loop) with unnecessary names
The short answer is to use version 0. Take a peek at the section title Use Enhanced For Loop Syntax at Android's documentation for Designing for Performance. That page has a bunch of goodies and is very clear and concise.
#0 is the easiest to read, in my opinion, but #2 and #3 will work just as well. There should be no performance difference between those three.
In almost no circumstances should you use #1. You state in your question that you might want to iterate over "every kind of List". If you happen to be iterating over a LinkedList then #1 will be n^2 complexity: not good. Even if you are absolutely sure that you are using a list that supports efficient random access (e.g. ArrayList) there's usually no reason to use #1 over any of the others.
In response to this comment from the OP.
However, #1 is required when updating (if not just mutating the current item or building the results as a new list) and comes with the index. Since the List<> is an ArrayList<> in this case, the get() (and size()) is O(1), but that isn't the same for all List-contract types.
Lets look at these issues:
It is certainly true that get(int) is not O(1) for all implementations of the List contract. However, AFAIK, size() is O(1) for all List implementations in java.util. But you are correct that #1 is suboptimal for many List implementations. Indeed, for lists like LinkedList where get(int) is O(N), the #1 approach results in a O(N^2) list iteration.
In the ArrayList case, it is a simple matter to manually hoist the call to size(), assigning it to a (final) local variable. With this optimization, the #1 code is significantly faster than the other cases ... for ArrayLists.
Your point about changing the list while iterating the elements raises a number of issues:
If you do this with a solution that explicitly or implicitly uses iterators, then depending on the list class you may get ConcurrentModificationExceptions. If you use one of the concurrent collection classes, you won't get the exception, but the javadocs state that the iterator won't necessarily return all of the list elements.
If you do this using the #1 code (as is) then, you have a problem. If the modification is performed by the same thread, you need to adjust the index variable to avoid missing entries, or returning them twice. Even if you get everything right, a list entry concurrently inserted before the current position won't show up.
If the modification in the #1 case is performed by a different thread, it hard to synchronize properly. The core problem is that get(int) and size() are separate operations. Even if they are individually synchronized, there is nothing to stop the other thread from modifying the list between a size and get call.
In short, iterating a list that is being concurrently modified is tricky, and should be avoided ... unless you really know what you are doing.
I'm returning to c++ after being away for a bit and trying to dust off the old melon.
In Java Iterator is an interface to a container having methods: hasNext(), next() and remove(). The presence of hasNext() means it has the concept of a limit for the container being traversed.
//with an Iterator
Iterator<String> iter = trees.iterator();
while (iter.hasNext())
{
System.out.println(iter.next());
}
In the C++ standard template library, iterators seem to represent a datatype or class the supports the operator++ and operator== but has no concept of a limit built in so comparison is required before advancing to the next item. The limit has to checked by the user comparing two iterators in the normal case the second iterator is the container end.
vector<int> vec;
vector<int>::iterator iter;
// Add some elements to vector
v.push_back(1);
v.push_back(4);
v.push_back(8);
for (iter= v.begin(); iter != v.end(); iter++)
{
cout << *i << " "; //Should output 1 4 8
}
The interesting part here is that in C++ a pointer is an iterator to an array. The STL took what was existing and build convention around it.
It there any further subtlety to this that I am missing?
Perhaps a bit more theoretical. Mathematically, collections in C++ can be described as a half-open interval of iterators, namely one iterator pointing to the start of the collection and one iterator pointing just behind the last element.
This convention opens up a host of possibilities. The way algorithms work in C++, they can all be applied to subsequences of a larger collection. To make such a thing work in Java, you have to create a wrapper around an existing collection that returns a different iterator.
Another important aspect of iterators has already been mentioned by Frank. There are different concepts of iterators. Java iterators correspond to C++' input iterators, i.e. they are read-only iterators that can only be incremented one step at a time and can't go backwards.
On the other extreme, you have C pointers which correspond exactly to C++' concept of a random access iterator.
All in all, C++ offers a much richer and purer concept that can be applied to a much wider variety of tasks than either C pointers or Java iterators.
Yes, there is a large conceptual difference. C++ utilizes different "classes" of iterators. Some are used for random access (unlike Java), some are used for forward access (like java). While even others are used for writing data (for use with, say, transform).
See the iterators concept in the C++ Documentation:
Input Iterator
Output Iterator
Forward Iterator
Bidirectional Iterator
Random Access Iterator
These are far more interesting and powerful compared to Java/C#'s puny iterators. Hopefully these conventions will be codified using C++0x's Concepts.
As mentioned, Java and C# iterators describe an intermixed position(state)-and-range(value), while C++ iterators separate the concepts of position and range. C++ iterators represent 'where am I now' separately from 'where can I go?'.
Java and C# iterators can't be copied. You can't recover a previous position. The common C++ iterators can.
Consider this example:
// for each element in vec
for(iter a = vec.begin(); a != vec.end(); ++a){
// critical step! We will revisit 'a' later.
iter cur = a;
unsigned i = 0;
// print 3 elements
for(; cur != vec.end() && i < 3; ++cur, ++i){
cout << *cur << " ";
}
cout << "\n";
}
Click the above link to see program output.
This rather silly loop goes through a sequence (using forward iterator semantics only), printing each contiguous subsequence of 3 elements exactly once (and a couple shorter subsequences at the end). But supposing N elements, and M elements per line instead of 3, this algorithm would still be O(N*M) iterator increments, and O(1) space.
The Java style iterators lack the ability to store position independently. You will either
lose O(1) space, using (for example) an array of size M to store history as you iterate
will need to traverse the list N times, making O(N^2+N*M) time
or use a concrete Array type with GetAt member function, losing genericism and the ability to use linked list container types.
Since only forward iteration mechanics were used in this example, i was able to swap in a list with no problems. This is critical to authoring generic algorithms, such as search, delayed initialization and evaluation, sorting, etc.
The inability to retain state corresponds most closely to the C++ STL input iterator, on which very few algorithms are built.
A pointer to an array element is indeed an iterator into the array.
As you say, in Java, an iterator has more knowledge of the underlying container than in C++. C++ iterators are general, and a pair of iterators can denote any range: this can be a sub-range of a container, a range over multiple containers (see http://www.justsoftwaresolutions.co.uk/articles/pair_iterators.pdf or http://www.boost.org/doc/libs/1_36_0/libs/iterator/doc/zip_iterator.html) or even a range of numbers (see http://www.boost.org/doc/libs/1_36_0/libs/iterator/doc/counting_iterator.html)
The iterator categories identify what you can and can't do with a given iterator.
To me the fundamental difference is that Java Iterators point between items, whereas C++ STL iterators point at items.
C++ iterators are a generalization of the pointer concept; they make it applicable to a wider range of situations. It means that they can be used to do such things as define arbitrary ranges.
Java iterators are relatively dumb enumerators (though not so bad as C#'s; at least Java has ListIterator and can be used to mutate the collection).
There are plenty of good answers about the differences, but I felt the thing that annoys me the most with Java iterators wasn't emphasized--You can't read the current value multiple times. This is really useful in a lot of scenarios, especially when you are merging iterators.
In c++, you have a method to advance the iterator and to read the current value. Reading its value doesn't advance the iteration; so you can read it multiple times. This is not possible with Java iterators, and I end up creating wrappers that do this.
A side note: one easy way to create a wrapper is to use an existing one--PeekingIterator from Guava.
Iterators are only equivalent to pointers in the trivial case of iterating over the contents of an array in sequence. An iterator could be supplying objects from any number of other sources: from a database, from a file, from the network, from some other calculation, etc.
C++ library (the part formerly known as STL) iterators are designed to be compatible with pointers. Java, without pointer arithmetic, had the freedom to be more programmer-friendly.
In C++ you end up having to use a pair of iterators. In Java you either use an iterator or a collection. Iterators are supposed to be the glue between algorithm and data structure. Code written for 1.5+ rarely need mention iterators, unless it is implementing a particular algorithm or data structure (which the vary majority of programmers have no need to do). As Java goes for dynamic polymorphism subsets and the like are much easier to handle.