Iterators in C++ (stl) vs Java, is there a conceptual difference?

Iterators in C++ (stl) vs Java, is there a conceptual difference? - java

I'm returning to c++ after being away for a bit and trying to dust off the old melon.
In Java Iterator is an interface to a container having methods: hasNext(), next() and remove(). The presence of hasNext() means it has the concept of a limit for the container being traversed.
//with an Iterator
Iterator<String> iter = trees.iterator();
while (iter.hasNext())
{
System.out.println(iter.next());
}
In the C++ standard template library, iterators seem to represent a datatype or class the supports the operator++ and operator== but has no concept of a limit built in so comparison is required before advancing to the next item. The limit has to checked by the user comparing two iterators in the normal case the second iterator is the container end.
vector<int> vec;
vector<int>::iterator iter;
// Add some elements to vector
v.push_back(1);
v.push_back(4);
v.push_back(8);
for (iter= v.begin(); iter != v.end(); iter++)
{
cout << *i << " "; //Should output 1 4 8
}
The interesting part here is that in C++ a pointer is an iterator to an array. The STL took what was existing and build convention around it.
It there any further subtlety to this that I am missing?

Perhaps a bit more theoretical. Mathematically, collections in C++ can be described as a half-open interval of iterators, namely one iterator pointing to the start of the collection and one iterator pointing just behind the last element.
This convention opens up a host of possibilities. The way algorithms work in C++, they can all be applied to subsequences of a larger collection. To make such a thing work in Java, you have to create a wrapper around an existing collection that returns a different iterator.
Another important aspect of iterators has already been mentioned by Frank. There are different concepts of iterators. Java iterators correspond to C++' input iterators, i.e. they are read-only iterators that can only be incremented one step at a time and can't go backwards.
On the other extreme, you have C pointers which correspond exactly to C++' concept of a random access iterator.
All in all, C++ offers a much richer and purer concept that can be applied to a much wider variety of tasks than either C pointers or Java iterators.

Yes, there is a large conceptual difference. C++ utilizes different "classes" of iterators. Some are used for random access (unlike Java), some are used for forward access (like java). While even others are used for writing data (for use with, say, transform).
See the iterators concept in the C++ Documentation:
Input Iterator
Output Iterator
Forward Iterator
Bidirectional Iterator
Random Access Iterator
These are far more interesting and powerful compared to Java/C#'s puny iterators. Hopefully these conventions will be codified using C++0x's Concepts.

As mentioned, Java and C# iterators describe an intermixed position(state)-and-range(value), while C++ iterators separate the concepts of position and range. C++ iterators represent 'where am I now' separately from 'where can I go?'.
Java and C# iterators can't be copied. You can't recover a previous position. The common C++ iterators can.
Consider this example:
// for each element in vec
for(iter a = vec.begin(); a != vec.end(); ++a){
// critical step! We will revisit 'a' later.
iter cur = a;
unsigned i = 0;
// print 3 elements
for(; cur != vec.end() && i < 3; ++cur, ++i){
cout << *cur << " ";
}
cout << "\n";
}
Click the above link to see program output.
This rather silly loop goes through a sequence (using forward iterator semantics only), printing each contiguous subsequence of 3 elements exactly once (and a couple shorter subsequences at the end). But supposing N elements, and M elements per line instead of 3, this algorithm would still be O(N*M) iterator increments, and O(1) space.
The Java style iterators lack the ability to store position independently. You will either
lose O(1) space, using (for example) an array of size M to store history as you iterate
will need to traverse the list N times, making O(N^2+N*M) time
or use a concrete Array type with GetAt member function, losing genericism and the ability to use linked list container types.
Since only forward iteration mechanics were used in this example, i was able to swap in a list with no problems. This is critical to authoring generic algorithms, such as search, delayed initialization and evaluation, sorting, etc.
The inability to retain state corresponds most closely to the C++ STL input iterator, on which very few algorithms are built.

A pointer to an array element is indeed an iterator into the array.
As you say, in Java, an iterator has more knowledge of the underlying container than in C++. C++ iterators are general, and a pair of iterators can denote any range: this can be a sub-range of a container, a range over multiple containers (see http://www.justsoftwaresolutions.co.uk/articles/pair_iterators.pdf or http://www.boost.org/doc/libs/1_36_0/libs/iterator/doc/zip_iterator.html) or even a range of numbers (see http://www.boost.org/doc/libs/1_36_0/libs/iterator/doc/counting_iterator.html)
The iterator categories identify what you can and can't do with a given iterator.

To me the fundamental difference is that Java Iterators point between items, whereas C++ STL iterators point at items.

C++ iterators are a generalization of the pointer concept; they make it applicable to a wider range of situations. It means that they can be used to do such things as define arbitrary ranges.
Java iterators are relatively dumb enumerators (though not so bad as C#'s; at least Java has ListIterator and can be used to mutate the collection).

There are plenty of good answers about the differences, but I felt the thing that annoys me the most with Java iterators wasn't emphasized--You can't read the current value multiple times. This is really useful in a lot of scenarios, especially when you are merging iterators.
In c++, you have a method to advance the iterator and to read the current value. Reading its value doesn't advance the iteration; so you can read it multiple times. This is not possible with Java iterators, and I end up creating wrappers that do this.
A side note: one easy way to create a wrapper is to use an existing one--PeekingIterator from Guava.

Iterators are only equivalent to pointers in the trivial case of iterating over the contents of an array in sequence. An iterator could be supplying objects from any number of other sources: from a database, from a file, from the network, from some other calculation, etc.

C++ library (the part formerly known as STL) iterators are designed to be compatible with pointers. Java, without pointer arithmetic, had the freedom to be more programmer-friendly.
In C++ you end up having to use a pair of iterators. In Java you either use an iterator or a collection. Iterators are supposed to be the glue between algorithm and data structure. Code written for 1.5+ rarely need mention iterators, unless it is implementing a particular algorithm or data structure (which the vary majority of programmers have no need to do). As Java goes for dynamic polymorphism subsets and the like are much easier to handle.

Related

Best way to convert a PriorityQueue to a sorted array

I am concerned with different styles of creating a sorted array from a Java PriorityQueue containing a few thousand elements. The Java 8 docs say
If you need ordered traversal, consider using Arrays.sort(pq.toArray()).
However, I do like the streaming API, so what I had at first was
Something[] elems = theHeap.stream().sorted(BY_CRITERION.reversed())
.toArray(Something[]::new);
(Where BY_CRITERION is the custom comparator of the PriorityQueue, and I do want the reverse order of that.) Is there any disadvantage to using this idiom, as compared to the following:
Something[] elems = theHeap.toArray(new Something[0]);
Arrays.sort(elems, BY_CRITERION.reversed());
This latter code certainly appears to be following the API doc recommendation more directly, but apart from that, is it really more efficient in terms of memory, such as fewer temporary structures being allocated etc.?
I would think that the streaming solution would have to buffer the stream elements in a temporary structure (an array?) and then sort them, and finally copy the sorted elements to the array allocated in toArray().
While the imperative solution would buffer the heap elements in a newly allocated array and then sort them. So this might be one copy operation fewer. (And one array allocation. The discussion of Collection.toArray(new T[size]) vs. Collection.toArray(new T[0]) is tangentially related here. For example, see here for why on OpenJDK the latter is faster.)
And what about sorting efficiency? The docs of Arrays.sort() say
Temporary storage requirements vary from a small constant for nearly sorted input arrays to n/2 object references for randomly ordered input arrays
while the documentation of Stream.sorted() is silent on this point. So at least in terms of being reliably documented, the imperative solution would seem to have an advantage.
But is there anything more to know?

Fundamentally, both variants do the same and since both are valid solutions within the library’s intended use cases, there is no reason why the implementations should prefer one over the other regarding choosing algorithms or adding optimizations.
In practice, this means that the most expensive operation, the sorting, ends up at the same implementation methods internally. The sorted(…) operation of the Stream implementation buffers all elements into an intermediate array, followed by calling Arrays.sort(T[], int, int, Comparator<? super T>), which will delegate to the same method as the method Arrays.sort(T[], Comparator<? super T>) you use in your first variant, the target being a sort method within the internal TimSort class.
So everything that has been said about the time and space complexity for Arrays.sort applies to the Stream.sort as well. But there is a performance difference though. For the OpenJDK implementations up to Java 10, the Stream is not capable of fusing the sorted with the subsequent toArray step, to directly use the result array for the sorting step. So currently, the Stream variant bears a final copying step from the intermediate array used for sorting to the final array created by the function passed to toArray. But future implementations might learn this trick, then, there will be no relevant performance different at all between the two solutions.

Why don't we count linear search cost as a prerequisite bottleneck for the insertion operation of a linked list, compared to ArrayList?

I have had this question for a while but I have been unsatisfied with the answers because the distinctions appear to be arbitrary and more like conventional wisdom that is sort of blindly accepted rather than assessed critically.
In an ArrayList it is said that insertion cost (for a single element) is linear. If we are inserting at index p for 0 <= p < n where n is the size of the list, then the remaining n-p elements are shifted over first before the new element is copied into position p.
In a LinkedList, it is said that insertion cost (for a single element) is constant. For instance if we already have a node and we want to insert after it, we rearrange some pointers and it's done quickly. But getting this node in the first place, I don't see how it can be done other than a linear search first (assuming it isn't a trivial case like prepending at the start of the list or appending at the end).
And yet in the case of the LinkedList, we don't count that initial search time. To me this is confusing because it's sort of like saying "The ice cream is free... after you pay for it." It's like, well, of course it is... but that sort of skips the hard part of paying for it. Of course inserting in a LinkedList is going to be constant time if you already have the node you want, but getting that node in the first place may take some extra time! I could easily say that inserting in an ArrayList is constant time... after I move the remaining n-p elements.
So I don't understand why this distinction is made for one but not the other. You could argue that insertion is considered constant for LinkedLists because of the cases where you insert at the front or back where linear time operations are not required, whereas in an ArrayList, insertion requires copying of the suffix array after position p, but I could easily counter that by saying if we insert at the back of an ArrayList, it is amortized constant time and doesn't require extra copying in most cases unless we reach capacity.
In other words we separate the linear stuff from the constant stuff for LinkedList, but we don't separate them for the ArrayList, even though in both cases, the linear operations may not be invoked or not invoked.
So why do we consider them separate for LinkedList and not for ArrayList? Or are they only being defined here in the context where LinkedList is overwhelmingly used for head/tail appends and prepends as opposed to elements in the middle?

This is basically a limitation of the Java interface for List and LinkedList, rather than a fundamental limitation of linked lists. That is, in Java there is no convenient concept of "a pointer to a list node".
Every type of list has a few different concepts loosely associated with the idea of pointing to a particular item:
The idea of a "reference" to a specific item in a list
The integer position of an item in the list
The value of a item that may be in the list (possibly multiple times)
The most general concept is the first one, and is usually encapsulated in the idea of an iterator. As it happens, the simple way to implement an iterator for an array backed list is simply to wrap an integer which refers to the position of the item in a list. So for array lists only, the first and second ways of referring to items are pretty tightly bound.
For other list types, however, and even for most other container types (trees, hashes, etc) that is not the case. The generic reference to an item is usually something like a pointer to the wrapper structure around one item (e.g., HashMap.Entry or LinkedList.Entry). For these structures the idea of accessing the nth element isn't necessary natural or even possible (e.g., unordered collections like sets and many hash maps).
Perhaps unfortunately, Java made the idea of getting an item by its index a first-class operation. Many of the operations directly on List objects are implemented in terms of list indexes: remove(int index), add(int index, ...), get(int index), etc. So it's kind of natural to think of those operations as being the fundamental ones.
For LinkedList though it's more fundamental to use a pointer to a node to refer to an object. Rather than passing around a list index, you'd pass around the pointer. After inserting an element, you'd get a pointer to the element.
In C++ this concept is embodied in the concept of the iterator, which is the first class way to refer to items in collections, including lists. So does such a "pointer" exist in Java? It sure does - it's the Iterator object! Usually you think of an Iterator as being for iteration, but you can also think of it as pointing to a particular object.
So the key observation is: given an pointer (iterator) to an object, you can remove and add from linked lists in constant time, but from an array-like list this takes linear time in general. There is no inherent need to search for an object before deleting it: there are plenty of scenarios where you can maintain or take as input such a reference, or where you are processing the entire list, and here the constant time deletion of linked lists does change the algorithmic complexity.
Of course, if you need to do something like delete the first entry containing the value "foo" that implies both a search and a delete operation. Both array-based and linked lists taken O(n) for search, so they don't vary here - but you can meaningfully separate the search and delete operations.
So you could, in principle, pass around Iterator objects rather than list indexes or object values - at least if your use case supports it. However, at the top I said that "Java has no convenient notion of a pointer to a list node". Why?
Well because actually using Iterator is actually very inconvenient. First of all, it's tough to get an Iterator to an object in the first place: for example, and unlike C++, the add() methods don't return an Iterator - so to get a pointer to the item you just added, you need to go ahead and iterate over the list or use the listIterator(int index) call, which is inherently inefficient for linked lists. Many methods (e.g., subList()) support only a version that takes indexes, but not Iterators - even when such a method could be efficiently supported.
Add to that the restrictions around iterator invalidation when the list is modified, and they actually become pretty useless for referring to elements except in immutable lists.
So Java's support of pointers to list elements is pretty half-hearted an so it's tough to leverage the constant time operations that linked list offers, except in cases such as adding to the front of a list, or deleting items during iteration.
It's not limited to lists, either - the ConcurrentQueue is also a linked structure which supports constant time deletes, but you can't reliably use that ability from Java.

If you're using a LinkedList, chances are you're not going to use it for a random access insert. LinkedList offers constant time for push (insert at the beginning) or add (because it has a ref to the final element IIRC). You are correct in your suspicion that an insert into a random index (e.g. insert sorted) will take linear time - not constant.
ArrayList, by contrast, is worst case linear. Most of the time it simply does an arraycopy to shift the indices (which is a low-level shift that is constant time). Only when you need to resize the backing array will it take linear time.

What does lockstep mean in programming?

I have been reading Effective Java on
Item 46: Prefer for-each loops to traditional for loops
In the part where are mentioned the cases when is iterator/for loop needed isntead of for-each loop, there is this point:
Parallel iteration—If you need to traverse multiple collections in
parallel, then you need explicit control over the iterator or index
variable, so that all iterators or index variables can be advanced
in lockstep.
Now, I understand what explicit control over iterator/index variable mean (not controller by for each loop). But I could not understand the meaning of lockstep in this sense. I tried to google it and found an article on Wikipedia which states:
Lockstep systems are fault-tolerant computer systems that run the same
set of operations at the same time in parallel.
This I understand as having aditional instance of for example server for fail-over That's ok. But I fail to fully understand what could be the exact meaning in the context of iterating over collection in programming.

In this context, the meaning is more like the military marching.
Or, when one operation advances, other operations advances/follows with it.
Or more specifically, if you want to iterate over two collections, you cannot easily the foreach construct:
for (Item i : list1) { //only allows you to iterate over 1 list.
}
Iterate over 2 collections )
Iterator iter1 = list1.iterator();
Iterator iter2 = list2.iterator();
while (iter1.hasNext() && iter2.hasNext()){
Item a = iter1.next();
Item b = iter2.next();
doSomething(a, b);
}
i.e. while iterating list1, iterating list2 follows with it - "in lockstep"

Lockstep execution means that the same statement will be executed on all the processors at the same time "in parallel". This is of special importance when you are dealing with GPGPU (General Purpose Graphics Processing Unit) programming. GPU's actually do the exact same operation in parallel on a different data set.
Example: In a for loop with independent operations on data (say a vector addition problem), all the processors may call the add operation simultaneously, then assignment operation simultaneously on two separate vector index, in a lockstep fashion, as one addition and assignment is independent from another.

The meaning of "lockstep" in this context is not special, but is the English-language meaning, interpreted as "at the same time".

Here, it just means that the index and iterator are advanced at the same time, so that they always correspond to the same element. Kind of like two people walking side-by-side--they have to step forward together if they're to remain side-by-side.

Differences between std::vector and java.util.Vector

In C++ i can insert an item into an arbitrary position in a vector, just like the code below:
std::vector<int> vec(10);
vec.insert(vec.begin()+2,2);
vec.insert(vec.begin()+4,3);
In Java i can not do the same, i get an exception java.lang.ArrayIndexOutOfBoundsException, code below:
Vector l5 = new Vector(10);
l5.add(0, 1);
l5.add(1, "Test");
l5.add(3, "test");
It means that C++ is better designed or is just a Java design decision ?
Why java use this approach ?

In the C++ code:
std::vector<int> vec(10);
You are creating a vector of size 10. So all indexes from 0 to 9 are valid afterwards.
In the Java code:
Vector l5 = new Vector(10);
You are creating an empty vector with an initial capacity of 10. It means the underlying array is of size 10 but the vector itself has the size 0.
It does not mean one is better designed than the other. The API is just different and this is not a difference that makes one better than the other.
Note that in Java it is now preffered to use ArrayList, which has almost the same API, but is not synchronized. If you want to find a bad design decision in Java's Vector, then this synchronization on every operation was probably one.
Therefore the best way to write an equivalent of the C++ initialization code in Java is :
List<Integer> list = new ArrayList<Integer>();
for (int i = 0; i < 10; i++){
list.add(new Integer());
}

The Javadoc for Vector.add(int, Object) pretty clearly states that an IndexOutOfBoundsException will be thrown if the index is less than zero or greater than or equal to the size. The Vector type grows as needed, and the constructor you've used sets the initial capacity, which is different than the size. Please read the linked Javadoc to better understand how to use the Vector type. Also, we java developers typically use a List type, such as ArrayList in situations where you would generally use a std::vector in C++.

Differences? You cannot compare how 2 languages do those. Normally Vector do use Stack data structure or LinkedList (or may be both). Which means, you put one item to the top, put another item on top of it, another item even on top of it, like wise. In LinkedList, it is bit different, you "pull" the value but the same thing. So in C++ it is better to use push_back() method.
C++ Vector objects are instantiated automatically. But in Java it is not, you have to fill it. I disagree with the way of filling it using l5.add(1, "Test");. Use l5.add("test").
Since you asked differences, you can define your object in this way as well
Vector a = new Vector();
That is without a type, in Java we call it without Generics. Possible since Java 1.6
Vector is now not widely used in Java. It has delays. We now move with ArrayList which is inside List interface.
Edit
variable names such as l5 are widely used in C++. But Java community expects more meaningful variable names :)

What is the need of collection framework in java?

What is the need of Collection framework in Java since all the data operations(sorting/adding/deleting) are possible with Arrays and moreover array is suitable for memory consumption and performance is also better compared with Collections.
Can anyone point me a real time data oriented example which shows the difference in both(array/Collections) of these implementations.

Arrays are not resizable.
Java Collections Framework provides lots of different useful data types, such as linked lists (allows insertion anywhere in constant time), resizeable array lists (like Vector but cooler), red-black trees, hash-based maps (like Hashtable but cooler).
Java Collections Framework provides abstractions, so you can refer to a list as a List, whether backed by an array list or a linked list; and you can refer to a map/dictionary as a Map, whether backed by a red-black tree or a hashtable.
In other words, Java Collections Framework allows you to use the right data structure, because one size does not fit all.

Several reasons:
Java's collection classes provides a higher level interface than arrays.
Arrays have a fixed size. Collections (see ArrayList) have a flexible size.
Efficiently implementing a complicated data structures (e.g., hash tables) on top of raw arrays is a demanding task. The standard HashMap gives you that for free.
There are different implementation you can choose from for the same set of services: ArrayList vs. LinkedList, HashMap vs. TreeMap, synchronized, etc.
Finally, arrays allow covariance: setting an element of an array is not guaranteed to succeed due to typing errors that are detectable only at run time. Generics prevent this problem in arrays.
Take a look at this fragment that illustrates the covariance problem:
String[] strings = new String[10];
Object[] objects = strings;
objects[0] = new Date(); // <- ArrayStoreException: java.util.Date

Collection classes like Set, List, and Map implementations are closer to the "problem space." They allow developers to complete work more quickly and turn in more readable/maintainable code.

For each class in the Collections API there's a different answer to your question. Here are a few examples.
LinkedList: If you remove an element from the middle of an array, you pay the cost of moving all of the elements to the right of the removed element. Not so with a linked list.
Set: If you try to implement a set with an array, adding an element or testing for an element's presence is O(N). With a HashSet, it's O(1).
Map: To implement a map using an array would give the same performance characteristics as your putative array implementation of a set.

It depends upon your application's needs. There are so many types of collections, including:
HashSet
ArrayList
HashMap
TreeSet
TreeMap
LinkedList
So for example, if you need to store key/value pairs, you will have to write a lot of custom code if it will be based off an array - whereas the Hash* collections should just work out of the box. As always, pick the right tool for the job.

Well the basic premise is "wrong" since Java included the Dictionary class since before interfaces existed in the language...
collections offer Lists which are somewhat similar to arrays, but they offer many more things that are not. I'll assume you were just talking about List (and even Set) and leave Map out of it.
Yes, it is possible to get the same functionality as List and Set with an array, however there is a lot of work involved. The whole point of a library is that users do not have to "roll their own" implementations of common things.
Once you have a single implementation that everyone uses it is easier to justify spending resources optimizing it as well. That means when the standard collections are sped up or have their memory footprint reduced that all applications using them get the improvements for free.
A single interface for each thing also simplifies every developers learning curve - there are not umpteen different ways of doing the same thing.
If you wanted to have an array that grows over time you would probably not put the growth code all over your classes, but would instead write a single utility method to do that. Same for deletion and insertion etc...
Also, arrays are not well suited to insertion/deletion, especially when you expect that the .length member is supposed to reflect the actual number of contents, so you would spend a huge amount of time growing and shrinking the array. Arrays are also not well suited for Sets as you would have to iterate over the entire array each time you wanted to do an insertion to check for duplicates. That would kill any perceived efficiency.

Arrays are not efficient always. What if you need something like LinkedList? Looks like you need to learn some data structure : http://en.wikipedia.org/wiki/List_of_data_structures

Java Collections came up with different functionality,usability and convenience.
When in an application we want to work on group of Objects, Only ARRAY can not help us,Or rather they might leads to do things with some cumbersome operations.
One important difference, is one of usability and convenience, especially given that Collections automatically expand in size when needed:
Collections came up with methods to simplify our work.
Each one has a unique feature:
List- Essentially a variable-size array;
You can usually add/remove items at any arbitrary position;
The order of the items is well defined (i.e. you can say what position a given item goes in in the list).
Used- Most cases where you just need to store or iterate through a "bunch of things" and later iterate through them.
Set- Things can be "there or not"— when you add items to a set, there's no notion of how many times the item was added, and usually no notion of ordering.
Used- Remembering "which items you've already processed", e.g. when doing a web crawl;
Making other yes-no decisions about an item, e.g. "is the item a word of English", "is the item in the database?" , "is the item in this category?" etc.
Here you find use of each collection as per scenario:

Collection is the framework in Java and you know that framework is very easy to use rather than implementing and then use it and your concern is that why we don't use the array there are drawbacks of array like it is static you have to define the size of row at least in beginning, so if your array is large then it would result primarily in wastage of large memory.
So you can prefer ArrayList over it which is inside the collection hierarchy.
Complexity is other issue like you want to insert in array then you have to trace it upto define index so over it you can use LinkedList all functions are implemented only you need to use and became your code less complex and you can read there are various advantages of collection hierarchy.

Collection framework are much higher level compared to Arrays and provides important interfaces and classes that by using them we can manage groups of objects with a much sophisticated way with many methods already given by the specific collection.
For example:
ArrayList - It's like a dynamic array i.e. we don't need to declare its size, it grows as we add elements to it and it shrinks as we remove elements from it, during the runtime of the program.
LinkedList - It can be used to depict a Queue(FIFO) or even a Stack(LIFO).
HashSet - It stores its element by a process called hashing. The order of elements in HashSet is not guaranteed.
TreeSet - TreeSet is the best candidate when one needs to store a large number of sorted elements and their fast access.
ArrayDeque - It can also be used to implement a first-in, first-out(FIFO) queue or a last-in, first-out(LIFO) queue.
HashMap - HashMap stores the data in the form of key-value pairs, where key and value are objects.
Treemap - TreeMap stores key-value pairs in a sorted ascending order and retrieval speed of an element out of a TreeMap is quite fast.
To learn more about Java collections, check out this article.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.