Why does AbstractCollection.toArray() handle the case of changed size? - java

There's this strange code in AbstractCollection:
public Object[] toArray() {
// Estimate size of array; be prepared to see more or fewer elements
Object[] r = new Object[size()];
Iterator<E> it = iterator();
for (int i = 0; i < r.length; i++) {
if (! it.hasNext()) // fewer elements than expected
return Arrays.copyOf(r, i);
r[i] = it.next();
}
return it.hasNext() ? finishToArray(r, it) : r;
}
The part "be prepared to see more or fewer elements" is IMHO a pure non-sense:
In case the collection changes in the meantime, the iterator throw a ConcurrentModification exception anyway.
I haven't found any non-concurrent subclass supporting this, especially
the ArrayList uses Arrays.copyOf(elementData, size) which could (due to visibility issues) copy a bunch of nulls instead of the data in case of resizing,
the LinkedList throws an ArrayIndexOutOfBoundsException if you're lucky enough.
Am I overlooking something?
Would you support this feature in your collection (meant for general use)?

From the JAVA DOC of toArray()
This implementation returns an array containing all the elements
returned by this collection's iterator, in the same order, stored in
consecutive elements of the array, starting with index 0. The length
of the returned array is equal to the number of elements returned by
the iterator, even if the size of this collection changes during
iteration, as might happen if the collection permits concurrent
modification during iteration.The size method is called only as an
optimization hint; the correct result is returned even if the iterator
returns a different number of elements.

Related

add element to limited size list

I have the following method which adds an element to a size limited ArrayList. If the size of the ArrayList exceeds, previous elements are removed (like FIFO = "first in first out") (version 1):
// adds the "item" into "list" and satisfies the "limit" of the list
public static <T> void add(List<T> list, final T item, int limit) {
var size = list.size() + 1;
if (size > limit) {
var exeeded = size - limit;
for (var i = 0; i < exeeded; i++) {
list.remove(0);
}
}
list.add(item);
}
The "version 1"-method works. However, I wanted to improve this method by using subList (version 2):
public static <T> void add(List<T> list, final T item, int limit) {
var size = list.size() + 1;
if (size > limit) {
var exeeded = size - limit;
list.subList(0, exeeded).clear();
}
list.add(item);
}
Both methods works. However, I want to know if "version 2" is also more performant than "version 1".
EDIT:
improved "Version 3":
public static <T> void add(List<T> list, final T item, int limit) {
var size = list.size() + 1;
if (size > limit) {
var exeeded = size - limit;
if (exeeded > 1) {
list.subList(0, exeeded).clear();
} else {
list.remove(0);
}
}
list.add(item);
}
It seems you have the ArrayList implementation in mind where remove(0) imposes the cost of copying all remaining elements in the backing array, repeatedly if you invoke remove(0) repeatedly.
In this case, using subList(0, number).clear() is a significant improvement, as you’re paying the cost of copying elements only once instead of number times.
Since the copying costs of remove(0) and subList(0, number).clear() are identical when number is one, the 3rd variant would save the cost of creating a temporary object for the sub list in that case. This, however is a tiny impact that doesn’t depend on the size of the list (or any other aspect of the input) and usually isn’t worth the more complex code. See also this answer for a discussion of the costs of a single temporary object. It’s even possible that the costs of the sub list construction get removed by the JVM’s runtime optimizer. Hence, such a conditional should only be used when you experience an actual performance problem, the profiler traces the problem back to this point, and benchmarks prove that the more complicated code has a positive effect.
But this is all moot when you use an ArrayDeque instead. This class has no copying costs when removing its head element, hence you can simply remove excess elements in a loop.
Question 1: The problem is this line:
list = list.subList(exeeded, list.size());
You're reassigning the variable list which will not change to object passed as an argument but only its local counterpart.
Question 2: The sublist will (on an array list) still need to recreate the array at some point. If you don't want that you could use a LinkedList. But as a general rule the ArrayList will still perform better on the whole. Since the underlying array only has to be recreated when exceeding the maximum capacity it usually doesn't matter a lot.
You could also try to actually shift the array, move every element to the next slot in the array. That way you would have to move all elements when a new one is added but don't need to recreate the array. So you avoid the trip to the heap which is usually the biggest impact on performance.

Why are ArrayList created with empty elements array but HashSet with null table?

Maybe a bit of a philosophical question.
Looking at java's ArrayList implementation I noticed that when creating a new instance, the internal "elementData" array (that holds the items) is created as new empty array:
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};
public ArrayList() {
this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}
However, a HashSet (that is based on a HashMap) is created with the table and entreySet are just left null;
transient Node<K,V>[] table;
transient Set<Map.Entry<K,V>> entrySet;
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
This got me thinking so I went and looked up C#'s List and HashSet:
https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,61f6a8d9f0c40f6e
https://referencesource.microsoft.com/#System.Core/System/Collections/Generic/HashSet.cs,2d265edc718b158b
List:
static readonly T[] _emptyArray = new T[0];
public List() {
_items = _emptyArray;
}
HashSet:
private int[] m_buckets;
public HashSet()
: this(EqualityComparer<T>.Default) { }
public HashSet(IEqualityComparer<T> comparer) {
if (comparer == null) {
comparer = EqualityComparer<T>.Default;
}
this.m_comparer = comparer;
m_lastIndex = 0;
m_count = 0;
m_freeList = -1;
m_version = 0;
}
So, is there a good reason why both languages picked empty for list and null for set/map?
They both used the "single instance" for the empty array trick, which is nice, but why not just have a null array?
Answering from a C# perspective.
For an empty ArrayList, you'll find that all the logic (get, add, grow, ...) works "as-is" if you have an empty array as backing store. No need for additional code to handle the uninitialized case, this makes the whole implementation neater. And since the empty array is cached, this does not result in an additional heap allocation, so you get the cleaner code at no extra cost.
For HashSet this is not possible, as accessing a bucket is done through the formula hashCode % m_buckets.Length. Trying to compute % 0 is considered as a division by 0, and therefore invalid. This means you need to handle specifically the "not initialized" case, so you gain nothing from pre-assigning the field with an empty array.
Initializing elementData to an empty array in ArrayList allows to avoid a null check in the grow(int minCapacity) method, which calls:
elementData = Arrays.copyOf(elementData, newCapacity);
to increase the capacity of the backing array. When that method is first called, that statement will "copy" the empty array to the start of the new array (actually it will copy nothing).
In HashMap a similar strategy wouldn't be useful, since when you re-size the array of buckets, you don't copy the original array to the start of the new array, you have to go over all the entries and find the new bucket of each entry. Therefore initialing the buckets array to an empty array instead of keeping it null will require you to check if the array's length == 0 instead of checking whether it's null. Replacing one condition with another wouldn't be useful.

Handling duplicates when shuffling an array

I wish to shuffle an array with duplicate elements. Using the shuffle method from Collections, how does it handle duplicates? I don't want two duplicates swapping each other. Thanks
All that is guaranteed about the behavior of the method is in the Javadoc.
The current implementation chooses swaps randomly, without regard to the content of the list. The general contract states that the ideal is for all permutations to be equally likely, so I would not anticipate it ever going to an implementation that requires an element to move, much less requires that it be swapped with an element of differing value. In general that's what shuffling is - random order, which can just as well (some small percentage of the time) mean "the same order that came in (or an equivalent order)". And the shuffle() method addresses the general case.
If you need every element to be swapped with an element of differing value, you can of course write your own method to do that. Beware that a naive implementation could fall into an infinite loop if there are too many duplicates relative to the size of the collection.
This is the method for Collections#shuffle(List, Random) (A pastebin including the documentation can be found here:
#SuppressWarnings({"rawtypes", "unchecked"})
public static void shuffle(List<?> list, Random rnd) {
int size = list.size();
if (size < SHUFFLE_THRESHOLD || list instanceof RandomAccess) {
for (int i=size; i>1; i--)
swap(list, i-1, rnd.nextInt(i));
} else {
Object arr[] = list.toArray();
// Shuffle array
for (int i=size; i>1; i--)
swap(arr, i-1, rnd.nextInt(i));
// Dump array back into list
// instead of using a raw type here, it's possible to capture
// the wildcard but it will require a call to a supplementary
// private method
ListIterator it = list.listIterator();
for (int i=0; i<arr.length; i++) {
it.next();
it.set(arr[i]);
}
}
}
This is the overloaded variant of Collections#shuffle(List).
The difference is that you can pass your own Random object if you want to seed it yourself.
As you can see it does not look at the values in each array slot. You could try to override this method and include a check for duplicates.
On a side note: Try checking the JavaDocs for these kind of questions. If you are unsure how a method works, just google the class + method name, or use the local Java source code on ur computer.

ArrayList.remove gives different result when called as Collection.remove

This code:
Collection<String> col = new ArrayList<String>();
col.add("a");
col.add("b");
col.add("c");
for(String s: col){
if(s.equals("b"))
col.remove(1);
System.out.print(s);
}
prints: abc
Meanwhile this one:
ArrayList<String> col = new ArrayList<String>();
col.add("a");
col.add("b");
col.add("c");
for(String s: col){
if(s.equals("b"))
col.remove(1);
System.out.print(s);
}
prints: ab
However it should print the same result...
What's the problem?
Collection has only boolean remove(Object o) method, which removes the passed object if found.
ArrayList also has public E remove(int index), which can remove an element by its index.
Your first snippet calls boolean remove(Object o), which doesn't remove anything, since your ArrayList doesn't contain 1. Your second snippet calls public E remove(int index) and removes the element whose index was 1 (i.e. it removes "b").
The different behavior results from the fact that method overload resolution occurs at compile time and depends on the compile time type of the variable for which you are calling the method. When the type of col is Collection, only remove methods of the Collection interface (and methods inherited by that interface) are considered for overloading resolution.
If you replace col.remove(1) with col.remove("b"), both snippets would behave the same.
As Tamoghna Chowdhury commented, boolean remove(Object o) can accept a primitive argument - int in your case - due to auto-boxing of the int to an Integer instance. For the second snippet, the reason public E remove(int index) is chosen over boolean remove(Object o) is that the method overloading resolution process first attempts to find a matching method without doing auto-boxing/unboxing conversions, so it only considers public E remove(int index).
To safely remove from a Collection while iterating over it, you should use an Iterator.
ArrayList<String> col = new ArrayList<String>();
col.add("a");
col.add("b");
col.add("c");
Iterator<String> i = col.iterator();
while (i.hasNext()) {
String s = i.next(); // must be called before you can call remove
if(s.equals("b"))
i.remove();
System.out.print(s);
}
Regarding, the reason why removal from collection is not working for you while the ArrayList worked is because of the following:
The java.util.ArrayList.remove(int index) method removes the element at the specified position in this list. Shifts any subsequent elements to the left (subtracts one from their indices). Hence, this one worked for you.
The java.util.Collection.remove(Object o) method removes a single instance of the specified element from this collection, if it is present (it is an optional operation). More formally, removes an element e such that (o==null ? e==null : o.equals(e)), if this collection contains one or more such elements. Returns true if this collection contained the specified element (or equivalently, if this collection changed as a result of the call).
Hope, this helps.
Both snippets are broken in different ways!
Case 1 (with Collection<String> col):
Since a Collection is unindexed, the only remove method its interface exposes is Collection.remove(Object o), which removes the specified equal object. Doing col.remove(1); first calls Integer.valueOf(1) to get an Integer object, then asks the list to remove that object. Since the list does not contain such any Integer objects, nothing is removed. Iteration continues normally through the list and abc is printed out.
Case 2 (with ArrayList<String> col):
When col's compile-time type is ArrayList, calling col.remove(1); instead invokes the method ArrayList.remove(int index) to remove the element at the specified position, thus removing b.
Now, why isn't c printed out? In order to loop over a collection with the for (X : Y) syntax, it behind the scenes calls the collection to get an Iterator object. For the Iterator returned by an ArrayList (and most collections) it is not safe to perform structural modifications to the list during iteration – unless you modify it through the methods of the Iterator itself – because the Iterator will become confused and lose track of which element to return next. That can result in elements being iterated multiple times, elements being skipped, or other errors. That's what happens here: element c is present in the list but never printed out because you confused the Iterator.
When an Iterator can detect this problem has happened it will warn you by throwing a ConcurrentModificationException. However, the check that an Iterator does for the problem is optimized for speed, not 100% correctness, and it doesn't always detect the problem. In your code if you change s.equals("b") to s.equals("a") or s.equals("c"), it does throw the exception (although this may be dependent on the particular Java version). From the ArrayList documentation:
The iterators returned by this class's iterator and listIterator methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis.
To remove elements during iteration, you must change the for (X : Y)-style of loop into a manual loop over an explicit Iterator, using its remove method:
for (Iterator<String> it = col.iterator(); it.hasNext();) {
String s = it.next();
if (s.equals("b"))
it.remove();
System.out.print(s);
}
This is now completely safe. It will iterate all elements exactly once (printing abc), while element b will be removed.
If you want, you can achieve the same effect without an Iterator using an int i-style loop, if you carefully adjust the index after any removals:
for (int i = 0; i < col.size(); i++) {
String s = col.get(i);
if (s.equals("b")) {
col.remove(i);
i--;
}
System.out.print(s);
}

Java Collections.sort() missing ConcurrentModificationException

I stumbled over this odd bug. Seems like Collections.sort() does not modify the sorted list in a way that enables a detection of concurrent modifications when also iterating over the same list. Example code:
List<Integer> my_list = new ArrayList<Integer>();
my_list.add(2);
my_list.add(1);
for (Integer num : my_list) {
/*
* print list
*/
StringBuilder sb = new StringBuilder();
for (Integer i : my_list)
sb.append(i).append(",");
System.out.println("List: " + sb.toString());
/*
* sort list
*/
System.out.println("CurrentElement: " + num);
Collections.sort(my_list);
}
outputs
List: 2,1,
CurrentElement: 2
List: 1,2,
CurrentElement: 2
One would expect a ConcurrentModificationException, but it is not being raised and the code works although it shouldn't.
Why would it throw ConcurrentModificationException when you are not adding/removing elements from your collection while iterating?
Note that ConcurrentModificationException would only occur when a new element is added in to your collection or remove from your collection while iterating. i.e., when your Collection is Structurally modified.
(Structural modifications are those that change the size of this list,
or otherwise perturb it in such a fashion that iterations in progress
may yield incorrect results.)
sort wouldn't structurally modify your Collection, all it does is modify the order.
Below code would throw ConcurrentModificationException as it add's an extra element into the collection while iterating.
for(Integer num : my_list) {
my_list.add(12);
}
If you look at the source of sort method in Collections class, its not throwing ConcurrentModificationException.
This implementation dumps the specified list into an array, sorts the
array, and iterates over the list resetting each element from the
corresponding position in the array. This avoids the n2 log(n)
performance that would result from attempting to sort a linked list in
place.
public static <T extends Comparable<? super T>> void sort(List<T> list) {
Object[] a = list.toArray();
Arrays.sort(a);
ListIterator<T> i = list.listIterator();
for (int j=0; j<a.length; j++) {
i.next();
i.set((T)a[j]);
}
}
Extract from the book java Generics and Collections:
The policy of the iterators for the Java 2 collections is to fail
fast, as described in Section 11.1: every time they access the backing
collection, they check it for structural modification (which, in
general, means that elements have been added or removed from the
collection). If they detect structural modification, they fail
immediately, throwing ConcurrentModificationException rather than
continuing to attempt to iterate over the modified collection with
unpredictable results.
Speaking of functionality I don't see why it should not throw ConcurrentModificationException. But according to documentation the iterator throws the exception when it notices structural modification and structural modification is defined as:
Structural modifications are those that change the size of the list,
or otherwise perturb it in such a fashion that iterations in progress
may yield incorrect results.
I think there is an argument for claiming that sort rearranging the elements causes the iterator to yield wrong results, but I haven't checked what are right results for iterator defined to be.
Speaking of implementation, it is easy to see why it does not: See the source for ArrayList and Collections:
ArrayList.modCount changes with the so called structural modifications
ListItr methods make a copy of its value in init and check that it hasn't changed in its methods
Collections.sort calls ListItr.set which calls ArratList.set. This last method does not increment modCount
So ListItr.next() sees the same modCount and no exception is thrown.
For Android, it depends on API versions. From API 26, Collections#sort(List<T>, Comparator<? super T>) actually calls List#sort(Comparator<? super E>). So, if you sort ArrayList, you can get ConcurrentModificationException depending on whether you've modified the list in another thread. Here's the source code from java/util/ArrayList.java that throws the exception:
public void sort(Comparator<? super E> c) {
final int expectedModCount = modCount;
Arrays.sort((E[]) elementData, 0, size, c);
if (modCount != expectedModCount) {
throw new ConcurrentModificationException();
}
modCount++;
}

Categories

Resources