Java Iterated HashTable vs ArrayList speed

Java Iterated HashTable vs ArrayList speed - java

I am writing a simple 3D SW rendering engine. I have a default ArrayList<Object3D> containing the whole scene. Now, I want to be able to add, remove and select objects by name, like 3D editors do (because its MUCH more simple than mouse select, but still looking good in homework :) ).
So, the first thing I thought is to have Hashtable for name and index to scene ArrayList. But, then I thought I could just simply save the scene using Hashtable directly, and go through it to render using iterator.
So I want to ask, in a 3D engine, what is speed-preferable? Because I will for-loop the scene many times per second, compared to selecting object. Is ArrayList any faster than iterated Hashtable? Thanks.

First, I suggest you use a HashMap instead of a Hashtable, for the same reason that ArrayList is a better choice than a Vector: less overhead due to useless synchronization.
My guess is that iterating through an ArrayList will be faster than iterating through the Set returned by a Hashtable's (or HashMap's) entrySet() method. But the only way to know is to profile.
Obviously, changes to the display list (other than appending or chopping off the last element) are going to be faster for a HashMap than for an ArrayList.
EDIT
So I followed my own advice and benchmarked. Here's the code I used:
import java.util.*;
public class IterTest {
static class Thing {
Thing(String name) { this.name = name; }
String name;
}
static class ArrayIterTest implements Runnable {
private final ArrayList<Thing> list;
ArrayIterTest(ArrayList<Thing> list) {
this.list = list;
}
public void run() {
int i = 0;
for (Thing thing : list) {
++i;
}
}
}
static class ArraySubscriptTest implements Runnable {
private final ArrayList<Thing> list;
ArraySubscriptTest(ArrayList<Thing> list) {
this.list = list;
}
public void run() {
int i = 0;
int n = list.size();
for (int j = 0; j < n; ++j) {
Thing thing = list.get(j);
++i;
}
}
}
static class MapIterTest implements Runnable {
private final Map<String, Thing> map;
MapIterTest(Map<String, Thing> map) {
this.map = map;
}
public void run() {
int i = 0;
Set<Map.Entry<String, Thing>> set = map.entrySet();
for (Map.Entry<String, Thing> entry : set) {
++i;
}
}
}
public static void main(String[] args) {
final int ITERS = 10000;
final Thing[] things = new Thing[1000];
for (int i = 0; i < things.length; ++i) {
things[i] = new Thing("thing " + i);
}
final ArrayList<Thing> arrayList = new ArrayList<Thing>();
Collections.addAll(arrayList, things);
final HashMap<String, Thing> hashMap = new HashMap<String, Thing>();
for (Thing thing : things) {
hashMap.put(thing.name, thing);
}
final ArrayIterTest t1 = new ArrayIterTest(arrayList);
final ArraySubscriptTest t2 = new ArraySubscriptTest(arrayList);
final MapIterTest t3 = new MapIterTest(hashMap);
System.out.println("t1 time: " + time(t1, ITERS));
System.out.println("t2 time: " + time(t2, ITERS));
System.out.println("t3 time: " + time(t3, ITERS));
}
private static long time(Runnable runnable, int iters) {
System.gc();
long start = System.nanoTime();
while (iters-- > 0) {
runnable.run();
}
return System.nanoTime() - start;
}
}
And here are the results for a typical run:
t1 time: 41412897
t2 time: 30580187
t3 time: 146536728
Clearly using an ArrayList is a big win (by a factor of 3-4) over a HashMap, at least for my style of iterating through a HashMap. I suspect the reason the array iterator is slower than array subscripting is all the iterator objects that need to be created and then garbage-collected.
For reference, this was done with Java 1.6.0_26 (64-bit JVM) on an Intel 1.6GHz quad-core Windows machine with plenty of free memory.

I'm fairly sure that iterating through the ArrayList will be faster than iterating over the Hashtable. Not sure how significant the difference is, though -- maybe (thumb suck) 2x in the actual internal logic, but that's not much.
But note that, unless you need multithread synchronization, you should use a HashMap rather than a Hashtable. There's some performance to be gained there.

Actually, I looked at the current HashMap implementation (preferred over Hashtable as everyone points out). Iterating over the values looks like it's simply iterating through an underlying array.
So, speed will probably be comparable to iterating an ArrayList, though there may be some time skipping over gaps in the HashMaps underlying array.
All said, profiling is king.

A) don't use Hashtable, use HashMap. Hashtable is informally deprecated
B) That depends on the application. Lookup will be faster in the HashMap, Iteration will likely be the same as both use arrays internally. (but the arrays in a HashMap have gaps, so that might give a slight advantage to the ArrayList). Oh, and if you want to maintain a fixed order of iteration, use LinkedHashMap (sorted by insertion) or TreeMap (sorted by natural ordering)

As already said, it's better to use HashMap. Regarding to iteration, in theory, ArrayList has to be faster for two reasons. First is that data structure is simpler, which gives less access time. The second is that ArrayList can be iterated by index without creating Iterator object, which, in case of intense use, produce less garbage and therefore less gc.
In practice - you may not notice difference, depends how heavy you are going to use it.

Use java.util.HashMap instead of java.util.Hashtable if you don't need retrieval synchronization.

Related

best way to Iterate over a collection and array consecutively

Its a very trivial question and related to coding Style and I am just asking to make my coding style more readable
Suppose I have a Collection like linkedList and an Array and I need to iterate over both simultaneously.
currently the best way I know is to get a iterator over list and define a index variable outside the iterator loop and increment the index variable simultaneously to access both next elements {list and array}. Please refer the example below
LinkedList<Integer> list = new LinkedList<Integer>();
Integer[] arr = new Array[25];
// lets suppose both have 25 elements.
// My Iteration method will be
int index =0;
for (Integer val : list) {
System.out.println(val);
System.out.println(arr[index++]);
}
so is it the only way or is there any other way I can perform this iteration in more readable and more relatable manner, where I don't have to take index variable separately.
I know it can be possible that array might have less or more elements than collection but I am only talking about the cases where they have equal and we need to iterate over Both of them.
PS : anybody can write a code that a computer can understand, actual challenge is to write code which humans can understand easily.

What you have is essentially fine: it's simple, and simple can be sufficient to make code readable.
The only thing I would caution about is the side effect of index++ inside arr[index++]: if, say, you want to use the same value multiple times in the loop body, you couldn't simply copy+paste.
Consider pulling out a variable as the first thing in the loop to store the "current" array element (which is essentially what the enhanced for loop does for the list element).
for (Integer val : list) {
Integer fromArr = arr[index++];
// ...
}
Just to point out an alternative without having a separate variable for the index, you can use ListIterator, which provides you with the index of the element.
// Assuming list and are have same number of elements.
for (ListIterator<Integer> it = list.listIterator();
it.hasNext();) {
// The ordering of these statements is important, because next() changes nextIndex().
Integer fromArr = arr[it.nextIndex()];
Integer val = it.next();
// ...
}
ListIterator is not an especially widely-used class, though; its use may in and of itself be confusing.
One of the downsides of the ListIterator approach is that you have to use the it correctly: you shouldn't touch it inside the loop (after getting the values), you have to put the statements in the right order, etc.
Another approach would be to create a library method analogous to Python's enumerate:
static <T> Iterable<Map.Entry<Integer, T>> enumerate(Iterable<? extends T> iterable) {
return () -> new Iterator<T>() {
int index = 0;
Iterator<? extends T> delegate = iterable.iterator();
#Override public boolean hasNext() { return delegate.hasNext(); }
#Override public Map.Entry<Integer, T> next() {
return new AbstractMap.SimpleEntry<>(index++, delegate.next());
}
};
}
This returns an iterable of map entries, where the key is the index and the value is the corresponding value.
You could then use this in an enhanced for loop:
for (Map.Entry<Integer, Integer> entry : enumerate(list)) {
Integer fromList = entry.getValue();
Integer fromArr = arr[entry.getKey()];
}

One option is to have 2 iterators, but I don't think it is any clearer:
for (Iterator<Integer> i1 = list.iterator(), i2 = Arrays.asList(arr).iterator();
i1.hasNext() && i2.hasNext();) {
System.out.println(i1.next());
System.out.println(i2.next());
}
But it is more robust in that it finishes at the shorter of the 2 collections.

I tried to simplify and handle size wise collections where both need not be of the same size. I believe this would work even if the sizes are not same and just one loop would suffice. Code snippet below:
LinkedList<Integer> list = new LinkedList<Integer>();
Integer[] arr = new Array[25];
int maxLength= Math.max(list.size(),arr.size());
//Looping over the lengthy collection( could be Linkedlist or arraylist)
for(int i=0;i<maxLength;i++){
if(list.size()>i)
System.out.println(list[i]);
if(arr.size()>i)
System.out.println(arr[i]);
}
Hope this helps! Thanks

ConcurrentModificationException only in Java 1.8.0_45

I've got two question about this code:
import java.util.*;
public class TestClass {
private static List<String> list;
public static void main(String[] argv) {
list = generateStringList(new Random(), "qwertyuioasdfghjklzxcvbnmPOIUYTREWQLKJHGFDSAMNBVCXZ1232456789", 50, 1000);
// Collections.sort(list, new Comparator<String>() {
// public int compare(String f1, String f2) {
// return -f1.compareTo(f2);
// }
// });
for (int i = 0; i < 500; i++) {
new MyThread(i).start();
}
}
private static class MyThread extends Thread {
int id;
MyThread(int id) { this.id = id; }
public void run() {
Collections.sort(list, new Comparator<String>() {
public int compare(String f1, String f2) {
return -f1.compareTo(f2);
}
});
for (Iterator it = list.iterator(); it.hasNext();) {
String s = (String) it.next();
try {
Thread.sleep(10 + (int)(Math.random()*100));
}catch (Exception e) { e.printStackTrace(); }
System.out.println(id+" -> "+s);
}
}
}
public static List<String> generateStringList(Random rng, String characters, int length, int size)
{
List<String> list = new ArrayList<String>();
for (int j = 0; j < size; j++) {
char[] text = new char[length];
for (int i = 0; i < length; i++)
{
text[i] = characters.charAt(rng.nextInt(characters.length()));
}
list.add(new String(text));
}
return list;
}
}
Running this code on java 1.8.0_45 i got java.util.ConcurrentModificationException.
1) Why I got the exception also if I decomment the sort before the thread.start?
2) Why I only got the exception on java 1.8.0_45? On 1.6.0_45, 1.7.0_79, 1.8.0_5 it works fine.

#nbokmans already nailed the general reason why you get that exception. However, it's true that this seems to be version dependant. I'll fill in why you get that in java 8.0_45 but not 1.6.0_45, 1.7.0_79, 1.8.0_5.
This is due to fact that Collections.sort() was changed in java 8.0_20. There's an in-depth article about it here. In the new version, sort, according to the article, is like this:
public void sort(Comparator<? super E> c) {
final int expectedModCount = modCount;
Arrays.sort((E[]) elementData, 0, size, c);
if (modCount != expectedModCount) {
throw new ConcurrentModificationException();
}
modCount++;
}
Like the article explains:
Contrary to the old Collections.sort, this implementation modifies the
modCount of the collection (line 7 above) once the list has been
sorted, even if the structure itself didn’t really change (still the
same number of elements).
So it will do an internal change even if the collection is already sorted, whereas before that change it didn't do that. That's why you're getting an exception now.
The actual fix is to not to sort a collection using multiple threads at the same time. You shouldn't do that.

A ConcurrentModificationException is thrown by methods that have detected concurrent (i.e. in a separate thread) modification of an object when such modification is not permissible.
The reason you're getting this exception is because you are modifying (sorting) the collection in a separate thread and iterating it.
I quote from the ConcurrentModificationException javadoc:
For example, it is not generally permissible for one thread to modify a Collection while another thread is iterating over it. In general, the results of the iteration are undefined under these circumstances.
Source
In your code, you are starting 500 threads that each sort and iterate over the list.
Try sorting the list before you start your threads, and remove the call to Collections#sort from MyThread's #run().

With Java 8, the Collections::sort method was reimplemented to delegate to the List::sort method. This way, a list can implement a more efficient sort algorithm if this is possible for the given implementation. For example, an ArrayList can use its random access property to implement a more effcient sorting algorithm than a LinkedList without random access.
The current implementation for ArrayList::sort explicitly checks for modifications as the implementation is defined within the class and is capable of accessing iternal properties.
Before Java 8, the Collections::sort method had to implement the actual sorting itself and could not delegate. Of course, the implementation could not access any internal properties of the specific list. The more generic sorting was implemented as follows:
public static <T> void sort(List<T> list, Comparator<? super T> c) {
Object[] a = list.toArray();
Arrays.sort(a, (Comparator)c);
ListIterator i = list.listIterator();
for (int j=0; j<a.length; j++) {
i.next();
i.set(a[j]);
}
}
The implementation first extracts a copy of elements and delegates the sorting to the implementation of Arrays::sort. This can not cause the observed exception as the sorting is conducted on a non-shared copy of elements. Later, the elements are updated element by element according to the sorted array by using a ListIterator.
For an ArrayList, the ArrayList and its iterator keep track of the number of structural modifications, i.e. modifications that change the size of the list. If those numbers diverge for the iterator and the list, the iterator can know that the list was modified outside of its own iteration. It is however not capable of discovering that the elements of a list were altered as it happens for the Collections::sort implementation.
The contract of a ArrayList does however not permit concurrent modifications in its contract. Despite the sorting not failing before Java 8, applying the sorting could lead to incorrect results. Since Java 8, it is however for the first time that this is discovered by the implementation.

You're getting this exception because you have separate threads modifying and iterating the list at the same time.
The commented out sort is not causing the problem. The CME is caused by the sort and the iteration inside the thread. Since you have multiple threads sorting and iterating, you're getting a CME. This is not dependent on the Java version.
It looks like your threads don't need to modify the list, so you can perform the sort once before your loop that creates threads, then remove if from the thread.

Fastest way to remove a Collection of Longs from another in Java

I have two collections of Long type. Both of size 20-30 million. What is the quickest way to remove from one, those that are common in the second? Lesser the heap space taken, the better, as there are other things going on in parallel.
I know LinkedList is better than ArrayList for removals using Iterator, but I'm just not sure if I need to iterate over each element. I want to poll for any better approaches, both Collections are sorted.
Edit: I previously stated my collection sizes as 2-3 million, I realized it is 20-30 million.
There will be lots of overlaps. The exact type of Collections is open to debate as well.

With counts in the range of millions, solutions with O(n2) complexity should be out. You have two basic solutions here:
Sort the second collection, and use binary search for an O((N+M)*logM) solution, or
Put elements from the second collection into a hash container, for an O(N+M) solution
Above, N is the number of elements in the first collection, and M is the number of elements in the second collection.
Set<Long> toRemove = new HashSet<Long>(collection2);
Iterator<Long> iter = collection1.iterator();
while (iter.hasNext()) {
if (toRemove.contains(iter.next())) {
iter.remove();
}
}
Note that if collection1 is an ArrayList, this will be very slow. If you must keep it an ArrayList, you can do it like this:
int rd = 0, wr = 0;
// Copy the elements you are keeping into a contiguous range
while (rd != arrayList1.size()) {
Long last = arrayList1.get(rd++);
if (!toRemove.contains(iter.next()) {
arrayList1.put(wr++, last);
}
}
// Remove "tail" elements
while (rd > wr) {
arrayList1.remove(--wr);
}

Without growing heap.
Collection<Long> a = new HashSet<Long>();
//fill a
Collection<Long> b = new ArrayList<Long>();
//fill b
for(int i = 0; i < b.size(); i++){
a.remove(b.get(i));
}
b.size() and b.get(int i) runs in constant time according to Oracles Javadoc.
Also a.remove(O o) runs in constant time.

First port of call would be the Collection.removeAll method. This uses no extra heap space, and its time complexity is dependent on the performance of the contains method on your second collection. If your second collection is a TreeSet then a.removeAll(b) takes O(n . log(m)) time (where n is the size of a and m is the size of b), if b is a HashSet then it takes O(n) time, if b is a sorted ArrayList then it's O(nm), but you can create a new wrapper Collection that uses a binary search to reduce it to O(n . log(m)) for negligible constant memory cost:
private static class SortedList<T extends Comparable<? super T>> extends com.google.common.collect.ForwardingList<T>
{
private List delegate;
public SortedList(ArrayList<T> delegate)
{
this.delegate = delegate;
}
#Override
protected List<T> delegate()
{
return delegate;
}
#Override
public boolean contains(Object object)
{
return Collections.binarySearch(delegate, (T) object) >= 0;
}
}
static <E extends Comparable<? super E>> void removeAll(Collection<E> a, ArrayList<E> b)
{
//assumes that b is sorted
a.removeAll(new SortedList<E>(b));
}

You should take a look at Apache Common Collections
I tested it with LinkedList containing ~3M Longs, it gives pretty good results :
Random r = new Random();
List<Long> list1 = new LinkedList<Long>();
for (int i = 0; i < 3000000; i++) {
list1.add(r.nextLong());
}
List<Long> list2 = new LinkedList<Long>();
for (int i = 0; i < 2000000; i++) {
list2.add(r.nextLong());
}
Collections.sort(list1);
Collections.sort(list2);
long time = System.currentTimeMillis();
list3 = ListUtils.subtract(list2, list1);
System.out.println("listUtils.intersection = " + (System.currentTimeMillis() - time));
I can't ensure you this is the best solution, but it is as easy one.
I get an execution time equals to :
1247 ms
Inconvenient : it creates a new List

Filling Lists with Default Values from final static Lists

The title is probably not the best, I apologize for that.
I have several final static Lists I am using to define defaults for database values. The default list of values should never change, as such when populating them, I use Collections.nCopies(int,T) to obtain an immutable List. These Lists are then used to populate lists in another class with defaults. The values in these Lists are expected to change.
Pseudocode for the class of defaults:
public final class FooDefaults {
public final static List<Integer> LIST_ONE;
public final static List<String> LIST_TWO;
//This map allows easier access to "column" values.
public final static List<Map<String,String>> LIST_THREE;
static {
LIST_ONE = Collections.nCopies(7, 5);
LIST_TWO = Collections.nCopies(10, "boo");
Map<String, String> temp = new java.util.LinkedHashMap<>();
for(int i=0;i<15;i++) {
temp.put(("Param"+i),"foo");
}
LIST_THREE = Collections.nCopies(10, temp);
}
}
Pseudocode for the class of editable values:
public class Foo {
//Keep the reference from changing.
//Prevents an accidental new.
private final List<Integer> listOne;
private final List<String> listTwo;
private final List<Map<String,String>> listThree;
public Foo() {
listOne = new java.util.ArrayList<>(FooDefaults.listOne);
listTwo = new java.util.ArrayList<>(FooDefaults.listTwo);
listThree = new java.util.ArrayList<>(FooDefaults.listThree);
}
}
My concern is that as I have performed a shallow copy on these lists, changes in the lists in Foo, will be visible in the Lists in FooDefaults.
This post: https://stackoverflow.com/a/1685158/1391956 suggests that due to Strings and Integers being immutable, I need not worry about accidentally overwriting the values in FooDefaults.LIST_ONE and FooDefaults.LIST_TWO.
Thus, my primary concern are the values contained within the maps in FooDefaults.LIST_THREE. If I change the values in the maps in Foo's listThree, will the change be visible in FooDefaults?
If so, what would be the most efficient way to handle this? Class Foo is likely to be instantiated over a thousand times and added to a List in another class, thus speed will potentially be an issue.
I originally created the final static lists in FooDefaults in the interest of speed, as it is my (probably incorrect) assumption that creating those Lists in FooDefaults and simply copying the data would be faster than creating them every time Foo is instantiated.
EDIT: If I must perform a Deep Copy I plan on using something similar to:
public static final List<Map<String, String>> getListThreeCopy() {
Map<String,String> temp = new java.util.LinkedHashMap<>();
for(Map.Entry<String, String> entry: LIST_THREE.get(0).entrySet()) {
temp.put(entry.getKey(),entry.getValue());
}
List<Map<String,String>> rtnList = new java.util.ArrayList<>();
for(int i=0;i<LIST_THREE.size();i++) {
rtnList.add(temp);
}
return rtnList;
}
Would there be a faster way?

In the end, I used the code mentioned at the end of my post in a loop to perform a Deep Copy 1,500 times and timed it. I did the same with the instantiation code.
As I somewhat expected, recreating the List from scratch is much faster than a deep copy.
16 milliseconds with the Deep Copy versus 0 milliseconds with instantiation. I timed it using System.currentTimeMillis().
So long as I only create it in a static function, I have little reason to worry about errors.

methods in foreach and for loops in java

My question is regarding optimization in java using the Android compiler. Will map.values() in the following be called every iteration, or will the Android compiler optimize it out.
LinkedHashMap<String, Object> map;
for (Object object : map.values())
{
//do something with object
}
Likewise here is another example. will aList.size() be called every iteration?
List<Object> aList;
for (int i = 0; i < aList.size(); i++)
{
object = aList.get(i);
//do something with i
}
And after all this, does it really matter if it calls the methods every iteration? Does Map.values(), and List.size() do much of anything?

In the first example, map.values() will be evaluated once. According to the Section 14.4.2 of the Java Language Specification, it is equivalent to:
for (Iterator<Object> i = map.values().iterator(); i.hasNext(); ) {
Object object = i.next();
// do something with object
}
In the second, aList.size() will be called every time the test is evaluated. For readability, it would be better to code it as:
for (Object object : aList) {
// do something with object
}
However, per the Android docs, this will be slower. Assuming that you aren't changing the list size inside the loop, the fastest another way would be to pull out the list size ahead of the loop:
final int size = aList.size();
for (int i = 0; i < size; i++)
{
object = aList.get(i);
//do something with i
}
This will be substantially faster (the Android docs linked to above say by a factor of 3) if aList happens to be an ArrayList, but is likely to be slower (possibly by a lot) for a LinkedList. It all depends on exactly what kind of List implementation class aList is.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Iterated HashTable vs ArrayList speed - java

Use java.util.HashMap instead of java.util.Hashtable if you don't need retrieval synchronization.

Related

best way to Iterate over a collection and array consecutively

ConcurrentModificationException only in Java 1.8.0_45

Fastest way to remove a Collection of Longs from another in Java

Filling Lists with Default Values from final static Lists

methods in foreach and for loops in java

Categories

Resources