My partner and I are attempting to program a LinkedList data structure. We have completed the data structure, and it functions properly with all required methods. We are required to perform a comparative test of the runtimes of our addFirst() method in our LinkedList class vs. the add(0, item) method of Java's ArrayList structure. The expected complexity of the addFirst() method for our LinkedList data structure is O(1) constant. This held true in our test. In timing the ArrayList add() method, we expected a complexity of O(N), but we again received a complexity of approximately O(1) constant. This appeared to be a strange discrepancy since we are utilizing Java's ArrayList. We thought there may be an issue in our timing structure, and we would be most appreciative if any one could help us identify our problem. Our Java code for the timing of both methods is listed below:
public class timingAnalysis {
public static void main(String[] args) {
//timeAddFirst();
timeAddArray();
}
public static void timeAddFirst()
{
long startTime, midTime, endTime;
long timesToLoop = 10000;
int inputSize = 20000;
MyLinkedList<Long> linkedList = new MyLinkedList<Long>();
for (; inputSize <= 1000000; inputSize = inputSize + 20000)
{
// Clear the collection so we can add new random
// values.
linkedList.clear();
// Let some time pass to stabilize the thread.
startTime = System.nanoTime();
while (System.nanoTime() - startTime < 1000000000)
{ }
// Start timing.
startTime = System.nanoTime();
for (long i = 0; i < timesToLoop; i++)
linkedList.addFirst(i);
midTime = System.nanoTime();
// Run an empty loop to capture the cost of running the loop.
for (long i = 0; i < timesToLoop; i++)
{} // empty block
endTime = System.nanoTime();
// Compute the time, subtract the cost of running the loop from
// the cost of running the loop and computing the removeAll method.
// Average it over the number of runs.
double averageTime = ((midTime - startTime) - (endTime - midTime)) / timesToLoop;
System.out.println(inputSize + " " + averageTime);
}
}
public static void timeAddArray()
{
long startTime, midTime, endTime;
long timesToLoop = 10000;
int inputSize = 20000;
ArrayList<Long> testList = new ArrayList<Long>();
for (; inputSize <= 1000000; inputSize = inputSize + 20000)
{
// Clear the collection so we can add new random
// values.
testList.clear();
// Let some time pass to stabilize the thread.
startTime = System.nanoTime();
while (System.nanoTime() - startTime < 1000000000)
{ }
// Start timing.
startTime = System.nanoTime();
for (long i = 0; i < timesToLoop; i++)
testList.add(0, i);
midTime = System.nanoTime();
// Run an empty loop to capture the cost of running the loop.
for (long i = 0; i < timesToLoop; i++)
{} // empty block
endTime = System.nanoTime();
// Compute the time, subtract the cost of running the loop from
// the cost of running the loop and computing the removeAll method.
// Average it over the number of runs.
double averageTime = ((midTime - startTime) - (endTime - midTime)) / timesToLoop;
System.out.println(inputSize + " " + averageTime);
}
}
}
You want to test for different inputSize, but you perform the operation to test timesToLoop times, which is constant. So of course, it takes the same time. You should use:
for (long i = 0; i < inputSize; i++)
testList.add(0, i);
As per my knowledge, Arraylist add operaion runs in O(1) time, so the results of your experiment are correct. I think the constant time for arrayList add method is amortized constant time.
As per java doc :
adding n elements require O(N) time so that is why the amortized constant time for adding.
Related
Consider the code snippets below and the time taken to execute them -
public static void main(String[] args) {
Long startTime = System.currentTimeMillis();
long sum = 0L;
for(int i = 0; i< Integer.MAX_VALUE; i++){
sum+=i;
}
Long timeDiff = (System.currentTimeMillis() - startTime) / 1000;
System.out.println("Time Difference : " + timeDiff + "secs");
}
Output -
Time Difference : 0secs
public static void main(String[] args) {
Long startTime = System.currentTimeMillis();
Long sum = 0L;
for(int i = 0; i< Integer.MAX_VALUE; i++){
sum+=i;
}
Long timeDiff = (System.currentTimeMillis() - startTime) / 1000;
System.out.println("Time Difference : " + timeDiff + "secs");
}
Output -
Time Difference : 8secs
public static void main(String[] args) {
Long startTime = System.currentTimeMillis();
Long sum = 0L;
for(Long i = 0L; i< Integer.MAX_VALUE; i++){
sum+=i;
}
Long timeDiff = (System.currentTimeMillis() - startTime) / 1000;
System.out.println("Time Difference : " + timeDiff + "secs");
}
Output -
Time Difference : 16secs
As per my understanding, it's happening because of every time object creation of Long Object, I am not sure how exactly this is happening. Tried looking into byte code didn't help much.
Help me understand how exactly things are internally happening?
Thanks in advance!
The "++" and "+=" operators are only defined for primitives.
Hence, when you apply them to a Long, an unboxing must take place before the operator is evaluated and then a boxing must take place to store the result.
The boxing probably costs more than the unboxing, since unboxing requires just a method call, while boxing requires object instantiation.
Each boxing involves the creation of a Long instance. Your loop has Integer.MAX_VALUE iterations, so the second loop creates over 2 billion Long objects (one for each sum+=i operation) while the third loop creates over 4 billion Long objects (one for each i++ operation and one for each sum+=i operation). These objects have to be instantiated and later garbage collected. That costs time.
Plausible causes:
- Too many object creations leading to GC activity at times.
- Too many boxing and unboxing of Wrapper to primitive and vice versa.
So I'm still fairly new to programming and I'm just wondering if I'm doing these benchmarks correctly. For queue I'm basically giving it a list filled with integers, and I would time how long it would take for it to find a number on the list. As for the HashMap it's basically the same idea, I would time how long it would take to get a number from the list. Also for both of them I would also time how long it would take for them to remove the contents of the list. Any help on this would be appreciated. Thank you!
// Create a queue, and test its performance
PriorityQueue<Integer> queue = new PriorityQueue <> (list);
System.out.println("Member test time for Priority Queue is " +
getTestTime(queue) + " milliseconds");
System.out.println("Remove element time for Priority Queue is " +
getRemoveTime(queue) + " milliseconds");
// Create a hash map, and test its performance
HashMap<Integer, Integer> newmap = new HashMap<Integer, Integer>();
for (int i = 0; i <N;i++) {
newmap.put(i, i);
}
System.out.println("Member test time for hash map is " +
getTestTime1(newmap) + " milliseconds");
System.out.println("Remove element time for hash map is " +
getRemoveTime1(newmap) + " milliseconds");
}
public static long getTestTime(Collection<Integer> c) {
long startTime = System.currentTimeMillis();
// Test if a number is in the collection
for (int i = 0; i < N; i++)
c.contains((int)(Math.random() * 2 * N));
return System.currentTimeMillis() - startTime;
}
public static long getTestTime1(HashMap<Integer,Integer> newmap) {
long startTime = System.currentTimeMillis();
// Test if a number is in the collection
for (int i = 0; i < N; i++)
newmap.containsKey((int)(Math.random() * 2 * N));
return System.currentTimeMillis() - startTime;
}
public static long getRemoveTime(Collection<Integer> c) {
long startTime = System.currentTimeMillis();
for (int i = 0; i < N; i++)
c.remove(i);
return System.currentTimeMillis() - startTime;
}
public static long getRemoveTime1(HashMap<Integer,Integer> newmap) {
long startTime = System.currentTimeMillis();
for (int i = 0; i < N; i++)
newmap.remove(i);
return System.currentTimeMillis() - startTime;
}
}
I have two suggestion. First, when doing benchmarking, do the bare minimum work immediately before and after the code you are evaluating. You don't want your benchmark activity to affect the result.
Second, System.currentTimeMillis() can, depending on the OS, only be accurate within 10 milliseconds. Better to use System.nanoTime(), which is accurate to perhaps 200 nanoseconds. Divide by 1_000_000 to get milliseconds.
Practically,
final long startNanos, endNanos;
startNanos = System.nanoTime();
// your code goes here
endNanos = System.nanoTime();
// display the results of the benchmark
For my work I have done some tests for time chart.
I have come to something that surprised me and need help understanding it.
I used few data structures as queue and wanted to know how deleting is fast according to number of items. And arraylist with 10 items, deleting from front and not set initial capacity is much slower than the same with set initial capacity (to 15). Why? And why it's same at 100 items.
Here's the chart:
Data Structures: L - implements List, C - set initial capacity, B - removing from back, Q - implements Queue
EDIT:
Appending relevant piece of code
new Thread(new Runnable() {
#Override
public void run()
{
long time;
final int[] arr = {10, 100, 1000, 10000, 100000, 1000000};
for (int anArr : arr)
{
final List<Word> temp = new ArrayList<>();
while (temp.size() < anArr) temp.add(new Item());
final int top = (int) Math.sqrt(anArr);
final List<Word> first = new ArrayList<>();
final List<Word> second = new ArrayList<>(anArr);
...
first.addAll(temp);
second.addAll(temp);
...
SystemClock.sleep(5000);
time = System.nanoTime();
for (int i = 0; i < top; ++i) first.remove(0);
Log.d("al_l", "rem: " + (System.nanoTime() - time));
time = System.nanoTime();
for (int i = 0; i < top; ++i) second.remove(0);
Log.d("al_lc", "rem: " + (System.nanoTime() - time));
...
}
}
}).start();
Read this article about Avoiding Benchmarking Pitfalls on the JVM. It explains the impact of the Hotspot VM on the test results. If you don't take care about it, your measurement isn't right. As you have found out with your own test.
If you want to do reliable benchamrking use JMH.
I too was able to replicate it by creating the code below. However, I noticed that whatever is run first (the set capacity vs non-set capacity) is the one that will take the longest. I assume this is some kind of optimization, maybe the JVM, or some kind of Caching?
public class Test {
public static void main(String[] args) {
measure(-1, 10); // switch with line below
measure(15, 10); // switch with line above
measure(-1, 100);
measure(15, 100);
}
public static void measure(int capacity, long numItems) {
ArrayList<String> arr = new ArrayList<>();
if (capacity >= 1) {
arr.ensureCapacity(capacity);
}
for (int i = 0; i <= numItems; i++) {
arr.add("T");
}
long start = System.nanoTime();
for (int i = 0; i <= numItems; i++) {
arr.remove(0);
}
long end = System.nanoTime();
System.out.println("Capacity: " + capacity + ", " + "Runtime: "
+ (end - start));
}
}
I have written the below code to observe the timing of a loop function. Surprisingly, It gives me different values for each run.
public static void main(String[] args) {
for (int attempt = 0; attempt < 10; attempt++) {
runloop();
}
}
public static void runloop() {
long sum = 0L;
long starttime = System.nanoTime();
for (int x = 0; x < 1000000; x++) {
sum += x;
}
long end = System.nanoTime();
System.out.println("Time taken:" + (end - starttime) / 1000L);
}
}
Observation :
Time taken:4062
Time taken:3122
Time taken:2707
Time taken:2445
Time taken:3575
Time taken:2823
Time taken:2228
Time taken:1816
Time taken:1839
Time taken:1811
I am not able to understand why there is such a difference in the timing.
What is the reason ?
It could be anything:
Other processes running on your computer limiting the time given to Java
Run of the garbage collector
Loop initialization time
...
I had thought that HashMaps were faster for random access of individual values than ArrayLists . . . that is, to say, that HashMap.get(key) should be faster than ArrayList.get(index) simply because the ArrayList has to traverse every element of the collection to reach its value, whereas the HashMap does not. You know, O(1) vs O(n) and all that.
edit: So my understanding of HashMaps was/is inadequate, hence my confusion. The results from this code are as expected. Thanks for the many explanations.
So I decided to test it, on a lark. Here is my code:
import java.util.HashMap;
import java.util.Iterator;
import java.util.ListIterator;
import java.util.NoSuchElementException;
import java.util.Scanner;
public class Testing
{
public static void main(String[] args)
{
ArrayList<SomeClass> alist = new ArrayList<>();
HashMap<Short, SomeClass> hmap = new HashMap<>(4000, (float).75);
ListIterator<SomeClass> alistiterator = alist.listIterator();
short j = 0;
do
{
alistiterator.add(new SomeClass());
j++;
}
while(j < 4000);
for (short i = 0; i < 4000; i++)
{
hmap.put(i, new SomeClass());
}
boolean done = false;
Scanner input = new Scanner(System.in);
String blargh = null;
do
{
System.out.println("\nEnter 1 to run iteration tests.");
System.out.println("Enter w to run warmup (recommended)");
System.out.println("Enter x to terminate program.");
try
{
blargh = input.nextLine();
}
catch (NoSuchElementException e)
{
System.out.println("Uh, what? Try again./n");
continue;
}
switch (blargh)
{
case "1":
long starttime = 0;
long total = 0;
for (short i = 0; i < 1000; i++)
{
starttime = System.nanoTime();
iteratearraylist(alist);
total += System.nanoTime() - starttime;
}
total = (long)(total * .001);
System.out.println(total + " ns: iterating sequentially"
+ " through ArrayList");
total = 0;
for (short i = 0; i< 1000; i++)
{
starttime = System.nanoTime();
iteratearraylistbyget(alist);
total += System.nanoTime() - starttime;
}
total = (long)(total * .001);
System.out.println(total + " ns: iterating sequentially"
+ " through ArrayList via .get()");
total = 0;
for (short i = 0; i< 1000; i++)
{
starttime = System.nanoTime();
iteratehashmap(hmap);
total += System.nanoTime() - starttime;
}
total = (long)(total * .001);
System.out.println(total + " ns: iterating sequentially"
+ " through HashMap via .next()");
total = 0;
for (short i = 0; i< 1000; i++)
{
starttime = System.nanoTime();
iteratehashmapbykey(hmap);
total += System.nanoTime() - starttime;
}
total = (long)(total * .001);
System.out.println(total + " ns: iterating sequentially"
+ " through HashMap via .get()");
total = 0;
for (short i = 0; i< 1000; i++)
{
starttime = System.nanoTime();
getvaluebyindex(alist);
total += System.nanoTime() - starttime;
}
total = (long)(total * .001);
System.out.println(total + " ns: getting end value"
+ " from ArrayList");
total = 0;
for (short i = 0; i< 1000; i++)
{
starttime = System.nanoTime();
getvaluebykey(hmap);
total += System.nanoTime() - starttime;
}
total = (long)(total * .001);
System.out.println(total + " ns: getting end value"
+ " from HashMap");
break;
case "w":
for (int i = 0; i < 60000; i++)
{
iteratearraylist(alist);
iteratearraylistbyget(alist);
iteratehashmap(hmap);
iteratehashmapbykey(hmap);
getvaluebyindex(alist);
getvaluebykey(hmap);
}
break;
case "x":
done = true;
break;
default:
System.out.println("Invalid entry. Please try again.");
break;
}
}
while (!done);
input.close();
}
public static void iteratearraylist(ArrayList<SomeClass> alist)
{
ListIterator<SomeClass> tempiterator = alist.listIterator();
do
{
tempiterator.next();
}
while (tempiterator.hasNext());
}
public static void iteratearraylistbyget(ArrayList<SomeClass> alist)
{
short i = 0;
do
{
alist.get(i);
i++;
}
while (i < 4000);
}
public static void iteratehashmap(HashMap<Short, SomeClass> hmap)
{
Iterator<HashMap.Entry<Short, SomeClass>> hmapiterator =
map.entrySet().iterator();
do
{
hmapiterator.next();
}
while (hmapiterator.hasNext());
}
public static void iteratehashmapbykey(HashMap<Short, SomeClass> hmap)
{
short i = 0;
do
{
hmap.get(i);
i++;
}
while (i < 4000);
}
public static void getvaluebykey(HashMap<Short, SomeClass> hmap)
{
hmap.get(3999);
}
public static void getvaluebyindex(ArrayList<SomeClass> alist)
{
alist.get(3999);
}
}
and
public class SomeClass
{
int a = 0;
float b = 0;
short c = 0;
public SomeClass()
{
a = (int)(Math.random() * 100000) + 1;
b = (float)(Math.random() * 100000) + 1.0f;
c = (short)((Math.random() * 32000) + 1);
}
}
Interestingly enough, the code seems to warm up in stages. The final stage that I've identified comes after around 120,000 iterations of all methods. Anyway, on my test machine (AMD x2-220, L3 + 1 extra core unlocked, 3.6 ghz, 2.1 ghz NB), the numbers that really jumped out at me were the last two reported. Namely, the time taken to .get() the last entry of the ArrayList (index == 3999) and the time taken to .get() the value associated with a Short key of 3999.
After 2-3 warmup cycles, testing shows that ArrayList.get() takes around 56 ns, while HashMap.get() takes around 68 ns. That is . . . not what I expected. Is my HashMap all eaten up with collisions? All the key entries are supposed to autobox to Shorts which are supposed to report their stored short value in response to .hashcode(), so all the hashcodes should be unique. I think?
Even without warmups, the ArrayList.get() is still faster. That is contrary to everything I've seen elsewhere, such as this question. Of course, I've also read that traversing an ArrayList with a ListIterator is faster than just using .get() in a loop, and obviously, that is also not the case . . .
Hashmaps aren't faster at retrieval of something at a known index. If you are storing things in a known order, the list will win.
But say instead of your example of inserting everything into the list 1-4000, you did it in a total random order. Now to retrieve the correct item from a list, you have to check each item one by one looking for the right item. But to retrieve it from the hashmap, all you need to know is the key you would have given it when you inserted it.
So really, you should be comparing Hashmap.get(i) to
for(Integer i : integerList)
if(i==value)
//found it!
Then you would see the real efficiency of the hashmap.
the ArrayList has to traverse every element of the collection to reach its value
This is not true. ArrayList is backed by an array which allows for constant-time get operations.
HashMap's get, on the other hand, first must hash its argument, then it must traverse the bucket to which the hash code corresponds, testing each element in the bucket for equality with the given key. This will generally be slower than just indexing an array.
ArrayList.get(index) acctualy uses constant time, since ArrayList is backed by an array, so it just uses that index in the bacing array. ArrayList.contains(Object) is a long operation in O(n) in worst case.
Big O for HashMap is O(1+α). Your α comes from hashcode collisions and a bucket must be traversed to check for equality.
Big O for pulling an item out of an ArrayList by index O(1)
When in doubt... draw it out...
Both ArrayList and HashMap are backed with arrays, HashMap has to compute a hash code of the key from which it derives the index to use for accessing the array while for accessing an and element in the ArrayList using get you provide the index. So its 3 operations vs 1 operation for the ArrayList.
But whether a List or a Map is backed with an array is implementation detail. So the answer may differ depending on which implementations you use.