Intermediate between LinkedList and ArrayList - java

In the following piece of code, I have the result of a query, but I have no clue about the total number of records. I have to store it into a container. When I read each record of the container, it will be a simple loop so that index-based access will not be used.
List<MyObject> list;
while ( source.hasNext() ) {
MyObject ob = new MyObject();
convertObject(ob, source.next());
list.add(ob);
}
...
//Another method
for (MyObject ob : objects){
showThings(ob);
}
LinkedList is poor because it creates many small objects with pointers to the next one. It uses more memory, makes the memory more fragmented and has more cache miss.
ArrayList is poor because I don't know the number of records that I will insert. Whenever I insert a new item and the inner array is full, it will allocate a bigger block of memory and copy everything to the new block.
I didn't find any solution in java.util. So I consider writing a custom list. It will be like a LinkedList, but each cell is an array. In other words, the first node will be like an ArrayList, but when it is full and I insert a new object, it will create another node with an array to insert the new items instead of copying everything. However, I may be reinventing the wheel somehow.

Whenever I insert a new item and the inner array is full, it will allocate a bigger block of memory and copy everything to the new block
This is not true, you can preallocate a hash array of any size you want, and it only increments when it runs out of space, increasing by 50% each time, so if you have a reasonable guess of the expected size, or even guess a large number, the reallocation cost will be zero or minimal.

Some googling unveils the Brownies Collections' GapList.
It is organizing its contents in blocks and manages them by arranging them in a tree. It also does block merging in case after element removal they become sparse.

Related

Reference to element in ArrayList which is later moved

assume the following code:
ArrayList<A> aList = new ArrayList<>();
for(int i = 0; i < 1000; ++i)
aList.add(new A());
A anElement = aList.get(500);
for(int i = 0; i < 100000; ++i)
aList.add(new A());
Afterwards anElement still correctly references aList[500], even though the ArrayList presumably reallocated its data multiple times during the second for loop. Is this assumption incorrect, and if not, how does Java manage to have anElement still point at the correct data in memory?
My theories are that either instead of freeing the memory anElement references, that memory now points to the current aList data, or alternatively the reference anElement has is updated when growing the array. Both of these theories however have really bad Space/Time Performance implication, so I consider them unlikely.
Edit:
I misunderstood how arrays store elements, I assumed they store them directly, but in reality they store references, meaning that anElement and aList[500] both point to some object on the heap, solving the problem I failed to understand!
When array that internally stores elements of an ArrayList becomes full, new, larger array is being created and all elements from the previous array are being copied to new one at the same indexes and now there is a space for new elements. Garbage collector will get rid of previous, not needed any more array.
You might want to have a look at the code of implementation of ArrayList here to see details of how it works "under the hood".
Second loop in your code, added next 100000 elements after the 1000th element so now you have 101000 elements in aList, first 1000 elements weren't moved anywhere. With get() method you only read that element, nothing is moved nor deleted from that ArrayList.
Note that ArrayList doesn't really work like array (e.g. array of As is A[]) and it's not a fixed-size collection - ArrayList changes its size when adding or removing elements - e. g. if you remove element at index 0 (aList.remove(0);), then element that was stored at index 1000 is now stored at index 999 and size of ArrayList also changes from 1000 to 999.
If you want to know how an ArrayList works internally just look at the sources you can find online, for example here:
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/ArrayList.java
Here you can clearly see that an Object[] is used internally that is resized to int newCapacity = (oldCapacity * 3)/2 + 1; when deemed neccessary.
The indexes stay the same, as long as you add something to the back. Should you insert something in the middle all indexes of elements behind this are incremented by one.
That is not Java, but an implementation detail of the given JVM. You can read something here: https://www.artima.com/insidejvm/ed2/jvm6.html for example, so there are complete books about JVM internals.
A simple approach is to have a map of objects, so you refer that map first, and then find the current location of the actual object. Then it is simple to move the actual object around, as its address is stored in a single copy, in that map.
One could say this is slow and store address directly, and go without the extra step of looking up the object in a map. This way moving the object around will require more work, though still possible, just all pointers have to be updated (as raw pointers do not appear on the language level, casting to numbers temporarily or storing inside some random array and similar magic can not do harm here, you can always track that something is a pointer pointing the object you are moving around or not).

what is the time complexity for copying list back to arrays and vice-versa in Java?

I am wondering what is the time complexity [in big O(n) notations] of ArrayList to Array conversion:
ArrayList assetTradingList = new ArrayList();
assetTradingList.add("Stocks trading");
assetTradingList.add("futures and option trading");
assetTradingList.add("electronic trading");
assetTradingList.add("forex trading");
assetTradingList.add("gold trading");
assetTradingList.add("fixed income bond trading");
String [] assetTradingArray = new String[assetTradingList.size()];
assetTradingArray.toArray(assetTradingArray);
similarly, what is the time complexity for arrays to list in the following ways:
method 1 using Arrays.asList:
String[] asset = {"equity", "stocks", "gold", "foreign exchange","fixed
income", "futures", "options"};
List assetList = Arrays.asList(asset);
method 2 using collections.addAll:
List assetList = new ArrayList();
String[] asset = {"equity", "stocks", "gold", "foreign exchange", "fixed
income", "futures", "options"};
Collections.addAll(assetList, asset);
method 3 addAll:
ArrayList newAssetList = new ArrayList();
newAssetList.addAll(Arrays.asList(asset));
The reason I am interested in the overhead of copying back and forth is because in typical interviews, questions come such as given an array of pre-order traversal elements, convert to binary search tree and so on, involving arrays. With List offering a whole bunch of operations such as remove etc, it would make it simple to code using List than Array.
In which case, I would like to defend me for using list instead of arrays saying "I would first convert the Array to List because the overhead of this operation is not much (hopefully)".
Any better methods recommended for copying the elements back and forth from array to list that would be faster would be good know too.
Thanks
It would seem that Arrays.asList(T[]); is the fastest withO(1)
Because the method returns an unmodifiable List, there is no reason to copy the references over to a new data structure. The method simply uses the given array as a backing array for the unmodifiable List implementation that it returns.
The other methods seem like they copy each element, one by one to an underlying data structure. ArrayList#toArray(..) uses System.arraycopy(..) deep down (O(n) but faster because it's done natively). Collections.addAll(..) loops through the array elements (O(n)).
Careful when using ArrayList. The backing array doubles in size when its capacity is reached, ie. when it's full. This takes O(n) time. Adding to an ArrayList might not be the best idea unless you know how many elements you are adding from the beginning and create it with that size.
Since the backing data structure of ArrayList is an array, and copying of an array elements is a O(n), it is O(n).
The only overhead I see is pollution of heap with those intermediate objects. Most of the time developers (especially, beginners) don't care about that and treat Java GC as a magic wand that cleans everything after them. My personal opinion is, if you can avoid unwanted transformation of array to list and vice versa, do that.
If you know beforehand the foreseeable (e.g. defined) size of a list, preallocate its size with ArrayList(int size) constructor to avoid internal array copying that takes place inside ArrayList when capacity is exhausted. Depending on a use case, consider other implementations, e.g. LinkedList, if you're only interested in consequent addition to the list, and iterative reading.
An ArrayList is fundamentally just a wrapper around an Object[] array. It has a lot of helpful methods for doing things like finding items, adding and removing items, and so on, but from a time complexity perspective these are no better than doing the operations yourself on a plain array.
If your hope is that an ArrayList is fundamentally more efficient than a manually managed array, it's not. Sorry!
Converting an array to an ArrayList takes O(n) time. Every element must be copied.
Inserting or removing an element takes O(m) amortized time, where m is the number of elements following the insertion/removal index. These elements have to be moved to new indices whether you use an array or ArrayList.
"Amortized" means average -- sometimes the backing array will need to be grown or shrunk, which takes additional time on the order of O(n). This doesn't happen every time, so on the whole the additional time averages out to an O(1) additional cost.
Accessing an element at an arbitrary index takes O(1) time. Both provide constant time random access.

java linkedlist slower than arraylist when adding elements?

i thought linkedlists were supposed to be faster than an arraylist when adding elements? i just did a test of how long it takes to add, sort, and search for elements (arraylist vs linkedlist vs hashset). i was just using the java.util classes for arraylist and linkedlist...using both of the add(object) methods available to each class.
arraylist out performed linkedlist in filling the list...and in a linear search of the list.
is this right? did i do something wrong in the implementation maybe?
***************EDIT*****************
i just want to make sure i'm using these things right. here's what i'm doing:
public class LinkedListTest {
private List<String> Names;
public LinkedListTest(){
Names = new LinkedList<String>();
}
Then I just using linkedlist methods ie "Names.add(strings)". And when I tested arraylists, it's nearly identical:
public class ArrayListTest {
private List<String> Names;
public ArrayListTest(){
Names = new ArrayList<String>();
}
Am I doing it right?
Yes, that's right. LinkedList will have to do a memory allocation on each insertion, while ArrayList is permitted to do fewer of them, giving it amortized O(1) insertion. Memory allocation looks cheap, but may be actually be very expensive.
The linear search time is likely slower in LinkedList due to locality of reference: the ArrayList elements are closer together, so there are fewer cache misses.
When you plan to insert only at the end of a List, ArrayList is the implementation of choice.
Remember that:
there's a difference in "raw" performance for a given number of elements, and in how different structures scale;
different structures perform differently at different operations, and that's essentially part of what you need to take into account in choosing which structure to use.
So, for example, a linked list has more to do in adding to the end, because it has an additional object to allocate and initialise per item added, but whatever that "intrinsic" cost per item, both structures will have O(1) performance for adding to the end of the list, i.e. have an effectively "constant" time per addition whatever the size of the list, but that constant will be different between ArrayList vs LinkedList and likely to be greater for the latter.
On the other hand, a linked list has constant time for adding to the beginning of the list, whereas in the case of an ArrayList, the elements must be "shuftied" along, an operation that takes some time proportional to the number of elements. But, for a given list size, say, 100 elements, it may still be quicker to "shufty" 100 elements than it is to allocate and initialise a single placeholder object of the linked list (but by the time you get to, say, a thousand or a million objects or whatever the threshold is, it won't be).
So in your testing, you probably want to consider both the "raw" time of the operations at a given size and how these operations scale as the list size grows.
Why did you think LinkedList would be faster? In the general case, an insert into an array list is simply a case of updating the pointer for a single array cell (with O(1) random access). The LinkedList insert is also random access, but must allocate an "cell" object to hold the entry, and update a pair of pointers, as well as ultimately setting the reference to the object being inserted.
Of course, periodically the ArrayList's backing array may need to be resized (which won't be the case if it was chosen with a large enough initial capacity), but since the array grows exponentially the amortized cost will be low, and is bounded by O(lg n) complexity.
Simply put - inserts into array lists are much simpler and therefore much faster overall.
Linked list may be slower than array list in these cases for a few reasons. If you are inserting into the end of the list, it is likely that the array list has this space already allocated. The underlying array is usually increased in large chunks, because this is a very time-consuming process. So, in most cases, to add an element in the back requires only sticking in a reference, whereas the linked list needs the creation of a node. Adding in the front and the middle should give different performance in for both types of list.
Linear traversal of the list will always be faster in an array based list because it must only traverse the array normally. This requires one dereferencing operation per cell. In the linked list, the nodes of the list must also be dereferenced, taking double the amount of time.
When adding an element to the back of a LinkedList (in Java LinkedList is actually a doubly linked list) it is an O(1) operation as is adding an element to the front of it. Adding an element on the ith position is roughly an O(i) operation.
So, if you were adding to the front of the list, a LinkedList would be significantly faster.
ArrayList is faster in accessing random index data, but slower when inserting elements in the middle of the list, because using linked list you just have to change reference values. But in an array list you have to copy all elements after the inserted index, one index behind.
EDIT: Is not there a linkedlist implementation which keeps the last element in mind? Doing it this way would speed up inserting at the end using linked list.

Non runtime allocation solution - ArrayList

I'm making a game in Java. I need some solution for my current runtime allocation, caused by my ArrayList. Every single minute or 30 seconds the garbage collector starts to runs because of I am calling for draw and updates-method through this collection.
How should I be able to do a non runtime allocation solution?
Thanks in advance and if needed, my code is posted below from my Manager class which contains the ArrayList of objects.:
Some code:
#Override
public void draw(GL10 gl) {
final int size = objects.size();
for(int x = 0; x < size; x++) {
Object object = objects.get(x);
object.draw(gl);
}
}
public void add(Object parent) {
objects.add(parent);
}
//Get collection, and later we call the draw function from these objects
public ArrayList<Object> getObjects() {
return objects;
}
public int getNumberOfObjects() {
return objects.size();
}
More explanation: The reason I mix with this is because (1) I see that the ArrayList implementation is slow and causing lags and (2) that I want to merge the objects/components together. When firing an update call from my Thread-class, it goes through my collection, send things down the tree/graph using the Manager's update function.
When looking at an Open Source project, Replica Island, I found that he used an alternative class FixedSizeArray that he wrotes on his own. Since I'm not that good at Java, I wanted to make things easier and now I'm looking for another solution. And at last, he explained WHY he made the special class:
FixedSizeArray is an alternative to a standard Java collection like ArrayList. It is designed to provide a contiguous array of fixed length which can be accessed, sorted, and searched without requiring any runtime allocation. This implementation makes a distinction between the "capacity" of an array (the maximum number of objects it can contain) and the "count" of an array (the current number of objects inserted into the array). Operations such as set() and remove() can only operate on objects that have been explicitly add()-ed to the array; that is, indexes larger than getCount() but smaller than getCapacity() can't be used on their own.
I see that the ArrayList implementation is slow and causing lags ...
If you see that, you are misinterpreting the evidence and jumping to unjustifiable conclusions. ArrayList is NOT slow, and it does NOT cause lags ... unless you use the class in a particularly suboptimal way.
The only times that an array list allocates memory are when you create the list, add more elements, copy the list, or call iterator().
When you create the array list, 2 java objects are created; one for the ArrayList and one for its backing array. If you use the initialCapacity argument and give an appropriate value, you can arrange that subsequent updates will not allocate memory.
When you add or insert an element, the array list may allocate one new object. But this only happens when the backing array is too small to hold all of the elements, and when it does happen the new backing array is typically twice the size of the old one. So inserting N elements will result in at most log2(N) allocations. Besides, if you create the array list with an appropriate initialCapacity, you can guarantee that there are zero allocations on add or insert.
When you copy a list to another list or array (using toArray or a copy constructor) you will get 1 or 2 allocations.
The iterator() method creates a new object each time you call it. But you can avoid this by iterating using an explicit index variable, List.size() and List.get(int). (Be aware that for (E e : someList) { ... } implicitly calls List.iterator().)
(External operations like Collections.sort do entail extra allocations, but that is not the fault of the array list. It will happen with any list type.)
In short, the only way you can get lots of allocations using an array list is if you create lots of array lists, or use them unintelligently.
The FixedSizedArray class you have found sounds like a waste of time. It sounds like it is equivalent to creating an ArrayList with an initial capacity ... with the restriction that it will break if you get the initial capacity wrong. Whoever wrote it probably doesn't understand Java collections very well.
It's not quite clear what you are asking, but:
If you know at compile time what objects should be in the collection, make it an array not an ArrayList and set the contents in an initialisation block.
Object[] objects = new Object[]{obj1,obj2,obj3};
What makes you think you know what the GC is reclaiming? Have you profiled your application?
What do you mean by "non-runtime allocation"? I'm really not even sure what you mean by "allocation" in this context... allocation of memory? That's done at runtime, obviously. You clearly aren't referring to any kind of fixed pool of objects that are known at compile time either, since your code allows adding objects to your list several different ways (not that you'd be able to allocate anything for them at compile time even if you were).
Beyond that, nothing in the code you've posted is going to cause garbage collection by itself. Objects can only be garbage collected when nothing in the program has a strong reference to them, and your posted code only allows adding objects to the ArrayList (though they can be removed by calling getObjects() and removing from that, of course). As long as you aren't removing objects from the objects list, you aren't reassigning objects to point to a different list, and the object containing it isn't itself becoming eligible for garbage collection, none of the objects it contains will ever be available for garbage collection either.
So basically, there isn't any specific problem with the code you've posted and your question doesn't make sense as asked. Perhaps there are more details you can provide or there's a better explanation of what exactly your issue is and what you want. If so, please try to add that to your question.
Edit:
From the description of FixedSizeArray and the code I looked at in it, it seems largely equivalent to an ArrayList that is initialized with a specific array capacity (using the constructor that takes an int initialCapcacity) except that it will fail at runtime if something tries to add to it when its array is full, where ArrayList will expand itself to hold more and continue working just fine. To be honest, it seems like a pointless class, possibly written because the author didn't actually understand ArrayList.
Note also that its statement about "not requiring any runtime allocation" is a bit misleading... it does of course have to allocate an array when it is created, but it just refuses to allocate a new array if its initial array fills up. You can achieve the same thing using ArrayList by simply giving it an initialCapacity that is at least large enough to hold the maximum number of objects you will ever add to it. If you do so, and you do in fact ensure you never add more than that number of objects to it, it will never allocate a new array after it is created.
However, none of this relates in any way to your stated issue about garbage collection, and your code still doesn't show anything that would cause huge numbers of objects to be garbage collected. If there is any issue at all, it may relate to the code that is actually calling the add and getObjects methods and what it's doing.

Is Java HashMap.clear() and remove() memory effective?

Consider the follwing HashMap.clear() code:
/**
* Removes all of the mappings from this map.
* The map will be empty after this call returns.
*/
public void clear() {
modCount++;
Entry[] tab = table;
for (int i = 0; i < tab.length; i++)
tab[i] = null;
size = 0;
}
It seems, that the internal array (table) of Entry objects is never shrinked. So, when I add 10000 elements to a map, and after that call map.clear(), it will keep 10000 nulls in it's internal array. So, my question is, how does JVM handle this array of nothing, and thus, is HashMap memory effective?
The idea is that clear() is only called when you want to re-use the HashMap. Reusing an object should only be done for the same reason it was used before, so chances are that you'll have roughly the same number of entries. To avoid useless shrinking and resizing of the Map the capacity is held the same when clear() is called.
If all you want to do is discard the data in the Map, then you need not (and in fact should not) call clear() on it, but simply clear all references to the Map itself, in which case it will be garbage collected eventually.
Looking at the source code, it does look like HashMap never shrinks. The resize method is called to double the size whenever required, but doesn't have anything ala ArrayList.trimToSize().
If you're using a HashMap in such a way that it grows and shrinks dramatically often, you may want to just create a new HashMap instead of calling clear().
You are right, but considering that increasing the array is a much more expensive operation, it's not unreasonable for the HashMap to think "once the user has increased the array, chances are he'll need the array this size again later" and just leave the array instead of decreasing it and risking to have to expensively expand it later again. It's a heuristic I guess - you could advocate the other way around too.
Another thing to consider is that each element in table is simply a reference. Setting these entries to null will remove the references from the items in your Map, which will then be free for garbage collection. So it isn't as if you are not freeing any memory at all.
However, if you need to free even the memory being used by the Map itself, then you should release it as per Joachim Sauer's suggestion.

Categories

Resources