Java OutOfMemory during sorting

Java OutOfMemory during sorting - java

I have a task about building a pyramid using list of numbers, but there is one problem with one test. In my task I need to sort a list. I use Collections.sort():
Collections.sort(inputNumbers, (o1, o2) -> {
if (o1 != null && o2 != null) {
return o1.compareTo(o2);
} else {
throw new CannotBuildPyramidException("Unable to build a pyramid");
}
});
But this test fails
#Test(expected = CannotBuildPyramidException.class)
public void buildPyramid8() {
// given
List<Integer> input = Collections.nCopies(Integer.MAX_VALUE - 1, 0);
// run
int[][] pyramid = pyramidBuilder.buildPyramid(input);
// assert (exception)
}
with OutOfMemoryError instead of my own CannotBuildPyramidException(it will be thrown in another method after sorting). I understand that it is because of TimSort in Collections.sort() method. I tried to use HeapSort, but I couldn`t even swap elements because my input list was initialized as Arrays.asList() and when I use set() method I get UnsupportedOperationException. Then I tried to convert my list to common ArrayList
ArrayList<Integer> list = new ArrayList<>(inputNumbers);
but I got OutOfMemoryError again. It`s not allowed to edit tests. I dont know what to do with this problem. Im using Java8 and IntelliJIdea SDK

Note that the list created by Collections.nCopies(Integer.MAX_VALUE - 1, 0) uses a tiny amount of memory and is immutable. The documentation says "Returns an immutable list consisting of n copies of the specified object. The newly allocated data object is tiny (it contains a single reference to the data object)". And if you look at the implementation, you'll see it does exactly what one would expect from that description. It returns a List object that only pretends to be large, only holding the size and the element once and returning that element when asked about any index.
The problem with Collections.sort is then two-fold:
The list must not be immutable, but that list is. That btw also explains the UnsupportedOperationException you got when you tried to set().
For performance reasons, it "obtains an array containing all elements in this list, sorts the array, [and writes back to the list]". So at this point the tiny pretend-list does get blown up and causes your memory problem.
So you need to find some other way to sort. One that works in-place and doesn't swap anything for this input (which is correct, as the list is already sorted). You could for example use bubble sort, which takes O(n) time and O(1) space on this input and doesn't attempt any swaps here.
Btw, about getting the memory problem "because of TimSort": Timsort is really not to blame. You don't even get to the Timsort part, as it's the preparatory copy-to-array that causes the memory problem. And furthermore, Timsort is smart and would detect that the data is already sorted and then wouldn't do anything. So if you actually did get to the Timsort part, or if you could directly apply it to the list, Timsort wouldn't cause a problem.

This list is too huge! Collections.nCopies(Integer.MAX_VALUE - 1, 0); gives us list of 2^31-1 elements (2147483647), each one taking about 4 bytes in memory (this is "simplified" size of Integer). If we multiply it, we'll have about 8.59 GB of memory required to store all those numbers. Are you sure you have enough memory to store it?
I believe this test is written in a very bad manner - one should never try to create such huge List.

Related

How does isEmpty() works behind the scenes

I have to optimize and algorithm and i noticed that we have a loop like this
while (!floorQueues.values().stream().allMatch(List::isEmpty))
It seems like on each iteration it checks if all of the lists in this map are empty.
The data in the map is taken from a two dimensional array like this
int currentFloorNumber = 0;
for (int[] que : queues) {
List<Integer> list = Arrays.stream(que).boxed().collect(Collectors.toList());
floorQueues.put(currentFloorNumber, list);
currentFloorNumber++;
}
I thought that it will be more optimal if i take the count of elements in the arrays when transforming the data and then check how many times i deleted from the lists as a condition to end the loop
while (countOfDeltedElements < totalCountOfElements)
but when i tested the code it runs slower that before. So i wonder how isEmpty works behind
the scenes to be faster than my solution.

This may depend on the implementation of the class that implements List.
An ArrayList simply checks if there are 0 elements:
/**
* Returns <tt>true</tt> if this list contains no elements.
*
* #return <tt>true</tt> if this list contains no elements
*/
public boolean isEmpty() {
return size == 0;
}

A distinct non-answer: Java performance, and Java benchmarking doesn't work this way.
You can't look at these 5 lines of source code to understand what exactly will happen at runtime. Or to be precise: you have to understand that streams are a very advanced, aka complex thing. Stream code might create plenty of objects at runtime, that are only used once, and then thrown away. In order to assess the true performance impacts, you really have to understand what that code is doing. And that could be: a lot!
The serious answer is: if you really need to understand what is going on, then you will need to:
study the stream implementation in great detail
and worse: you will need to look into the activities of the Just in Time compiler within the JVM (to see for example how that "source" code gets translated and optimised to machine code).
and most likely: you will need to apply a real profiler, and to plenty of experiments.
In any case, you should start reading here to ensure that the numbers you are measuring make sense in the first place.

Normally we have two ways to check if the list is empty or not either list.length() > 0 or !list.isEmpty() .
when we use list.length() what happen at backend it will iterate/go through till the end of list and if the list have big numbers of elements and it will surely going to take long to reach the end .On the other hand, if we use 'list.isEmpty()' then it will check only first element of the list if its their or not (O(1)) and return true/false just on this first index which is obviously fast.
From performance prespective , we should alwayz use isEmpty() as best practice

Java : Creating chunks of List for processing

I have a list with a large number of elements. While processing this list, in some cases I want the list to be partitioned into smaller sub-lists and in some cases I want to process the entire list.
private void processList(List<X> entireList, int partitionSize)
{
Iterator<X> entireListIterator = entireList.iterator();
Iterator<List<X>> chunkOfEntireList = Iterators.partition(entireListIterator, partitionSize);
while (chunkOfEntireList.hasNext()) {
doSomething(chunkOfEntireList.next());
if (chunkOfEntireList.hasNext()) {
doSomethingOnlyIfTheresMore();
}
}
I'm using com.google.common.collect.Iterators for creating partitions. Link of documentation here
So in cases where I want to partition the list with size 100, I call
processList(entireList, 100);
Now, when I don't want to create chunks of the list, I thought I could pass Integer.MAX_VALUE as partitionSize.
processList(entireList, Integer.MAX_VALUE);
But this leads to my code going out of memory. Can someone help me out? What am I missing? What is Iterators doing internally and how do I overcome this?
EDIT : I also require the "if" clause inside to do something only if there are more lists to process. i.e i require hasNext() function of the iterator.

You're getting an out of memory error because Iterators.partition() internally populates an array with the given partition length. The allocated array is always the partition size because the actual number of elements is not known until the iteration is complete. (The issue could have been prevented if they had used an ArrayList internally; I guess the designers decided that arrays would offer better performance in the common case.)
Using Lists.partition() will avoid the problem since it delegates to List.subList(), which is only a view of the underlying list:
private void processList(List<X> entireList, int partitionSize) {
for (List<X> chunk : Lists.partition(entireList, partitionSize)) {
doSomething(chunk);
}
}

Normally while partitioning it will allocate a new list with given partitionSize. So it is obvious in this case that there will be such error. Why don't you use the original list when you need only single partition. Possible solutions.
create a separate overloaded method where you won't take the size.
pass the size as -1 when you don't need any partition. In the method check the value, if -1 then put the original list into the chunkOfEntireList,.

Non runtime allocation solution - ArrayList

I'm making a game in Java. I need some solution for my current runtime allocation, caused by my ArrayList. Every single minute or 30 seconds the garbage collector starts to runs because of I am calling for draw and updates-method through this collection.
How should I be able to do a non runtime allocation solution?
Thanks in advance and if needed, my code is posted below from my Manager class which contains the ArrayList of objects.:
Some code:
#Override
public void draw(GL10 gl) {
final int size = objects.size();
for(int x = 0; x < size; x++) {
Object object = objects.get(x);
object.draw(gl);
}
}
public void add(Object parent) {
objects.add(parent);
}
//Get collection, and later we call the draw function from these objects
public ArrayList<Object> getObjects() {
return objects;
}
public int getNumberOfObjects() {
return objects.size();
}
More explanation: The reason I mix with this is because (1) I see that the ArrayList implementation is slow and causing lags and (2) that I want to merge the objects/components together. When firing an update call from my Thread-class, it goes through my collection, send things down the tree/graph using the Manager's update function.
When looking at an Open Source project, Replica Island, I found that he used an alternative class FixedSizeArray that he wrotes on his own. Since I'm not that good at Java, I wanted to make things easier and now I'm looking for another solution. And at last, he explained WHY he made the special class:
FixedSizeArray is an alternative to a standard Java collection like ArrayList. It is designed to provide a contiguous array of fixed length which can be accessed, sorted, and searched without requiring any runtime allocation. This implementation makes a distinction between the "capacity" of an array (the maximum number of objects it can contain) and the "count" of an array (the current number of objects inserted into the array). Operations such as set() and remove() can only operate on objects that have been explicitly add()-ed to the array; that is, indexes larger than getCount() but smaller than getCapacity() can't be used on their own.

I see that the ArrayList implementation is slow and causing lags ...
If you see that, you are misinterpreting the evidence and jumping to unjustifiable conclusions. ArrayList is NOT slow, and it does NOT cause lags ... unless you use the class in a particularly suboptimal way.
The only times that an array list allocates memory are when you create the list, add more elements, copy the list, or call iterator().
When you create the array list, 2 java objects are created; one for the ArrayList and one for its backing array. If you use the initialCapacity argument and give an appropriate value, you can arrange that subsequent updates will not allocate memory.
When you add or insert an element, the array list may allocate one new object. But this only happens when the backing array is too small to hold all of the elements, and when it does happen the new backing array is typically twice the size of the old one. So inserting N elements will result in at most log2(N) allocations. Besides, if you create the array list with an appropriate initialCapacity, you can guarantee that there are zero allocations on add or insert.
When you copy a list to another list or array (using toArray or a copy constructor) you will get 1 or 2 allocations.
The iterator() method creates a new object each time you call it. But you can avoid this by iterating using an explicit index variable, List.size() and List.get(int). (Be aware that for (E e : someList) { ... } implicitly calls List.iterator().)
(External operations like Collections.sort do entail extra allocations, but that is not the fault of the array list. It will happen with any list type.)
In short, the only way you can get lots of allocations using an array list is if you create lots of array lists, or use them unintelligently.
The FixedSizedArray class you have found sounds like a waste of time. It sounds like it is equivalent to creating an ArrayList with an initial capacity ... with the restriction that it will break if you get the initial capacity wrong. Whoever wrote it probably doesn't understand Java collections very well.

It's not quite clear what you are asking, but:
If you know at compile time what objects should be in the collection, make it an array not an ArrayList and set the contents in an initialisation block.
Object[] objects = new Object[]{obj1,obj2,obj3};

What makes you think you know what the GC is reclaiming? Have you profiled your application?

What do you mean by "non-runtime allocation"? I'm really not even sure what you mean by "allocation" in this context... allocation of memory? That's done at runtime, obviously. You clearly aren't referring to any kind of fixed pool of objects that are known at compile time either, since your code allows adding objects to your list several different ways (not that you'd be able to allocate anything for them at compile time even if you were).
Beyond that, nothing in the code you've posted is going to cause garbage collection by itself. Objects can only be garbage collected when nothing in the program has a strong reference to them, and your posted code only allows adding objects to the ArrayList (though they can be removed by calling getObjects() and removing from that, of course). As long as you aren't removing objects from the objects list, you aren't reassigning objects to point to a different list, and the object containing it isn't itself becoming eligible for garbage collection, none of the objects it contains will ever be available for garbage collection either.
So basically, there isn't any specific problem with the code you've posted and your question doesn't make sense as asked. Perhaps there are more details you can provide or there's a better explanation of what exactly your issue is and what you want. If so, please try to add that to your question.
Edit:
From the description of FixedSizeArray and the code I looked at in it, it seems largely equivalent to an ArrayList that is initialized with a specific array capacity (using the constructor that takes an int initialCapcacity) except that it will fail at runtime if something tries to add to it when its array is full, where ArrayList will expand itself to hold more and continue working just fine. To be honest, it seems like a pointless class, possibly written because the author didn't actually understand ArrayList.
Note also that its statement about "not requiring any runtime allocation" is a bit misleading... it does of course have to allocate an array when it is created, but it just refuses to allocate a new array if its initial array fills up. You can achieve the same thing using ArrayList by simply giving it an initialCapacity that is at least large enough to hold the maximum number of objects you will ever add to it. If you do so, and you do in fact ensure you never add more than that number of objects to it, it will never allocate a new array after it is created.
However, none of this relates in any way to your stated issue about garbage collection, and your code still doesn't show anything that would cause huge numbers of objects to be garbage collected. If there is any issue at all, it may relate to the code that is actually calling the add and getObjects methods and what it's doing.

Best way to remove repeats in a collection in Java?

This is a two-part question:
First, I am interested to know what the best way to remove repeating elements from a collection is. The way I have been doing it up until now is to simply convert the collection into a set. I know sets cannot have repeating elements so it just handles it for me.
Is this an efficient solution? Would it be better/more idiomatic/faster to loop and remove repeats? Does it matter?
My second (related) question is: What is the best way to convert an array to a Set? Assuming an array arr The way I have been doing it is the following:
Set x = new HashSet(Arrays.asList(arr));
This converts the array into a list, and then into a set. Seems to be kinda roundabout. Is there a better/more idiomatic/more efficient way to do this than the double conversion way?
Thanks!

Do you have any information about the collection, like say it is already sorted, or it contains mostly duplicates or mostly unique items? With just an arbitrary collection I think converting it to a Set is fine.
Arrays.asList() doesn't create a brand new list. It actually just returns a List which uses the array as its backing store, so it's a cheap operation. So your way of making a Set from an array is how I'd do it, too.

Use HashSet's standard Collection conversion constructor. According to The Java Tutorials:
Here's a simple but useful Set idiom.
Suppose you have a Collection, c, and
you want to create another Collection
containing the same elements but with
all duplicates eliminated. The
following one-liner does the trick.
Collection<Type> noDups = new HashSet<Type>(c);
It works by creating a Set (which, by
definition, cannot contain a
duplicate), initially containing all
the elements in c. It uses the
standard conversion constructor
described in the The Collection
Interface section.
Here is a minor variant of this idiom
that preserves the order of the
original collection while removing
duplicate element.
Collection<Type> noDups = new LinkedHashSet<Type>(c);
The following is a generic method that
encapsulates the preceding idiom,
returning a Set of the same generic
type as the one passed.
public static <E> Set<E> removeDups(Collection<E> c) {
return new LinkedHashSet<E>(c);
}

Assuming you really want set semantics, creating a new Set from the duplicate-containing collection is a great approach. It's very clear what the intent is, it's more compact than doing the loop yourself, and it leaves the source collection intact.
For creating a Set from an array, creating an intermediate List is a common approach. The wrapper returned by Arrays.asList() is lightweight and efficient. There's not a more direct API in core Java to do this, unfortunately.

I think your approach of putting items into a set to produce the collection of unique items is the best one. It's clear, efficient, and correct.
If you're uncomfortable using Arrays.asList() on the way into the set, you could simply run a foreach loop over the array to add items to the set, but I don't see any harm (for non-primitive arrays) in your approach. Arrays.asList() returns a list that is "backed by" the source array, so it doesn't have significant cost in time or space.

1.
Duplicates
Concurring other answers: Using Set should be the most efficient way to remove duplicates. HashSet should run in O(n) time on average. Looping and removing repeats would run in the order of O(n^2). So using Set is recommended in most cases. There are some cases (e.g. limited memory) where iterating might make sense.
2.
Arrays.asList() is a cheap operation that doesn't copy the array, with minimal memory overhead. You can manually add elements by iterating through the array.
public static Set arrayToSet(T[] array) {
Set set = new HashSet(array.length / 2);
for (T item : array)
set.add(item);
return set;
}

Barring any specific performance bottlenecks that you know of (say a collection of tens of thousands of items) converting to a set is a perfectly reasonable solution and should be (IMO) the first way you solve this problem, and only look for something fancier if there is a specific problem to solve.

Storing & lookup double array

I have a fairly expensive array calculation (SpectralResponse) which I like to keep to a minimum. I figured the best way is to store them and bring it back up when same array is needed again in the future. The decision is made using BasicParameters.
So right now, I use a LinkedList of object for the arrays of SpectralResponse, and another LinkedList for the BasicParameter. And the BasicParameters has a isParamsEqualTo(BasicParameters) method to compare the parameter set.
LinkedList<SpectralResponse> responses
LinkedList<BasicParameters> fitParams
LinkedList<Integer> responseNumbers
So to look up, I just go through the list of BasicParameters, check for match, if matched, return the SpectralResponse. If no match, then calculate the SpectralResponse.
Here's is the for loop I used to lookup.
size: LinkedList size, limited to a reasonable value
responseNumber: just another variable to distinguish the SpectralResponse.
for ( i = size-1; i > 0 ; i--) {
if (responseNumbers.get(i) == responseNum)
{
tempFit = fitParams.get(i);
if (tempFit.isParamsEqualTo(fit))
{
return responses.get(i);
}
}
}
But somehow, doing it this way no only take out lots of memory, it's actually slower than just calculating SpectralResponse straight. Much slower.
So it is my implementation that's wrong, or I was mistaken that precalculating and lookup is faster?

You are accessing a LinkedList by index, this is the worst possible way to access it ;)
You should use ArrayList instead, or use iterators for all your lists.
Possibly you should merge the three objects into one, and keep them in a map with responseNum as key.
Hope this helps!

You probably should use an array type (an actual array, like Vector, ArrayList), not Linked lists. Linked lists is best for stack or queue operation, not indexing (since you have to traverse it from one end). Vector is a auto resizing array, wich has less overhead in accessing inexes.

The get(i) methods of LinkedList require that to fetch each item it has to go further and further along the list. Consider using an ArrayList, the iterator() method, or just an array.

The second line, 'if (responseNumbers.get(i) == responseNum)' will also be inefficient as the responseNumbers.get(i) is an Integer, and has to be unboxed to an int (Java 5 onwards does this automatically; your code would not compile on Java 1.4 or earlier if responseNum is declared as an an int). See this for more information on boxing.
To remove this unboxing overhead, use an IntList from the apache primitives library. This library contains collections that store the underlying objects (ints in your case) as a primitive array (e.g. int[]) instead of an Object array. This means no boxing is required as the IntList's methods return primitive types, not Integers.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.