Issue with addAll method - java

Whenever Collection#addAll is called, it creates a copy of the argument list and then attaches it to the collection on whom addAll was called.
Below is the code for case I :
if (parentData != 0) {
if (nodeParentMap.get(parentData) != null) {
nodeParentMap.put(newNodeData, parentData);
//Creates new Nodes' parent and assigns its next to its parent, in that way I get the parents' linked list
Node parent = Node.build(parentData);
parent.next = parentListMap.get(parentData);
parentListMap.put(newNodeData, parent);
}
} else {
//Code for root
nodeParentMap.put(newNodeData, parentData);
parentListMap.put(newNodeData, null);
}
Here its takes N iterations to find Nth parent.
Below is the code for case II:
if (parentData != 0) {
if (nodeParentMap.get(parentData) != null) {
nodeParentMap.put(newNodeData, parentData);
//Here all the parents of a node are present in arrayList #parents,
//so that I can fetch parent in O(1) as I know the index
ArrayList<Integer> parents = new ArrayList<>();
parents.add(parentData);
parents.addAll(parentListMap.get(parentData));
parentListMap.put(newNodeData, parents);
}
} else {
//Code for root
nodeParentMap.put(newNodeData, parentData);
parentListMap.put(newNodeData, new ArrayList<>());
}
But in case II when ArrayList#addAll is called, it creates copy of the list passed and then attatches it. So, Is there a way to execute ArrayList#addAll with calling System#arrayCopy?
Thank you.

In general, you should not care. The difference will be unnoticeable unless you run this code millions of times. You should write your code as cleanly as possible, if possible, and make it show your intent. Do you have a performance issue? Have you profiled your code and the profiler showed you that you're spending a lot of time in copying the array elements?
Measure, don't guess. You need a way to tell there is an issue. And you need a way to tell whether it is gone after a code change.
Could you perhaps change your algorithm if there's so much duplicate data and so much element copying that you maybe could use a more efficient structure or algorithm? For example, you could use Iterables.concat() of Google Guava. The resulting code will be shorter, states your intent very cleanly and does not copy anything - the underlying List will contain a reference to the original data structure and will only get it lazily. Beware that if this is massively chained, you didn't actually help yourself...
If after all this you still think you need to avoid the double array copy anyway, what stops you from doing this?
List<Integer> tempParents = parentListMap.get(parentData);
List<Integer> parents = new ArrayList<>(tempParents.size() + 1);
parents.add(parentData);
for (Integer i : tempParents) {
parents.add(i);
}
Note that performance-wise, this code will generally be comparable to just calling addAll() since in the ArrayList's overridden implementation of addAll() there's no iteration, just hard array copying which is intrinsified in the JVM and highly optimized. The above version will therefore only be useful for short lists (probably) or to solve a memory issue, not a performance one as the iterative version does not require any extra temporary memory while the copying one from addAll() does.

Related

Java Recursion - Alternative to passing-by-reference:

I'm migrating from C to Java and I'm having difficulties with recursion, specially because in Java you can't pass an argument by reference.
What I'm looking is not a solution/trick to force Java pass an argument by reference, but the recommended way to solve such a problem in Java.
Let's take the recursive node insertion in a binary tree:
void nodeInsert(Node n, int a) {
if (n == null)
n = new Node(a);
...
}
In C, by the end of the execution, the node n in the tree would point to the newly created node. In Java, however, n will still be null (because n is passed by value).
What is the suggested Java approach for such problems?
Some approaches I already tried:
Using a static object to keep track of the parent (issue complicates when using generics).
Passing the parent node as part of the function. It works but complicates the code a bit and doesn't look as a good solution.
Creating an additional member pointing to the parent node but this is not a good solution, as it increases the space required by O(n);
Any advice is welcome.
In Java instead of using reference variables, we use return values and assign it to the variable that has to be changed.
Node nodeInsert(Node n, int a) {
if (n == null){
n = new Node(a);
return n;
}
else
{
....
return nodeInsert(n,a); //this is how a recursion is done.
....
}
}
If you need more on recursion http://www.toves.org/books/java/ch18-recurex/ will teach you right.
A common way to implement is to maintain the node relationships inside the node itself. Quite a lot of examples can be found in implementations of various JDK datastructures. So the Node is the container for the value and contains references to other nodes, depending on the data structure.
If you need a child->parent relationship between nodes, the Node class would look like
class Node<T> {
T value;
Node parent;
}
In case of insert, you create a new node, set the parent reference to the original one, and return the new Node as a result (this is optional, but not uncommon to do, so the call has a handle of the new child)
Node<T> insert(Node<T> parent, T value) {
Node<T> child = new Node<>();
child.value = value;
child.parent = parent;
return child;
}
And yes, this adds a minor overhead of 4 bytes per Node (or 8 bytes, on 64bit JVMs without compressed pointers)
I propose the following solutions:
Implement a method in class Node that adds a child node. This makes use of the OO-possibility to encapsulate data and functionality together in a class.
Change nodeInsert to return the new node and add it to the parent in the caller (also mentioned in comments). The responsibility of nodeInsert is to create the node. This is a clear responsibility and the method signature shows what the result of the method is. If the creation is not more than new Node() it might not be worth to have a separate method for it.
You can pass a holder object that in turn references your new Node object
void nodeInsert(AtomicReference<Node> r, int a) {
if (r.get() == null)
r.set(new Node(a));
...
}
Or you could pass an array with space for one element.
After months posting this question, I realized yet another solution that is, in fact, already contemplated in java design patterns but not mentioned here: Null Object Pattern.
The downside is that each null occupies memory (in some cases, like large Red-Black trees, this could become significant).

Does a method call in in a for loop declaration affect performance? [duplicate]

I am writing a game engine, in which a set of objects held in a ArrayList are iterated over using a for loop. Obviously, efficiency is rather important, and so I was wondering about the efficiency of the loop.
for (String extension : assetLoader.getSupportedExtensions()) {
// do stuff with the extension here
}
Where getSupportedExtension() returns an ArrayList of Strings. What I'm wondering is if the method is called every time the loop iterates over a new extension. If so, would it be more efficient to do something like:
ArrayList<String> supportedExtensions = ((IAssetLoader<?>) loader).getSupportedExtensions();
for (String extension : supportedExtensions) {
// stuff
}
? Thanks in advance.
By specification, the idiom
for (String extension : assetLoader.getSupportedExtensions()) {
...
}
expands into
for (Iterator<String> it = assetLoader.getSupportedExtensions().iterator(); it.hasNext();)
{
String extension = it.next();
...
}
Therefore the call you ask about occurs only once, at loop init time. It is the iterator object whose methods are being called repeatedly.
However, if you are honestly interested about the performance of your application, then you should make sure you're focusing on the big wins and not small potatoes like this. It is almost impossible to make a getter call stand out as a bottleneck in any piece of code. This goes double for applications running on HotSpot, which will inline that getter call and turn it into a direct field access.
No, the method assetLoader.getSupportedExtensions() is called only once before the first iteration of the loop, and is used to create an Iterator<String> used by the enhanced for loop.
The two snippets will have the same performance.
Direct cost.
Since, as people said before, the following
for (String extension : assetLoader.getSupportedExtensions()) {
//stuff
}
transforms into
for (Iterator<String> it = assetLoader.getSupportedExtensions().iterator(); it.hasNext();) {
String extension = it.next();
//stuf
}
getSupportedExtensions() is called once and both of your code snippets have the same performance cost, but not the best performance possible to go through the List, because of...
Indirect cost
Which is the cost of instantiation and utilization of new short-living object + cost of method next(). Method iterator() prepares an instance of Iterator. So, it is need to spend time to instantiate the object and then (when that object becomes unreachable) to GC it. The total indirect cost isn't so much (about 10 instructions to allocate memory for new object + a few instructions of constructor + about 5 lines of ArrayList.Itr.next() + removing of the object from Eden on minor GC), but I personally prefer indexing (or even plain arrays):
ArrayList<String> supportedExtensions = ((IAssetLoader<?>) loader).getSupportedExtensions();
for (int i = 0; i < supportedExtensions.size(); i++) {
String extension = supportedExtensions.get(i);
// stuff
}
over iterating when I have to iterate through the list frequently in the main path of my application. Some other examples of standard java code with hidden cost are some String methods (substring(), trim() etc.), NIO Selectors, boxing/unboxing of primitives to store them in Collections etc.

Sorting Implementation, same test case

i have sth like: ( X - different algorithms)
public class XAlgorithm{
sort(List l){...}
}
In testClass it present as follows:
ArrayList array = new ArrayList(...); // original array
public static void main(String[]args){
AlgorithmsTest at = new AlgorithmsTest();
at.testInsertSort();
// when add at.array.printAll() - method printing all elements, there are no changes to original array what I want
at.testBubbleSort();
at.testSelectSort();
at.testShellSort();
}
testBubbleSort{
...
ArrayList arrayBubble = new ArrayList(testBubble.sort(array));
...
}
Problem is my result ( time measured by System.currentTimeMilis() ) is different when i launch for ex. two times in a row the same algorithm, it's also strange because even when I done copying in every method ( by putting all new Elements into new array and then operate on it) still works wrong. Time is always greatest for first algorithm in main no matter which one it is.
I even controlled array between every algorithm ( like //comment in code above) and it is right - no changes to it, so where is the problem :/ ?
Thanks in advance
Even though you stated you're making a copy of the array, it sounds like you're sorting in place and then making a copy of the array.
Therefore, the first time is going to take longest, but all subsequent runs have less work to do because the array is "sorted".
It also seems to say that your sort algorithms have bugs in it, such that you're getting close on the first sort (or it is right) but then a subsequent sort is finding a corner case, causing a slight variation in the sorted array. I'd be analyzing my sort methods and make sure they're working as you intended.

Preventing allocation for ArrayList iterators in Java

So I am part way through writing my first game on Android and after watching a lengthy presentation on optimising for games, I have been checking my allocations. I have managed to get rid of all in-game allocations apart from ones made my ArrayList when it creates an implicit iterator for the for(Object o : m_arrayList) convention.
There are a fair few of these iterations/allocations since all of my game objects, ai entities etc. are stored in these for their ease of use.
So what are my options?
I could, theoretically specify sensible upperbounds and use arrays, but I like the features of ArrayList such as exists and remove that keep code clean and simple.
Override ArrayList and provide my own implementation of iterator() that returns a class member rather than allocating a new iterator type each time it is used.
I would prefer to go for option 2 for ease of use, but I had a little go at this and ran into problems. Does anyone have an example of what I described in option 2 above? I was having problems inheriting from a generic class, type clashes apparently.
The second question to this then is are there any other options for avoiding these allocations?
And I guess as a bonus question, Does anyone know if ArrayList preallocates a number of memory slots for a certain amount (specified either in the ctor or as some shiftable value) and would never need to do any other allocations so long as you stay within those bounds? Even after a clear()?
Thanks in advance, sorry there is so much there but I think this information could be useful to a lot of people.
Use positional iteration.
for ( int i = 0, n = arrayList.size( ); i < n; ++i )
{
Object val = arrayList.get( i );
}
That's how it was done before Java 5.
For preallocation.
ArrayList arrayList = new ArrayList( numSlots );
or at runtime
arrayList.ensureCapacity( numSlots );
And for a bonus -> http://docs.oracle.com/javase/6/docs/api/java/util/ArrayList.html
I'll answer the bonus question first: Yes, ArrayList does pre-allocate slots. It has a constructor that takes the desired number of slots as an argument, e.g. new ArrayList<Whatever>(1000). clear does not deallocate any slots.
Returning a shared iterator reference has a few problems. The main problem is that you have no way of knowing when the iterator should be reset to the first element. Consider the following code:
CustomArrayList<Whatever> list = ...
for (Whatever item : list) {
doSomething();
}
for (Whatever item : list) {
doSomethingElse();
}
The CustomArrayList class has no way of knowing that its shared iterator should be reset between the two loops. If you just reset it on every call to iterator(), then you'll have a problem here:
for (Whatever first : list) {
for (Whatever second : list) {
...
}
}
In this case you do not want to reset the iterator between calls.
#Alexander Progrebnyak's answer is probably the best way to iterate over a list without using an Iterator; just make sure you have fast random access (i.e. don't ever use a LinkedList).
I'd also like to point out that you are getting into some pretty heavy micro-optimization here. I'd suggest that you profile your code and find out if allocating iterators is a genuine problem before you invest much time in it. Even in games you should only optimize what needs optimizing, otherwise you can spend many, many days shaving a few milliseconds off a minute-long operation.

Is ArrayList.size() method cached?

I was wondering, is the size() method that you can call on a existing ArrayList<T> cached?
Or is it preferable in performance critical code that I just store the size() in a local int?
I would expect that it is indeed cached, when you don't add/remove items between calls to size().
Am I right?
update
I am not talking about inlining or such things. I just want to know if the method size() itself caches the value internally, or that it dynamically computes every time when called.
I don't think I'd say it's "cached" as such - but it's just stored in a field, so it's fast enough to call frequently.
The Sun JDK implementation of size() is just:
public int size() {
return size;
}
Yes.
A quick look at the Java source would tell you the answer.
This is the implementation in OpenJDK version:
/**
* Returns the number of elements in this list.
*
* #return the number of elements in this list
*/
public int size() {
return size;
}
So it's as good as a method call is going to get. It's not very likely that HotSpot caches the value returned by this method, so if you're really THAT concerned, you can cache it yourself. Unless your profiling has shown that this is a bottleneck, though (not very likely), you should just concern yourself with readability rather than whether a simple method call that returns the value of a field is cached or not.
I don't know the answer for sure, but my guess would be: no. There is no way for a Java compiler, short of special casing ArrayList, to know that the functions you invoke will be non-mutating and that, as a result, the invocation of size() should return the same value. Therefore, I find it highly unlikely that a Java compiler will factor out repeated calls to size() and store them in a temporary value. If you need that level of optimization then you should store the value in a local variable yourself. Otherwise, yes, you will pay for the function invocation overhead associated with calling the size() method. Note, though, that the size() method is O(1) for an ArrayList (though the function call overhead is pretty hefty). Personally, I would factor out any calls to size() from loops and manually store them in a local where applicable.
Edit
Even though such an optimization cannot be performed by a Java compiler, it has been aptly pointed out that the JIT can inline the implementation of ArrayList.size() such that it only costs the same as a field access, without any additional method call overhead, so in effect the costs are negligible, although you might still save slightly by manually saving in a temporary (which could potentially eliminate a memory lookup and instead serve the variable out of a CPU register).
The obvious implementation of ArrayList would be to store the size internally in a field. I would be very surprised if it ever had to be computed, even after resizing.
Why would it need to be? ArrayList implements a List interface that is backed by an array, after all.
I would assume it to just have a size member, that is incremented when you insert things and decremented when you remove, and then it just returns that.
I haven't looked at more than the API docs now, though.
If caching the result of the size() method would noticeably improve performance (which it sometimes does - I regularly see ArrayList.size() as the top compiled method in my -Xprof output) then consider converting the whole list to an array for an even greater speedup.
Here's one trick that can work if you iterate over the list regularly but only rarely update it:
class FooProcessor {
private Foo[] fooArray = null;
private List<Foo> fooList = new ArrayList<Foo>();
public void addFoo(Foo foo) {
fooList.add(foo);
fooArray = null;
}
public void processAllFoos() {
Foo[] foos = getFooArray();
for (int i = 0; i < foos.length; ++ i) {
process(foos[i]);
}
}
private void getFooArray() {
if (fooArray == null) {
Foo[] tmpArray = new Foo[fooList.size()];
fooArray = fooList.toArray(tmpArray);
}
return fooArray;
}
}

Categories

Resources