How do I use a deep copy efficiently?

How do I use a deep copy efficiently? - java

I am writing a class GenerateNeighbours in which an initial solution Sol is passed to the constructor. A number of "neighbours" of this solution is to be computed. In order to make sure that my initial solution does not change, I am making a deep copy of the initial solution every iteration.
Also, I am making use of an arraylist S_Comp which was created using generateSComp(). At the start each iteration, this version of S_Comp is to be used. Therefore, I am not calling generateSComp() each iteration, as I figured that this would be unnecessary and it would require an excessive amount of time.
Since S_Comp changes within each iteration and I want to make sure that I start with the same S_Comp each iteration, I am also using a deep copy for this. However, I feel like my approach is not so efficient, since the running time of my program barely improved with respect to calling generateSComp every iteration.
Below you can find the code that I am talking about:
public GenerateNeighbours(DataFile testDataFile, ArrayList<Route> Sol, Random randomGen) throws IOException
{
solution = (ArrayList<Route>) DeepCopy.copy(Sol);
S_Comp = new ArrayList<MyNodesData>();
S_Comp = generateSComp(testDataFile, solution, S_Comp);
ArrayList<MyNodesData> originalS_Comp = (ArrayList<MyNodesData>) DeepCopy.copy(S_Comp);
rewards = new ArrayList<Integer>();
neighbours = new ArrayList<ArrayList<Route>>();
//---------Generating neighborhood solutions---------
for(int i=0; i<parameterSet.getBeta(); i++)
{
solution = (ArrayList<Route>) DeepCopy.copy(Sol); <---- making a deep copy every iteration
S_Comp = (ArrayList<MyNodesData>) DeepCopy.copy(originalS_Comp); <---- making a deep copy every iteration
// S_Comp = new ArrayList<MyNodesData>();
// generateSComp(testDataFile, solution, S_Comp); <------- This is what should be unnecessary since S_Comp is the same at the start of every iteration
step1to6(i);
rewards.add(getSolutionReward(solution));
neighbours.add(solution);
}
}
Since I am making two deep copies every iteration, I feel like my program is rather inefficient. Does anybody have a suggestion on how to use my deep copies more efficiently? Or is this just the way it should be done?
(For the DeepCopy class I used the one as suggested at http://javatechniques.com/blog/faster-deep-copies-of-java-objects/)

Related

Issue with addAll method

Whenever Collection#addAll is called, it creates a copy of the argument list and then attaches it to the collection on whom addAll was called.
Below is the code for case I :
if (parentData != 0) {
if (nodeParentMap.get(parentData) != null) {
nodeParentMap.put(newNodeData, parentData);
//Creates new Nodes' parent and assigns its next to its parent, in that way I get the parents' linked list
Node parent = Node.build(parentData);
parent.next = parentListMap.get(parentData);
parentListMap.put(newNodeData, parent);
}
} else {
//Code for root
nodeParentMap.put(newNodeData, parentData);
parentListMap.put(newNodeData, null);
}
Here its takes N iterations to find Nth parent.
Below is the code for case II:
if (parentData != 0) {
if (nodeParentMap.get(parentData) != null) {
nodeParentMap.put(newNodeData, parentData);
//Here all the parents of a node are present in arrayList #parents,
//so that I can fetch parent in O(1) as I know the index
ArrayList<Integer> parents = new ArrayList<>();
parents.add(parentData);
parents.addAll(parentListMap.get(parentData));
parentListMap.put(newNodeData, parents);
}
} else {
//Code for root
nodeParentMap.put(newNodeData, parentData);
parentListMap.put(newNodeData, new ArrayList<>());
}
But in case II when ArrayList#addAll is called, it creates copy of the list passed and then attatches it. So, Is there a way to execute ArrayList#addAll with calling System#arrayCopy?
Thank you.

In general, you should not care. The difference will be unnoticeable unless you run this code millions of times. You should write your code as cleanly as possible, if possible, and make it show your intent. Do you have a performance issue? Have you profiled your code and the profiler showed you that you're spending a lot of time in copying the array elements?
Measure, don't guess. You need a way to tell there is an issue. And you need a way to tell whether it is gone after a code change.
Could you perhaps change your algorithm if there's so much duplicate data and so much element copying that you maybe could use a more efficient structure or algorithm? For example, you could use Iterables.concat() of Google Guava. The resulting code will be shorter, states your intent very cleanly and does not copy anything - the underlying List will contain a reference to the original data structure and will only get it lazily. Beware that if this is massively chained, you didn't actually help yourself...
If after all this you still think you need to avoid the double array copy anyway, what stops you from doing this?
List<Integer> tempParents = parentListMap.get(parentData);
List<Integer> parents = new ArrayList<>(tempParents.size() + 1);
parents.add(parentData);
for (Integer i : tempParents) {
parents.add(i);
}
Note that performance-wise, this code will generally be comparable to just calling addAll() since in the ArrayList's overridden implementation of addAll() there's no iteration, just hard array copying which is intrinsified in the JVM and highly optimized. The above version will therefore only be useful for short lists (probably) or to solve a memory issue, not a performance one as the iterative version does not require any extra temporary memory while the copying one from addAll() does.

Does a method call in in a for loop declaration affect performance? [duplicate]

I am writing a game engine, in which a set of objects held in a ArrayList are iterated over using a for loop. Obviously, efficiency is rather important, and so I was wondering about the efficiency of the loop.
for (String extension : assetLoader.getSupportedExtensions()) {
// do stuff with the extension here
}
Where getSupportedExtension() returns an ArrayList of Strings. What I'm wondering is if the method is called every time the loop iterates over a new extension. If so, would it be more efficient to do something like:
ArrayList<String> supportedExtensions = ((IAssetLoader<?>) loader).getSupportedExtensions();
for (String extension : supportedExtensions) {
// stuff
}
? Thanks in advance.

By specification, the idiom
for (String extension : assetLoader.getSupportedExtensions()) {
...
}
expands into
for (Iterator<String> it = assetLoader.getSupportedExtensions().iterator(); it.hasNext();)
{
String extension = it.next();
...
}
Therefore the call you ask about occurs only once, at loop init time. It is the iterator object whose methods are being called repeatedly.
However, if you are honestly interested about the performance of your application, then you should make sure you're focusing on the big wins and not small potatoes like this. It is almost impossible to make a getter call stand out as a bottleneck in any piece of code. This goes double for applications running on HotSpot, which will inline that getter call and turn it into a direct field access.

No, the method assetLoader.getSupportedExtensions() is called only once before the first iteration of the loop, and is used to create an Iterator<String> used by the enhanced for loop.
The two snippets will have the same performance.

Direct cost.
Since, as people said before, the following
for (String extension : assetLoader.getSupportedExtensions()) {
//stuff
}
transforms into
for (Iterator<String> it = assetLoader.getSupportedExtensions().iterator(); it.hasNext();) {
String extension = it.next();
//stuf
}
getSupportedExtensions() is called once and both of your code snippets have the same performance cost, but not the best performance possible to go through the List, because of...
Indirect cost
Which is the cost of instantiation and utilization of new short-living object + cost of method next(). Method iterator() prepares an instance of Iterator. So, it is need to spend time to instantiate the object and then (when that object becomes unreachable) to GC it. The total indirect cost isn't so much (about 10 instructions to allocate memory for new object + a few instructions of constructor + about 5 lines of ArrayList.Itr.next() + removing of the object from Eden on minor GC), but I personally prefer indexing (or even plain arrays):
ArrayList<String> supportedExtensions = ((IAssetLoader<?>) loader).getSupportedExtensions();
for (int i = 0; i < supportedExtensions.size(); i++) {
String extension = supportedExtensions.get(i);
// stuff
}
over iterating when I have to iterate through the list frequently in the main path of my application. Some other examples of standard java code with hidden cost are some String methods (substring(), trim() etc.), NIO Selectors, boxing/unboxing of primitives to store them in Collections etc.

Sorting Implementation, same test case

i have sth like: ( X - different algorithms)
public class XAlgorithm{
sort(List l){...}
}
In testClass it present as follows:
ArrayList array = new ArrayList(...); // original array
public static void main(String[]args){
AlgorithmsTest at = new AlgorithmsTest();
at.testInsertSort();
// when add at.array.printAll() - method printing all elements, there are no changes to original array what I want
at.testBubbleSort();
at.testSelectSort();
at.testShellSort();
}
testBubbleSort{
...
ArrayList arrayBubble = new ArrayList(testBubble.sort(array));
...
}
Problem is my result ( time measured by System.currentTimeMilis() ) is different when i launch for ex. two times in a row the same algorithm, it's also strange because even when I done copying in every method ( by putting all new Elements into new array and then operate on it) still works wrong. Time is always greatest for first algorithm in main no matter which one it is.
I even controlled array between every algorithm ( like //comment in code above) and it is right - no changes to it, so where is the problem :/ ?
Thanks in advance

Even though you stated you're making a copy of the array, it sounds like you're sorting in place and then making a copy of the array.
Therefore, the first time is going to take longest, but all subsequent runs have less work to do because the array is "sorted".
It also seems to say that your sort algorithms have bugs in it, such that you're getting close on the first sort (or it is right) but then a subsequent sort is finding a corner case, causing a slight variation in the sorted array. I'd be analyzing my sort methods and make sure they're working as you intended.

Preventing allocation for ArrayList iterators in Java

So I am part way through writing my first game on Android and after watching a lengthy presentation on optimising for games, I have been checking my allocations. I have managed to get rid of all in-game allocations apart from ones made my ArrayList when it creates an implicit iterator for the for(Object o : m_arrayList) convention.
There are a fair few of these iterations/allocations since all of my game objects, ai entities etc. are stored in these for their ease of use.
So what are my options?
I could, theoretically specify sensible upperbounds and use arrays, but I like the features of ArrayList such as exists and remove that keep code clean and simple.
Override ArrayList and provide my own implementation of iterator() that returns a class member rather than allocating a new iterator type each time it is used.
I would prefer to go for option 2 for ease of use, but I had a little go at this and ran into problems. Does anyone have an example of what I described in option 2 above? I was having problems inheriting from a generic class, type clashes apparently.
The second question to this then is are there any other options for avoiding these allocations?
And I guess as a bonus question, Does anyone know if ArrayList preallocates a number of memory slots for a certain amount (specified either in the ctor or as some shiftable value) and would never need to do any other allocations so long as you stay within those bounds? Even after a clear()?
Thanks in advance, sorry there is so much there but I think this information could be useful to a lot of people.

Use positional iteration.
for ( int i = 0, n = arrayList.size( ); i < n; ++i )
{
Object val = arrayList.get( i );
}
That's how it was done before Java 5.
For preallocation.
ArrayList arrayList = new ArrayList( numSlots );
or at runtime
arrayList.ensureCapacity( numSlots );
And for a bonus -> http://docs.oracle.com/javase/6/docs/api/java/util/ArrayList.html

I'll answer the bonus question first: Yes, ArrayList does pre-allocate slots. It has a constructor that takes the desired number of slots as an argument, e.g. new ArrayList<Whatever>(1000). clear does not deallocate any slots.
Returning a shared iterator reference has a few problems. The main problem is that you have no way of knowing when the iterator should be reset to the first element. Consider the following code:
CustomArrayList<Whatever> list = ...
for (Whatever item : list) {
doSomething();
}
for (Whatever item : list) {
doSomethingElse();
}
The CustomArrayList class has no way of knowing that its shared iterator should be reset between the two loops. If you just reset it on every call to iterator(), then you'll have a problem here:
for (Whatever first : list) {
for (Whatever second : list) {
...
}
}
In this case you do not want to reset the iterator between calls.
#Alexander Progrebnyak's answer is probably the best way to iterate over a list without using an Iterator; just make sure you have fast random access (i.e. don't ever use a LinkedList).
I'd also like to point out that you are getting into some pretty heavy micro-optimization here. I'd suggest that you profile your code and find out if allocating iterators is a genuine problem before you invest much time in it. Even in games you should only optimize what needs optimizing, otherwise you can spend many, many days shaving a few milliseconds off a minute-long operation.

How bad is declaring arrays inside a for loop in Java?

I come from a C background, so I admit that I'm still struggling with letting go of memory management when writing in Java. Here's one issue that's come up a few times that I would love to get some elaboration on. Here are two ways to write the same routine, the only difference being when double[] array is declared:
Code Sample 1:
double[] array;
for (int i=0; i<n; ++i) {
array = calculateSomethingAndReturnAnArray(i);
if (someFunctionOnArrays(array)) {
// DO ONE THING
} else {
// DO SOME OTHER THING
}
}
Code Sample 2:
for (int i=0; i<n; ++i) {
double[] array = calculateSomethingAndReturnAnArray(i);
if (someFunctionOnArrays(array)) {
// DO ONE THING
} else {
// DO SOME OTHER THING
}
}
Here, private double[] calculateSomethingAndReturnAnArray(int i) always returns an array of the same length. I have a strong aversion to Code Sample 2 because it creates a new array for each iteration when it could just overwrite the existing array. However, I think this might be one of those times when I should just sit back and let Java handle the situation for me.
What are the reasons to prefer one of the ways over the other or are they truly identical in Java?

There's nothing special about arrays here because you're not allocating for the array, you're just creating a new variable, it's equivalent to:
Object foo;
for(...){
foo = func(...);
}
In the case where you create the variable outside the loop it, the variable (which will hold the location of the thing it refers to) will only ever be allocated once, in the case where you create the variable inside the loop, the variable may be reallocated for in each iteration, but my guess is the compiler or the JIT will fix that in an optimization step.
I'd consider this a micro-optimization, if you're running into problems with this segment of your code, you should be making decisions based on measurements rather than on the specs alone, if you're not running into issues with this segment of code, you should do the semantically correct thing and declare the variable in the scope that makes sense.
See also this similar question about best practices.

A declaration of a local variable without an initializing expression will do NO work whatsoever. The work happens when the variable is initialized.
Thus, the following are identical with respects to semantics and performance:
double[] array;
for (int i=0; i<n; ++i) {
array = calculateSomethingAndReturnAnArray(i);
// ...
}
and
for (int i=0; i<n; ++i) {
double[] array = calculateSomethingAndReturnAnArray(i);
// ...
}
(You can't even quibble that the first case allows the array to be used after the loop ends. For that to be legal, array has to have a definite value after the loop, and it doesn't unless you add an initializer to the declaration; e.g. double[] array = null;)
To elaborate on #Mark Elliot 's point about micro-optimization:
This is really an attempt to optimize rather than a real optimization, because (as I noted) it should have no effect.
Even if the Java compiler actually emitted some non-trivial executable code for double[] array;, the chances are that the time to execute would be insignificant compared with the total execution time of the loop body, and of the application as a whole. Hence, this is most likely to be a pointless optimization.
Even if this is a worthwhile optimization, you have to consider that you have optimized for a specific target platform; i.e. a particular combination of hardware and JVM version. Micro-optimizations like this may not be optimal on other platforms, and could in theory be anti-optimizations.
In summary, you are most likely wasting your time if you focus on things like this when writing Java code. If performance is a concern for your application, focus on the MACRO level performance; e.g. things like algorithmic complexity, good database / query design, patterns of network interactions, and so on.

Both create a new array for each iteration. They have the same semantics.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.