Optimizing Java Array Copy - java

So for my research group I am attempting to convert some old C++ code to Java and am running into an issue where in the C++ code it does the following:
method(array+i, other parameters)
Now I know that Java does not support pointer arithmetic, so I got around this by copying the subarray from array+i to the end of array into a new array, but this causes the code to run horribly slow (I.e. 100x slower than the C++ version). Is there a way to get around this? I saw someone mention a built-in method on here, but is that any faster?

Not only does your code become slower, it also changes the semantic of what is happening: when you make a call in C++, no array copying is done, so any change the method may apply to the array is happening in the original, not in the throw-away copy.
To achieve the same effect in Java change the signature of your function as follows:
void method(array, offset, other parameters)
Now the caller has to pass the position in the array that the method should consider the "virtual zero" of the array. In other words, instead of writing something like
for (int i = 0 ; i != N ; i++)
...
you would have to write
for (int i = offset ; i != offset+N ; i++)
...
This would preserve the C++ semantic of passing an array to a member function.

The C++ function probably relied on processing from the beginning of the array. In Java it should be configured to run from an offset into the array so the array doesn't need to be copied. Copying the array, even with System.arraycopy, would take a significant amount of time.
It could be defined as a Java method with something like this:
void method(<somearraytype> array, int offset, other parameters)
Then the method would start at the offset into the array, and it would be called something like this:
method(array, i, other parameters);

If you wish to pass a sub-array to a method, an alternative to copying the sub-array into a new array would be to pass the entire array with an additional offset parameter that indicates the first relevant index of the array. This would require changes in the implementation of method, but if performance is an issue, that's probably the most efficient way.

The right way to handle this is to refactor the method, to take signature
method(int[] array, int i, other parameters)
so that you pass the whole array (by reference), and then tell the method where to start its processing from. Then you don't need to do any copying.

Related

Can a Java array be passed to a C/C++ function which takes an array?

I am learning about JNI nowadays. Let us say I have a C/C++ library function, which takes int* array as an input (We assume int is of 4 bytes and signed just like as it is in Java), that is, an array passed as a pointer. Would it be possible to pass a Java array of int to such function through JNI, without doing any copying (Obviously we remove the length part of the Java array while doing so)? Is the direct ByteBuffer the only viable method for doing such things?
A direct ByteBuffer would be one way of avoiding copying, as you mention yourself.
If you pass a Java array you will need to call Get<Primitive>ArrayElements, which may or may not copy (or Get<Primitive>ArrayRegion, but that would make no sense since it always copies).
There's also GetPrimitiveArrayCritical which you can use if you only need access to the elements for a "short" amount of time, and don't need to perform any other JNI calls before releasing the elements. It is "more likely" than Get<Primitive>ArrayElements to not copy.
An example:
jint len = env->GetArrayLength(myIntArray);
jint *elements = env->GetPrimitiveArrayCritical(myIntArray, NULL);
// Use elements...
env->ReleasePrimitiveArrayCritical(myIntArray, elements, 0);
See Oracle's JNI documentation.

Strange Arrays.asList() behavior

I have this method:
//not related to the error, but included for reference
ArrayList<ArrayList<Color>> a = new ArrayList<ArrayList<Color>>();
void addColorToList(float[] j) //this array always length 3
{
float[] k = Arrays.copyOf(j, 3);
Arrays.sort(k);
//Error in the following line
a.get(Arrays.asList(j).indexOf(k[0])).add(new Color(j[0], j[1], j[2]));
}
and this error:
Exception in thread "AWT-EventQueue-1"
java.lang.ArrayIndexOutOfBoundsException: -1
I've determined that my code always calls a.get() with -1, because Arrays.asList(j). indexOf(k[0]) does not find the element. However, I cannot figure out why this doesn't work as I would expect it to. I tried printing out the result of Arrays.asList(j), but I'm not really sure what to make of the result: [[F#307af497]. Can anybody tell me what the issue is?
Lets start with this:
I tried printing out the result of Arrays.asList(j), but I'm not really sure what to make of the result: [[F#307af497].
You are printing a list using (effectively) the toString() method. So ...
The outer '[' and ']' are from the list formatting.
You have a list consisting of one element.
That element is an array of float. (The [F#307af497 is produced by Object.toString(), and [F is how the type float[] is rendered ...)
This is actually an important clue. Arrays.asList(float[]) is returning a "list of float[]" ...
But why?
Well, because that's what it is supposed to do!!
The Arrays.asList signature is Arrays.asList<T>(T ...). That means it expects either an explicit T[] ... or a sequence of zero or more T instances, which it will then wrap as an Object[]. The Object[] is then wrapped as a List<Object> (roughly speaking).
The critical thing here is that the actual type T must be a reference type ... not a primitive type.
But your code seems to be expecting an overloaded method like this Arrays.asList(float ...) ... and expecting that that will give you your float[] wrapped as a List<Float>.
Unfortunately, there is no such overload for asList.
So what is actually happening is that:
your call is binding to Arrays.asList<float[]>(float[] ...)
the varargs is causing j to be wrapped in an array; i.e. equivalent to new float[][]{j}
the result is an instance of List<float[]> ... with one element.
So what is the solution?
In general ...
One solution would be to represent your floats as a Float[] rather than a float[]. Ideally, you would push this change back through the code that created the array in the first place, etcetera.
A second solution would be to convert the float[] to a Float[] before calling asList. The simple way to do that is with a simple loop. (There may also be a 3rd-party library for doing this.) The downsides are:
the conversion needs to happen each time you call this method which could be expensive if you call it a lot, and
there is no connection between the original array and the array that you have wrapped ... if you wanted to update the array through the list wrapper.
But in this case, the best solution is to simply replace this:
Arrays.asList(j).indexOf(k[0])
with a simple loop that iterates over the original float[] testing for an entry that matches k[0].
The moral of this story: you can easily shoot yourself in the foot by striving for an elegant solution in Java.
Often, dumb code is better. (Both faster, and more correct.)

When taking an element from an array in an object, does Java have to read the whole array?

For example given the following methods:
public double[] getCoord(){
return coord;
}
public double getCoord(int variable){
return coord[variable];
}
Would it be better to call
object.getCoord()[1]
or
object.getCoord(1)
and why?
Although there is no performance difference, the second method presents a far superior API, because Java arrays are always mutable. The first API lets your users write
object.getCoord()[1] = 12.345;
and modify internals of your object behind your back. This is never a good thing: even a non-malicious users could do things you never intended, simply by mistake.
In terms of performance, it doesn't matter. The first method returns a reference to the array, not a copy.
That said, the second method protects the array from being modified outside the class.
No, Java doesn't read the whole array when you use the subscript operator ([]). With regards to would it be better to use the accessor method to grab the array first, then index into it versus call a method that does the same thing, it's probably negligible. You're still incurring the overhead (minimal mind you) of invoking a function and returning a result either way.
I am going to guess that #2 is marginally slower because a parameter has to be pushed onto the stack prior to the call to getCoord(int). Not much in it though.
Neither has to read the whole array.
Both are slower than direct array access, for example coord[1].

Java - Efficient way to access an array

it's been some time since I last coded in Java, but I need a little hint here.
We have a simple function - note that this is C:
void update(double *source, double *target, int n) {
for(int i = 0; i < n; ++i)
target[i] = source[i] * i; // well, actually a bit more complicated, just some kind of calculation
}
So, now I need to recode this function in Java - efficiently. My problems are:
Java has of course no pointers, so how can I pass the arrays efficiently without having large amounts of memory copy operations due to call by value
Which data structure is the best to store the arrays
Note that source and target are large arrays, storing up to 1 million elements
In Java it's almost the same thing:
static void update(double[] source, double[] target, int n)
{
for (int i = 0; i < n; i++)
target[i] = source[i] * i;
}
You don't copy any memory. When you pass an array into this function, it's passing a reference to an array by value.
In general, Java passes function arguments by value. But in the case of arrays and user defined classes, the objects you're dealing with are always reference types. So function calls on classes and arrays are always passing the class/array reference by value.
So if you have a class that looks like:
class Foo
{
int[] A; // For arguments say let's say this contains 1 million items always
}
and you have a function that you can call on it:
static void Bar(Foo f)
{
....
}
It only passes the reference to the Foo, it doesn't make a copy of the data at all.
Arrays are passed by reference, (the value of the reference is passed). So there won't be any new copy of array.
Code will be quite similar:
void update(double source[], double target[], int n)
{
for (int i = 0; i < n; i++)
target[i] = source[i] * i;
}
What do you mean by 'data structure for array'? Array itself is a data structure. You anyways have to access each element for the type of operation you are trying to do. So array itself is a good data structure I guess. You may wanna look at ArrayList.
As some others have already pointed out by-ref / by-value is a C/C++ thing and not applicable to Java.
Now unless you're doing some real native coding passing these arrays C/C++ to / fro Java:
Given that in C code array is passed as pointer (void update(double *source, double *target, int n)) I assume it's size is dynamic, if so your signature in Java should be void update(List<Double> source, List<Double> target, int n). Let the caller decide if it's an ArrayList or Vector or LinkedList or ...
But if you're into some JNI (passing these arrays C/C++ to / fro Java) then perhaps we need to consider other aspects.
The Java Spec says that everything in Java is pass-by-value. There is no such thing as "pass-by-reference" in Java.
But, don't be fooled by this, the internal working is pretty complex, and you can actually manipulate the arrays the way you want.
Verbatim from Oracle's java Tutorials:
Reference data type parameters, such as objects, are also passed into
methods by value. This means that when the method returns, the
passed-in reference still references the same object as before.
However, the values of the object's fields can be changed in the
method, if they have the proper access level.
Java copies and passes the reference by value, not the object. Thus, method manipulation will alter the objects, since the references point to the original objects. But since the references are copies, swaps will fail.
the code to use is similar and straightforward:
void update(double source[], double target[], int n)
{
for (int i = 0; i < n; i++)
target[i] = source[i] * i;
}
For a better understanding of what I mentioned, have a look at this question: Is Java "pass-by-reference" or "pass-by-value"?
As to your question of data structures, use an Array. Looking at your snippet, it is clear that you need random access, so just stick to good ol' arrays..
Java uses references for arrays (and other objects). The value of the reference, not the array itself, is passed in method calls, with cost similar to C pointers. If you don't need to expand them dynamically, simple arrays are the fastest data structure to use.
Otherwise, consider ArrayList<Double>. But these are much more expensive, in both speed and size, because each double is "boxed" in a Double object.
A third alternative is to use a relevant resizable list class from a library with high-performance primitive collections, like Trove's TDoubleArrayList.
A question you didn't ask, is whether Java will use any relevant SIMD features of your processor for a simple loop like this. And I'm glad you didn't, because I don't know. But I'm fairly confident that if it is smart enough to use them, it will only be for simple arrays.
Java uses call-by-object semantic, so there is no copying.

Calling a method n times: should I use a converted for-each loop or a traditional for loop?

Given the need to loop up to an arbitrary int value, is it better programming practice to convert the value into an array and for-each the array, or just use a traditional for loop?
FYI, I am calculating the number of 5 and 6 results ("hits") in multiple throws of 6-sided dice. My arbitrary int value is the dicePool which represents the number of multiple throws.
As I understand it, there are two options:
Convert the dicePool into an array and for-each the array:
public int calcHits(int dicePool) {
int[] dp = new int[dicePool];
for (Integer a : dp) {
// call throwDice method
}
}
Use a traditional for loop:
public int calcHits(int dicePool) {
for (int i = 0; i < dicePool; i++) {
// call throwDice method
}
}
My view is that option 1 is clumsy code and involves unnecessary creation of an array, even though the for-each loop is more efficient than the traditional for loop in Option 2.
At this point, speed isn't important (insert premature-optimization comment ;). What matters is how quickly you can understand what the code does, which is to call a method dicePool times.
The first method allocates an array of size dicePool and iterates through its values, which happens to run the loop body dicePool times (I'll pretend you meant int instead of Integer to avoid the unrelated autoboxing issue). This is potentially inefficient for the computer running the code, but more importantly it's inefficient for the human reading the code as it's conceptually distant from what you wanted to accomplish. Specifically, you force the reader to think about the new array you've just made, AND the value of the variable a, which will be 0 for every iteration of the loop, even though neither of those are related to your end goal.
Any Java programmer looking at the second method will realize that you're executing the loop body dicePool times with i 'counting up' to dicePool. While the latter part isn't especially important, the beginning is exactly what you meant to do. Using this common Java idiom minimizes the unrelated things a reader needs to think about, so it's the best choice.
When in doubt, go with simplicity. :D
Why would you need to allocate an array to loop over a variable that can be safely incremented and used without any need of allocation?
It sounds unecessarily inefficient. You can need to allocate an array if you need to swap the order of ints but this is not the case. I would go for option 2 for sure.
The foreach is useful when you want to iterate on a collection but creating a collection just to iterate over it when you don't need it is just without sense..
(2) is the obvious choice because there's no point in creating the array, based on your description. If there is, of course things change.
What makes you think that the for-each loop is more efficient?
Iterating over a set is very likely less efficient than a simple loop and counter.
It might help if you gave more context about the problem, specifically whether there's more to this question than choosing one syntax over the other. I am having trouble thinking of a problem to which #1 would be a better solution.
I wouldn't write the first one. It's not necessary to use the latest syntax in every setting.
Your instinct is a good one: if it feels and looks clumsy, it probably is.
Go with #2 and sleep at night.

Categories

Resources