So, I've got a conceptual question. I've been working with JNI on Android for the purposes of doing low-level audio "stuff." I've done plenty of audio coding in C/C++, so I figured that this would not be much of a problem. I decided to do use C++ in my "native" code (because who doesn't love OOP?). The issue I've encountered seems (to me) to be a strange one: when I create an object for processing audio in the C++ code, and I never pass this object to Java (nor the other way around), calling methods on this object seems to invoke the garbage collection quite often. Since this is happening inside audio callbacks, the result is stuttering audio, and I get frequent messages along the lines of:
WAIT_FOR_CONCURRENT_GC blocked 23ms
However, when I perform the same operations by creating static functions (rather than invoking member methods on a memeber object) the performance of the app seems to be fine, and I no longer see the above log message.
Basically, is there any reason calling a static function should have better performance than calling member methods on a member object in native code? More specifically, are member objects, or limited scope variables that live entirely inside the native code of a JNI project involved in garbage collection? Is the C++ call stack involved in GC? Is there any insight anybody can give me on how C++ memory management meets Java memory management when it comes to JNI programming? That is, in the case that I'm not passing data between Java and C++, does the way I write C++ code affect Java memory management (GC or otherwise)?
Allow me to try to give an example. Bear with me, 'cause it's freaking long, and if you think you have insight you're welcome to stop reading here.
I have a couple of objects. One that is responsible for creating the audio engine, initializing output, etc. It is called HelloAudioJNI (sorry for not putting compile-able examples, but there's a lot of code).
class CHelloAudioJNI {
... omitted members ...
//member object pointers
COscillator *osc;
CWaveShaper *waveShaper;
... etc ...
public:
//some methods
void init(float fs, int bufferSize, int channels);
... blah blah blah ...
It follows that I have a couple more classes. The WaveShaper class looks like this:
class CWaveShaper : public CAudioFilter {
protected:
double *coeffs;
unsigned int order;//order
public:
CWaveShaper(const double sampleRate, const unsigned int numChannels,
double *coefficients, const unsigned int order);
double processSample(double input, unsigned int channel);
void reset();
};
Let's not worry about the CAudioFilter class for now, since this example is already quite long. The WaveShaper .cpp file looks like this:
CWaveShaper::CWaveShaper(const double sampleRate,
const unsigned int numChannels,
double *coefficients,
const unsigned int numCoeffs) :
CAudioFilter(sampleRate,numChannels), coeffs(coefficients), order(numCoeffs)
{}
double CWaveShaper::processSample(double input, unsigned int channel)
{
double output = 0;
double pow = input;
//zeroth order polynomial:
output = pow * coeffs[0];
//each additional iteration
for(int iteration = 1; iteration < order; iteration++){
pow *= input;
output += pow * coeffs[iteration];
}
return output;
}
void CWaveShaper::reset() {}
and then there's HelloAudioJNI.cpp. This is where we get into the meat of the issue. I create the member objects properly, using new inside the init function, thusly:
void CHelloAudioJNI::init(float samplerate, int bufferSize, int channels)
{
... some omitted initialization code ...
//wave shaper numero uno
double coefficients[2] = {1.0/2.0, 3.0/2.0};
waveShaper = new CWaveShaper(fs,outChannels,coefficients,2);
... some more omitted code ...
}
Ok everything seems fine so far. Then inside the audio callback we call some member methods on the member object like so:
void CHelloAudioJNI::processOutputBuffer()
{
//compute audio using COscillator object
for(int index = 0; index < outputBuffer.bufferLen; index++){
for(int channel = 0; channel < outputBuffer.numChannels; channel++){
double sample;
//synthesize
sample = osc->computeSample(channel);
//wave-shape
sample = waveShaper->processSample(sample,channel);
//convert to FXP and save to output buffer
short int outputSample = amplitude * sample * FLOAT_TO_SHORT;
outputBuffer.buffer[interleaveIndex(index,channel)] = outputSample;
}
}
}
This is what produces frequent audio interruptions and lots of messages about garbage collection. However, if I copy the CWaveShaper::processSample() function to the HelloAudioJNI.cpp immediately above the callback and call it directly instead of the member function:
sample = waveShape(sample, coeff, 2);
Then I get beautiful beautiful audio coming out of my android device and I do not get such frequent messages about garbage collection. Once again the questions are, are member objects, or limited scope variables that live entirely inside the native code of a JNI project involved in garbage collection? Is the C++ call stack involved in GC? Is there any insight anybody can give me on how C++ memory management meets Java memory management when it comes to JNI programming? That is, in the case that I'm not passing data between Java and C++, does the way I write C++ code affect Java memory management (GC or otherwise)?
There is no relationship between C++ objects and Dalvik's garbage collection. Dalvik has no interest in the contents of the native heap, other than for its own internal storage. All objects created from Java sources live on the "managed" heap, which is where garbage collection takes place.
The Dalvik GC does not examine the native stack; each thread known to the VM has a separate stack for the interpreter to use.
The only way C++ and managed objects are related is if you choose to create a relationship by pairing objects in some way (e.g. creating a new managed object from a C++ constructor, or deleting a native object from a Java finalizer).
You can use the "Allocation Tracker" feature of DDMS / ADT to see the most-recently created objects on the managed heap, and from where they are being allocated. If you run that during the GC flurry you should be able to tell what's causing it.
Also, the logcat messages show the process and thread IDs (from the command line use, adb logcat -v threadtime), which you should check to make sure that the messages are coming from your app, and also to see which thread the GC activity is occurring on. You can see the thread names in the "Threads" tab in DDMS / ADT.
CHelloAudioJNI::init(...) stores a pointer to a stack variable (double coefficients[2]) in
waveShaper. When you access waveShaper->coeffs after coefficients has gone out of scope, BadThings(tm) happens.
Make a copy of the array in your CWaveShaper constructor (and don't forget to delete it in your destructor). Or use std::array.
Related
doingI have a problem while acessing dll functions with multiple threads .
I am using my own-compilated dll . I call a dll function from java (JNA) with multiple java threads .
The operation I am processing is about images processing .
With this method I do observe some little frame generation speed loss.
I am wondering if it is because of the thread access to the dll function .
Here is the function I am using :
__declspec(dllexport) int iterate(double z_r,double z_i,double c_r,double c_i,double maxIteration){
double tmp;
int i=0;
while(z_r*z_r + z_i*z_i < 4 && i <maxIteration){
tmp = z_r;
z_r = z_r*z_r - z_i*z_i + c_r;
z_i = 2*z_i*tmp + c_i;
i++;
}
return i;
}
The problem probably isn't that you are accessing the function from multiple threads, it should be the external access itself. I don't know how big your values for, for example, maxIteration are, but it seems to me that this code snippet doesn't run very long, but often.
Especially when using JNA there's probably some serious overhead when invoking this method. So you should try to do more work at once, before returning to Java (and invoking the external method again...). This way the performance advantages you might have in C could make up for the overhead.
That said, however, it is not sure to say that this method would run faster written in C than written in Java. Without citation at hand at the moment (I will try to find one), I heard in a lecture a few weeks ago that Java is supposed to be amazingly fast when it comes to simple arithmetic operations - and this is the only thing your method does. You should also check if you enabled compiler optimizations when compiling your C library.
Edit: This Wikipedia article states that Java has a performance for arithmetic operations similar to such programs written in C++. So the performance advantage might be slight and the overhead I mentioned before might decide in the end.
I'm trying to understand what is the memory footprint of an object in Java. I read this and other docs on object and memory in Java.
However, when I'm using the sizeof Java library or visualvm, I get two different results where none of them feet what I could expect according to the previous reference (http://www.javamex.com).
For my test, I'm using Java SE 7 Developer Preview on a 64-bits Mac with java.sizeof 0.2.1 and visualvm 1.3.5.
I have three classes, TestObject, TestObject2, TestObject3.
public class TestObject
{
}
public class TestObject2 extends TestObject
{
int a = 3;
}
public class TestObject3 extends TestObject2
{
int b = 4;
int c = 5;
}
My main class:
public class memoryTester
{
public static void main(String[] args) throws Throwable
{
TestObject object1 = new TestObject();
TestObject2 object2 = new TestObject2();
TestObject3 object3 = new TestObject3();
int sum = object2.a + object3.b + object3.c;
System.out.println(sum);
SizeOf.turnOnDebug();
System.out.println(SizeOf.humanReadable(SizeOf.deepSizeOf(object1)));
System.out.println(SizeOf.humanReadable(SizeOf.deepSizeOf(object2)));
System.out.println(SizeOf.humanReadable(SizeOf.deepSizeOf(object3)));
}
}
With java.SizeOf() I get:
{ test.TestObject
} size = 16.0b
16.0b
{ test.TestObject2
a = 3
} size = 16.0b
16.0b
{ test.TestObject3
b = 4
c = 5
} size = 24.0b
24.0b
With visualvm I have:
this (Java frame) TestObject #1 16
this (Java frame) TestObject2 #1 20
this (Java frame) TestObject3 #1 28
According to documentations I read over Internet, as I'm in 64-bits, I should have an object header of 16 bytes, ok for TestObject.
Then for TestObject2 I should add 4 bytes for the integer field giving the 20 bytes, I should add again 4 bytes of padding, giving a total size of 24 bytes for TestObject2. Am I wrong?
Continuing that way for TestObject3, I have to add 8 bytes more for the two integer fields which should give 32 bytes.
VisualVm seems to ignore padding whereas java.sizeOf seems to miss 4 bytes as if there were included in the object header. I can replace an integer by 4 booleans it gives the same result.
Questions:
Why these two tools give different results?
Should we have padding?
I also read somewhere (I did'nt find back the link) that between a class and its subclass there could be some padding, is it right? In that case, an inherited tree of classes could have some memory overhead?
Finally, is there some Java spec/doc which details what Java is doing?
Thanks for your help.
Update:
To answer the comment of utapyngo, to get the size of the objects in visualvm, I create a heapdump, then in the "Classes" part I check the column "size" next after the column "instances". The number of instances if 1 for each kind of objects.
To answer comment of Nathaniel Ford, I initialized each fieds and then did a simple sum with them in my main method to make use of them. It didn't change the results.
Yes padding can happen. It is also possible for objects on the stack to get optimised out entirely. Only the JVM knows the exact sizes at any point in time. As such techniques to approximate the size from within the Java language all tend to disagree, tools that attach to the JVM tend to be the most accurate however. The three main techniques of implementing sizeOf within Java that I am aware of are:
serialize the object and return the length of those bytes
(clearly wrong, but useful for relative comparisons)
List item reflection,
and hard coded size constants for each field found on an object. can
be tuned to be kinda accurate but changes in the JVM and padding
that the JVM may or may not be doing will throw it.
List item create loads
of objects, run gc and compare changes in jvm heap size
None of these techniques are accurate.
If you are running on the Oracle JVM, on or after v1.5. Then there is a way to read the size of an object straight out of the C structure used by the Java runtime. Not a good idea for production, and get it wrong then you can crash the JVM. But here is a blog post that you may find interesting if you wish to have a go at it: http://highlyscalable.wordpress.com/2012/02/02/direct-memory-access-in-java/
As for documentation on what Java is actually doing, that is JVM specific, version specific and potentially configuration specific too. Each implementation is free to handle objects differently. Even to the extent of optimising objects out entirely, for example, objects that are not passed out from the stack are free not to be allocated on the heap. Some JVMs may even manage to keep the object within the CPU registers entirely. Not your case here, but I include it as an example as to why getting the true size of Java objects is tricky.
So best to take any sizeOf values that you get with a pinch of salt and treat it as a 'guideline' measurement only.
Suppose that I have a C data-structure containing many data fields (>15):
struct MyData
{
int x;
float y;
...
}
In Java, I can store a pointer to a MyData as a long, and access members of the C data structure through JNI calls:
long mydata_p = MyDataJNI.alloc();
int x = MyDataJNI.getX( mydata_p );
float y = MyDataJNI.getY( mydata_p );
...
However, the calls to these functions are shockingly expensive (10x-100x the cost of an equivalent C function call). This is true even if the implementation getX, getY, ... is as simple as:
return ((MyData*)())->x
Q1: Why are JNI calls so expensive? What else is going on other than a call to a function pointer? (For reference, I'm looking at JNI calls in Android.)
Q2: What is the fastest what to make all of the elements of my C structure available at the Java layer?
You want to move all of the data with a single JNI call. You can either copy it into a primitive byte array (and then extract Java primitives from that using some combination of ByteArrayOutputStream and DataOutputStream), or use a direct NIO ByteBuffer (ByteBuffer.allocateDirect), which makes more sense if you need to move data back and forth. The direct ByteBuffer is specifically designed and optimized for sharing memory between the VM and native code.
Instead of storing the results of the native call in an externally managed pointer, you could have it return the result in a Java object.
The easiest would be to memcpy the C structs straight into a byte[], and then on the Java side, wrap that array in a ByteBuffer and read it with getInt(), getFloat(), etc.
You could also return an array of Java objects constructed to reflect your structs, but the code would be an unbelievable mess, somewhat akin to constructing everything through reflection with explicit memory management.
I am coding up something using the JNI Invocation API. A C program starts up a JVM and makes calls into it. The JNIenv pointer is global to the C file. I have numerous C functions which need to perform the same operation on a given class of jobject. So I wrote helper functions which take a jobject and process it, returning the needed data (a C data type...for example, an int status value). Is it safe to write C helper functions and pass jobjects to them as arguments?
i.e. (a simple example - designed to illustrate the question):
int getStatusValue(jobject jStatus)
{
return (*jenv)->CallIntMethod(jenv,jStatus,statusMethod);
}
int function1()
{
int status;
jobject aObj = (*jenv)->NewObject
(jenv,
aDefinedClass,
aDefinedCtor);
jobject j = (*jenv)->CallObjectMethod
(jenv,
aObj,
aDefinedObjGetMethod)
status = getStatusValue(j);
(*jenv)->DeleteLocalRef(jenv,aObj);
(*jenv)->DeleteLocalRef(jenv,j);
return status;
}
Thanks.
I'm not acquainted with the details of JNI, but once thing I noticed is this:
return (*jenv)->CallIntMethod(jenv,jStatus,statusMethod);
That looks like the official JNI code and it is taking a jobect as a parameter. If it works for JNI, there is no reason it can't work for your code.
All jni objects are valid until the native method returns. As long as you dont store non global jni objects between two jni calls everything should work.
The invocation of a jni function should work like this:
Java function call
create native local references
call native function
do your stuff
exit native function
release existing local references
return to java
The step 4 can contain any code, local references stay valid until step 6 if not release before.
If you want to store jni objects on the c side between two calls to a native java function you have to create global references and release them later. Not releasing a global reference leads to memory leaks as the garbage collector is unable to free the related java objects.
At the moment, i'm trying to create a Java-application which uses CUDA-functionality. The connection between CUDA and Java works fine, but i've got another problem and wanted to ask, if my thoughts about it are correct.
When i call a native function from Java, i pass some data to it, the functions calculates something and returns a result. Is it possible, to let the first function return a reference (pointer) to this result which i can pass to JNI and call another function that does further calculations with the result?
My idea was to reduce the overhead that comes from copying data to and from the GPU by leaving the data in the GPU memory and just passing a reference to it so other functions can use it.
After trying some time, i thought for myself, this shouldn't be possible, because pointers get deleted after the application ends (in this case, when the C-function terminates). Is this correct? Or am i just to bad in C to see the solution?
Edit:
Well, to expand the question a little bit (or make it more clearly): Is memory allocated by JNI native functions deallocated when the function ends? Or may i still access it until either the JNI application ends or when i free it manually?
Thanks for your input :)
I used the following approach:
in your JNI code, create a struct that would hold references to objects you need. When you first create this struct, return its pointer to java as a long. Then, from java you just call any method with this long as a parameter, and in C cast it to a pointer to your struct.
The structure will be in the heap, so it will not be cleared between different JNI calls.
EDIT: I don't think you can use long ptr = (long)&address; since address is a static variable. Use it the way Gunslinger47 suggested, i.e. create new instance of class or a struct (using new or malloc) and pass its pointer.
In C++ you can use any mechanism you want to allocate/free memory: the stack, malloc/free, new/delete or any other custom implementation. The only requirement is that if you allocated a block of memory with one mechanism, you have to free it with the same mechanism, so you can't call free on a stack variable and you can't call delete on malloced memory.
JNI has its own mechanisms for allocating/freeing JVM memory:
NewObject/DeleteLocalRef
NewGlobalRef/DeleteGlobalRef
NewWeakGlobalRef/DeleteWeakGlobalRef
These follow the same rule, the only catch is that local refs can be deleted "en masse" either explicitly, with PopLocalFrame, or implicitly, when the native method exits.
JNI doesn't know how you allocated your memory, so it can't free it when your function exits. Stack variables will obviously be destroyed because you're still writing C++, but your GPU memory will remain valid.
The only problem then is how to access the memory on subsequent invocations, and then you can use Gunslinger47's suggestion:
JNIEXPORT jlong JNICALL Java_MyJavaClass_Function1() {
MyClass* pObject = new MyClass(...);
return (long)pObject;
}
JNIEXPORT void JNICALL Java_MyJavaClass_Function2(jlong lp) {
MyClass* pObject = (MyClass*)lp;
...
}
While the accepted answer from #denis-tulskiy does make sense, I've personnally followed suggestions from here.
So instead of using a pseudo-pointer type such as jlong (or jint if you want to save some space on 32bits arch), use instead a ByteBuffer. For example:
MyNativeStruct* data; // Initialized elsewhere.
jobject bb = (*env)->NewDirectByteBuffer(env, (void*) data, sizeof(MyNativeStruct));
which you can later re-use with:
jobject bb; // Initialized elsewhere.
MyNativeStruct* data = (MyNativeStruct*) (*env)->GetDirectBufferAddress(env, bb);
For very simple cases, this solution is very easy to use. Suppose you have:
struct {
int exampleInt;
short exampleShort;
} MyNativeStruct;
On the Java side, you simply need to do:
public int getExampleInt() {
return bb.getInt(0);
}
public short getExampleShort() {
return bb.getShort(4);
}
Which saves you from writing lots of boilerplate code ! One should however pay attention to byte ordering as explained here.
Java wouldn't know what to do with a pointer, but it should be able to store a pointer from a native function's return value then hand it off to another native function for it to deal with. C pointers are nothing more than numeric values at the core.
Another contibutor would have to tell you whether or not the pointed to graphics memory would be cleared between JNI invocations and if there would be any work-arounds.
I know this question was already officially answered, but I'd like to add my solution:
Instead of trying to pass a pointer, put the pointer in a Java array (at index 0) and pass that to JNI. JNI code can get and set the array element using GetIntArrayRegion/SetIntArrayRegion.
In my code, I need the native layer to manage a file descriptor (an open socket). The Java class holds a int[1] array and passes it to the native function. The native function can do whatever with it (get/set) and put back the result in the array.
If you are allocating memory dynamically (on the heap) inside of the native function, it is not deleted. In other words, you are able to retain state between different calls into native functions, using pointers, static vars, etc.
Think of it a different way: what could you do safely keep in an function call, called from another C++ program? The same things apply here. When a function is exited, anything on the stack for that function call is destroyed; but anything on the heap is retained unless you explicitly delete it.
Short answer: as long as you don't deallocate the result you're returning to the calling function, it will remain valid for re-entrance later. Just make sure to clean it up when you're done.
Its best to do this exactly how Unsafe.allocateMemory does.
Create your object then type it to (uintptr_t) which is a 32/64 bit unsigned integer.
return (uintptr_t) malloc(50);
void * f = (uintptr_t) jlong;
This is the only correct way to do it.
Here is the sanity checking Unsafe.allocateMemory does.
inline jlong addr_to_java(void* p) {
assert(p == (void*)(uintptr_t)p, "must not be odd high bits");
return (uintptr_t)p;
}
UNSAFE_ENTRY(jlong, Unsafe_AllocateMemory(JNIEnv *env, jobject unsafe, jlong size))
UnsafeWrapper("Unsafe_AllocateMemory");
size_t sz = (size_t)size;
if (sz != (julong)size || size < 0) {
THROW_0(vmSymbols::java_lang_IllegalArgumentException());
}
if (sz == 0) {
return 0;
}
sz = round_to(sz, HeapWordSize);
void* x = os::malloc(sz, mtInternal);
if (x == NULL) {
THROW_0(vmSymbols::java_lang_OutOfMemoryError());
}
//Copy::fill_to_words((HeapWord*)x, sz / HeapWordSize);
return addr_to_java(x);
UNSAFE_END