InvalidKernelArgs on enqueueNDRange while a similar call works fine

InvalidKernelArgs on enqueueNDRange while a similar call works fine - java

I'm using JavaCL to process images.
I keep getting
com.nativelibs4java.opencl.CLException$InvalidKernelArgs: InvalidKernelArgs
On the call to enqueueNDRange call in this (part of) function :
FloatBuffer outBuffer = ByteBuffer.allocateDirect(4*XYZ.length).order(context.getByteOrder()).asFloatBuffer();
CLFloatBuffer cl_outBuffer = context.createFloatBuffer(CLMem.Usage.Output, outBuffer, false);
CLFloatBuffer cl_inBuffer = context.createFloatBuffer(CLMem.Usage.Input,XYZ.length);
FloatBuffer inBuffer = cl_inBuffer.map(queue, CLMem.MapFlags.Write).put(XYZ);
inBuffer.rewind();
event = cl_inBuffer.unmap(queue, inBuffer);
XYZ2RGBKernel.setArgs(cl_inBuffer, XYZ.length/4,cl_outBuffer);
event = XYZ2RGBKernel.enqueueNDRange(queue, new int[]{XYZ.length/4}, event);
event = cl_outBuffer.read(queue, outBuffer, true, event);
XYZ is a pixel array with 4 floats per pixels (encoded like RGBARGBARGBA....)
The associated kernel header is :
__kernel void XYZ2RGB( __constant float3* inputXYZ,
int numberOfPixels,
__global float* output
)
I can't figure out why it doesn't work since this call to enqueueNDRange :
CLFloatBuffer cl_Rbuffer = context.createFloatBuffer(CLMem.Usage.Input, R.length);
FloatBuffer R_buffer = cl_Rbuffer.map(queue, CLMem.MapFlags.Write).put(R);
R_buffer.rewind();
event = cl_Rbuffer.unmap(queue, R_buffer);
CLFloatBuffer cl_Gbuffer = context.createFloatBuffer(CLMem.Usage.Input, G.length);
FloatBuffer G_buffer = cl_Gbuffer.map(queue, CLMem.MapFlags.Write, event).put(G);
G_buffer.rewind();
event = cl_Gbuffer.unmap(queue, G_buffer);
CLFloatBuffer cl_Bbuffer = context.createFloatBuffer(CLMem.Usage.Input, B.length);
FloatBuffer B_buffer = cl_Bbuffer.map(queue, CLMem.MapFlags.Write, event).put(B);
B_buffer.rewind();
event = cl_Bbuffer.unmap(queue, B_buffer);
FloatBuffer outBuffer = ByteBuffer.allocateDirect(4*4*R.length).order(context.getByteOrder()).asFloatBuffer();
CLFloatBuffer cl_outBuffer = context.createFloatBuffer(CLMem.Usage.Output, outBuffer, false);
RGB2XYZKernel.setArgs(cl_Rbuffer, cl_Gbuffer, cl_Bbuffer, cl_outBuffer);
event = RGB2XYZKernel.enqueueNDRange(queue, new int[]{R.length}, event);
event = cl_outBuffer.read(queue, outBuffer, true, event);
With the associated kernel header :
__kernel void RGB2XYZ( __constant float* inputR,
__constant float* inputG,
__constant float* inputB,
__global float3* output)
Works without any problem.
Before anyone asks, float3 or float4 would work the same, because the OpenCL specs uses 4*sizeof(float) alignment for both. And I've tried switching between the two.
I also tried passing the input as float*, but it doesn't work either.
Both calls happen one after the other.
Update
I fixed it, after multiple hours :
__constant seems to have a size limit (couldn't find that in the specs though). XYZ being 4 times the size of R, G or B, it crashed at runtime.
I had issues afterwards with float3. It seems that the library I'm forced to use isn't up-to-date and so it wasn't supported well enough, so I switched to float4
However if any of you have some more insights about __constant size limit and stuff, let me know, I'm sure it will be handy for the people who will come across this thread.

__constant seems to have a size limit (couldn't find that in the specs though).
Limits depend on the device. Constant buffers have a per-buffer size limit (CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, min 64KB), and there is also a limit on how many constant arguments you can pass to a kernel (CL_DEVICE_MAX_CONSTANT_ARGS, min 8). Both AMD and Nvidia GPUs are usually close to the minimums, so the total amount of data that can be passed as __constant can be very small.
The point of "constant" memory is not to pass read-only input user data to kernels (as you seem to be using it); the point is to store algorithm-specific constants (lookup tables, matrix/polynomial/filter coefficients, etc). If you want to pass read-only input data, the usual way is to declare the kernel argument as __global const <type>* and create the corresponding buffer with CL_MEM_READ_ONLY.
Here is some more insight.

Related

Same prediction for each inference

I saved a tensorflow model using tf.saved_model.builder.SavedModelBuilder.
However, when I try to make predictions in java, in most of the time it returns the same results (for fc8 (alexnet) the layer before softmax) in some cases, it produces some real different results and it's most likely to be correct, so from that, I assume that the training is OK.
Did anyone else experienced this? Does anyone have an idea what's wrong?
my Java implementation:
Tensor image = constructAndExecuteGraphToNormalizeImage(imageBytes);
Tensor result = s.runner().feed("input_tensor", image).feed("Placeholder_1",t).fetch("fc8/fc8").run().get(0);
private static Tensor constructAndExecuteGraphToNormalizeImage(byte[] imageBytes) {
try (Graph g = new Graph()) {
TF.GraphBuilder b = new TF.GraphBuilder(g);
// Some constants specific to the pre-trained model at:
// https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip
//
// - The model was trained with images scaled to 224x224 pixels.
// - The colors, represented as R, G, B in 1-byte each were converted to
// float using (value - Mean)/Scale.
final int H = 227;
final int W = 227;
final float mean = 117f;
final float scale = 1f;
// Since the graph is being constructed once per execution here, we can use a constant for the
// input image. If the graph were to be re-used for multiple input images, a placeholder would
// have been more appropriate.
final Output input = b.constant("input", imageBytes);
final Output output =
b.div(
b.sub(
b.resizeBilinear(
b.expandDims(
b.cast(b.decodeJpeg(input, 3), DataType.FLOAT),
b.constant("make_batch", 0)),
b.constant("size", new int[] {H, W})),
b.constant("mean", mean)),
b.constant("scale", scale));
try (Session s = new Session(g)) {
return s.runner().fetch(output.op().name()).run().get(0);
}
}
}

I am assuming that there is no random operation left in your graph, such as dropout. (Seems to be the case, since you often get the same results).
Alas, some operations in tensorflow seem to be non-deterministic, such as reductions and convolutions. We have to live with the fact that tensorflow's nets are stochastic beasts: their performance can be approached statistically but their outputs are non-deterministic.
(I believe some other frameworks such as Theano go farther than tensorflow in proposing deterministic operations.)

How to use ScriptIntrinsic3DLUT with a .cube file?

first, I'm new to image processing in Android. I have a .cube file that was "Generated by Resolve" that is LUT_3D_SIZE 33. I'm trying to use android.support.v8.renderscript.ScriptIntrinsic3DLUT to apply the lookup table to process an image. I assume that I should use ScriptIntrinsic3DLUT and NOT android.support.v8.renderscript.ScriptIntrinsicLUT, correct?
I'm having problems finding sample code to do this so this is what I've pieced together so far. The issue I'm having is how to create an Allocation based on my .cube file?
...
final RenderScript renderScript = RenderScript.create(getApplicationContext());
final ScriptIntrinsic3DLUT scriptIntrinsic3DLUT = ScriptIntrinsic3DLUT.create(renderScript, Element.U8_4(renderScript));
// How to create an Allocation from .cube file?
//final Allocation allocationLut = Allocation.createXXX();
scriptIntrinsic3DLUT.setLUT(allocationLut);
Bitmap bitmapIn = selectedImage;
Bitmap bitmapOut = selectedImage.copy(bitmapIn.getConfig(),true);
Allocation aIn = Allocation.createFromBitmap(renderScript, bitmapIn);
Allocation aOut = Allocation.createTyped(renderScript, aIn.getType());
aOut.copyTo(bitmapOut);
imageView.setImageBitmap(bitmapOut);
...
Any thoughts?

Parsing the .cube file
First, what you should do is to parse the .cube file.
OpenColorIO shows how to do this in C++. It has some ways to parse the LUT files like .cube, .lut, etc.
For example, FileFormatIridasCube.cpp shows how to
process a .cube file.
You can easily get the size through
LUT_3D_SIZE. I have contacted an image processing algorithm engineer.
This is what he said:
Generally in the industry a 17^3 cube is considered preview, 33^3 normal and 65^3 for highest quality output.
Note that in a .cube file, we can get 3*LUT_3D_SIZE^3 floats.
The key point is what to do with the float array. We cannot set this array to the cube in ScriptIntrinsic3DLUT with the Allocation.
Before doing this we need to handle the float array.
Handle the data in .cube file
As we know, each RGB component is an 8-bit int if it is 8-bit depth.
R is in the high 8-bit, G is in the middle, and B is in the low 8-bit. In this way, a 24-bit int can contain these
three components at the same time.
In a .cube file, each data line contains 3 floats.
Please note: the blue component goes first!!!
I get this conclusion from trial and error. (Or someone can give a more accurate explanation.)
Each float represents the coefficient of the component according to 255. Therefore, we need to calculate the real
value with these three components:
int getRGBColorValue(float b, float g, float r) {
int bcol = (int) (255 * clamp(b, 0.f, 1.f));
int gcol = (int) (255 * clamp(g, 0.f, 1.f));
int rcol = (int) (255 * clamp(r, 0.f, 1.f));
return bcol | (gcol << 8) | (rcol << 16);
}
So we can get an integer from each data line, which contains 3 floats.
And finally, we get the integer array, the length of which is LUT_3D_SIZE^3. This array is expected to be
applied to the cube.
ScriptIntrinsic3DLUT
RsLutDemo shows how to apply ScriptIntrinsic3DLUT.
RenderScript mRs;
Bitmap mBitmap;
Bitmap mLutBitmap;
ScriptIntrinsic3DLUT mScriptlut;
Bitmap mOutputBitmap;
Allocation mAllocIn;
Allocation mAllocOut;
Allocation mAllocCube;
...
int redDim, greenDim, blueDim;
int[] lut;
if (mScriptlut == null) {
mScriptlut = ScriptIntrinsic3DLUT.create(mRs, Element.U8_4(mRs));
}
if (mBitmap == null) {
mBitmap = BitmapFactory.decodeResource(getResources(),
R.drawable.bugs);
mOutputBitmap = Bitmap.createBitmap(mBitmap.getWidth(), mBitmap.getHeight(), mBitmap.getConfig());
mAllocIn = Allocation.createFromBitmap(mRs, mBitmap);
mAllocOut = Allocation.createFromBitmap(mRs, mOutputBitmap);
}
...
// get the expected lut[] from .cube file.
...
Type.Builder tb = new Type.Builder(mRs, Element.U8_4(mRs));
tb.setX(redDim).setY(greenDim).setZ(blueDim);
Type t = tb.create();
mAllocCube = Allocation.createTyped(mRs, t);
mAllocCube.copyFromUnchecked(lut);
mScriptlut.setLUT(mAllocCube);
mScriptlut.forEach(mAllocIn, mAllocOut);
mAllocOut.copyTo(mOutputBitmap);
Demo
I have finished a demo to show the work.
You can view it on Github.
Thanks.

With a 3D LUT yes, you have to use the core framework version as there is no support library version of 3D LUT at this time. Your 3D LUT allocation would have to be created by parsing the file appropriately, there is no built in support for .cube files (or any other 3D LUT format).

How to get java wrapper for libjpeg-turbo to actually compress?

I'm having trouble getting libjpeg-turbo in my java project to actually compress an image. It writes a .jpg fine - but the final size of the result is always e.g. almost the same as a 24bit windows .bmp of the same image. A 480x854 image turns into a 1.2 Megabyte jpeg with the below code snippet. If I use GRAY sampling it's 800Kb (and these are not fancy images to begin with - mostly a neutral background with some filled primary color discs on them for a game I'm working on).
Here's the code I've got so far:
// for some byte[] src in RGB888 format, representing an image of dimensions
// 'width' and 'height'
try
{
TJCompressor tjc = new TJCompressor(
src,
width
0, // "pitch" - scanline size
height
TJ.PF_RGB // format
);
tjc.setJPEGQuality(75);
tjc.setSubsamp(TJ.SAMP_420);
byte[] jpg_data = tjc.compress(0);
new java.io.FileOutputStream(new java.io.File("/tmp/dump.jpg")).write(jpg_data, 0, jpg_data.length);
}
catch(Exception e)
{
e.printStackTrace(System.err);
}
I'm particularly having a hard time finding sample java usage documentation for this project; it mostly assumes a C background/usage. I don't understand the flags to pass to compress (nor do I really know the internals of the jpeg standard, nor do I want to :)!
Thanks!

Doh! And within 5 minutes of posting the question the answer hit me.
A hexdump of the result showed that the end of the file for these images was just lots and lots of 0s.
For anybody in a similar situation in the future, instead of using jpg_data.length (which is apparently entirely too large for some reason), use TJCompressor.getCompressedSize() immediately after your call to TJCompressor.compress().
Final result becomes:
// for some byte[] src in RGB format, representing an image of dimensions
// 'width' and 'height'
try
{
TJCompressor tjc = new TJCompressor(
src,
width
0, // "pitch" - scanline size
height
TJ.PF_RGB // format
);
tjc.setJPEGQuality(75);
tjc.setSubsamp(TJ.SAMP_420);
byte[] jpg_data = tjc.compress(0);
int actual_size = tjc.getCompressedSize();
new java.io.FileOutputStream(new java.io.File("/tmp/dump.jpg")).
write(jpg_data, 0, actual_size);
}
catch(Exception e)
{
e.printStackTrace(System.err);
}

Android Camera Preview YUV format into RGB on the GPU

I have copy pasted some code I found on stackoverflow to convert the default camera preview YUV into RGB format and then uploaded it to OpenGL for processing.
That worked fine, the issue is that most of the CPU was busy at converting the YUV images into the RGB format and it turned into the bottle neck.
I want to upload the YUV image into the GPU and then convert it into RGB in a fragment shader.
I took the same Java YUV to RGB function I found which worked on the CPU and tried to make it work on the GPU.
It turned to be quite a little nightmare, since there are several differences on doing calculations on Java and the GPU.
First, the preview image comes in byte[] in Java, but bytes are signed, so there might be negative values.
In addition, the fragment shader normally deals with [0..1] floating values for instead of a byte.
I am sure this is solveable and I almost solved it. But I spent a few hours trying to figure out what I was doing wrong and couldn't make it work.
Bottom line, I ask for someone to just write this shader function and preferably test it. For me it would be a tedious monkey job since I don't really understand why this conversion works the way it is, and I just try to mimic the same function on the GPU.
This is a very similar function to what I used on Java:
Displaying YUV Image in Android
What I did some of the job on the CPU, such as turnning the 1.5*wh bytes YUV format into a wh*YUV, as follows:
static public void decodeYUV420SP(int[] rgba, byte[] yuv420sp, int width,
int height) {
final int frameSize = width * height;
for (int j = 0, yp = 0; j < height; j++) {
int uvp = frameSize + (j >> 1) * width, u = 0, v = 0;
for (int i = 0; i < width; i++, yp++) {
int y = (int) yuv420sp[yp]+127;
if ((i & 1) == 0) {
v = (int)yuv420sp[uvp++]+127;
u = (int)yuv420sp[uvp++]+127;
}
rgba[yp] = 0xFF000000+(y<<16) | (u<<8) | v;
}
}
}
I added 127 because byte is signed.
I then loaded the rgba into a OpenGL texture and tried to do the rest of the calculation on the GPU.
Any help would be appreaciated...

I used this code from wikipedia to calculate the conversion from YUV to RGB on the GPU:
private static int convertYUVtoRGB(int y, int u, int v) {
int r,g,b;
r = y + (int)1.402f*v;
g = y - (int)(0.344f*u +0.714f*v);
b = y + (int)1.772f*u;
r = r>255? 255 : r<0 ? 0 : r;
g = g>255? 255 : g<0 ? 0 : g;
b = b>255? 255 : b<0 ? 0 : b;
return 0xff000000 | (b<<16) | (g<<8) | r;
}
I converted the floats to 0.0..255.0 and then use the above code.
The part on the CPU was to rearrange the original YUV pixels into a YUV matrix(also shown in wikipdia).
Basically I used the wikipedia code and did the simplest float<->byte conersions to make it work out.
Small mistakes like adding 16 to Y or not adding 128 to U and V would give undesirable results. So you need to take care of it.
But it wasn't a lot of work once I used the wikipedia code as the base.

Converting on CPU sounds easy but I believe question is how to do it on GPU?
I did it recently in my project where I needed to get very fast QR code detection even when camera angle is 45 degrees to surface where code is printed, and it worked with great performance:
(following code is trimmed just to contain key lines, it is assumed that you have both Java and OpenGLES solid understanding)
Create a GL texture that will contain stored Camera image:
int[] txt = new int[1];
GLES20.glGenTextures(1,txt,0);
GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES,txt[0]);
GLES20.glTextParameterf(... set min filter to GL_LINEAR );
GLES20.glTextParameterf(... set mag filter to GL_LINEAR );
GLES20.glTextParameteri(... set wrap_s to GL_CLAMP_TO_EDGE );
GLES20.glTextParameteri(... set wrap_t to GL_CLAMP_TO_EDGE );
Pay attention that texture type is not GL_TEXTURE_2D. This is important, since only a GL_TEXTURE_EXTERNAL_OES type is supported by SurfaceTexture object, which will be used in the next step.
Setup SurfaceTexture:
SurfaceTexture surfTex = new SurfaceTeture(txt[0]);
surfTex.setOnFrameAvailableListener(this);
Above assumes that 'this' is an object that implements 'onFrameAvailable' function.
public void onFrameAvailable(SurfaceTexture st)
{
surfTexNeedUpdate = true;
// this flag will be read in GL render pipeline
}
Setup camera:
Camera cam = Camera.open();
cam.setPreviewTexture(surfTex);
This Camera API is deprecated if you target Android 5.0, so if you are, you have to use new CameraDevice API.
In your render pipeline, have following block to check if camera has frame available, and update surface texture with it. When surface texture is updated, will fill in GL texture that is linked with it.
if( surfTexNeedUpdate )
{
surfTex.updateTexImage();
surfTexNeedUpdate = false;
}
To bind GL texture which has Camera -> SurfaceTeture link to, just do this in rendering pipe:
GLES20.glBindTexture(GLES20.GL_TEXTURE_EXTERNAL_OS, txt[0]);
Goes without saying, you need to set current active texture.
In your GL shader program which will use above texture in it's fragment part, you must have first line:
#extension GL_OES_EGL_imiage_external : require
Above is a must-have.
Texture uniform must be samplerExternalOES type:
uniform samplerExternalOES u_Texture0;
Reading pixel from it is just like from GL_TEXTURE_2D type, and UV coordinates are in same range (from 0.0 to 1.0):
vec4 px = texture2D(u_Texture0, v_UV);
Once you have your render pipeline ready to render a quad with above texture and shader, just start the camera:
cam.startPreview();
You should see quad on your GL screen with live camera feed. Now you just need to grab the image with glReadPixels:
GLES20.glReadPixels(0,0,width,height,GLES20.GL_RGBA, GLES20.GL_UNSIGNED_BYTE, bytes);
Above line assumes that your FBO is RGBA, and that bytes is already initialized byte[] array to proper size, and that width and height are size of your FBO.
And voila! You have captured RGBA pixels from camera instead of converting YUV bytes received in onPreviewFrame callback...
You can also use RGB framebuffer object and avoid alpha if you don't need it.
It is important to note that camera will call onFrameAvailable in it's own thread which is not your GL render pipeline thread, thus you should not perform any GL calls in that function.

In February 2011, Renderscript was first introduced. Since Android 3.0 Honeycomb (API 11), and definitely since Android 4.2 JellyBean (API 17), when ScriptIntrinsicYuvToRGB was added, the easiest and most efficient solution has been to use renderscript for YUV to RGB conversion. I have recently generalized this solution to handle device rotation.

Efficiently Implementing Java Native Interface Webcam Feed

I'm working on a project that takes video input from a webcam and displays regions of motion to the user. My "beta" attempt at this project was to use the Java Media Framework to retrieve the webcam feed. Through some utility functions, JMF conveniently returns webcam frames as BufferedImages, which I built a significant amount of framework around to process. However, I soon realized that JMF isn't well supported by Sun/Oracle anymore, and some of the higher webcam resolutions (720p) are not accessible through the JMF interface.
I'd like to continue processing frames as BufferedImages, and use OpenCV (C++) to source the video feed. Using OpenCV's framework alone, I've found that OpenCV does a good job of efficiently returning high-def webcam frames and painting them to screen.
I figured it would be pretty straightforward to feed this data into Java and achieve the same efficiency. I just finished writing the JNI DLL to copy this data into a BufferedImage and return it to Java. However, I'm finding that the amount of data copying I'm doing is really hindering performance. I'm targeting 30 FPS, but it takes roughly 100 msec alone to even copy the data from the char array returned by OpenCV into a Java BufferedImage. Instead, I'm seeing about 2-5 FPS.
When returning a frame capture, OpenCV provides a pointer to a 1D char array. This data needs to be provided to Java, and apparently I don't have the time to copy any of it.
I need a better solution to get these frame captures into a BufferedImage. A few solutions I'm considering, none of which I think are very good (fairly certain they would also perform poorly):
(1) Override BufferedImage, and return pixel data from various BufferedImage methods by making native calls to the DLL. (Instead of doing the array copying at once, I return individual pixels as requested by the calling code). Note that calling code typically needs all pixels in the image to paint the image or process it, so this individual pixel-grab operation would be implemented in a 2D for-loop.
(2) Instruct the BufferedImage to use a java.nio.ByteBuffer to somehow directly access data in the char array returned by OpenCV. Would appreciate any tips as to how this is done.
(3) Do everything in C++ and forget Java. Well well, yes this does sound like the most logical solution, however I will not have time to start this many-month project from scratch.
As of now, my JNI code has been written to return the BufferedImage, however at this point I'm willing to accept the return of a 1D char array and then put it into a BufferedImage.
By the way... the question here is: What is the most efficient way to copy a 1D char array of image data into a BufferedImage?
Provided is the (inefficient) code that I use to source image from OpenCV and copy into BufferedImage:
JNIEXPORT jobject JNICALL Java_graphicanalyzer_ImageFeedOpenCV_getFrame
(JNIEnv * env, jobject jThis, jobject camera)
{
//get the memory address of the CvCapture device, the value of which is encapsulated in the camera jobject
jclass cameraClass = env->FindClass("graphicanalyzer/Camera");
jfieldID fid = env->GetFieldID(cameraClass,"pCvCapture","I");
//get the address of the CvCapture device
int a_pCvCapture = (int)env->GetIntField(camera, fid);
//get a pointer to the CvCapture device
CvCapture *capture = (CvCapture*)a_pCvCapture;
//get a frame from the CvCapture device
IplImage *frame = cvQueryFrame( capture );
//get a handle on the BufferedImage class
jclass bufferedImageClass = env->FindClass("java/awt/image/BufferedImage");
if (bufferedImageClass == NULL)
{
return NULL;
}
//get a handle on the BufferedImage(int width, int height, int imageType) constructor
jmethodID bufferedImageConstructor = env->GetMethodID(bufferedImageClass,"<init>","(III)V");
//get the field ID of BufferedImage.TYPE_INT_RGB
jfieldID imageTypeFieldID = env->GetStaticFieldID(bufferedImageClass,"TYPE_INT_RGB","I");
//get the int value from the BufferedImage.TYPE_INT_RGB field
jint imageTypeIntRGB = env->GetStaticIntField(bufferedImageClass,imageTypeFieldID);
//create a new BufferedImage
jobject ret = env->NewObject(bufferedImageClass, bufferedImageConstructor, (jint)frame->width, (jint)frame->height, imageTypeIntRGB);
//get a handle on the method BufferedImage.getRaster()
jmethodID getWritableRasterID = env->GetMethodID(bufferedImageClass, "getRaster", "()Ljava/awt/image/WritableRaster;");
//call the BufferedImage.getRaster() method
jobject writableRaster = env->CallObjectMethod(ret,getWritableRasterID);
//get a handle on the WritableRaster class
jclass writableRasterClass = env->FindClass("java/awt/image/WritableRaster");
//get a handle on the WritableRaster.setPixel(int x, int y, int[] rgb) method
jmethodID setPixelID = env->GetMethodID(writableRasterClass, "setPixel", "(II[I)V"); //void setPixel(int, int, int[])
//iterate through the frame we got above and set each pixel within the WritableRaster
jintArray rgbArray = env->NewIntArray(3);
jint rgb[3];
char *px;
for (jint x=0; x < frame->width; x++)
{
for (jint y=0; y < frame->height; y++)
{
px = frame->imageData+(frame->widthStep*y+x*frame->nChannels);
rgb[0] = abs(px[2]); // OpenCV returns BGR bit order
rgb[1] = abs(px[1]); // OpenCV returns BGR bit order
rgb[2] = abs(px[0]); // OpenCV returns BGR bit order
//copy jint array into jintArray
env->SetIntArrayRegion(rgbArray,0,3,rgb); //take values in rgb and move to rgbArray
//call setPixel() this is a copy operation
env->CallVoidMethod(writableRaster,setPixelID,x,y,rgbArray);
}
}
return ret; //return the BufferedImage
}

There is another option if you wish to make your code really fast and still use Java. The AWT windowing toolkit has a direct native interface you can use to draw to an AWT surface using C or C++. Thus, there would be no need to copy anything to Java, as you could render directly from the buffer in C or C++. I am not sure of the specifics on how to do this because I have not looked at it in a while, but I know that it is included in the standard JRE distribution. Using this method, you could probably approach the FPS limit of the camera if you wished, rather than struggling to reach 30 FPS.
If you want to research this further I would start here and here.
Happy Programming!

I would construct the RGB int array required by BufferedImage and then use a single call to
void setRGB(int startX, int startY, int w, int h, int[] rgbArray, int offset, int scansize)
to set the entire image data array at once. Or at least, large portions of it.
Without having timed it, I would suspect that it's the per-pixel calls to
env->SetIntArrayRegion(rgbArray,0,3,rgb);
env->CallVoidMethod(writableRaster,setPixelID,x,y,rgbArray);
which are taking the lion's share of the time.
EDIT: It will be likely the method invocations rather than manipulation of memory, per se, that is taking the time. So build data in your JNI code and copy it in blocks or a single hit to the Java image. Once you create and pin a Java int[] you can access it via native pointers. Then one call to setRGB will copy the array into your image.
Note: You do still have to copy the data at least once, but doing all pixels in one hit via 1 function call will be vastly more efficient than doing them individually via 2 x N function calls.
EDIT 2:
Reviewing my JNI code, I have only ever used byte arrays, but the principles are the same for int arrays. Use:
NewIntArray
to create an int array, and
GetIntArrayElements
to pin it and get a pointer, and when you are done,
ReleaseIntArrayElements
to release it, remembering to use the flag to copy data back to Java's memory heap.
Then, you should be able to use your Java int array handle to invoke the setRGB function.
Remember also that this is actually setting RGBA pixels, so 4 channels, including alpha, not just three (the RGB names in Java seem to predate alpha channel, but most of the so-named methods are compatible with a 32 bit value).

As a secondary consideration, if the only difference between the image data array returned by OpenCV and what is required by Java is the BGR vs RGB, then
px = frame->imageData+(frame->widthStep*y+x*frame->nChannels);
rgb[0] = abs(px[2]); // OpenCV returns BGR bit order
rgb[1] = abs(px[1]); // OpenCV returns BGR bit order
rgb[2] = abs(px[0]); // OpenCV returns BGR bit order
is a relatively inefficient way to convert them. Instead you could do something like:
uint32 px = frame->imageData+(frame->widthStep*y+x*frame->nChannels);
javaArray[ofs]=((px&0x00FF0000)>>16)|(px&0x0000FF00)|((px&0x000000FF)<<16);
(note my C code is rusty, so this might not be entirely valid, but it shows what is needed).

Managed to speed up the process using an NIO ByteBuffer.
On the C++ JNI side...
JNIEXPORT jobject JNICALL Java_graphicanalyzer_ImageFeedOpenCV_getFrame
(JNIEnv * env, jobject jThis, jobject camera)
{
//...
IplImage *frame = cvQueryFrame(pCaptureDevice);
jobject byteBuf = env->NewDirectByteBuffer(frame->imageData, frame->imageSize);
return byteBuf;
}
and on the Java side...
void getFrame(Camera cam)
{
ByteBuffer frameData = cam.getFrame(); //NATIVE call
byte[] imgArray = new byte[frame.data.capacity()];
frameData.get(imgArray); //although it seems like an array copy, this call returns very quickly
DataBufferByte frameDataBuf = new DataBufferByte(imgArray,imgArray.length);
//determine image sample model characteristics
int dataType = DataBuffer.TYPE_BYTE;
int width = cam.getFrameWidth();
int height = cam.getFrameHeight();
int pixelStride = cam.getPixelStride();
int scanlineStride = cam.getScanlineStride();
int bandOffsets = new int[] {2,1,0}; //BGR
//create a WritableRaster with the DataBufferByte
PixelInterleavedSampleModel pism = new PixelInterleavedSampleModel
(
dataType,
width,
height,
pixelStride,
scanlineStride,
bandOffsets
);
WritableRaster raster = new ImgFeedWritableRaster( pism, frameDataBuf, new Point(0,0) );
//create the BufferedImage
ColorSpace cs = ColorSpace.getInstance(ColorSpace.CS_sRGB);
ComponentColorModel cm = new ComponentColorModel(cs, false, false, Transparency.OPAQUE, DataBuffer.TYPE_BYTE);
BufferedImage newImg = new BufferedImage(cm,raster,false,null);
handleNewImage(newImg);
}
Using the java.nio.ByteBuffer, I can quickly address the char array returned by the OpenCV code without (apparently) doing much gruesome array copying.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

InvalidKernelArgs on enqueueNDRange while a similar call works fine - java

Related

Same prediction for each inference

How to use ScriptIntrinsic3DLUT with a .cube file?

How to get java wrapper for libjpeg-turbo to actually compress?

Android Camera Preview YUV format into RGB on the GPU

Efficiently Implementing Java Native Interface Webcam Feed

Categories

Resources