I wanted to transform a TYPE_3BYTE_BGR BufferedImage in Java to yuv using the sws_scale function of FFMpeg through JNI. I first extract the data of my image from the BufferedImage as
byte[] imgData = ((DataBufferByte) myImage.getRaster().getDataBuffer()).getData();
byte[] output = processImage(toSend,0);
Then I pass it to the processImage function which is a native function. The C++ side looks like this:
JNIEXPORT jbyteArray JNICALL Java_jni_JniExample_processData
(JNIEnv *env, jobject obj, jbyteArray data, jint index)
jboolean isCopy;
uint8_t *test = (uint8_t *)env->GetPrimitiveArrayCritical(data, &isCopy);
uint8_t *inData[1]; // RGB24 have one plane
inData[0] = test;
SwsContext * ctx = sws_getContext(width,height,AV_PIX_FMT_BGR24, (int)width, (int)width,
AV_PIX_FMT_YUV420P, 0, 0, 0, 0);
int lumaPlaneSize = width *height;
uint8_t *yuv[3];
yuv[0] = new uint8_t[lumaPlaneSize];
yuv[1] = new uint8_t[lumaPlaneSize/4];
yuv[2] = new uint8_t[lumaPlaneSize/4];
int inLinesize[1] = { 3*nvEncoder->width }; // RGB stride
int outLinesize[3] = { 3*width ,3*width ,3*width }; // YUV stride
sws_scale(ctx, inData, inLinesize, 0, height , yuv, outLinesize);
However, after running the code, I get the warning: [swscaler # 0x7fb598659480] Warning: data is not aligned! This can lead to a speedloss, everything crashes., and everything crashes on the last line. Am I doing things properly in terms of passing the correct arguments to sws_scale? (specially the strides).
There was a separate bug here: SwsContext * ctx = sws_getContext(width,height,AV_PIX_FMT_BGR24, (int)width, (int)width,0,NULL,NULL,NULL) which should be changed to: SwsContext * ctx = sws_getContext(width,height,AV_PIX_FMT_BGR24, (int)height, (int)width,0,NULL,NULL,NULL)

The first problem I see - wrong strides for output image:
yuv[0] = new uint8_t[lumaPlaneSize];
yuv[1] = new uint8_t[lumaPlaneSize/4];
yuv[2] = new uint8_t[lumaPlaneSize/4];
int inLinesize[1] = { 3*nvEncoder->width }; // RGB stride
int outLinesize[3] = { 3*width ,3*width ,3*width }; // YUV stride
// ^^^^^^^ ^^^^^^^ ^^^^^^^
Allocated planes are not large enough for passed strides. YUV420 uses one byte for each channel, so 3 is redundant and leads to bound violation. due rescaler skips a lot of space when goes to next line. Next, actual chroma width is a half of luma width, so if you want tight-packed luma and chroma planes without gaps at line ends use next:
int outLinesize[3] = { width , width / 2 , width / 2 }; // YUV stride
Allocation sizes remain the same.

Looking at the source, in particular around line 321, you get that warning message if your system supports AVX2 instructions and the various pointers and sizes are not multiples of 16. The crash is probably occurring because the arrays you pass in, inData, inLineSize, and outLinesize, are not the right size. The pointer arrays need to have at least 3 elements, and the stride arrays need 4. Somewhere in sws_scale it is accessing inData[1] which is outside the bounds of your array resulting in a bad pointer.


How to efficiently display raw rgb int array in javafx?

I have an int array where each value stores a bitpacked rgb value (8 bits per channel) and alpha is always 255(opaque) and i want to display that in javafx.
My current approach is using a canvas like this:
GraphicsContext graphics = canvas.getGraphicsContext2D();
PixelWriter pw = graphics.getPixelWriter();
pw.setPixels(0, 0, width, height, PixelFormat.getIntArgbInstance(), pixels, 0, width);
However before that i actually have to set the alpha component of each pixel by iterating each pixel and OR'ing it with a mask that turns the pixel from rgb to argb like this:
for (int i = 0; i < pixels.length; i++) {
pixels[i] = 0xFF000000 | pixels[i];
Is there a more efficient to do this (as the pixels array is updated many times every second)?
I was hoping there's a IntRgbInstance but unfortunately there isn't (only ByteRgbInstance)
Other approaches i've tested:
Approach 1: Creating a IntBuffer that is filled up like this:
IntBuffer buffer = IntBuffer.allocate(pixels.length * 4);
for (int pixel : pixels) {
buffer.put(0xFF000000 | pixel);
And then generating a PixelBuffer that uses this buffer, the pixel buffer is then used as an input to this WritableImage constructor:
and then i display that WritableImage using a ImageView
This however still didn't speed up anything(rather made it a bit slower) and im guessing that because i have to construct a new WritableImage instance each time the pixels int array is updated.
Approach 2 (that didn't work for some reason, i.e. it displayed nothing in the screen): Creating a buffer the same way as above and using that in one of the setPixels() methods that takes in a buffer:
IntBuffer buffer = IntBuffer.allocate(pixels.length * 4);
for (int pixel : pixels) {
buffer.put(0xFF000000 | pixel);
pw.setPixels(0, 0, width, height, PixelFormat.getIntArgbInstance(), buffer, width);
After a bit of more research i found out that i don't need to create a new WritableImage instance each time the pixels array is updated but i can just use the updateBuffer method here:
So the code currently looks like this:
pb.updateBuffer(callback -> {
for (int pixel : pixels) {
buffer.put(0xFF000000 | pixel);
return null;
Where pb, buffer is only created once like this:
IntBuffer buffer = IntBuffer.allocate(pixels.length * 4);
PixelBuffer<IntBuffer> pb = new PixelBuffer<>(width, height, buffer, PixelFormat.getIntArgbPreInstance());
view.setImage(new WritableImage(pb));
and this did indeed result in a nice speedup (close to 2x compared to my initial approach)
Maybe this is what you are looking for. You could create a PixelBuffer from an IntBuffer of your data.

copying an image onto another with JOCL/OpenCL

so my goal is to use the GPU for my brand new Java project which is to create a game and the game engine itself (I think it is a very good way to learn in deep how it works).
I was using multi-threading on the CPU with java.awt.Graphics2D to display my game, but i have observed on other PCs that the game was running below 40FPS so i have decided to learn how to use GPU (I will be still rendering all objects in a for loop then draw the image on screen).
For that reason, I started to code following the OpenCL documentation and the JOCL samples a small simple test which is to paint the texture onto the background image (let's amdit that every entities has a texture).
This method is called in each render call and it is given the background, the texture, and the position of this entity as arguments.
Both codes below has been updated to fit #ProjectPhysX recommandations.
public static void XXX(final BufferedImage output_image, final BufferedImage input_image, float x, float y) {
cl_image_format format = new cl_image_format();
format.image_channel_order = CL_RGBA;
format.image_channel_data_type = CL_UNSIGNED_INT8;
//allocate ouput pointer
cl_image_desc output_description = new cl_image_desc();
output_description.buffer = null; //must be null for 2D image
output_description.image_depth = 0; //is only used if the image is a 3D image
output_description.image_row_pitch = 0; //must be 0 if host_ptr is null
output_description.image_slice_pitch = 0; //must be 0 if host_ptr is null
output_description.num_mip_levels = 0; //must be 0
output_description.num_samples = 0; //must be 0
output_description.image_type = CL_MEM_OBJECT_IMAGE2D;
output_description.image_width = output_image.getWidth();
output_description.image_height = output_image.getHeight();
output_description.image_array_size = output_description.image_width * output_description.image_height;
cl_mem output_memory = clCreateImage(context, CL_MEM_WRITE_ONLY, format, output_description, null, null);
//set up first kernel arg
clSetKernelArg(kernel, 0, Sizeof.cl_mem,;
//allocates input pointer
cl_image_desc input_description = new cl_image_desc();
input_description.buffer = null; //must be null for 2D image
input_description.image_depth = 0; //is only used if the image is a 3D image
input_description.image_row_pitch = 0; //must be 0 if host_ptr is null
input_description.image_slice_pitch = 0; //must be 0 if host_ptr is null
input_description.num_mip_levels = 0; //must be 0
input_description.num_samples = 0; //must be 0
input_description.image_type = CL_MEM_OBJECT_IMAGE2D;
input_description.image_width = input_image.getWidth();
input_description.image_height = input_image.getHeight();
input_description.image_array_size = input_description.image_width * input_description.image_height;
DataBufferInt input_buffer = (DataBufferInt) input_image.getRaster().getDataBuffer();
int input_data[] = input_buffer.getData();
cl_mem input_memory = clCreateImage(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, format, input_description,, null);
//loads the input buffer to the gpu memory
long[] input_origin = new long[] { 0, 0, 0 };
long[] input_region = new long[] { input_image.getWidth(), input_image.getHeight(), 1 };
int input_row_pitch = input_image.getWidth() * Sizeof.cl_uint; //the length of each row in bytes
clEnqueueWriteImage(commandQueue, input_memory, CL_TRUE, input_origin, input_region, input_row_pitch, 0,, 0, null, null);
//set up second kernel arg
clSetKernelArg(kernel, 1, Sizeof.cl_mem,;
//set up third and fourth kernel args
clSetKernelArg(kernel, 2, Sizeof.cl_float, float[] { x }));
clSetKernelArg(kernel, 3, Sizeof.cl_float, float[] { y }));
//blocks until all previously queued commands are issued
//enqueue the program execution
long[] globalWorkSize = new long[] { input_description.image_width, input_description.image_height };
clEnqueueNDRangeKernel(commandQueue, kernel, 2, null, globalWorkSize, null, 0, null, null);
//transfer the output result back to host
DataBufferInt output_buffer = (DataBufferInt) output_image.getRaster().getDataBuffer();
int output_data[] = output_buffer.getData();
long[] output_origin = new long[] { 0, 0, 0 };
long[] output_region = new long[] { output_description.image_width, output_description.image_height, 1 };
int output_row_pitch = output_image.getWidth() * Sizeof.cl_uint;
clEnqueueReadImage(commandQueue, output_memory, CL_TRUE, output_origin, output_region, output_row_pitch, 0,, 0, null, null);
//free pointers
And here's the program source runned on the kernel.
__kernel void drawImage(__write_only image2d_t dst_image, __read_only image2d_t src_image, float xoff, float yoff)
const int x = get_global_id(0);
const int y = get_global_id(1);
int2 in_coords = (int2) { x, y };
uint4 pixel = read_imageui(src_image, sampler, in_coords);
pixel = -16184301;
printf("%d, %d, %u\n", x, y, pixel);
const int sx = get_global_size(0);
const int sy = get_global_size(1);
int2 out_coords = (int2) { ((int) xoff + x) % sx, ((int) yoff + y) % sy};
write_imageui(dst_image, out_coords, pixel);
Without the call to write_imageui, the background is painted black, otherwhise it is white.
At the moment, I am a bit struggling to understand why pixel = 0 in the C function, but i think that someone familiar with JOCL would found out very quick my error in this code. I am very confused with this code for today, maybe tomorrow, but i don't feel like I will ever catch myself my mistake. For that reason i request your help to review my code. I feel like an idiot that i can't figure it out at that point.
const int sx = get_global_size(0);
const int sy = get_global_size(1);
int2 out_coords = (int2) { (xoff + x)%sx, (yoff + y)%sy};
to avoid errors or undefined behaviour. Right now you are writing into Nirwana if the coordinate+offset is putside the image region. Also there is no clEnqueueWriteImage before the kernel is called, so src_image on the GPU is undefined and may contain random values.
OpenCL requires kernel parameters to be declared in global memory space:
__kernel void drawImage(global image2d_t dst_image, global image2d_t src_image, global float xoff, global float yoff)
Also as someone who has written a graphics engine in Java, C++ and GPU-parallelized in OpenCL, let me give you some guidance: In the Java code, you probably use painter's algorithm: Make a list of all drawn objects with their approximate z-coordinates, sort the objects by z-coordinate and draw them back-to-front in a single for-loop. On the GPU, painter's algorithm won't work as you cannot parallelize it. Instead you have a list of objects (lines/triangles) in 3D space, and you parallelize over this list: Each GPU thread rasterizes a single triangle, all threads at the same time, and draw the pixels on the frame at the same time. To solve the draing order problem, you use a z-buffer: an image consisting of a z-coordinate per pixel. During rasterization of the line/triange, you calculate the z-coordinate for every pixel, and only if it is larger than the one previously in the z-buffer at that pixel, you draw the new color.
Regarding performance: java.awt.Graphics2D is very efficient in terms of CPU usage, you can do ~40k triangles per frame at 60fps. With OpenCL, expect ~30M triangles per frame at 60fps.

Translate Python function to apply mask to image into Java

I'm trying to translate the following Python function, that applies a mask to an image, into Java:
# Applies an image mask.
def region_of_interest(img, vertices):
#defining a blank mask to start with
mask = np.zeros_like(img)
#defining a 3 channel or 1 channel color to fill the mask with depending on the input image
if len(img.shape) > 2:
channel_count = img.shape[2] # i.e. 3 or 4 depending on your image
ignore_mask_color = (255,) * channel_count
ignore_mask_color = 255
#filling pixels inside the polygon defined by "vertices" with the fill color
cv2.fillPoly(mask, vertices, ignore_mask_color)
#returning the image only where mask pixels are nonzero
masked_image = cv2.bitwise_and(img, mask)
return masked_image
So far, this is what I've got:
public static opencv_core.Mat applyMask(opencv_core.Mat image, opencv_core.MatVector vertices) {
opencv_core.Mat mask = opencv_core.Mat.zeros(image.size(), opencv_core.CV_8U).asMat();
opencv_core.Scalar color = new opencv_core.Scalar(image.channels()); // 3
double[] colors = new double[] {
255.0, 255.0, 255.0, 255.0,
255.0, 255.0, 255.0, 255.0,
255.0, 255.0, 255.0, 255.0};
color.put(colors, 0, colors.length);
opencv_imgproc.fillPoly(mask, vertices, color);
opencv_core.Mat dst = new opencv_core.Mat();
opencv_core.bitwise_and(image, mask, dst);
return dst;
But, it isn't working. When I try invoking this method like in the following example:
opencv_core.MatVector points = new opencv_core.MatVector(
new opencv_core.Mat(2, 3, opencv_core.CV_32F, new IntPointer(1, 2, 3, 4, 5, 6))
opencv_core.MatVector vertices = new opencv_core.MatVector(points);
opencv_core.Mat masked = LaneManager.applyMask(src, vertices);
(I'm assuming this is the right way to build a 2x3 matrix of three points with two coordinates each (1,2), (3, 4) and (5,6))
I get an exception:
java.lang.RuntimeException: std::bad_alloc
at org.bytedeco.javacpp.opencv_imgproc.fillPoly(Native Method)
I'm using OpenCV as provided by org.bytedeco.javacpp-presets:opencv-platform:3.2.0-1.3 via Maven Central.
I must admit that I'm at a loss here: What's the idiomatic Java way of doing the same thing as the Python function above?
Alright, I finally figured it out. If you define your coordinates with:
int[] points = new int[] {x1, y1, x2, y2, ...};
Then you can simply apply a mask with the following code:
opencv_core.Mat mask = new opencv_core.Mat(image.size(), image.type());
// Array of polygons where each polygon is represented as an array of points
opencv_core.Point polygon = new opencv_core.Point();
polygon.put(points, 0, points.length);
opencv_imgproc.fillPoly(mask, polygon, new int[] {points.length / 2}, 1, new opencv_core.Scalar(255, 255, 255, 0));
opencv_core.Mat masked = new opencv_core.Mat(image.size(), image.type());
opencv_core.bitwise_and(image, mask, masked);
Where image is the original image, and masked is the masked result.
The problem was that the original list of points wasn't defined properly.
Maybe some whiz has the complete answers. Here is food for thought:
The Java API is a direct copy of the CPP API:
The error std::bad_alloc occurs when you fail to allocate required storage space. (
There are two CPP methods:
C++: void fillPoly(Mat& img, const Point** pts, const int* npts, int ncontours, const Scalar& color, int lineType=LINE_8, int shift=0, Point offset=Point() ), and
C++: void fillPoly(InputOutputArray img, InputArrayOfArrays pts, const Scalar& color, int lineType=LINE_8, int shift=0, Point offset=Point() )
You don't need to convert from Mat to InputArray, but you can (and should) just pass a Mat object where an InputArray is requested (

How to use ScriptIntrinsic3DLUT with a .cube file?

first, I'm new to image processing in Android. I have a .cube file that was "Generated by Resolve" that is LUT_3D_SIZE 33. I'm trying to use to apply the lookup table to process an image. I assume that I should use ScriptIntrinsic3DLUT and NOT, correct?
I'm having problems finding sample code to do this so this is what I've pieced together so far. The issue I'm having is how to create an Allocation based on my .cube file?
final RenderScript renderScript = RenderScript.create(getApplicationContext());
final ScriptIntrinsic3DLUT scriptIntrinsic3DLUT = ScriptIntrinsic3DLUT.create(renderScript, Element.U8_4(renderScript));
// How to create an Allocation from .cube file?
//final Allocation allocationLut = Allocation.createXXX();
Bitmap bitmapIn = selectedImage;
Bitmap bitmapOut = selectedImage.copy(bitmapIn.getConfig(),true);
Allocation aIn = Allocation.createFromBitmap(renderScript, bitmapIn);
Allocation aOut = Allocation.createTyped(renderScript, aIn.getType());
Any thoughts?
Parsing the .cube file
First, what you should do is to parse the .cube file.
OpenColorIO shows how to do this in C++. It has some ways to parse the LUT files like .cube, .lut, etc.
For example, FileFormatIridasCube.cpp shows how to
process a .cube file.
You can easily get the size through
LUT_3D_SIZE. I have contacted an image processing algorithm engineer.
This is what he said:
Generally in the industry a 17^3 cube is considered preview, 33^3 normal and 65^3 for highest quality output.
Note that in a .cube file, we can get 3*LUT_3D_SIZE^3 floats.
The key point is what to do with the float array. We cannot set this array to the cube in ScriptIntrinsic3DLUT with the Allocation.
Before doing this we need to handle the float array.
Handle the data in .cube file
As we know, each RGB component is an 8-bit int if it is 8-bit depth.
R is in the high 8-bit, G is in the middle, and B is in the low 8-bit. In this way, a 24-bit int can contain these
three components at the same time.
In a .cube file, each data line contains 3 floats.
Please note: the blue component goes first!!!
I get this conclusion from trial and error. (Or someone can give a more accurate explanation.)
Each float represents the coefficient of the component according to 255. Therefore, we need to calculate the real
value with these three components:
int getRGBColorValue(float b, float g, float r) {
int bcol = (int) (255 * clamp(b, 0.f, 1.f));
int gcol = (int) (255 * clamp(g, 0.f, 1.f));
int rcol = (int) (255 * clamp(r, 0.f, 1.f));
return bcol | (gcol << 8) | (rcol << 16);
So we can get an integer from each data line, which contains 3 floats.
And finally, we get the integer array, the length of which is LUT_3D_SIZE^3. This array is expected to be
applied to the cube.
RsLutDemo shows how to apply ScriptIntrinsic3DLUT.
RenderScript mRs;
Bitmap mBitmap;
Bitmap mLutBitmap;
ScriptIntrinsic3DLUT mScriptlut;
Bitmap mOutputBitmap;
Allocation mAllocIn;
Allocation mAllocOut;
Allocation mAllocCube;
int redDim, greenDim, blueDim;
int[] lut;
if (mScriptlut == null) {
mScriptlut = ScriptIntrinsic3DLUT.create(mRs, Element.U8_4(mRs));
if (mBitmap == null) {
mBitmap = BitmapFactory.decodeResource(getResources(),
mOutputBitmap = Bitmap.createBitmap(mBitmap.getWidth(), mBitmap.getHeight(), mBitmap.getConfig());
mAllocIn = Allocation.createFromBitmap(mRs, mBitmap);
mAllocOut = Allocation.createFromBitmap(mRs, mOutputBitmap);
// get the expected lut[] from .cube file.
Type.Builder tb = new Type.Builder(mRs, Element.U8_4(mRs));
Type t = tb.create();
mAllocCube = Allocation.createTyped(mRs, t);
mScriptlut.forEach(mAllocIn, mAllocOut);
I have finished a demo to show the work.
You can view it on Github.
With a 3D LUT yes, you have to use the core framework version as there is no support library version of 3D LUT at this time. Your 3D LUT allocation would have to be created by parsing the file appropriately, there is no built in support for .cube files (or any other 3D LUT format).

Efficiently Implementing Java Native Interface Webcam Feed

I'm working on a project that takes video input from a webcam and displays regions of motion to the user. My "beta" attempt at this project was to use the Java Media Framework to retrieve the webcam feed. Through some utility functions, JMF conveniently returns webcam frames as BufferedImages, which I built a significant amount of framework around to process. However, I soon realized that JMF isn't well supported by Sun/Oracle anymore, and some of the higher webcam resolutions (720p) are not accessible through the JMF interface.
I'd like to continue processing frames as BufferedImages, and use OpenCV (C++) to source the video feed. Using OpenCV's framework alone, I've found that OpenCV does a good job of efficiently returning high-def webcam frames and painting them to screen.
I figured it would be pretty straightforward to feed this data into Java and achieve the same efficiency. I just finished writing the JNI DLL to copy this data into a BufferedImage and return it to Java. However, I'm finding that the amount of data copying I'm doing is really hindering performance. I'm targeting 30 FPS, but it takes roughly 100 msec alone to even copy the data from the char array returned by OpenCV into a Java BufferedImage. Instead, I'm seeing about 2-5 FPS.
When returning a frame capture, OpenCV provides a pointer to a 1D char array. This data needs to be provided to Java, and apparently I don't have the time to copy any of it.
I need a better solution to get these frame captures into a BufferedImage. A few solutions I'm considering, none of which I think are very good (fairly certain they would also perform poorly):
(1) Override BufferedImage, and return pixel data from various BufferedImage methods by making native calls to the DLL. (Instead of doing the array copying at once, I return individual pixels as requested by the calling code). Note that calling code typically needs all pixels in the image to paint the image or process it, so this individual pixel-grab operation would be implemented in a 2D for-loop.
(2) Instruct the BufferedImage to use a java.nio.ByteBuffer to somehow directly access data in the char array returned by OpenCV. Would appreciate any tips as to how this is done.
(3) Do everything in C++ and forget Java. Well well, yes this does sound like the most logical solution, however I will not have time to start this many-month project from scratch.
As of now, my JNI code has been written to return the BufferedImage, however at this point I'm willing to accept the return of a 1D char array and then put it into a BufferedImage.
By the way... the question here is: What is the most efficient way to copy a 1D char array of image data into a BufferedImage?
Provided is the (inefficient) code that I use to source image from OpenCV and copy into BufferedImage:
JNIEXPORT jobject JNICALL Java_graphicanalyzer_ImageFeedOpenCV_getFrame
(JNIEnv * env, jobject jThis, jobject camera)
//get the memory address of the CvCapture device, the value of which is encapsulated in the camera jobject
jclass cameraClass = env->FindClass("graphicanalyzer/Camera");
jfieldID fid = env->GetFieldID(cameraClass,"pCvCapture","I");
//get the address of the CvCapture device
int a_pCvCapture = (int)env->GetIntField(camera, fid);
//get a pointer to the CvCapture device
CvCapture *capture = (CvCapture*)a_pCvCapture;
//get a frame from the CvCapture device
IplImage *frame = cvQueryFrame( capture );
//get a handle on the BufferedImage class
jclass bufferedImageClass = env->FindClass("java/awt/image/BufferedImage");
if (bufferedImageClass == NULL)
return NULL;
//get a handle on the BufferedImage(int width, int height, int imageType) constructor
jmethodID bufferedImageConstructor = env->GetMethodID(bufferedImageClass,"<init>","(III)V");
//get the field ID of BufferedImage.TYPE_INT_RGB
jfieldID imageTypeFieldID = env->GetStaticFieldID(bufferedImageClass,"TYPE_INT_RGB","I");
//get the int value from the BufferedImage.TYPE_INT_RGB field
jint imageTypeIntRGB = env->GetStaticIntField(bufferedImageClass,imageTypeFieldID);
//create a new BufferedImage
jobject ret = env->NewObject(bufferedImageClass, bufferedImageConstructor, (jint)frame->width, (jint)frame->height, imageTypeIntRGB);
//get a handle on the method BufferedImage.getRaster()
jmethodID getWritableRasterID = env->GetMethodID(bufferedImageClass, "getRaster", "()Ljava/awt/image/WritableRaster;");
//call the BufferedImage.getRaster() method
jobject writableRaster = env->CallObjectMethod(ret,getWritableRasterID);
//get a handle on the WritableRaster class
jclass writableRasterClass = env->FindClass("java/awt/image/WritableRaster");
//get a handle on the WritableRaster.setPixel(int x, int y, int[] rgb) method
jmethodID setPixelID = env->GetMethodID(writableRasterClass, "setPixel", "(II[I)V"); //void setPixel(int, int, int[])
//iterate through the frame we got above and set each pixel within the WritableRaster
jintArray rgbArray = env->NewIntArray(3);
jint rgb[3];
char *px;
for (jint x=0; x < frame->width; x++)
for (jint y=0; y < frame->height; y++)
px = frame->imageData+(frame->widthStep*y+x*frame->nChannels);
rgb[0] = abs(px[2]); // OpenCV returns BGR bit order
rgb[1] = abs(px[1]); // OpenCV returns BGR bit order
rgb[2] = abs(px[0]); // OpenCV returns BGR bit order
//copy jint array into jintArray
env->SetIntArrayRegion(rgbArray,0,3,rgb); //take values in rgb and move to rgbArray
//call setPixel() this is a copy operation
return ret; //return the BufferedImage
There is another option if you wish to make your code really fast and still use Java. The AWT windowing toolkit has a direct native interface you can use to draw to an AWT surface using C or C++. Thus, there would be no need to copy anything to Java, as you could render directly from the buffer in C or C++. I am not sure of the specifics on how to do this because I have not looked at it in a while, but I know that it is included in the standard JRE distribution. Using this method, you could probably approach the FPS limit of the camera if you wished, rather than struggling to reach 30 FPS.
If you want to research this further I would start here and here.
Happy Programming!
I would construct the RGB int array required by BufferedImage and then use a single call to
void setRGB(int startX, int startY, int w, int h, int[] rgbArray, int offset, int scansize)
to set the entire image data array at once. Or at least, large portions of it.
Without having timed it, I would suspect that it's the per-pixel calls to
which are taking the lion's share of the time.
EDIT: It will be likely the method invocations rather than manipulation of memory, per se, that is taking the time. So build data in your JNI code and copy it in blocks or a single hit to the Java image. Once you create and pin a Java int[] you can access it via native pointers. Then one call to setRGB will copy the array into your image.
Note: You do still have to copy the data at least once, but doing all pixels in one hit via 1 function call will be vastly more efficient than doing them individually via 2 x N function calls.
Reviewing my JNI code, I have only ever used byte arrays, but the principles are the same for int arrays. Use:
to create an int array, and
to pin it and get a pointer, and when you are done,
to release it, remembering to use the flag to copy data back to Java's memory heap.
Then, you should be able to use your Java int array handle to invoke the setRGB function.
Remember also that this is actually setting RGBA pixels, so 4 channels, including alpha, not just three (the RGB names in Java seem to predate alpha channel, but most of the so-named methods are compatible with a 32 bit value).
As a secondary consideration, if the only difference between the image data array returned by OpenCV and what is required by Java is the BGR vs RGB, then
px = frame->imageData+(frame->widthStep*y+x*frame->nChannels);
rgb[0] = abs(px[2]); // OpenCV returns BGR bit order
rgb[1] = abs(px[1]); // OpenCV returns BGR bit order
rgb[2] = abs(px[0]); // OpenCV returns BGR bit order
is a relatively inefficient way to convert them. Instead you could do something like:
uint32 px = frame->imageData+(frame->widthStep*y+x*frame->nChannels);
(note my C code is rusty, so this might not be entirely valid, but it shows what is needed).
Managed to speed up the process using an NIO ByteBuffer.
On the C++ JNI side...
JNIEXPORT jobject JNICALL Java_graphicanalyzer_ImageFeedOpenCV_getFrame
(JNIEnv * env, jobject jThis, jobject camera)
IplImage *frame = cvQueryFrame(pCaptureDevice);
jobject byteBuf = env->NewDirectByteBuffer(frame->imageData, frame->imageSize);
return byteBuf;
and on the Java side...
void getFrame(Camera cam)
ByteBuffer frameData = cam.getFrame(); //NATIVE call
byte[] imgArray = new byte[];
frameData.get(imgArray); //although it seems like an array copy, this call returns very quickly
DataBufferByte frameDataBuf = new DataBufferByte(imgArray,imgArray.length);
//determine image sample model characteristics
int dataType = DataBuffer.TYPE_BYTE;
int width = cam.getFrameWidth();
int height = cam.getFrameHeight();
int pixelStride = cam.getPixelStride();
int scanlineStride = cam.getScanlineStride();
int bandOffsets = new int[] {2,1,0}; //BGR
//create a WritableRaster with the DataBufferByte
PixelInterleavedSampleModel pism = new PixelInterleavedSampleModel
WritableRaster raster = new ImgFeedWritableRaster( pism, frameDataBuf, new Point(0,0) );
//create the BufferedImage
ColorSpace cs = ColorSpace.getInstance(ColorSpace.CS_sRGB);
ComponentColorModel cm = new ComponentColorModel(cs, false, false, Transparency.OPAQUE, DataBuffer.TYPE_BYTE);
BufferedImage newImg = new BufferedImage(cm,raster,false,null);
Using the java.nio.ByteBuffer, I can quickly address the char array returned by the OpenCV code without (apparently) doing much gruesome array copying.

