I was looking to convert a buffered image to its corresponding pixel value array. I found a code for that:
public static double[] createArrFromIm(BufferedImage im){
int imWidth = im.getWidth();
int imHeight = im.getHeight();
double[] imArr = new double[imWidth* imHeight];
im.getData().getPixels(0, 0, imWidth, imHeight, imArr);
return imArr;
}
The original author who wrote this code block also gave some sample images which work perfect for this block. However, when I try to run this block against my images (the images are always 125*150) the block throws an array index out of bound exception at line:
im.getData().getPixels(0, 0, imWidth, imHeight, imArr);
This incident seems very arcane to me. Any help or suggestion will be very much appreciable. Thanks.
As #FiReTiTi says, you should use the getRaster() method instead of the getData() method, unless you really want a copy of the image data.
However, that is not the cause of the exception. The problem is that your double array only allocates space for a single band (similarly, FiReTiTi's version works, because he explicitly leaves the last parameter 0, only requesting the first band). This is fine for single band (gray scale) images, but I assume you use RGB, CMYK or other color model with multiple bands.
The fix is to multiply the allocated space with the number of bands, as below:
public static double[] createArrFromIm(BufferedImage im) {
int imWidth = im.getWidth();
int imHeight = im.getHeight();
int imBands = im.getRaster().getNumBands(); // typically 3 or 4, depending on RGB or ARGB
double[] imArr = new double[imWidth * imHeight * imBands];
im.getRaster().getPixels(0, 0, imWidth, imHeight, imArr);
return imArr;
}
Do it using the raster:
public static double[] createArrFromIm(BufferedImage im){
int imWidth = im.getWidth();
int imHeight = im.getHeight();
double[] imArr = new double[imWidth* imHeight];
for (int y=0, nb=0 ; y < imHeight ; y++)
for (int x=0 ; x < imWidth ; x++, nb++)
imArr[nb] = im.getRaster().getSampleDouble(x, y, 0) ;
return imArr;
}
As pointed by #haraldK, getData() also works, but it returns a copy of the raster, so it's slower.
There is a faster way using the DataBuffer, but then you have to manage the BufferedImage encoding because you have a direct access to the pixel values.
And here is the answer why what you did, didn't work. im.getData().getPixels() returns an array, it does not fill the array you give as parameter. The array that you give as parameter just determines the return type. So if you want to use getData (it's not the best option), you have to do:
double[] imArr = im.getData().getPixels(0, 0, imWidth, imHeight, (double[])null) ;
Related
so my goal is to use the GPU for my brand new Java project which is to create a game and the game engine itself (I think it is a very good way to learn in deep how it works).
I was using multi-threading on the CPU with java.awt.Graphics2D to display my game, but i have observed on other PCs that the game was running below 40FPS so i have decided to learn how to use GPU (I will be still rendering all objects in a for loop then draw the image on screen).
For that reason, I started to code following the OpenCL documentation and the JOCL samples a small simple test which is to paint the texture onto the background image (let's amdit that every entities has a texture).
This method is called in each render call and it is given the background, the texture, and the position of this entity as arguments.
Both codes below has been updated to fit #ProjectPhysX recommandations.
public static void XXX(final BufferedImage output_image, final BufferedImage input_image, float x, float y) {
cl_image_format format = new cl_image_format();
format.image_channel_order = CL_RGBA;
format.image_channel_data_type = CL_UNSIGNED_INT8;
//allocate ouput pointer
cl_image_desc output_description = new cl_image_desc();
output_description.buffer = null; //must be null for 2D image
output_description.image_depth = 0; //is only used if the image is a 3D image
output_description.image_row_pitch = 0; //must be 0 if host_ptr is null
output_description.image_slice_pitch = 0; //must be 0 if host_ptr is null
output_description.num_mip_levels = 0; //must be 0
output_description.num_samples = 0; //must be 0
output_description.image_type = CL_MEM_OBJECT_IMAGE2D;
output_description.image_width = output_image.getWidth();
output_description.image_height = output_image.getHeight();
output_description.image_array_size = output_description.image_width * output_description.image_height;
cl_mem output_memory = clCreateImage(context, CL_MEM_WRITE_ONLY, format, output_description, null, null);
//set up first kernel arg
clSetKernelArg(kernel, 0, Sizeof.cl_mem, Pointer.to(output_memory));
//allocates input pointer
cl_image_desc input_description = new cl_image_desc();
input_description.buffer = null; //must be null for 2D image
input_description.image_depth = 0; //is only used if the image is a 3D image
input_description.image_row_pitch = 0; //must be 0 if host_ptr is null
input_description.image_slice_pitch = 0; //must be 0 if host_ptr is null
input_description.num_mip_levels = 0; //must be 0
input_description.num_samples = 0; //must be 0
input_description.image_type = CL_MEM_OBJECT_IMAGE2D;
input_description.image_width = input_image.getWidth();
input_description.image_height = input_image.getHeight();
input_description.image_array_size = input_description.image_width * input_description.image_height;
DataBufferInt input_buffer = (DataBufferInt) input_image.getRaster().getDataBuffer();
int input_data[] = input_buffer.getData();
cl_mem input_memory = clCreateImage(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, format, input_description, Pointer.to(input_data), null);
//loads the input buffer to the gpu memory
long[] input_origin = new long[] { 0, 0, 0 };
long[] input_region = new long[] { input_image.getWidth(), input_image.getHeight(), 1 };
int input_row_pitch = input_image.getWidth() * Sizeof.cl_uint; //the length of each row in bytes
clEnqueueWriteImage(commandQueue, input_memory, CL_TRUE, input_origin, input_region, input_row_pitch, 0, Pointer.to(input_data), 0, null, null);
//set up second kernel arg
clSetKernelArg(kernel, 1, Sizeof.cl_mem, Pointer.to(input_memory));
//set up third and fourth kernel args
clSetKernelArg(kernel, 2, Sizeof.cl_float, Pointer.to(new float[] { x }));
clSetKernelArg(kernel, 3, Sizeof.cl_float, Pointer.to(new float[] { y }));
//blocks until all previously queued commands are issued
clFinish(commandQueue);
//enqueue the program execution
long[] globalWorkSize = new long[] { input_description.image_width, input_description.image_height };
clEnqueueNDRangeKernel(commandQueue, kernel, 2, null, globalWorkSize, null, 0, null, null);
//transfer the output result back to host
DataBufferInt output_buffer = (DataBufferInt) output_image.getRaster().getDataBuffer();
int output_data[] = output_buffer.getData();
long[] output_origin = new long[] { 0, 0, 0 };
long[] output_region = new long[] { output_description.image_width, output_description.image_height, 1 };
int output_row_pitch = output_image.getWidth() * Sizeof.cl_uint;
clEnqueueReadImage(commandQueue, output_memory, CL_TRUE, output_origin, output_region, output_row_pitch, 0, Pointer.to(output_data), 0, null, null);
//free pointers
clReleaseMemObject(input_memory);
clReleaseMemObject(output_memory);
}
And here's the program source runned on the kernel.
const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST;
__kernel void drawImage(__write_only image2d_t dst_image, __read_only image2d_t src_image, float xoff, float yoff)
{
const int x = get_global_id(0);
const int y = get_global_id(1);
int2 in_coords = (int2) { x, y };
uint4 pixel = read_imageui(src_image, sampler, in_coords);
pixel = -16184301;
printf("%d, %d, %u\n", x, y, pixel);
const int sx = get_global_size(0);
const int sy = get_global_size(1);
int2 out_coords = (int2) { ((int) xoff + x) % sx, ((int) yoff + y) % sy};
write_imageui(dst_image, out_coords, pixel);
}
Without the call to write_imageui, the background is painted black, otherwhise it is white.
At the moment, I am a bit struggling to understand why pixel = 0 in the C function, but i think that someone familiar with JOCL would found out very quick my error in this code. I am very confused with this code for today, maybe tomorrow, but i don't feel like I will ever catch myself my mistake. For that reason i request your help to review my code. I feel like an idiot that i can't figure it out at that point.
Try
const int sx = get_global_size(0);
const int sy = get_global_size(1);
int2 out_coords = (int2) { (xoff + x)%sx, (yoff + y)%sy};
to avoid errors or undefined behaviour. Right now you are writing into Nirwana if the coordinate+offset is putside the image region. Also there is no clEnqueueWriteImage before the kernel is called, so src_image on the GPU is undefined and may contain random values.
OpenCL requires kernel parameters to be declared in global memory space:
__kernel void drawImage(global image2d_t dst_image, global image2d_t src_image, global float xoff, global float yoff)
Also as someone who has written a graphics engine in Java, C++ and GPU-parallelized in OpenCL, let me give you some guidance: In the Java code, you probably use painter's algorithm: Make a list of all drawn objects with their approximate z-coordinates, sort the objects by z-coordinate and draw them back-to-front in a single for-loop. On the GPU, painter's algorithm won't work as you cannot parallelize it. Instead you have a list of objects (lines/triangles) in 3D space, and you parallelize over this list: Each GPU thread rasterizes a single triangle, all threads at the same time, and draw the pixels on the frame at the same time. To solve the draing order problem, you use a z-buffer: an image consisting of a z-coordinate per pixel. During rasterization of the line/triange, you calculate the z-coordinate for every pixel, and only if it is larger than the one previously in the z-buffer at that pixel, you draw the new color.
Regarding performance: java.awt.Graphics2D is very efficient in terms of CPU usage, you can do ~40k triangles per frame at 60fps. With OpenCL, expect ~30M triangles per frame at 60fps.
Now I am learning about Image.I want to copy an image. I try:
private BufferedImage mImage, mNewImage;
private int mWidth, mHeight;
private int[] mPixelData;
public void generate() {
try {
mImage = ImageIO.read(new File("D:\\Documents\\Pictures\\image.png"));
mWidth = mImage.getWidth();
mHeight = mImage.getHeight();
mPixelData = new int[mWidth * mHeight];
// get pixel data from image
for (int i = 0; i < mHeight; i++) {
for (int j = 0; j < mWidth; j++) {
int rgb = mImage.getRGB(j, i);
int a = rgb >>> 24;
int r = (rgb >> 16) & 0xff;
int g = (rgb >> 8) & 0xff;
int b = rgb & 0xff;
int newRgb = (a << 24 | r << 16 | g << 8 | b);
mPixelData[i * mWidth + j] = newRgb;
}
mNewImage = new BufferedImage(mWidth, mHeight, mImage.getType());
WritableRaster raster = (WritableRaster) mNewImage.getData();
raster.setPixels(0, 0, mWidth, mHeight, mPixelData);
File file = new File("D:\\Documents\\Pictures\\image2.png");
ImageIO.write(mNewImage, "png", file);
}
} catch (IOException e) {
e.printStackTrace();
}
}
But I got an exception:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 222748 at sun.awt.image.ByteInterleavedRaster.setPixels(ByteInterleavedRaster.java:1108)
The logic in your code is sane, but there are multiple minor issues with the code above, so I'll try to point them out one by one: :-)
Your mPixelData is in packed ARGB layout, this is the same as use by BufferedImage.TYPE_INT_ARGB. So you want to use this type, rather than the type of the original image. If you see from your stack trace, the type of your raster is ByteInterleavedRaster, and this is not compatible with your int[] pixels (another issue that may arise from using the original type, is that it may be TYPE_CUSTOM, which can't be created using this constructor). So, first change:
mNewImage = new BufferedImage(mWidth, mHeight, BufferedImage.TYPE_INT_ARGB);
(Note: You will still get an IndexOutOfBoundsException after this change, I'll return to that later).
BufferedImage.getData() will give you a copy of the pixel data, rather than a reference to the current data. So, setting the pixels on this copy will have no effect on the data being written to disk later. Instead, use the getRaster() method, that does exactly what you want:
WritableRaster raster = mNewImage.getRaster();
The Raster.setPixels(x, y, w, h, pixels) method expects an array containing one sample per array element (A, R, G and B as separate samples). This means that the length of your array is only one fourth of what the method expects, and this is finally the cause of the exception you see. Instead, as you array is in int-packed ARGB layout (which is the native layout of the type you now use), you should use the setDataElements method:
raster.setDataElements(0, 0, mWidth, mHeight, mPixelData);
Finally, I just like to point out that all the bit shifting in your loop will simply unpack all the pixels into single components (A, R, G and B) and then pack them back together again... So, newRGB == rgb in this case. But you are maybe planning to add color manipulation here later, in which case it makes sense. :-)
PS: If all you want to do is creating an exact copy of the original image, the fastest way to do it is probably:
ColorModel cm = mImage.getColorModel();
WritableRaster raster = (WritableRaster) mImage.getData(); // Here we want a copy of the original image
mNewImage = new BufferedImage(cm, raster, cm.isAlphaPremultiplied(), null);
I have three arrays x,y and value.
for each x,y , f(x,y) = value;
I did not understand how to use the BicubicSplineInterpolator class.
I need to find values for different x and y
Here is a link to the class
http://commons.apache.org/proper/commons-math/javadocs/api-3.3/org/apache/commons/math3/analysis/interpolation/BicubicSplineInterpolator.html
TIA
From the doc, BicubicSplineInterpolator() requires the data points to form a grid-like pattern. Therefore, you should provide an array of x values (length = m) and an array of y values (length = m) and a matrix of function values (length= mxn).
I agree the docs are quite counter-intuitive. To make things worse, BicubicSplineInterpolator() is buggy, see https://issues.apache.org/jira/browse/MATH-1166, use instead BicubicInterpolator as documented at http://commons.apache.org/proper/commons-math/changes-report.html.
The interpolation works with a regular grid (e.g. a 3x3 grid, and you have a value at each grid point).
You need to define the grid positions. (In this example, we have 0, 128, 256 in both x and y dimension. However these are just numbers, you can have on X e.g. temperature, and humidity on Y with different ranges.)
Then define a matrix with the actual values at each grid point, make an interpolator which returns an interpolating function, which can calculate any value at any x,y.
final double[] xValues = new double[] {0,128,256};
final double[] yValues = new double[] {0,128,256};
final double[][] fValues = new double[][] {{1, 0, 1},
{0, 0, 1},
{0, 0, 1}};
final BivariateGridInterpolator interpolator = new BicubicInterpolator();
final BivariateFunction function = interpolator.interpolate(xValues, yValues,fValues);
for (int y=0;y<255;y++) {
for (int x=0;x<255;x++) {
double value=function.value(x, y);
// do something with this
}
}
I've had this old graphics project laying around (written in oberon) and since i wrote it as one of my first projects it looks kinda chaotic.
So I descided that, since i'm bored anyway, i would rewrite it in java.
Everything so far seems to work... Until i try to rotate and/or do my eye-point transformation.
If i ignore said operations the image comes out just fine but the moment i try to do any of the operations that require me to multiply a point with a transformation matrix it all goes bad.
the eye point transformation generates stupidly small numbers with end coördinates like [-0.002027571306540029, 0.05938634628270456, -123.30022583847628]
this causes the resulting image to look empty but if i multiply each point with 1000 it turns out it's just very, very small and, in stead of being rotated, has just been translated in some (seemingly) random direction.
if i then ignore the eye point and simply focus on my rotations the results are also pretty strange (note: the image auto scales depending on the range of coordinates):
setting xRotation to 90° only makes the image very narrow and way too high (resolution should be about 1000x1000 and is then 138x1000
setting yRotation to 90° makes it very wide (1000x138)
setting zRotation to 90° simply seems to translate the image all the way to the right side of the screen.
What i have checked so far:
i have checked and re-checked my rotation matrices at least 15 times now so they are (probably) correct
doing a test multiplication with a vector and the identity matrix does return the original vector
my matrices are initialized as identity matrices prior to being used as rotation matrices
the angles in the files are in degrees but are converted to radian when read.
Having said that i have 2 more notes:
a vector in this case is a simple 3 value array of doubles (representing the x, y and z values)
a matrix is a 4x4 array of doubles initialized as the identity matrix
When trying to rotate them i do it in the order:
scale (multiplying with a scale factor)
rotate along x-axis
rotate along y-axis
rotate along z-axis
translate
do eye-point transformation
then, if the point isn't already on the z-plane, project it
like so:
protected void rotate() throws ParseException
{
Matrix rotate_x = Transformations.x_rotation(rotateX);
Matrix rotate_y = Transformations.y_rotation(rotateY);
Matrix rotate_z = Transformations.z_rotation(rotateZ);
Matrix translate = Transformations.translation(center.x(), center.y(), center.z());
for(Vector3D point : points)
{
point = Vector3D.mult(point, scale);
point = Vector3D.mult(point, rotate_x);
point = Vector3D.mult(point, rotate_y);
point = Vector3D.mult(point, rotate_z);
point = Vector3D.mult(point, translate);
point = Vector3D.mult(point, eye);
if(point.z() != 0)
{
point.setX(point.x()/(-point.z()));
point.setY(point.y()/(-point.z()));
}
checkMinMax(point);
}
}
here's the code that initializes the rotation matrices if you're interested:
public static Matrix eye_transformation(Vector3D eye)throws ParseException
{
double r = eye.length();
double teta = Math.atan2(eye.y(), eye.x());
double zr = eye.z()/r;
double fi = Math.acos(zr);
Matrix v = new Matrix();
v.set(0, 0, -Math.sin(teta));
v.set(0, 1, -Math.cos(teta) * Math.cos(fi));
v.set(0, 2, Math.cos(teta) * Math.sin(fi));
v.set(1, 0, Math.cos(teta));
v.set(1, 1, -Math.sin(teta) * Math.cos(fi));
v.set(1, 2, Math.sin(teta) * Math.sin(fi));
v.set(2, 1, Math.sin(fi));
v.set(2, 2, Math.cos(fi));
v.set(3, 2, -r);
return v;
}
public static Matrix z_rotation(double angle) throws ParseException
{
Matrix v = new Matrix();
v.set(0, 0, Math.cos(angle));
v.set(0, 1, Math.sin(angle));
v.set(1, 0, -Math.sin(angle));
v.set(1, 1, Math.cos(angle));
return v;
}
public static Matrix x_rotation(double angle) throws ParseException
{
Matrix v = new Matrix();;
v.set(1, 1, Math.cos(angle));
v.set(1, 2, Math.sin(angle));
v.set(2, 1, -Math.sin(angle));
v.set(2, 2, Math.cos(angle));
return v;
}
public static Matrix y_rotation(double angle) throws ParseException
{
Matrix v = new Matrix();
v.set(0, 0, Math.cos(angle));
v.set(0, 2, -Math.sin(angle));
v.set(2, 0, Math.sin(angle));
v.set(2, 2, Math.cos(angle));
return v;
}
public static Matrix translation(double a, double b, double c) throws ParseException
{
Matrix v = new Matrix();;
v.set(3, 0, a);
v.set(3, 1, b);
v.set(3, 2, c);
return v;
}
And the actual method that multiplies a point with a rotation matrix
note: NR_DIMS is defined as 3.
public static Vector3D mult(Vector3D lhs, Matrix rhs) throws ParseException
{
if(rhs.get(0, 3)!=0 || rhs.get(1, 3)!=0 || rhs.get(2, 3)!=0 || rhs.get(3, 3)!=1)
throw new ParseException("the matrix multiplificiation thingy just borked");
Vector3D ret = new Vector3D();
double[] vec = new double[NR_DIMS];
double[] temp = new double[NR_DIMS+1];
temp[0] = lhs.x;
temp[1] = lhs.y;
temp[2] = lhs.z;
temp[3] = lhs.infty? 0:1;
for (int i = 0; i < NR_DIMS; i++)
{
vec[i] = 0;
// Multiply the original vector with the i-th column of the matrix.
for (int j = 0; j <= NR_DIMS; j++)
{
vec[i] += temp[j] * rhs.get(j,i);
}
}
ret.x = vec[0];
ret.y = vec[1];
ret.z = vec[2];
ret.infty = lhs.infty;
return ret;
}
I've checked and re-checked this code with my old code (note: the old code works) and it's identical when it comes to the operations.
So I'm at a loss here, I did look around for similar questions but they didn't really provide any useful information.
Thanks :)
small addition:
if i ignore both the eye-point and the rotations (so i only project the image) it comes out perfectly fine.
I can see that the image is complete apart from the rotations.
Any more suggestions?
A few possible mistakes I can think of:
In the constructor of Matrix, you are not loading the identity matrix.
You are passing your angles in degrees instead of radians.
Your eye-projection matrix projects in another range you think? I mean, in OpenGL all projection matrixes should projection onto the rectangle [(-1,-1),(1,1)]. This rectangle represents the screen.
Mixing up premultiply and postmultiply. Id est: I usually do: matrix*vector, where in your code, you seem to be doing vector*matrix, if I'm not mistaken.
Mixing up columns and rows in your Matrix?
I'm going to take another look at your question tomorrow. Hopefully, one of these suggestions helps you.
EDIT: I overlooked you already checked the first two items.
alright, i'm currently feeling like a complete idiot. The issue was a simply logic error.
The error sits in this part of the code:
for(Vector3D point : points)
{
point = Vector3D.mult(point, scale);
point = Vector3D.mult(point, rotate_x);
point = Vector3D.mult(point, rotate_y);
point = Vector3D.mult(point, rotate_z);
point = Vector3D.mult(point, translate);
point = Vector3D.mult(point, eye);
if(point.z() != 0)
{
point.setX(point.x()/(-point.z()));
point.setY(point.y()/(-point.z()));
}
checkMinMax(point);
}
I forgot that, when you obtain an object from a list, it is a new instance of that object with the same data rather than a reference to it.
So what i have done is simply remove the old entry and replace it with the new one.
Problem solved.
I have 2 Mat objects, overlay and background.
How do I put my overlay Mat on top of my background Mat such that only the non-transparent pixels of the overlay Mat completely obscures the background Mat?
I have tried addWeighted() which combines the 2 Mat but both "layers" are still visible.
The overlay Mat has a transparent channel while the background Mat does not.
The pixel in the overlay Mat is either completely transparent or fully obscure.
Both Mats are of the same size.
The function addWeighted won't work since it will use the same alpha value to all the pixels. To do exactly what you are saying, to only replace the non transparent values in the background, you can create a small function for that, like this:
cv::Mat blending(cv::Mat& overlay, cv::Mat& background){
//must have same size for this to work
assert(overlay.cols == background.cols && overlay.rows == background.rows);
cv::Mat result = background.clone();
for (int i = 0; i < result.rows; i++){
for (int j = 0; j < result.cols; j++){
cv::Vec4b pix = overlay.at<cv::Vec4b>(i,j);
if (pix[3] == 0){
result.at<cv::Vec3b>(i,j) = cv::Vec3b(pix[0], pix[1], pix[2]);
}
}
}
return result;
}
I am not sure if the transparent value in opencv is 0 or 255, so change it accordingly.... I think it is 0 for non-transparent adn 255 for fully transparent.
If you want to use the value of the alpha channel as a rate to blend, then change it a little to this:
cv::Mat blending(cv::Mat& overlay, cv::Mat& background){
//must have same size for this to work
assert(overlay.cols == background.cols && overlay.rows == background.rows);
cv::Mat result = background.clone();
for (int i = 0; i < result.rows; i++){
for (int j = 0; j < result.cols; j++){
cv::Vec4b pix = overlay.at<cv::Vec4b>(i,j);
double alphaRate = 1.0 - pix[3]/255.0;
result.at<cv::Vec3b>(i,j) = (1.0 - alphaRate) * cv::Vec3b(pix[0], pix[1], pix[2]) + result.at<cv::Vec3b>(i,j) * alphaRate;
}
}
return result;
}
Sorry for the code being in C++ and not in JAVA, but I think you can get an idea. Basically is just a loop in the pixels and changing the pixels in the copy of background to those of the overlay if they are not transparent.
* EDIT *
I will answer your comment with this edit, since it may take space. The problem is how OpenCV matrix works. For an image with alpha, the data is organized as an array like BGRA BGRA .... BGRA, and the basic operations like add, multiply and so on work in matrices with the same dimensions..... you can always try to separate the matrix with split (this will re write the matrix so it may be slow), then change the alpha channel to double (again, rewrite) and then do the multiplication and adding of the matrices. It should be faster since OpenCV optimizes these functions.... also you can do this in GPU....
Something like this:
cv::Mat blending(cv::Mat& overlay, cv::Mat& background){
std::vector<cv::Mat> channels;
cv::split(overlay, channels);
channels[3].convertTo(channels[3], CV_64F, 1.0/255.0);
cv::Mat newOverlay, result;
cv::merge(channels, newOverlay);
result = newOverlay * channels[3] + ((1 - channels[3]) * background);
return result;
}
Not sure if OpenCV allows a CV_8U to multiply a CV_64F, or if this will be faster or not.... but it may be.
Also, the ones with loops has no problem in threads, so it can be optimized... running this in release mode will greatly increase the speed too since the .at function of OpenCV does several asserts.... that in release mode are not done. Not sure if this can be change in JAVA though...
I was able to port api55's edited answer for java:
private void merge(Mat background, Mat overlay) {
List<Mat> backgroundChannels = new ArrayList<>();
Core.split(background, backgroundChannels);
List<Mat> overlayChannels = new ArrayList<>();
Core.split(overlay, overlayChannels);
// compute "alphaRate = 1 - overlayAlpha / 255"
Mat overlayAlphaChannel = overlayChannels.get(3);
Mat alphaRate = new Mat(overlayAlphaChannel.size(), overlayAlphaChannel.type());
Core.divide(overlayAlphaChannel, new Scalar(255), alphaRate);
Core.absdiff(alphaRate, new Scalar(1), alphaRate);
for (int i = 0; i < 3; i++) {
// compute "(1 - alphaRate) * overlay"
Mat overlayChannel = overlayChannels.get(i);
Mat temp = new Mat(alphaRate.size(), alphaRate.type());
Core.absdiff(alphaRate, new Scalar(1), temp);
Core.multiply(temp, overlayChannel, overlayChannel);
temp.release();
// compute "background * alphaRate"
Mat backgroundChannel = backgroundChannels.get(i);
Core.multiply(backgroundChannel, alphaRate, backgroundChannel);
// compute the merged channel
Core.add(backgroundChannel, overlayChannel, backgroundChannel);
}
alphaRate.release();
Core.merge(backgroundChannels, background);
}
it is a lot faster compared to the double nested loop calculation.