Make a BufferedImage use less RAM?

Make a BufferedImage use less RAM? - java

I have java program that reads a jpegfile from the harddrive and uses it as the background image for various other things. The image itself is stored in a BufferImage object like so:
BufferedImage background
background = ImageIO.read(file)
This works great - the problem is that the BufferedImage object itself is enormous. For example, a 215k jpeg file becomes a BufferedImage object that's 4 megs and change. The app in question can have some fairly large background images loaded, but whereas the jpegs are never more than a meg or two, the memory used to store the BufferedImage can quickly exceed 100s of megabytes.
I assume all this is because the image is being stored in ram as raw RGB data, not compressed or optimized in any way.
Is there a way to have it store the image in ram in a smaller format? I'm in a situation where I have more slack on the CPU side than RAM, so a slight performance hit to get the image object's size back down towards the jpeg compression would be well worth it.

One of my projects I just down-sample the image as it is being read from an ImageStream on the fly. The down-sampling reduces the dimensions of the image to a required width & height whilst not requiring expensive resizing computations or modification of the image on disk.
Because I down-sample the image to a smaller size, it also significantly reduces the processing power and RAM required to display it. For extra optimization, I render the buffered image in tiles also... But that's a bit outside the scope of this discussion. Try the following:
public static BufferedImage subsampleImage(
ImageInputStream inputStream,
int x,
int y,
IIOReadProgressListener progressListener) throws IOException {
BufferedImage resampledImage = null;
Iterator<ImageReader> readers = ImageIO.getImageReaders(inputStream);
if(!readers.hasNext()) {
throw new IOException("No reader available for supplied image stream.");
}
ImageReader reader = readers.next();
ImageReadParam imageReaderParams = reader.getDefaultReadParam();
reader.setInput(inputStream);
Dimension d1 = new Dimension(reader.getWidth(0), reader.getHeight(0));
Dimension d2 = new Dimension(x, y);
int subsampling = (int)scaleSubsamplingMaintainAspectRatio(d1, d2);
imageReaderParams.setSourceSubsampling(subsampling, subsampling, 0, 0);
reader.addIIOReadProgressListener(progressListener);
resampledImage = reader.read(0, imageReaderParams);
reader.removeAllIIOReadProgressListeners();
return resampledImage;
}
public static long scaleSubsamplingMaintainAspectRatio(Dimension d1, Dimension d2) {
long subsampling = 1;
if(d1.getWidth() > d2.getWidth()) {
subsampling = Math.round(d1.getWidth() / d2.getWidth());
} else if(d1.getHeight() > d2.getHeight()) {
subsampling = Math.round(d1.getHeight() / d2.getHeight());
}
return subsampling;
}
To get the ImageInputStream from a File, use:
ImageIO.createImageInputStream(new File("C:\\image.jpeg"));
As you can see, this implementation respects the images original aspect ratio as well. You can optionally register an IIOReadProgressListener so that you can keep track of how much of the image has been read so far. This is useful for showing a progress bar if the image is being read over a network for instance... Not required though, you can just specify null.
Why is this of particular relevance to your situation? It never reads the entire image into memory, just as much as you need it to so that it can be displayed at the desired resolution. Works really well for huge images, even those that are 10's of MB on disk.

I assume all this is because the image
is being stored in ram as raw RGB
data, not compressed or optimized in
any way.
Exactly... Say a 1920x1200 JPG can fit in, say, 300 KB while in memory, in a (typical) RGB + alpha, 8 bits per component (hence 32 bits per pixel) it shall occupy, in memory:
1920 x 1200 x 32 / 8 = 9 216 000 bytes
so your 300 KB file becomes a picture needing nearly 9 MB of RAM (note that depending on the type of images you're using from Java and depending on the JVM and OS this may sometimes be GFX-card RAM).
If you want to use a picture as a background of a 1920x1200 desktop, you probably don't need to have a picture bigger than that in memory (unless you want to some special effect, like sub-rgb decimation / color anti-aliasing / etc.).
So you have to choices:
makes your files less wide and less tall (in pixels) on disk
reduce the image size on the fly
I typically go with number 2 because reducing file size on hard disk means you're losing details (a 1920x1200 picture is less detailed than the "same" at 3940x2400: you'd be "losing information" by downscaling it).
Now, Java kinda sucks big times at manipulating pictures that big (both from a performance point of view, a memory usage point of view, and a quality point of view [*]). Back in the days I'd call ImageMagick from Java to resize the picture on disk first, and then load the resized image (say fitting my screen's size).
Nowadays there are Java bridges / APIs to interface directly with ImageMagick.
[*] There is NO WAY you're downsizing an image using Java's built-in API as fast and with a quality as good as the one provided by ImageMagick, for a start.

Do you have to use BufferedImage? Could you write your own Image implementation that stores the jpg bytes in memory, and coverts to a BufferedImage as necessary and then discards?
This applied with some display aware logic (rescale the image using JAI before storing in your byte array as jpg), will make it faster than decoding the large jpg every time, and a smaller footprint than what you currently have (processing memory requirements excepted).

Use imgscalr:
http://www.thebuzzmedia.com/software/imgscalr-java-image-scaling-library/
Why?
Follows best practices
Stupid simple
Interpolation, Anti-aliasing support
So you aren't rolling your own scaling library
Code:
BufferedImage thumbnail = Scalr.resize(image, 150);
or
BufferedImage thumbnail = Scalr.resize(image, Scalr.Method.SPEED, Scalr.Mode.FIT_TO_WIDTH, 150, 100, Scalr.OP_ANTIALIAS);
Also, use image.flush() on your larger image after conversion to help with the memory utilization.

File size of the JPG on disk is completely irrelevant.
The pixel dimensions of the file are. If your image is 15 Megapixels expect it to require crap load of RAM to load a raw uncompressed version.
Re-size your image dimensions to be just what you need and that is the best you can do without going to a less rich colorspace representation.

You could copy the pixels of the image to another buffer and see if that occupies less memory then the BufferedImage object. Probably something like this:
BufferedImage background = new BufferedImage(
width,
height,
BufferedImage.TYPE_INT_RGB
);
int[] pixels = background.getRaster().getPixels(
0,
0,
imageBuffer.getWidth(),
imageBuffer.getHeight(),
(int[]) null
);

Related

How to handle huge data/images in RAM in Java?

Summary
I am reading a large binary file which contains image data.
Cumulative Count Cut analysis is performed on data [It requires another array with same size as the image].
The data is stretched between 0 to 255 stored in BufferedImage pixel by pixel, to draw the image on JPanel.
On this image, zooming is performed using AffineTransform.
Problems
Small Image(<.5GB)
1.1 When I am increasing the scale factor for performing zooming, after a
point exception is thrown:-
java.lang.OutOfMemoryError: Java heap space.
Below is the code used for zooming-
scaled = new BufferedImage(width, height, BufferedImage.TYPE_BYTE_GRAY);
Graphics2D g2d = (Graphics2D)scaled.createGraphics();
AffineTransform transformer = new AffineTransform();
transformer.scale(scaleFactor, scaleFactor);
g2d.setTransform(transformer);
Large Image(>1.5GB)
While loading a huge image(>1.5GB), same exception occurs as appeared in
1.1, even is the image is small enough to be loaded, sometimes, I get the same error.
Solutions Tried
I tried using BigBufferedImage in place of BufferedImage to store the stretched data. BigBufferedImage image = BigBufferedImage.create(newCol,newRow, BufferedImage.TYPE_INT_ARGB);
But it couldn't be passed to g2d.drawImage(image, 0, 0, this);
because the repaint method of JPanel just stops for some reason.
I tried loading image in low resolution where pixel is read and few columns and rows are jumped/skipped. But the problem is how to decide what number of pixels to skip as image size varies therefore I am unable to decide how to decide the jump parameter.
MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY,0, inChannel.size());
buffer.order(ByteOrder.LITTLE_ENDIAN);
FloatBuffer floatBuffer = buffer.asFloatBuffer();
for(int i=0,k=0;i<nrow;i=i+jump) /*jump is the value to be skipped, nrow is height of image*/
{
for(int j=0,l=0;j<ncol1;j=j+jump) //ncol is width of image
{
index=(i*ncol)+j;
oneDimArray[(k*ncolLessRes)+l] = floatBuffer.get(index);//oneDimArray is initialised to size of Low Resolution image.
l++;
}
k++;
}
The problem is to decide how many column and row to skip i.e what value of jump should be set.
I tried setting Xmx but image size varies and we cannot dynamically set the Xmx values.
Here are some values -
table, th, td {
border: 1px solid black;
}
<table style="width:100%">
<tr>
<th>Image Size</th>
<th>Xmx</th>
<th>Xms</th>
<th>Problem</th>
</tr>
<tr>
<td>83Mb</td>
<td>512m</td>
<td>256m</td>
<td>working</td>
</tr>
<tr>
<td>83Mb</td>
<td>3096m</td>
<td>2048m</td>
<td>System hanged</td>
</tr>
<tr>
<td>3.84Gb</td>
<td>512m</td>
<td>256m</td>
<td>java.lang.OutOfMemoryError: Java heap space
</tr>
<tr>
<td>3.84Gb</td>
<td>3096m</td>
<td>512m</td>
<td>java.lang.OutOfMemoryError: Java heap space
</tr>
</table>
For this I tried finding memory allocated to program:-
try(BufferedWriter bw= new BufferedWriter(new FileWriter(dtaFile,true))){
Runtime runtime=Runtime.getRuntime();
runtime.gc();
double oneMB=Math.pow(2,20);
long[] arr= Instream.range(0,(int)(10.432*long.BYTES*Math.pow(2,20))).asLongStream().toArray();
runtime.gc();
long freeMemory= runtime.freeMemory();
long totalMemory= runtime.totalMemory();
long usedMemory= totalMemory-freeMemory;
long maxMemory= runtime.maxMemory();
String fileLine= String.format(" %9.3f %9.3f %9.3f " , usedMemory/oneMb, freeMemory/oneMB, totalMemory/oneMb, maxMemory/oneMB);
bw.write();
}
Following results were obtained
Memory Allocation
This approach failed because the available memory increases as per usage of my code. As a result it will not be useful for me to make a decision for jump.
Result Expected
A way to access the amount of available memory before the loading of the image so that I could use it to make decision on value of the jump. Is there any other alternative to decide jump value (i.e., how much I can lower the resolution?).

You can read the specific portion of an image, then scale it with reduced resolution for display purpose.
So in your case you can read the image in chunk (read image portions just like we read the data from db row by row)
For example:
// Define the portion / row size 50px or 100px
int rowHeight = 50;
int rowsToScan = imageHeight / rowHeight;
if(imageHeight % rowHeight > 0) rowsToScan++;
int x = 0;
int y = 0;
int w = imageWidth;
int h = rowHeight;
ArrayList<BufferedImage> scaledImagePortions = new ArrayList<>();
for(int i = 1; i <= rowsToScan; i++) {
// Read the portion of an image scale it
// and push the scaled version in lets say array
BufferedImage scalledPortionOfImage = this.getScaledPortionOfImage(img, x, y, w, h);
scaledImagePortions.add(scalledPortionOfImage);
y = (rowHeight * i);
}
// Create single image out of scaled images portions
Thread which can help you to get portion of an image Read region from very large image file in Java
Thread which can help you to scale the image (my quick search result :) )
how to resize Image in java?
Thread which can help you in merging the buffered images: Merging two images
You can always tweak the snippets :)

OutOfMemoryError that is self explainatory - you are out of memory. That is beeing said not physical RAM you have on your machine, but rather JVM hits upper memory allocation limit set by -xmx setting
Your xmx setting testing makes little sense as you try to put 3,8GB size of an image into 512MB memory block. It cannot work - you cannot put 10 liters of water in 5 liters bottle. For memory usage you need at least the size of image x3 as you are storing every pixel separately and that contains of 3 bytes (RGB). And that is just for pure image data. What is left is whole app and data object structure overhead + additional space required for computation and probably plenty more that I didn't mention and I am not even aware of.
You don't want to "dynamicly set" -xmx. Set it to maximum possible value in your system (trial and error). JVM will not take that much of memory unless it will need it. By additional -X settings you can tell JVM to free up unused memory so you don't have to worry about unused memory beeing "freezed" by JVM.
I never worked on image processing applications. Is Photoshop or Gimp is capable of opening and doing something usefull with such big images? Maybe you should looks for clues about processing that much of data there (if it is working)
If point above is just a naive as you need this for scientific purposes (and that is not what Photoshop or Gimp are made for unless you are flatearther :) ), you will need scientific grade hardware.
One thing that comes into my mind, is not to read image into memory at all but process it on the fly. This could reduce memory consumption to order of megabytes.
Take a closer look into ImageReader API as it suggest (readTile method) it might be possible to read only area of image (eg for zooming in)

Images in java: most efficient float representation?

I need to do some image manipulation in java. I am porting python code, which uses numpy arrays with dimensions cols, rows, channels; these are floating point. I know how to get RGB out of a BufferedImage and how to put it back; this question is about how to lay out the resulting float image.
Here are some of the options:
direct translation:
float[][][] img = new float[cols][rows][channels];
put channels first:
float[][][] img = new float[channels][cols][rows];
combine indexes:
float[] img = new float[rows*cols*channels];
img[ i * cols * channels + j * channels + k ] = ...;
Option 1 has the advantage that it reads the same as the original code; but it seems non-idiomatic for Java, and probably not fast.
Option 2 should be faster, if I understand how Java N-dimensional arrays work under the hood; at the cost of looking slightly odd. It seems this allocates channels*cols arrays of size rows, as opposed to option 1 which allocates rows*cols arrays of size channels (a very large number of tiny arrays = large overhead).
Option 3 seems to be closest to what the AWT and other Java code does; but it requires passing around the dimensions (they are not built into the array) and it is very easy to get the indexing wrong (especially when there is other index arithmetic going on).
Which of these is better and why? What are some of the other pros and cons? Is there an even better way?
UPDATE
I benchmarked options 1 and 2, on a non-trivial example of image processing which runs four different algorithms (in a 10x loop, so the VM gets to warm up). This is on OpenJDK 7 on Ubuntu, Intel i5 cpu. Surprisingly, there isn't much of a speed difference: option 2 is about 6% slower than option 1. There is a pretty large difference in amount of memory garbage-collected (using java -verbose:gc): option 1 collects 1.32 GB of memory during the entire run, while option 2 collects only 0.87 GB (not quite half, but then again not all images used are color). I wonder how much difference there will be in Dalvik?

BoofCV has float image types and the raw pixel data can manipulated directly. See the tutorial.
BoofCV provides several routines for quickly converting BufferedImage into different BoofCV image types. Using BoofCV routines for converting to/from BufferedImages are very fast.
Convert a BufferedImage to a multispectral float type image with BoofCV:
MultiSpectral<ImageFloat32> image =
ConvertBufferedImage.convertFromMulti(image,null,true,ImageFloat32.class);
Access pixel value from the float image array:
float value = image.getBand(i).data[ image.startIndex + y*image.stride + x];
Another way to get and set the pixel value:
float f = image.getBand(i).get(x, y);
...
image.getBand(i).set(x, y, f);
Where i represents the index of the color channel.
Convert a BoofCV image back to BufferedImage:
BufferedImage bufferedImage =
new BufferedImage(image.width, image.height, BufferedImage.TYPE_4BYTE_ABGR);
BufferedImage bufferedImage = ConvertBufferedImage.convertTo(
image, bufferedImage, true);

You are right, option 3 has a smaller memory footprint.
As for which performs better, you'd have to profile and/or benchmark the options.
Given your statement that row and column counts are large, I'd go with option 3, but wrap the array in a class that knows the dimensions, e.g. called Image.

The option 3 is used by the BufferedImage in Java. It's good for memory as said Andreas, but for image processing and information continuity it's not optimal.
The most practical would be:
float[][] img = new float[channels][cols*rows];
Like that, the channels are separated and thus can be processed independently. This representation would be optimal if you want to call native codes.

Efficient way to send an image over socket in Java

I'm a bit of a Java noob, and I have read some basics about sockets and I can successfully send images over socket using ImageIO, but I want to reduce the amount of data that is sent. Ultimately I want the image (screen capture) to be send as fast as possible with the smallest possible file size.
Right now, I have imageIO set up as such;
DataInputStream in=new DataInputStream(client.getInputStream());
DataOutputStream out = new DataOutputStream(client.getOutputStream());
ImageIO.write(captureImg(),"JPG",client.getOutputStream());
And the receiver:
BufferedImage img=ImageIO.read(ImageIO.createImageInputStream(server.getInputStream()));
File outputfile = new File("Screen"+(date.toString())+".jpg");
ImageIO.write(img, "jpg", outputfile);
In case you're wondering, this is my method that is used to take the image.
Rectangle screenRect = new Rectangle(Toolkit.getDefaultToolkit().getScreenSize());
BufferedImage capture = new Robot().createScreenCapture(screenRect);
I have heard about Byte arrays, where you can send the bytes then draw the image at the other end. However I'm not sure if this is more efficient.
Any help would be greatly appreciated, please ask if you would like me to add any extra info or code for the byte array!
Thanks.
EDIT: Patrick:
ByteArrayOutputStream bScrn = new ByteArrayOutputStream();
ImageIO.write(captureImg(), "JPG", bScrn);
byte imgBytes[] = bScrn.toByteArray();
out.write((Integer.toString(imgBytes.length)).getBytes());
out.write(imgBytes,0,imgBytes.length);

There already has been an extensive discussion in the comments, but to summarize a few points that I find important:
You have a trade-off between several criteria:
Minimize network traffic
Minimize CPU load
Maximize image quality
You can reduce the network traffic with a high image compression. But this will increase the CPU load and might reduce the image quality.
Whether it reduces the image quality depends on the compression type: For JPG, you can make the image arbitrarily small, but the quality of the image will then be ... well, arbitrarily bad. For PNG, the image quality will stay the same (since it is a lossless compression), but the CPU load and the resulting image size may be greater.
The option of ZIPping the image data was also mentioned. It is true that ZIPping the JPG or PNG data of an image will hardly reduce the amount of data (because the data already is compressed). But compressing the raw image data can be a feasible option, as an alternative to JPG or PNG.
Which compression technique (JPG, PNG or ZIP) is appropriate also depends on the image content: JPG is more suitable for "natural" images, like photos or rendered images. These can withstand a high compression without causing artefacts. For artifical images (like line drawings), it will quickly cause undesirable artefacts, particularly at sharp edges or when the image contains texts. In contrast to that: When the image contains large areas with a single color, then a compression like PNG (or ZIP) can reduce the image size due to the "run length compression" nature of these compression methods.
I already made some experiments for such an image transfer quite a while ago, and implemented it in a way that easily allowed tweaking and tuning these parameters and switching between the different methods, and comparing the speed for different application cases. But from the tip of my head, I can not give a profound summary of the results.
BTW: Depending on what you actually want to transfer, you could consider obtaining the image data with a different technique than Robot#createScreenCapture(Rectangle). This method is well-known to be distressingly slow. For example, when you want to transfer a Swing application, you could let your application directly paint into an image. Roughly with a pattern like
BufferedImage image = new BufferedImage(w,h,type);
Graphics g = image.getGraphics();
myMainFrame.paint(g);
g.dispose();
(This is only a sketch, to show the basic idea!)
Additionally, you could consider further options for increasing the "percieved speed" of such an image transfer. For example, you could divide your image into tiles, and transfer these tiles one after another. The receiver will possibly appreciate it if the image would at least be partially visible as quickly as possible. This idea could be extended further. For example, by detecting which tiles have really changed between two frames, and only transfer these changed tiles. (This approach could be extended and implemented in a rather sophisticated way, by detecting the "minimum regions" that have to be transferred)
However, for the case that you first want to play around with the most obvious tuning parameter: Here is a method that allows writing a JPG image with a quality value between 0.0 and 1.0 into an output stream:
public static void writeJPG(
BufferedImage bufferedImage,
OutputStream outputStream,
float quality) throws IOException
{
Iterator<ImageWriter> iterator =
ImageIO.getImageWritersByFormatName("jpg");
ImageWriter imageWriter = iterator.next();
ImageWriteParam imageWriteParam = imageWriter.getDefaultWriteParam();
imageWriteParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
imageWriteParam.setCompressionQuality(quality);
ImageOutputStream imageOutputStream =
new MemoryCacheImageOutputStream(outputStream);
imageWriter.setOutput(imageOutputStream);
IIOImage iioimage = new IIOImage(bufferedImage, null, null);
imageWriter.write(null, iioimage, imageWriteParam);
imageOutputStream.flush();
}

fail reducing Image size using google app engine images java API

I want to reduce image size (in KB) when its size is larger than 1MB.
when I apply the resize transformation with smaller width and smaller height the size of the transformed image (in bytes) is larger than the orig image.
The funny (or sad) part is even when I invoke the resize with the same width and height as the orig (i.e. dimensions are not changed) the size "transformed" image is larger than the orig
final byte[] origData = .....;
final ImagesService imagesService = ImagesServiceFactory.getImagesService();
final Image origImage = ImagesServiceFactory.makeImage(oldDate);
System.out.println("orig dimensions is " + origImage.getWidth() + " X " + origImage.getHeight());
final Transform resize = ImagesServiceFactory.makeResize(origImage.getWidth(), origImage.getHeight());
final Image newImage = imagesService.applyTransform(resize, origImage);
final byte[] newImageData = newImage.getImageData();
//newImageData.length > origData.length :-(

Image coding has some special characteristics that you are observing the results from. As you decode a image from its (file) representation, you generate a lot of pixels. The subsequent encoding only sees the pixels and does not know anything about the size of your original file. Therefore the encoding step is crusial to get right.
The common JPEG format, and also the PNG format, have different compression levels, i.e a quality setting. They can have this because they do lossy compressions. In general, images with a lot of details (sharp edges) should be compressed with high quality and blurry images with low quality; as you probably have seen, small images usually are more blurry and large images usually more sharp.
Without going into the techical details, this means that you should set the quality level accoring to the nature of your image, which also is determined by the size of the input image. In other words, if you encode a blurry image as a big file, you are wasting space, since you would get about the same result using less bytes. But the encoder does not have this information, so you have to configure it using the correct quality setting
Edit: In your case manually set a low quality for encoding if you started with a small file (compared to number of pixels) and then of course a high quality if the opposite is true. Do some experimentations, probably a single quality setting for all photos will be acceptable.

A pitfall I fell in was, that I requested PNG output ... and the image size didn't change either. The image service silently ignored quality parameter. According to a comment in implementation the quality parameter is considered only for JPEG.

Loading PNGs into OpenGL performance issues - Java & JOGL much slower than C# & Tao.OpenGL

I am noticing a large performance difference between Java & JOGL and C# & Tao.OpenGL when both loading PNGs from storage into memory, and when loading that BufferedImage (java) or Bitmap (C# - both are PNGs on hard drive) 'into' OpenGL.
This difference is quite large, so I assumed I was doing something wrong, however after quite a lot of searching and trying different loading techniques I've been unable to reduce this difference.
With Java I get an image loaded in 248ms and loaded into OpenGL in 728ms
The same on C# takes 54ms to load the image, and 34ms to load/create texture.
The image in question above is a PNG containing transparency, sized 7200x255, used for a 2D animated sprite. I realise the size is really quite ridiculous and am considering cutting up the sprite, however the large difference is still there (and confusing).
On the Java side the code looks like this:
BufferedImage image = ImageIO.read(new File(fileName));
texture = TextureIO.newTexture(image, false);
texture.setTexParameteri(GL.GL_TEXTURE_MIN_FILTER, GL.GL_LINEAR);
texture.setTexParameteri(GL.GL_TEXTURE_MAG_FILTER, GL.GL_LINEAR);
The C# code uses:
Bitmap t = new Bitmap(fileName);
t.RotateFlip(RotateFlipType.RotateNoneFlipY);
Rectangle r = new Rectangle(0, 0, t.Width, t.Height);
BitmapData bd = t.LockBits(r, ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);
Gl.glBindTexture(Gl.GL_TEXTURE_2D, tID);
Gl.glTexImage2D(Gl.GL_TEXTURE_2D, 0, Gl.GL_RGBA, t.Width, t.Height, 0, Gl.GL_BGRA, Gl.GL_UNSIGNED_BYTE, bd.Scan0);
Gl.glTexParameteri(Gl.GL_TEXTURE_2D, Gl.GL_TEXTURE_MIN_FILTER, Gl.GL_LINEAR);
Gl.glTexParameteri(Gl.GL_TEXTURE_2D, Gl.GL_TEXTURE_MAG_FILTER, Gl.GL_LINEAR);
t.UnlockBits(bd);
t.Dispose();
After quite a lot of testing I can only come to the conclusion that Java/JOGL is just slower here - PNG reading might not be as quick, or that I'm still doing something wrong.
Thanks.
Edit2:
I have found that creating a new BufferedImage with format TYPE_INT_ARGB_PRE decreases OpenGL texture load time by almost half - this includes having to create the new BufferedImage, getting the Graphics2D from it and then rendering the previously loaded image to it.
Edit3: Benchmark results for 5 variations.
I wrote a small benchmarking tool, the following results come from loading a set of 33 pngs, most are very wide, 5 times.
testStart: ImageIO.read(file) -> TextureIO.newTexture(image)
result: avg = 10250ms, total = 51251
testStart: ImageIO.read(bis) -> TextureIO.newTexture(image)
result: avg = 10029ms, total = 50147
testStart: ImageIO.read(file) -> TextureIO.newTexture(argbImage)
result: avg = 5343ms, total = 26717
testStart: ImageIO.read(bis) -> TextureIO.newTexture(argbImage)
result: avg = 5534ms, total = 27673
testStart: TextureIO.newTexture(file)
result: avg = 10395ms, total = 51979
ImageIO.read(bis) refers to the technique described in James Branigan's answer below.
argbImage refers to the technique described in my previous edit:
img = ImageIO.read(file);
argbImg = new BufferedImage(img.getWidth(), img.getHeight(), TYPE_INT_ARGB_PRE);
g = argbImg.createGraphics();
g.drawImage(img, 0, 0, null);
texture = TextureIO.newTexture(argbImg, false);
Any more methods of loading (either images from file, or images to OpenGL) would be appreciated, I will update these benchmarks.

Short Answer
The JOGL texture classes do quite a bit more than necessary, and I guess that's why they are slow. I run into the same problem a few days ago, and now fixed it by loading the texture with the low-level API (glGenTextures, glBindTexture, glTexParameterf, and glTexImage2D). The loading time decreased from about 1 second to "no noticeable delay", but I haven't done any systematic profiling.
Long Answer
If you look into the documentation and source code of the JOGL TextureIO, TextureData and Texture classes, you notice that they do quite a bit more than just uploading the texture onto the GPU:
Handling of different image formats
Alpha premultiplication
I'm not sure which one of these is taking more time. But in many cases you know what kind of image data you have available, and don't need to do any premultiplication.
The alpha premultiplication feature is anyway completely misplaced in this class (from a software architecture perspective), and I didn't find any way to disable it. Even though the documentation claims that this is the "mathematically correct way" (I'm actually not convinced about that), there are plenty of cases in which you don't want to use alpha premultiplication, or have done it beforehand (e.g. for performance reasons).
After all, loading a texture with the low-level API is quite simple unless you need it to handle different image formats. Here is some scala code which works nicely for all my RGBA texture images:
val textureIDList = new Array[Int](1)
gl.glGenTextures(1, textureIDList, 0)
gl.glBindTexture(GL.GL_TEXTURE_2D, textureIDList(0))
gl.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MIN_FILTER, GL.GL_LINEAR)
gl.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MAG_FILTER, GL.GL_LINEAR)
val dataBuffer = image.getRaster.getDataBuffer // image is a java.awt.image.BufferedImage (loaded from a PNG file)
val buffer: Buffer = dataBuffer match {
case b: DataBufferByte => ByteBuffer.wrap(b.getData)
case _ => null
}
gl.glTexImage2D(GL.GL_TEXTURE_2D, 0, GL.GL_RGBA, image.getWidth, image.getHeight, 0, GL.GL_RGBA, GL.GL_UNSIGNED_BYTE, buffer)
...
gl.glDeleteTextures(1, textureIDList, 0)

I'm not sure that it will completely close the performance gap, but you should be able to use the ImageIO.read method that takes a InputStream and pass in a BufferedInputStream wrapping a FileInputStream. This should greatly reduce the number of native file I/O calls that the JVM has to perform. It would look like this:
File file = new File(fileName);
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis, 8192); //8K reads
BufferedImage image = ImageIO.read(bis);

Have you looked into JAI (Java Advanced Imaging) by any chance, it implements native acceleration for tasks such as png compressing/decompression. The Java implementation of PNG decompression may be the issue here. Which version of jvm are you using ?
I work with applications which load and render thousands of textures, for this I use a pure Java implementation of DDS format - available with NASA WorldWind. DDS Textures load into GL faster since it is understood by the graphics card.
I appreciate your benchmarking and would like to use your experiments to test out DDS load times. Also tweak the memory available to JAI and JVM to allow loading of more segments and decompression.

Actually, i load my textures in JOGL like this:
TextureData data = TextureIO.newTextureData(stream, false, fileFormat);
Texture2D tex = new Texture2D(...); // contains glTexImage2D
tex.bind(g);
tex.uploadData(g, 0, data); // contains glTexSubImage2D
Load textures in this way can bypass the extra work for contructing a BufferedImage and interpreting it.
It's pretty fast for me.
U can profile it out. im waiting for your result.

you can also try loading the Texture directly from a BufferedImage
There is an example here.
Using this you can see if the image load is taking the time, or the write to Create / Video Memory.
You may also want to think about the size of the image to a power 2, ie 16,32,64,128,256,1024... dimensions, some gfx card will not be able to process non power 2 sizes, and you will get blank textures when using on those gfx cards.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.