I am very new to Java, and trying to use Mathematica's Java interface to access a file using memory mapping (in hope of a performance improvement).
The Mathematica code I have is (I believe) equivalent to the following Java code (based on this):
import java.io.FileInputStream;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
public class MainClass {
private static final int LENGTH = 8*100;
public static void main(String[] args) throws Exception {
MappedByteBuffer buffer = new FileInputStream("test.bin").getChannel().map(FileChannel.MapMode.READ_ONLY, 0, LENGTH);
buffer.load();
buffer.isLoaded(); // returns false, why?
}
}
I would like to use the array() method on buffer, so I am trying to load the buffers contents into memory first using load(). However, even after load(), isLoaded() returns false, and buffer.array() throws an exception: java.lang.UnsupportedOperationException
at java.nio.ByteBuffer.array(ByteBuffer.java:940).
Why doesn't the buffer load and how can I call the array() method?
My ultimate aim here is to get an array of doubles using asDoubleBuffer().array(). The method getDouble() does work correctly, but I was hoping to get this done in one go for good performance. What am I doing wrong?
As I am doing this from Mathematica, I'll post the actual Mathematica code I used too (equivalent to the above in Java):
Needs["JLink`"]
LoadJavaClass["java.nio.channels.FileChannel$MapMode"]
buffer = JavaNew["java.io.FileInputStream", "test.bin"]#getChannel[]#map[FileChannel$MapMode`READUONLY, 0, 8*100]
buffer#load[]
buffer#isLoaded[] (* returns False *)
According to Javadoc
"The content of a mapped byte buffer can change at any time, for example if the content of the corresponding region of the mapped file is changed by this program or another. Whether or not such changes occur, and when they occur, is operating-system dependent and therefore unspecified.
All or part of a mapped byte buffer may become inaccessible at any time, for example if the mapped file is truncated. An attempt to access an inaccessible region of a mapped byte buffer will not change the buffer's content and will cause an unspecified exception to be thrown either at the time of the access or at some later time. It is therefore strongly recommended that appropriate precautions be taken to avoid the manipulation of a mapped file by this program, or by a concurrently running program, except to read or write the file's content."
To me it seems to many conditions and undesirable misbehavior. Do you need particularly this class?
If you just need to read file contents in fastest way, give a try:
FileChannel fChannel = new FileInputStream(f).getChannel();
byte[] barray = new byte[(int) f.length()];
ByteBuffer bb = ByteBuffer.wrap(barray);
bb.order(ByteOrder.LITTLE_ENDIAN);
fChannel.read(bb);
It works at speed almost equal to disk system test speed.
For double you can use DoubleBuffer (with double[] array if f.length()/4 size) or just call getDouble(int) method of ByteBuffer.
in Java:
final byte[] hb; // Non-null only for heap buffers
so it is not even implemented for MappedByteBuffer but is for HeapByteBuffer.
in Android:
**
* Child class implements this method to realize {#code array()}.
*
* #see #array()
*/
abstract byte[] protectedArray();
and again not in MappedByteBuffer, but for example ByteArrayBuffer does implement the backing array.
#Override byte[] protectedArray() {
if (isReadOnly) {
throw new ReadOnlyBufferException();
}
return backingArray;
}
The point of memory map is to be off heap. A backing array would be on heap.
If you can get the FileChannel open from RandomAccessFile and then call map on the channel, you can also use the bulk get() method on the MappedByteBuffer to read into a byte[]. This copies from off heap, avoiding IO, into heap again.
buffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, fileChannel.size());
byte[] b = new byte[buffer.limit()];
buffer.get(b);
Related
I have a Spring Boot application that converts the XML to JSON. The conversion is done by calling another JAVA application internally and passing some information. The implementation of the called application is out of my control and I do not have access to make any changes to it.
The called Java Application requires a OutputStream so I am using the ByteArrayOutputStream and passing it within the method. After receiving the output I am converting the OutputStream to String. During the conversion, I am running into the warning Inefficient conversion from ByteArrayOutputStream.
I wanted to know how can I fix this warning. I researched a bit and found that we need to pass the size of ByteArrayOutputStream but in my case I am not aware how much size it can have because it would depend on the size of the input XML that I am giving. So I am unable to predict it and set it.
Can someone please guide me on what I can do within my Spring Boot application ByteArrayOutputStream so as to fix the warning that I receive in my Intellij IDE:
Inefficient conversion from ByteArrayOutputStream
Following is my code sample:
final InputStream inputStream = new ByteArrayInputStream(xmlEvents.getBytes(StandardCharsets.UTF_8));
final var output = new ByteArrayOutputStream();
new Converter().convert(inputStream, new Handler<>(new Validator(), new StreamCollector(output)));
return new String(output.toByteArray());
I am getting the warning for the line:
new String(output.toByteArray())
The explanation for this warning is that
new String(output.toByteArray());
creates a byte[] from the contents of the ByteArrayOutputStream, then creates a String from the byte[]. That is doing an unnecessary copy of the data.
The fix suggested by Intellij is:
output.toString(StandardCharsets.UTF_8).
which creates the String in a single operation without creating an intermediate byte[].
How does it do this?
Well toString() is passing the ByteArrayOutputStream's internal byte[] buffer to the String constructor. By contrast, output.toByteArray() is copying the buffer to a new byte[] ... so that the caller cannot interfere with the actual buffer's contents.
The usual problem in Java is that you have to hack to get a proper unmapping of memory mapped files - see here for the 14year old bug report ;)
But on Android there seems to be 0 solutions in pure Java and just via NDK. Is this true? If yes, any pointers to an open source solution with Android/Java bindings?
There is no hack available under Android.
But there are a few helpers and snippets which make the C-Java binding for mmap files easy/easier:
util-mmap, Apache License 2.0, here is an issue regarding Android support
Using Memory Mapped Files and JNI to communicate between Java and C++ programs or easier with tools like javacpp?
It looks tomcat has implement a helper (jni.MMap) that is able to unmap/delete a mmap file
See the util-mmap in action, really easy:
public class MMapTesting {
public static void main(String[] args) throws IOException {
File file = new File("test");
MMapBuffer buffer = new MMapBuffer(file, 0, 1000, FileChannel.MapMode.READ_WRITE, ByteOrder.BIG_ENDIAN)) {
buffer.memory().intArray(0, 100).set(2, 234);
// calls unmap under the hood
buffer.close();
// here we call unmap automatically at the end of this try-resource block
try (MMapBuffer buffer = new MMapBuffer(file, FileChannel.MapMode.READ_WRITE, ByteOrder.BIG_ENDIAN)) {
System.out.println("length: " + buffer.memory().length());
IntArray arr = buffer.memory().intArray(0, buffer.memory().length() / 8);
// prints 234
System.out.println(arr.get(2));
}
}
}
From the Android Developers website:
A direct byte buffer whose content is a memory-mapped region of a file.
Mapped byte buffers are created via the FileChannel.map method. This class extends the ByteBuffer class with operations that are specific to memory-mapped file regions.
A mapped byte buffer and the file mapping that it represents remain valid until the buffer itself is garbage-collected.
The content of a mapped byte buffer can change at any time, for example if the content of the corresponding region of the mapped file is changed by this program or another. Whether or not such changes occur, and when they occur, is operating-system dependent and therefore unspecified.
As for what I've understood from this text, is that there is no way to unmap the MappedByteBuffer using the Android Java SDK. Only using the NDK, like you said.
Below is the code that I have written. I want to do the simple thing, storing binary file data into byteBuffer.
File file = new File(fileName);
try {
ReadableByteChannel channel = new FileInputStream(fileName).getChannel();
ByteBuffer buf = ByteBuffer.allocateDirect(file.length());
// How can use buf.read to get all the contents?
} catch (Exception e){
}
I was wondering
how can I use read to get all data from channel and store it in ByteBuffer
if there is more elegant way to allocate ByteBuffer, other than using File object to get the length of the file
I prefer to use memory mapping.
FileChannel channel = new FileInputStream(fileName).getChannel();
ByteBuffer buf = channel.map(MapMode.READ_ONLY,0,channel.size());
If the file is greater than 2 GB, you have to have more than one mapping. On the plus side this takes around 10 ms regardless of size and doesn't use much heap or direct memory regardless of the size of the file.
From the ReadableByteChannel Javadocs
read(ByteBuffer dst)
An attempt is made to read up to r bytes from the channel, where r is the number of bytes remaining in the buffer, that is, dst.remaining(), at the moment this method is invoked.
So ... channel.read(buf);
As for your second question, if you want to read the entire contents of the file into memory at once that seems like a reasonable approach.
I have a collection of objects:
Map<BufferedImage, Map<ImageTransform, Set<Point>>> map
I want to write those to a file, and then be able to read them back in the same struct.
I can't just write the collection as it is, because BufferedImage doesn't implement the Serializable (nor the Externalizable) interface. So I need to use the methods from the ImageIO class to write the image.
ImageTransform is a custom object that implements Serializable. So, I believe the value part of my map collection, should be writeable as it is.
Here is what I do to write to the file:
ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream(file));
for (BufferedImage image : map.keySet()) {
ImageIO.write(image, "PNG", out); // write the image to the stream
out.writeObject(map.get(image)); // write the 'value' part of the map
}
Here is what I do to read back from the file:
ObjectInputStream in = new ObjectInputStream(new FileInputStream(file));
while(true) {
try {
BufferedImage image = ImageIO.read(in);
Map<ImageTransform, Set<Point>> value =
(Map<ImageTransform, Set<Point>>) in.readObject(); // marker
map.put(image, value);
} catch (IOException ioe) {
break;
}
}
However, this doesn't work. I get a java.io.OptionalDataException at marker.
java.io.OptionalDataException
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1300)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
My question is, firstly, is the writing concept correct ? is ImageIO#write good for this case, or should I think about using/storing the BufferedImage#getRgb int[] array ? is the array more compact (as in, takes up less space in the file) ?
Secondly, how should I be reading the object back from the file ? How do I know when the EOF is reached ? Why doesn't the above work ?
I hope the info provided is enough, if you need more info on something, please tell me.
Thanks in advance.
It's not working as ObjectOutputStream and ObjectInputStream write/expect a certain file format that is violated when you write an image out of order. To use ObjectStreams successfully you will need to observe the contract that is specifed by ObjectStreams.
To do this you will need to create a holding class, and use this class as the key to your map instead of BufferedImages. This holding class should implement Serializable and a three methods (not in any actual interface) that mark the Class as needing special handling during reading and writing. The method signatures must be exactly as specified or serialization won't work.
For more information have a look at the documentation on ObjectOutputStream.
public class ImageHolder implements Serializable {
BufferedImage image;
public ImageHolder(BufferedImage image) {
this.image = image;
}
private void readObject(ObjectInputStream stream)
throws IOException, ClassNotFoundException {
image = ImageIO.read(stream);
}
private void writeObject(ObjectOutputStream stream)
throws IOException {
ImageIO.write(image, "PNG", stream);
}
private void readObjectNoData() throws ObjectStreamException {
// leave image as null
}
And then serialsation should be as simple as outputStream.writeObject(map). Though you will need to check that the implementing class of ImageTransform is serialisable too.
One way to 'cheat' and only have a single object to serialize is to add the group of objects to an expandable, serializable list. Then serialize the list.
BTW - I would tend to use XMLEncoder over serialized Objects because they can be restored in later JVMs. There is no such guarantee for serialized Objects.
#Ivan c00kiemon5ter V Kanak: "I'm trying to keep the file as small in size as possible,..
That is often wasted effort, given disk space is so cheap.
*.. so I guess Serialization is better for that. *
Don't guess. Measure.
..I'll try using a List and see how that goes. ..
Cool. Note that if using the XMLEncoder, I'd recommend Zipping it in most cases. That would reduce the file size of the cruft of XML. This situation is different in storing images.
Image formats typically incorporate compression of a type that is not conducive to being further compressed by Zip. That can be side-stepped by storing the XML compressed, and the images as 'raw' in separate entries in the Zip. OTOH I think you'll find the amount of bytes saved by compressing the XML alone is not worth the effort - given the final file size of the image entries.
I am using ByteBuffer to transfer data with java nio. A same message can be sent to multiple receivers. The message format is "message header + message content". A staright way is to allocate a new byte buffer for each receiver. This is not effiecient.
My question is whether there is similar java function for ByteBuffer to pointer funciton in C/C++. So I can use one buffer to hold message content and concate with different headers. In this way, it is efficiency.
thanks.
In Java your can use a GatheringByteChannel (which you most probably are dealing with). It allows to have one static buffer containing the header and an individual buffer for each client holding the varying contents. For some material to get started you might want to check out this blog post:
http://javaol.wordpress.com/2011/05/06/java-nio-scatter-gather/
I use a single ByteBuffer to send to multiple receivers.
ByteBuffer bb = ByteBuffer.allocateDirect(LARGE_BUFFER);
bb.clear();
bb.position(START_OF_CONTENT /* 1024 */);
appendContentTo(bb);
int endOfContent = bb.position();
bb.limit(endOfContent);
for(Connection conn: connections) {
bb.position(START_OF_CONTENT);
/* prepend header BEFORE the position and move the position back */
conn.prependHeader(bb);
conn.write(bb);
}
This way, you can use the same ByteBuffer for every connection. There is only ever one copy of the content.
An example of what conn.prependHeader() might look like
public void prependHeader(ByteBuffer bb) {
// bb starts at the start of the content.
int pos = bb.position();
// it would be better if a byte[] wasn't required. This is just an example
byte[] header = getHeaderAsBytes();
bb.position(bb.position()-header.length);
bb.put(header);
// bb starts at the start of the header.
bb.position(bb.position()-header.length);
}