long time reader, first time poster.
I'm having a bit of trouble reading data quickly from a set of binary files. ByteBuffers and MappedBytBuffers offer the performance I require but they seem to require an initial run to warm up. I'm not sure if that makes sense so here's some code:
int BUFFERSIZE = 864;
int DATASIZE = 33663168;
int pos = 0;
// Open File channel to get data
FileChannel channel = new RandomAccessFile(new File(myFile), "r").getChannel();
// Set MappedByteBuffer to read DATASIZE bytes from channel
MappedByteBuffer mbb = channel.map(FileChannel.MapMode.READ_ONLY, pos, DATASIZE);
// Set Endianness
mbb.order(ByteOrder.nativeOrder());
ArrayList<Double> ndt = new ArrayList<Double>();
// Read doubles from MappedByteBuffer, perform conversion and add to arraylist
while (pos < DATASIZE) {
xf = mbb.getDouble(pos);
ndt.add(xf * cnst * 1000d + stt);
pos += BUFFERSIZE;
}
// Return arraylist
return ndt;
So this takes about 7 seconds to run but if I then run it again it does it in 10ms. It seems that it needs to do some sort of initial run to set up the correct behaviour. I've found that by doing something simple like this works:
channel = new RandomAccessFile(new File(mdfFile), "r").getChannel();
ByteBuffer buf = ByteBuffer.allocateDirect(DATASIZE);
channel.read(buf);
channel.close();
This takes around 2 seconds and if I then run through the MappedByteBuffer procedure it returns the data in 10ms. I just cannot figure out how to get rid of that initialisation step and read the data in 10ms first time. I've read all sorts of things about 'warming up', JIT and the JVM but all to no avail.
So, my question is, is it possible to get the 10 ms performance straight away or do I need to do some sort of initialisation? If so, what is the fastest way to do this please?
The code is intended to run through around a thousand quite large files so speed is quite important.
Many thanks.
I just cannot figure out how to get rid of that initialisation step and read the data in 10ms first time
You can't. The data does have to be read from the disk. That takes longer than 10ms. The 10ms is for all the other times when it's already in memory.
Related
I am trying to encrypt a file(txt, pdf, doc) using Google Tink - streaming AEAD encryption, below is the Java code which I am trying to execute. But all I get is 1 KB output encrypted file and no errors. All Input files whether 2 MB or more than 10 MB, output file will be always of 1 KB. I am unable to figure out what could be going wrong, can someone please help.
TinkConfig.register();
final int chunkSize = 256;
KeysetHandle keysetHandle = KeysetHandle.generateNew(
StreamingAeadKeyTemplates.AES128_CTR_HMAC_SHA256_4KB);
// 2. Get the primitive.
StreamingAead streamingAead = keysetHandle.getPrimitive(StreamingAead.class);
// 3. Use the primitive to encrypt some data and write the ciphertext to a file,
FileChannel ciphertextDestination =
new FileOutputStream("encyptedOutput.txt").getChannel();
String associatedData = "Tinks34";
WritableByteChannel encryptingChannel =
streamingAead.newEncryptingChannel(ciphertextDestination, associatedData.getBytes());
ByteBuffer buffer = ByteBuffer.allocate(chunkSize);
InputStream in = new FileInputStream("FileToEncrypt.txt");
while (in.available() > 0) {
in.read(buffer.array());
System.out.println(in);
encryptingChannel.write(buffer);
}
encryptingChannel.close();
in.close();
System.out.println("completed");
This is all about understanding ByteBuffer and how it operates. Let me explain.
in.read(buffer.array());
This writes data to the underlying array, but since array is decoupled from the state of the original buffer, the position of the buffer is not advanced. This is not good, as the next call:
encryptingChannel.write(buffer);
will now think that the position is 0. The limit hasn't changed either and is therefore still set to the capacity: 256. That means the result of the write operation is to write 256 bytes and set the position to the limit (the position).
Now the read operation still operates on the underlying byte array, and that's still 256 bytes in size. So all next read operations take place perfectly. However, all the write operations will assume that there are no bytes to be written, as the position remains at 256.
To use ByteBuffer you can use FileBuffer.read. Then you need to flip the buffer before writing the read data. Finally, after writing you need to clear the buffer's position (and limit, but that only changes on the last read) to prepare the buffer for the next read operation. So the order is commonly read, flip, write, clear for instances of Buffer.
Don't mix Channels and I/O streams, it will makes your life unnecessarily complicated, and learning how to use ByteBuffer is hard enough all by itself.
I am dealing with input streams of unknown size that I need to serialize to a byte[] for fail-safe behavior.
I have this code right now based on IOUtils, but with 5-50 diffrent threads possibly running this, I don't know how reliable it is.
try(final ByteArrayOutputStream output= new ByteArrayOutputStream()){
long free_memory = Runtime.getRuntime().freeMemory() / 5;
final byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
long count = 0;
int n = 0;
while (-1 != (n = input.read(buffer))) {
output.write(buffer, 0, n);
count += n;
free_memory -= n;
if (free_memory < DEFAULT_BUFFER_SIZE) {
free_memory = Runtime.getRuntime().freeMemory();
if (free_memory < (DEFAULT_BUFFER_SIZE * 10)) {
throw new IOException("JVM is low on Memory.");
}
free_memory = free_memory / 5;
}
}
output.flush();
return output.toByteArray();
}
I want to catch an OOM error before it is a problem and kills the thread, and I don't want to save the stream as a file. Is there a better way of making sure you don't use too much memory?
(I'm using Java 8)
Too answer your question, given the fact that multiple threads are running the same code, this is a very unreliable way.
The code asks the system how many memory is available with Runtime.getRuntime().freeMemory(), which is a value that is obsolete the instant it returns, as other threads will have consumed more memory in the meantime. The corresponding I/O exception that should be thrown in case some not so obvious threshold of remaining memory is thrown might or might not be executed, but whether it is is totally not important.
The capturing of the data is done inside a ByteArrayOutputStream, which will increase (and copy) its buffer each time the end is reached. It is not controlled by the 'how much memory is there' check, so again, multiple threads will be resizing their buffer at the same time, any of which can fail.
The most fail safe manner is storing the data on a disk, thus making a copy. If the data comes from an outside streaming source you can use
Files.copy(). If what you get is a file, you can use the other variant of copy, which I think delegates it to the OS.
During my internship I encountered a codepiece (it turns out to be default codepiece for this matter) shown below.
InputStream input = new BufferedInputStream(url.openStream());
OutputStream output = new FileOutputStream(file);
byte data[] = new byte[1024];
int total = 0;
int count;
while ((count = input.read(data)) != -1) {
total += count;
output.write(data, 0, count);
}
So here are my questions.Assume the data is 2050 Byte
What is the reason of using 1024 constant?
As I took Computer networks class,I can relate some of my knowledge
to this
matter.Assuming we have fast connection, will we read 1024 Byte long
data at every iteration? So will count variable be 1024,1024,2 with
every iteration or is it possible 1000,1000,50 ?
If we have really
slow connection , is it possible that read() method will try to fill
1024 Byte buffer , even if it would take minutes long?
What is the reason of using 1024 constant?
None. It's arbitrary. I use 8192. The code you posted will work with any size >= 1.
Assuming we have fast connection, will we read 1024 Byte long data at every iteration?
No, you will either get an exception or end of stream or at least 1 byte on every iteration.
So will count variable be 1024,1024,2 with every iteration or is it possible 1000,1000,50 ?
Anything >= 1 byte per iteration is possible unless an exception or end of stream occurs.
If we have really slow connection, is it possible that read() method will try to fill 1024 Byte buffer, even if it would take minutes long?
No. It will block until it reads at least one byte or an exception or end of stream occurs.
This is all stated in the Javadoc.
I/O operation are expensive , so it is generally recommend to batch it, so in your case it is 1KB you can change that to more/less depending on your requirement.
You have to remember that it is blocking call , so it is too big you might get impression that your program is not moving.
You should not read byte by byte also because it is too much of I/O operation and program will spent all time in I/O only, so size should be depend on rate at which you can process data.
How can I make this piece of code extremely quick?
It reads a raw image using RandomAccessFile (in) and write it in a file using DataOutputStream (out)
final int WORD_SIZE = 4;
byte[] singleValue = new byte[WORD_SIZE];
long position;
for (int i=1; i<=100000; i++)
{
out.writeBytes(i + " ");
for(int j=1; j<=17; j++)
{
in.seek(position);
in.read(singleValue);
String str = Integer.toString(ByteBuffer.wrap(singleValue).order(ByteOrder.LITTLE_ENDIAN).getInt());
out.writeBytes(str + " ");
position+=WORD_SIZE;
}
out.writeBytes("\n");
}
The inner for creates a new line in the file every 17 elements
Thanks
I assume that the reason you are asking is because this code is running really slowly. If that is the case, then one reason is that each seek and read call is doing a system call. A RandomAccessFile has no buffering. (I'm guessing that singleValue is a byte[] of length 1.)
So the way to make this go faster is to step back and think about what it is actually doing. If I understand it correctly, it is reading each 4th byte in the file, converting them to decimal numbers and outputting them as text, 17 to a line. You could easily do that using a BufferedInputStream like this:
int b = bis.read(); // read a byte
bis.skip(3); // skip 3 bytes.
(with a bit of error checking ....). If you use a BufferedInputStream like this, most of the read and skip calls will operate on data that has already been buffered, and the number of syscalls will reduce to 1 for every N bytes, where N is the buffer size.
UPDATE - my guess was wrong. You are actually reading alternate words, so ...
bis.read(singleValue);
bis.skip(4);
Every 100000 offsets I have to jump 200000 and then do it again till the end of the file.
Use bis.skip(800000) to do that. It should do a big skip by moving the file position without actually reading any data. One syscall at most. (For a FileInputStream, at least.)
You can also speed up the output side by a roughly equivalent amount by wrapping the DataOutputStream around a BufferedOutputStream.
But System.out is already buffered.
What is the fastest way to fill up a pre-allocated ByteBuffer in Java?
I first set the size of the byte buffer with allocateDirect(), this only needs to be done once. After, I need to fill it up continuously (recycling it) as fast as possible with new data which arrives as a byte[] array, around every 5ms, and without eating memory as I have already pre-allocated the byte buffer.
At the moment, I use the put() instruction, which in my system takes around 100ms to complete.
Is there another way to fill up the byte buffer? Does thewrap() function run faster without re-allocating the array?
I would hope you mean byte[] not Byte[]
A put() is the fastest way to copy a byte[] into a ByteBuffer. An even faster way is to write into the ByteBuffer in the first place and not use a byte[] at all.
If the copy is taking 100ms, perhaps you are copying too much data. In this test it copies 1 MB in 128 micro-seconds.
ByteBuffer bb = ByteBuffer.allocateDirect(1024 * 1024);
byte[] bytes = new byte[bb.capacity()];
int runs = 50000;
long start = System.nanoTime();
for (int i = 0; i < runs; i++) {
bb.clear();
bb.put(bytes);
}
long time = System.nanoTime() - start;
System.out.printf("Average time to copy 1 MB was %.1f us%n", time / runs / 1e3);
prints
Average time to copy 1 MB was 128.9 us
wrap should be much faster, as --if I understand the doc correctly-- it does not copy the byte array, but just sets the member variables (capacity/limit/position).
wrap only makes sense, if you already got the byte[] with the desired data from somewhere, of course.