What is the fastest way to fill up a pre-allocated ByteBuffer in Java?
I first set the size of the byte buffer with allocateDirect(), this only needs to be done once. After, I need to fill it up continuously (recycling it) as fast as possible with new data which arrives as a byte[] array, around every 5ms, and without eating memory as I have already pre-allocated the byte buffer.
At the moment, I use the put() instruction, which in my system takes around 100ms to complete.
Is there another way to fill up the byte buffer? Does thewrap() function run faster without re-allocating the array?
I would hope you mean byte[] not Byte[]
A put() is the fastest way to copy a byte[] into a ByteBuffer. An even faster way is to write into the ByteBuffer in the first place and not use a byte[] at all.
If the copy is taking 100ms, perhaps you are copying too much data. In this test it copies 1 MB in 128 micro-seconds.
ByteBuffer bb = ByteBuffer.allocateDirect(1024 * 1024);
byte[] bytes = new byte[bb.capacity()];
int runs = 50000;
long start = System.nanoTime();
for (int i = 0; i < runs; i++) {
bb.clear();
bb.put(bytes);
}
long time = System.nanoTime() - start;
System.out.printf("Average time to copy 1 MB was %.1f us%n", time / runs / 1e3);
prints
Average time to copy 1 MB was 128.9 us
wrap should be much faster, as --if I understand the doc correctly-- it does not copy the byte array, but just sets the member variables (capacity/limit/position).
wrap only makes sense, if you already got the byte[] with the desired data from somewhere, of course.
Related
I am trying to encrypt a file(txt, pdf, doc) using Google Tink - streaming AEAD encryption, below is the Java code which I am trying to execute. But all I get is 1 KB output encrypted file and no errors. All Input files whether 2 MB or more than 10 MB, output file will be always of 1 KB. I am unable to figure out what could be going wrong, can someone please help.
TinkConfig.register();
final int chunkSize = 256;
KeysetHandle keysetHandle = KeysetHandle.generateNew(
StreamingAeadKeyTemplates.AES128_CTR_HMAC_SHA256_4KB);
// 2. Get the primitive.
StreamingAead streamingAead = keysetHandle.getPrimitive(StreamingAead.class);
// 3. Use the primitive to encrypt some data and write the ciphertext to a file,
FileChannel ciphertextDestination =
new FileOutputStream("encyptedOutput.txt").getChannel();
String associatedData = "Tinks34";
WritableByteChannel encryptingChannel =
streamingAead.newEncryptingChannel(ciphertextDestination, associatedData.getBytes());
ByteBuffer buffer = ByteBuffer.allocate(chunkSize);
InputStream in = new FileInputStream("FileToEncrypt.txt");
while (in.available() > 0) {
in.read(buffer.array());
System.out.println(in);
encryptingChannel.write(buffer);
}
encryptingChannel.close();
in.close();
System.out.println("completed");
This is all about understanding ByteBuffer and how it operates. Let me explain.
in.read(buffer.array());
This writes data to the underlying array, but since array is decoupled from the state of the original buffer, the position of the buffer is not advanced. This is not good, as the next call:
encryptingChannel.write(buffer);
will now think that the position is 0. The limit hasn't changed either and is therefore still set to the capacity: 256. That means the result of the write operation is to write 256 bytes and set the position to the limit (the position).
Now the read operation still operates on the underlying byte array, and that's still 256 bytes in size. So all next read operations take place perfectly. However, all the write operations will assume that there are no bytes to be written, as the position remains at 256.
To use ByteBuffer you can use FileBuffer.read. Then you need to flip the buffer before writing the read data. Finally, after writing you need to clear the buffer's position (and limit, but that only changes on the last read) to prepare the buffer for the next read operation. So the order is commonly read, flip, write, clear for instances of Buffer.
Don't mix Channels and I/O streams, it will makes your life unnecessarily complicated, and learning how to use ByteBuffer is hard enough all by itself.
During my internship I encountered a codepiece (it turns out to be default codepiece for this matter) shown below.
InputStream input = new BufferedInputStream(url.openStream());
OutputStream output = new FileOutputStream(file);
byte data[] = new byte[1024];
int total = 0;
int count;
while ((count = input.read(data)) != -1) {
total += count;
output.write(data, 0, count);
}
So here are my questions.Assume the data is 2050 Byte
What is the reason of using 1024 constant?
As I took Computer networks class,I can relate some of my knowledge
to this
matter.Assuming we have fast connection, will we read 1024 Byte long
data at every iteration? So will count variable be 1024,1024,2 with
every iteration or is it possible 1000,1000,50 ?
If we have really
slow connection , is it possible that read() method will try to fill
1024 Byte buffer , even if it would take minutes long?
What is the reason of using 1024 constant?
None. It's arbitrary. I use 8192. The code you posted will work with any size >= 1.
Assuming we have fast connection, will we read 1024 Byte long data at every iteration?
No, you will either get an exception or end of stream or at least 1 byte on every iteration.
So will count variable be 1024,1024,2 with every iteration or is it possible 1000,1000,50 ?
Anything >= 1 byte per iteration is possible unless an exception or end of stream occurs.
If we have really slow connection, is it possible that read() method will try to fill 1024 Byte buffer, even if it would take minutes long?
No. It will block until it reads at least one byte or an exception or end of stream occurs.
This is all stated in the Javadoc.
I/O operation are expensive , so it is generally recommend to batch it, so in your case it is 1KB you can change that to more/less depending on your requirement.
You have to remember that it is blocking call , so it is too big you might get impression that your program is not moving.
You should not read byte by byte also because it is too much of I/O operation and program will spent all time in I/O only, so size should be depend on rate at which you can process data.
long time reader, first time poster.
I'm having a bit of trouble reading data quickly from a set of binary files. ByteBuffers and MappedBytBuffers offer the performance I require but they seem to require an initial run to warm up. I'm not sure if that makes sense so here's some code:
int BUFFERSIZE = 864;
int DATASIZE = 33663168;
int pos = 0;
// Open File channel to get data
FileChannel channel = new RandomAccessFile(new File(myFile), "r").getChannel();
// Set MappedByteBuffer to read DATASIZE bytes from channel
MappedByteBuffer mbb = channel.map(FileChannel.MapMode.READ_ONLY, pos, DATASIZE);
// Set Endianness
mbb.order(ByteOrder.nativeOrder());
ArrayList<Double> ndt = new ArrayList<Double>();
// Read doubles from MappedByteBuffer, perform conversion and add to arraylist
while (pos < DATASIZE) {
xf = mbb.getDouble(pos);
ndt.add(xf * cnst * 1000d + stt);
pos += BUFFERSIZE;
}
// Return arraylist
return ndt;
So this takes about 7 seconds to run but if I then run it again it does it in 10ms. It seems that it needs to do some sort of initial run to set up the correct behaviour. I've found that by doing something simple like this works:
channel = new RandomAccessFile(new File(mdfFile), "r").getChannel();
ByteBuffer buf = ByteBuffer.allocateDirect(DATASIZE);
channel.read(buf);
channel.close();
This takes around 2 seconds and if I then run through the MappedByteBuffer procedure it returns the data in 10ms. I just cannot figure out how to get rid of that initialisation step and read the data in 10ms first time. I've read all sorts of things about 'warming up', JIT and the JVM but all to no avail.
So, my question is, is it possible to get the 10 ms performance straight away or do I need to do some sort of initialisation? If so, what is the fastest way to do this please?
The code is intended to run through around a thousand quite large files so speed is quite important.
Many thanks.
I just cannot figure out how to get rid of that initialisation step and read the data in 10ms first time
You can't. The data does have to be read from the disk. That takes longer than 10ms. The 10ms is for all the other times when it's already in memory.
I am trying to copy a file using the following code:
1:
int data=0;
byte[] buffer = new byte[4096];
while((data = bufferedInputStream.read())!=-1){
bufferedOutputStream.write(data);
}
2:
byte[] buffer = new byte[4096];
while(bufferedInputStream.read(buffer)!=-1){
bufferedOutputStream.write(buffer);
}
Actual size of file is 3892028 bytes(on windows). The file will be uploaded by the user thro struts2 fileupload. Uploaded file size is exactly same as that of windows. When I try to copy the uploaded file from temporary folder, the copied file varies in size and the time taken also varies(it is negligible). Please find the below readings.
Without using buffer(Code 1)
Time taken 77
3892028
3891200
Buffer size 1024(Code 2)
Time taken 17
3892028
3891200
Buffer size 4096(Code 2)
Time taken 18
3892028
3891200
Buffer size 10240(Code 2)
Time taken 14
3892028
3901440
Buffer size 102400(Code 2)
Time taken 9
3892028
3993600
If I increase the buffer size further, time taken increases, again it is negligible. So my questions are,
Why the file size changes?
Is there any subtle consequences due to this size variation?
What is the best way to accomplish this functionality(copying a file)?
I don't know what is going beneath? Thanks for any suggestion.
Edit: I have flush() and close() method calls.
Note: I have trimmed my code to make it simpler.
The problem is, BufferedInputStream.read(byte[]) reads as much as it can into the buffer. So if the stream contains only 1 byte, only the first byte of byte array will be filled. However, BufferedInputStream.write(byte[]) writes all the given bytes into the stream, meaning it will still write full 4096 bytes, containing 1 byte from current iteration and 4095 remaining bytes from previous iteration.
What you need to do, is save the amount of bytes that were read, and then write the same amount.
Example:
int lastReadCnt = 0;
byte[] buffer = new byte[4096];
while((lastReadCnt = bufferedInputStream.read(buffer))!=-1){
bufferedOutputStream.write(buffer, 0, lastReadCnt);
}
References:
Java 6: InputStream: read(byte[],int,int)
Java 6: OutputStream: write(byte[],int,int)
Why the file size changes?
You forgot to `flush()` (and `close()`):
bufferedOutputStream.flush()
Also you should pass the number of bytes read to write method:
bufferedOutputStream.write(data, 0, bytesRead);
What is the best way to accomplish this functionality(copying a file)?
FileUtils.copyFile()
IOUtils.copy()
Both from apache-commons IO.
Is this:
ByteBuffer buf = ByteBuffer.allocate(1000);
...the only way to initialize a ByteBuffer?
What if I have no idea how many bytes I need to allocate..?
Edit: More details:
I'm converting one image file format to a TIFF file. The problem is the starting file format can be any size, but I need to write the data in the TIFF to little endian. So I'm reading the stuff I'm eventually going to print to the TIFF file into the ByteBuffer first so I can put everything in Little Endian, then I'm going to write it to the outfile. I guess since I know how long IFDs are, headers are, and I can probably figure out how many bytes in each image plane, I can just use multiple ByteBuffers during this whole process.
The types of places that you would use a ByteBuffer are generally the types of places that you would otherwise use a byte array (which also has a fixed size). With synchronous I/O you often use byte arrays, with asynchronous I/O, ByteBuffers are used instead.
If you need to read an unknown amount of data using a ByteBuffer, consider using a loop with your buffer and append the data to a ByteArrayOutputStream as you read it. When you are finished, call toByteArray() to get the final byte array.
Any time when you aren't absolutely sure of the size (or maximum size) of a given input, reading in a loop (possibly using a ByteArrayOutputStream, but otherwise just processing the data as a stream, as it is read) is the only way to handle it. Without some sort of loop, any remaining data will of course be lost.
For example:
final byte[] buf = new byte[4096];
int numRead;
// Use try-with-resources to auto-close streams.
try(
final FileInputStream fis = new FileInputStream(...);
final ByteArrayOutputStream baos = new ByteArrayOutputStream()
) {
while ((numRead = fis.read(buf)) > 0) {
baos.write(buf, 0, numRead);
}
final byte[] allBytes = baos.toByteArray();
// Do something with the data.
}
catch( final Exception e ) {
// Do something on failure...
}
If you instead wanted to write Java ints, or other things that aren't raw bytes, you can wrap your ByteArrayOutputStream in a DataOutputStream:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
while (thereAreMoreIntsFromSomewhere()) {
int someInt = getIntFromSomewhere();
dos.writeInt(someInt);
}
byte[] allBytes = baos.toByteArray();
Depends.
Library
Converting file formats tends to be a solved problem for most problem domains. For example:
Batik can transcode between various image formats (including TIFF).
Apache POI can convert between office spreadsheet formats.
Flexmark can generate HTML from Markdown.
The list is long. The first question should be, "What library can accomplish this task?" If performance is a consideration, your time is likely better spent optimising an existing package to meet your needs than writing yet another tool. (As a bonus, other people get to benefit from the centralised work.)
Known Quantities
Reading a file? Allocate file.size() bytes.
Copying a string? Allocate string.length() bytes.
Copying a TCP packet? Allocate 1500 bytes, for example.
Unknown Quantities
When the number of bytes is truly unknown, you can do a few things:
Make a guess.
Analyze example data sets to buffer; use the average length.
Example
Java's StringBuffer, unless otherwise instructed, uses an initial buffer size to hold 16 characters. Once the 16 characters are filled, a new, longer array is allocated, and then the original 16 characters copied. If the StringBuffer had an initial size of 1024 characters, then the reallocation would not happen as early or as often.
Optimization
Either way, this is probably a premature optimization. Typically you would allocate a set number of bytes when you want to reduce the number of internal memory reallocations that get executed.
It is unlikely that this will be the application's bottleneck.
The idea is that it's only a buffer - not the whole of the data. It's a temporary resting spot for data as you read a chunk, process it (possibly writing it somewhere else). So, allocate yourself a big enough "chunk" and it normally won't be a problem.
What problem are you anticipating?