Nature of InputStream.read() method against URL stream - java

During my internship I encountered a codepiece (it turns out to be default codepiece for this matter) shown below.
InputStream input = new BufferedInputStream(url.openStream());
OutputStream output = new FileOutputStream(file);
byte data[] = new byte[1024];
int total = 0;
int count;
while ((count = input.read(data)) != -1) {
total += count;
output.write(data, 0, count);
}
So here are my questions.Assume the data is 2050 Byte
What is the reason of using 1024 constant?
As I took Computer networks class,I can relate some of my knowledge
to this
matter.Assuming we have fast connection, will we read 1024 Byte long
data at every iteration? So will count variable be 1024,1024,2 with
every iteration or is it possible 1000,1000,50 ?
If we have really
slow connection , is it possible that read() method will try to fill
1024 Byte buffer , even if it would take minutes long?

What is the reason of using 1024 constant?
None. It's arbitrary. I use 8192. The code you posted will work with any size >= 1.
Assuming we have fast connection, will we read 1024 Byte long data at every iteration?
No, you will either get an exception or end of stream or at least 1 byte on every iteration.
So will count variable be 1024,1024,2 with every iteration or is it possible 1000,1000,50 ?
Anything >= 1 byte per iteration is possible unless an exception or end of stream occurs.
If we have really slow connection, is it possible that read() method will try to fill 1024 Byte buffer, even if it would take minutes long?
No. It will block until it reads at least one byte or an exception or end of stream occurs.
This is all stated in the Javadoc.

I/O operation are expensive , so it is generally recommend to batch it, so in your case it is 1KB you can change that to more/less depending on your requirement.
You have to remember that it is blocking call , so it is too big you might get impression that your program is not moving.
You should not read byte by byte also because it is too much of I/O operation and program will spent all time in I/O only, so size should be depend on rate at which you can process data.

Related

Google TINK - Streaming AEAD Always returning an output file of 1 KB

I am trying to encrypt a file(txt, pdf, doc) using Google Tink - streaming AEAD encryption, below is the Java code which I am trying to execute. But all I get is 1 KB output encrypted file and no errors. All Input files whether 2 MB or more than 10 MB, output file will be always of 1 KB. I am unable to figure out what could be going wrong, can someone please help.
TinkConfig.register();
final int chunkSize = 256;
KeysetHandle keysetHandle = KeysetHandle.generateNew(
StreamingAeadKeyTemplates.AES128_CTR_HMAC_SHA256_4KB);
// 2. Get the primitive.
StreamingAead streamingAead = keysetHandle.getPrimitive(StreamingAead.class);
// 3. Use the primitive to encrypt some data and write the ciphertext to a file,
FileChannel ciphertextDestination =
new FileOutputStream("encyptedOutput.txt").getChannel();
String associatedData = "Tinks34";
WritableByteChannel encryptingChannel =
streamingAead.newEncryptingChannel(ciphertextDestination, associatedData.getBytes());
ByteBuffer buffer = ByteBuffer.allocate(chunkSize);
InputStream in = new FileInputStream("FileToEncrypt.txt");
while (in.available() > 0) {
in.read(buffer.array());
System.out.println(in);
encryptingChannel.write(buffer);
}
encryptingChannel.close();
in.close();
System.out.println("completed");
This is all about understanding ByteBuffer and how it operates. Let me explain.
in.read(buffer.array());
This writes data to the underlying array, but since array is decoupled from the state of the original buffer, the position of the buffer is not advanced. This is not good, as the next call:
encryptingChannel.write(buffer);
will now think that the position is 0. The limit hasn't changed either and is therefore still set to the capacity: 256. That means the result of the write operation is to write 256 bytes and set the position to the limit (the position).
Now the read operation still operates on the underlying byte array, and that's still 256 bytes in size. So all next read operations take place perfectly. However, all the write operations will assume that there are no bytes to be written, as the position remains at 256.
To use ByteBuffer you can use FileBuffer.read. Then you need to flip the buffer before writing the read data. Finally, after writing you need to clear the buffer's position (and limit, but that only changes on the last read) to prepare the buffer for the next read operation. So the order is commonly read, flip, write, clear for instances of Buffer.
Don't mix Channels and I/O streams, it will makes your life unnecessarily complicated, and learning how to use ByteBuffer is hard enough all by itself.

MappedByteBuffer slow on initial run

long time reader, first time poster.
I'm having a bit of trouble reading data quickly from a set of binary files. ByteBuffers and MappedBytBuffers offer the performance I require but they seem to require an initial run to warm up. I'm not sure if that makes sense so here's some code:
int BUFFERSIZE = 864;
int DATASIZE = 33663168;
int pos = 0;
// Open File channel to get data
FileChannel channel = new RandomAccessFile(new File(myFile), "r").getChannel();
// Set MappedByteBuffer to read DATASIZE bytes from channel
MappedByteBuffer mbb = channel.map(FileChannel.MapMode.READ_ONLY, pos, DATASIZE);
// Set Endianness
mbb.order(ByteOrder.nativeOrder());
ArrayList<Double> ndt = new ArrayList<Double>();
// Read doubles from MappedByteBuffer, perform conversion and add to arraylist
while (pos < DATASIZE) {
xf = mbb.getDouble(pos);
ndt.add(xf * cnst * 1000d + stt);
pos += BUFFERSIZE;
}
// Return arraylist
return ndt;
So this takes about 7 seconds to run but if I then run it again it does it in 10ms. It seems that it needs to do some sort of initial run to set up the correct behaviour. I've found that by doing something simple like this works:
channel = new RandomAccessFile(new File(mdfFile), "r").getChannel();
ByteBuffer buf = ByteBuffer.allocateDirect(DATASIZE);
channel.read(buf);
channel.close();
This takes around 2 seconds and if I then run through the MappedByteBuffer procedure it returns the data in 10ms. I just cannot figure out how to get rid of that initialisation step and read the data in 10ms first time. I've read all sorts of things about 'warming up', JIT and the JVM but all to no avail.
So, my question is, is it possible to get the 10 ms performance straight away or do I need to do some sort of initialisation? If so, what is the fastest way to do this please?
The code is intended to run through around a thousand quite large files so speed is quite important.
Many thanks.
I just cannot figure out how to get rid of that initialisation step and read the data in 10ms first time
You can't. The data does have to be read from the disk. That takes longer than 10ms. The 10ms is for all the other times when it's already in memory.

Java ByteBuffer put vs wrap

What is the fastest way to fill up a pre-allocated ByteBuffer in Java?
I first set the size of the byte buffer with allocateDirect(), this only needs to be done once. After, I need to fill it up continuously (recycling it) as fast as possible with new data which arrives as a byte[] array, around every 5ms, and without eating memory as I have already pre-allocated the byte buffer.
At the moment, I use the put() instruction, which in my system takes around 100ms to complete.
Is there another way to fill up the byte buffer? Does thewrap() function run faster without re-allocating the array?
I would hope you mean byte[] not Byte[]
A put() is the fastest way to copy a byte[] into a ByteBuffer. An even faster way is to write into the ByteBuffer in the first place and not use a byte[] at all.
If the copy is taking 100ms, perhaps you are copying too much data. In this test it copies 1 MB in 128 micro-seconds.
ByteBuffer bb = ByteBuffer.allocateDirect(1024 * 1024);
byte[] bytes = new byte[bb.capacity()];
int runs = 50000;
long start = System.nanoTime();
for (int i = 0; i < runs; i++) {
bb.clear();
bb.put(bytes);
}
long time = System.nanoTime() - start;
System.out.printf("Average time to copy 1 MB was %.1f us%n", time / runs / 1e3);
prints
Average time to copy 1 MB was 128.9 us
wrap should be much faster, as --if I understand the doc correctly-- it does not copy the byte array, but just sets the member variables (capacity/limit/position).
wrap only makes sense, if you already got the byte[] with the desired data from somewhere, of course.

How to determine the buffer size for BufferedOutputStream's write method

I am trying to copy a file using the following code:
1:
int data=0;
byte[] buffer = new byte[4096];
while((data = bufferedInputStream.read())!=-1){
bufferedOutputStream.write(data);
}
2:
byte[] buffer = new byte[4096];
while(bufferedInputStream.read(buffer)!=-1){
bufferedOutputStream.write(buffer);
}
Actual size of file is 3892028 bytes(on windows). The file will be uploaded by the user thro struts2 fileupload. Uploaded file size is exactly same as that of windows. When I try to copy the uploaded file from temporary folder, the copied file varies in size and the time taken also varies(it is negligible). Please find the below readings.
Without using buffer(Code 1)
Time taken 77
3892028
3891200
Buffer size 1024(Code 2)
Time taken 17
3892028
3891200
Buffer size 4096(Code 2)
Time taken 18
3892028
3891200
Buffer size 10240(Code 2)
Time taken 14
3892028
3901440
Buffer size 102400(Code 2)
Time taken 9
3892028
3993600
If I increase the buffer size further, time taken increases, again it is negligible. So my questions are,
Why the file size changes?
Is there any subtle consequences due to this size variation?
What is the best way to accomplish this functionality(copying a file)?
I don't know what is going beneath? Thanks for any suggestion.
Edit: I have flush() and close() method calls.
Note: I have trimmed my code to make it simpler.
The problem is, BufferedInputStream.read(byte[]) reads as much as it can into the buffer. So if the stream contains only 1 byte, only the first byte of byte array will be filled. However, BufferedInputStream.write(byte[]) writes all the given bytes into the stream, meaning it will still write full 4096 bytes, containing 1 byte from current iteration and 4095 remaining bytes from previous iteration.
What you need to do, is save the amount of bytes that were read, and then write the same amount.
Example:
int lastReadCnt = 0;
byte[] buffer = new byte[4096];
while((lastReadCnt = bufferedInputStream.read(buffer))!=-1){
bufferedOutputStream.write(buffer, 0, lastReadCnt);
}
References:
Java 6: InputStream: read(byte[],int,int)
Java 6: OutputStream: write(byte[],int,int)
Why the file size changes?
You forgot to `flush()` (and `close()`):
bufferedOutputStream.flush()
Also you should pass the number of bytes read to write method:
bufferedOutputStream.write(data, 0, bytesRead);
What is the best way to accomplish this functionality(copying a file)?
FileUtils.copyFile()
IOUtils.copy()
Both from apache-commons IO.

Can calling available() for a BufferedInputStream lead me astray in this case?

I am reading in arbitrary size file in blocks of 1021 bytes, with a block size of <= 1021 bytes for the final block of the file. At the moment, I am doing this using a BufferedInputStream which is wrapped around a FileInputStream and code that looks (roughly) like the following (where reader is the BufferedInputStream and this is operating in a loop):
int availableData = reader.available();
int datalen = (availableData >= 1021)
? 1021
: availableData;
reader.read(bufferArray, 0, datalen);
However, from reading the API docs, I note that available() only gives an "estimate" of the available size, before the call would 'block'. Printing out the value of availableData each iteration seems to give the expected values - starting with the file size and slowly getting less until it is <= 1021. Given that this is a local file, am I wrong to expect this to be a correct value - is there a situation where available() would give an incorrect answer?
EDIT: Sorry, additional information. The BufferedInputStream is wrapped around a FileInputStream. From the source code for a FIS, I think I'm safe to rely on available() as a measure of how much data is left in the case of a local file. Am I right?
The question is pointless. Those four lines of code are entirely equivalent to this:
reader.read(buffer, 0, 1021);
without the timing-window problem you have introduced between the available() call and the read. Note that this code is still incorrect as you are ignoring the return value, which can be -1 at EOS, or else anything between 1 and 1021 inclusive.
It doesn't give the estimated size, it gives the remaining bytes that can be read. It's not an estimate with BufferedInputStream.
Returns the number of bytes that can
be read from this input stream without
blocking.
You should pass available() directly into the read() call if you want to avoid blocking, but remember to return if the return value is 0 or -1. available() might throw an exception on buffer types that don't support the operation.

Categories

Resources