How Buffer Streams works internally in Java - java

I'm reading about Buffer Streams. I searched about it and found many answers that clear my concepts but still have little more questions.
After searching, I have come to know that, Buffer is temporary memory(RAM) which helps program to read data quickly instead hard disk. and when Buffers empty then native input API is called.
After reading little more I got answer from here that is.
Reading data from disk byte-by-byte is very inefficient. One way to
speed it up is to use a buffer: instead of reading one byte at a time,
you read a few thousand bytes at once, and put them in a buffer, in
memory. Then you can look at the bytes in the buffer one by one.
I have two confusion,
1: How/Who data filled in Buffers? (native API how?) as quote above, who filled thousand bytes at once? and it will consume same time. Suppose I have 5MB data, and 5MB loaded once in Buffer in 5 Seconds. and then program use this data from buffer in 5 seconds. Total 10 seconds. But if I skip buffering, then program get direct data from hard disk in 1MB/2sec same as 10Sec total. Please clear my this confusion.
2: The second one how this line works
BufferedReader inputStream = new BufferedReader(new FileReader("xanadu.txt"));
As I'm thinking FileReader write data to buffer, then BufferedReader read data from buffer memory? Also explain this.
Thanks.

As for the performance of using buffering during read/write, it's probably minimal in impact since the OS will cache too, however buffering will reduce the number of calls to the OS, which will have an impact.
When you add other operations on top, such as character encoding/decoding or compression/decompression, the impact is greater as those operations are more efficient when done in blocks.
You second question said:
As I'm thinking FileReader write data to buffer, then BufferedReader read data from buffer memory? Also explain this.
I believe your thinking is wrong. Yes, technically the FileReader will write data to a buffer, but the buffer is not defined by the FileReader, it's defined by the caller of the FileReader.read(buffer) method.
The operation is initiated from outside, when some code calls BufferedReader.read() (any of the overloads). BufferedReader will then check it's buffer, and if enough data is available in the buffer, it will return the data without involving the FileReader. If more data is needed, the BufferedReader will call the FileReader.read(buffer) method to get the next chunk of data.
It's a pull operation, not a push, meaning the data is pulled out of the readers by the caller.

All the stuff is done by a private method named fill() i give you for educational purpose, but all java IDE let you see the source code yourself :
private void fill() throws IOException {
int dst;
if (markedChar <= UNMARKED) {
/* No mark */
dst = 0;
} else {
/* Marked */
int delta = nextChar - markedChar;
if (delta >= readAheadLimit) {
/* Gone past read-ahead limit: Invalidate mark */
markedChar = INVALIDATED;
readAheadLimit = 0;
dst = 0;
} else {
if (readAheadLimit <= cb.length) {
/* Shuffle in the current buffer */
// here copy the read chars in a memory buffer named cb
System.arraycopy(cb, markedChar, cb, 0, delta);
markedChar = 0;
dst = delta;
} else {
/* Reallocate buffer to accommodate read-ahead limit */
char ncb[] = new char[readAheadLimit];
System.arraycopy(cb, markedChar, ncb, 0, delta);
cb = ncb;
markedChar = 0;
dst = delta;
}
nextChar = nChars = delta;
}
}
int n;
do {
n = in.read(cb, dst, cb.length - dst);
} while (n == 0);
if (n > 0) {
nChars = dst + n;
nextChar = dst;
}
}

Related

Java Open BufferedWriter only once but rewrite content many times

I am running a long running operation, Say 100k jobs. i want to update the progress of it in a file once every 100 such jobs are completed.
i am opening the file using bufferedWriter with append mode as false. Writing it and then closing it. this is done once every 100 jobs are completed. So the file open and close would have happened 1000 times. Can i optimise it further by opening and closing the file only once?
public static void writeMetaData(String writeDir, JSONObject jsonObject) throws Exception {
String filePath = writeDir.concat("/").concat("metadata.txt");
BufferedWriter metaDataWriter = Files.newBufferedWriter(Paths.get(filePath), StandardCharsets.UTF_8, StandardOpenOption.TRUNCATE_EXISTING);
metaDataWriter.write(jsonObject.toString());
IOUtils.closeQuietly(metaDataWriter);
}
for(int i =0 ; i < 100000; i++) {
// do Something;
if(i % 100 == 0) {
writeMetaData(writeDir, jsonObject);
}
}
File should only have a single line.
Expected file content after 100 jobs:
progress: 100
Expected file content after 200 jobs:
progress: 200
Can this be optimised further?
First of all, an expression like writeDir.concat("/").concat("metadata.txt") is reducing readability and performance. A straight-forward writeDir + "/" + "metadata.txt" will provide better performance. But since you’re constructing a string merely for constructing a Path, it’s even more straight-forward not to do the Path’s job in your code but rather use Paths.get(writeDir, "metadata.txt").
You can not rewind a BufferedWriter but you can rewind a FileChannel. Therefore, to keep the channel open and rewind it when needed, you have to construct a new writer after rewinding:
public static void writeMetaData(FileChannel ch, JSONObject jsonObj) throws IOException {
ch.position(0);
if(ch.size() > 0) ch.truncate(0);
Writer w = Channels.newWriter(ch, StandardCharsets.UTF_8.newEncoder(), 8192);
w.write(jsonObj.toString());
w.flush();
}
try(FileChannel ch = FileChannel.open(Paths.get(writeDir, "metadata.txt"),
StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {
for(int i = 0; i < 100000; i++) {
// do Something;
if(i % 100 == 0) {
writeMetaData(ch, jsonObject);
}
}
}
It’s important that the use of the Writer ends with flush() to force the write of all buffered data, but not close() as that would also close the underlying channel. Note that this code does not wrap the writer into a BufferedWriter; encoding text as UTF-8 is already a buffered operation and by requesting a larger buffer for the encoder, matching BufferedWriter’s default buffer size, we get the same effect of buffering without the copying overhead.
Since writing is not an end in itself, there’s a question left regarding your reading side. If the reader is trying to read the data in some intervals, there’s the risk of overlapping with the write, getting incomplete data.
You could use
public static void writeMetaData(FileChannel ch, JSONObject jsonObj) throws IOException {
try(FileLock lock = ch.lock()) {
ch.position(0);
if(ch.size() > 0) ch.truncate(0);
Writer w = Channels.newWriter(ch, StandardCharsets.UTF_8.newEncoder(), 8192);
w.write(jsonObj.toString());
w.flush();
}
}
to lock the file during the write. But depending on the system, file locking might not be mandatory but only affect readers also trying to get a read lock.
When you use JDK 11 or newer, you may consider using
for(int i = 0; i < 100000; i++) {
// do Something;
if(i % 100 == 0) {
Files.writeString(Paths.get(writeDir, "metadata.txt"), jsonObject.toString());
}
}
which clearly wins on simplicity (yes, that’s the complete code, no additional method required). The default options do already include the desired StandardCharsets.UTF_8 and StandardOpenOption.TRUNCATE_EXISTING.
While it does open and close the file internally, it has some other performance tweaks which may compensate. Especially in the likely case that the string consists of ASCII characters only, as the implementation will simply write the string’s internal array directly to the file then.
A Stream does not allow to go back and rewrite content. A way to achieve what you want is using a RandomAccessFile.
Its setLength() method will truncate the file if you pass 0.
Here is a simple example:
import java.io.*;
public class Test
{
public static void updateFile(RandomAccessFile raf, String content) throws IOException
{
raf.setLength(0);
raf.write(content.getBytes("UTF-8"));
}
public static void main(String[] args) throws IOException
{
try(RandomAccessFile raf = new RandomAccessFile("metadata.txt", "rw"))
{
updateFile(raf, "progress: 100");
updateFile(raf, "progress: 200");
}
}
}
File operations are typically buffered by the underlying kernel, and so you're unlikely to see much of a performance benefit by keeping an open file descriptor for this kind of low throughput application.
Keeping your code as a single operation that gets to leave no state after it finishes, rather than designing it as a continuous rewindable stream, makes for an elegant, simple, and unless you've specifically requested synchronous IO, then also sufficiently performant implementation that gets to benefit from the optimizations of all of the layers that sit beneath it.
When you do get measurable impedance to performance by this, which I suspect you never will, you could use the RandomAccessFile API, or go unnecessarily lower level by using FileChannel as others already specified.
I think you shouldn't compromise the simplicity/elegance of your design for this kind of micro-optimization, which in the grand scheme of things, is guaranteed to be insignificant (one tiny write operation per 100 jobs processed).

Is reading/writing in an array more efficient than reading/writing a char/byte one by one?

try(FileReader reader = new FileReader("input.txt")) {
int c;
while ((c = reader.read()) != -1)
System.out.print((char)c);
} catch (Exception ignored) { }
In this code, I read a char by char. Is it more efficient in someway to read a into an array of chars at once? In other words, is there any kind of optimization that happens when reading in arrays?
For example in this code, I have an array of char called arr and I read into it until there is noting left to read. Is it more efficient?
try(FileReader reader = new FileReader("input.txt")) {
int size;
char[] arr = new char[100];
while ((size = reader.read(arr)) != -1)
for (int i = 0; i < size; i++)
System.out.print(arr[i]);
} catch (Exception ignored) { }
The question applies for both reading/writing both chars/bytes.
Depends on the reader. The answer can be yes, though. Whatever Reader or InputStream is the actual 'raw' driver (the one that isn't just wrapping another reader or inputstream, but the one that is actually talking to the OS to get the data) - it may well implement the single-character read() method by asking the OS to read a single character.
In the end, you have a disk, and disks return data in blocks. So if you ask for 1 byte, you have 2 options as a computer:
Ask the disk for the block that contains the byte that is to be read. Store the block in memory someplace for a while. Return one byte; for the next few moments, if more requests for bytes come in from the same block, return from the stored data in memory and don't bother asking the disk at all. NOTE: This requires memory! Who allocates it? How much memory is okay? Tricky questions. OSes tend to give low level tools and don't like just picking values for any of these questions.
Ask the disk for the block that contains the byte that is to be read. Find the 1 byte needed from within this block. Ignore the rest of the data, return just that one byte. If in a few moments another byte from that block is asked for... ask the disk, again, for the whole block, and repeat this routine.
Which of the two models you get depends on many factors: For example: What kind of disk is it, what OS do you have, what underlying java reader are you using. But it is plausible you end up in this second mode and that is, as you can probably tell, usually incredibly slow, because you end up reading the same block 4000+ times instead of only once.
So, how to fix this? Well, java doesn't really know what the OS is doing either, so the safest bet is to let java do the caching. Then you have no dependencies on whatever the OS is doing.
You could write it yourself, so instead of:
for (int i = in.read(); i != -1; i = in.read()) {
processOneChar((char) i);
}
you could do:
char[] buffer = new char[4096];
while (true) {
int r = in.read(buffer);
if (r == -1) break;
for (int i = 0; i < r; i++) processOneChar(buffer[i]);
}
more code, but now the second scenario (the same block is read off the disk a ton of times) can no longer occur; you have given the OS the freedom to return to you up to 4096 chars worth of data.
Or, use a java builtin: BufferedX:
BufferedReader br = new BufferedReader(in);
for (int i = br.read(); i != -1; i = br.read()) {
processOneChar((char) i);
}
The implementation of BufferedReader guarantees that java will take care of making some reasonably sized buffer to avoid rereads of the same block off of disk.
NB: Note that the FileReader constructor you are using should not be used. It uses platform default encoding (anytime you convert bytes to characters, encoding is involved), and platform default is a recipe for untestable bugs, which are very bad. Use new FileReader(file, StandardCharsets.UTF_8) instead, or better yet, use the new API:
Path p = Paths.get("C:/file.txt");
try (BufferedReader br = Files.newBufferedReader(p)) {
for (int i = br.read(); i != -1; i = br.read()) {
processOneChar((char) i);
}
}
Note that this:
Defaults to UTF-8, because the Files API defaults to UTF-8 unlike most places in the VM.
Makes a bufferedreader immediately, no need to make it yourself.
Properly manages the resource (ensures it is closed regardless of how this code exits, be it normally or be exception), by using an ARM block.
Because a BufferedX is involved, no risk of the 'read the same block a lot' performance hole.
NB: The same logic applies when writing; disks such as SSDs can only write a whole block at a time. Now it's not just slow as molasses to write, you're also ruining your disk, as they get a limited number of writes.

Java - download a file through network with a buffer

i want to read from a network stream and write the bytes to a file, directly.
But every time i run the program very few bytes are written to the file actually.
Java:
InputStream in = uc.getInputStream();
int clength=uc.getContentLength();
byte[] barr = new byte[clength];
int offset=0;
int totalwritten=0;
int i;
int wrote=0;
OutputStream out = new FileOutputStream("file.xlsx");
while(in.available()!=0) {
wrote=in.read(barr, offset, clength-offset);
out.write(barr, offset, wrote);
offset+=wrote;
totalwritten+=wrote;
}
System.out.println("Written: "+totalwritten+" of "+clength);
out.flush();
That's because available() doesn't do what you think it does. Read its API documentation. You should simply read until the number of bytes read, returned by read(), is -1. Or even simpler, use Files.copy():
Files.copy(in, new File("file.xlsx").toPath());
Using a buffer that has the size of the input stream also pretty much defeats the purpose of using a buffer, which is to only have a few bytes in memory.
If you want to reimplement copy(), the general pattern is the following:
byte[] buffer = new byte[4096]; // number of bytes in memory
int numberOfBytesRead;
while ((numberOfBytesRead = in.read(buffer)) >= 0) {
out.write(buffer, 0, numberOfBytesRead);
}
You're using .available() wrong. From Java documentation:
available() returns an estimate of the number of bytes that can be read
(or skipped over) from this input stream without blocking by the next
invocation of a method for this input stream
That means that the first time your stream is slower than your file writing speed (very soon in all probability) the while ends.
You should either prepare a thread that waits for the input until it has read all the expected content length (with a sizable timeout, of course) or just block your program in the wait, if user interaction is not a big deal.

Why doesn't InputStream fill the array fully?

Dude, I'm using following code to read up a large file(2MB or more) and do some business with data.
I have to read 128Byte for each data read call.
At the first I used this code(no problem,works good).
InputStream is;//= something...
int read=-1;
byte[] buff=new byte[128];
while(true){
for(int idx=0;idx<128;idx++){
read=is.read(); if(read==-1){return;}//end of stream
buff[idx]=(byte)read;
}
process_data(buff);
}
Then I tried this code which the problems got appeared(Error! weird responses sometimes)
InputStream is;//= something...
int read=-1;
byte[] buff=new byte[128];
while(true){
//ERROR! java doesn't read 128 bytes while it's available
if((read=is.read(buff,0,128))==128){process_data(buff);}else{return;}
}
The above code doesn't work all the time, I'm sure that number of data is available, but reads(read) 127 or 125, or 123, sometimes. what is the problem?
I also found a code for this to use DataInputStream#readFully(buff:byte[]):void which works too, but I'm just wondered why the seconds solution doesn't fill the array data while the data is available.
Thanks buddy.
Consulting the javadoc for FileInputStream (I'm assuming since you're reading from file):
Reads up to len bytes of data from this input stream into an array of bytes. If len is not zero, the method blocks until some input is available; otherwise, no bytes are read and 0 is returned.
The key here is that the method only blocks until some data is available. The returned value gives you how many bytes was actually read. The reason you may be reading less than 128 bytes could be due to a slow drive/implementation-defined behavior.
For a proper read sequence, you should check that read() does not equal -1 (End of stream) and write to a buffer until the correct amount of data has been read.
Example of a proper implementation of your code:
InputStream is; // = something...
int read;
int read_total;
byte[] buf = new byte[128];
// Infinite loop
while(true){
read_total = 0;
// Repeatedly perform reads until break or end of stream, offsetting at last read position in array
while((read = is.read(buf, read_total, buf.length - offset)) != -1){
// Gets the amount read and adds it to a read_total variable.
read_total = read_total + read;
// Break if it read_total is buffer length (128)
if(read_total == buf.length){
break;
}
}
if(read_total != buf.length){
// Incomplete read before 128 bytes
}else{
process_data(buf);
}
}
Edit:
Don't try to use available() as an indicator of data availability (sounds weird I know), again the javadoc:
Returns an estimate of the number of remaining bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. Returns 0 when the file position is beyond EOF. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
In some cases, a non-blocking read (or skip) may appear to be blocked when it is merely slow, for example when reading large files over slow networks.
The key there is estimate, don't work with estimates.
Since the accepted answer was provided a new option has become available. Starting with Java 9, the InputStream class has two methods named readNBytes that eliminate the need for the programmer to write a read loop, for example your method could look like
public static void some_method( ) throws IOException {
InputStream is = new FileInputStream(args[1]);
byte[] buff = new byte[128];
while (true) {
int numRead = is.readNBytes(buff, 0, buff.length);
if (numRead == 0) {
break;
}
// The last read before end-of-stream may read fewer than 128 bytes.
process_data(buff, numRead);
}
}
or the slightly simpler
public static void some_method( ) throws IOException {
InputStream is = new FileInputStream(args[1]);
while (true) {
byte[] buff = is.readNBytes(128);
if (buff.length == 0) {
break;
}
// The last read before end-of-stream may read fewer than 128 bytes.
process_data(buff);
}
}

Limit the content available from a Java NIO Channel (File or Socket)

I'm pretty new to NIO and wanted to implement some feature with it, instead of typical Streams (which can do all sort of things).
What I'm not sure I can get is reading from a file into a buffer and limiting the content that I will transfer. Let's say from position 100 to 200 (even if file length is 1000). It also would be nice to do on network sockets.
I know that NIO keeps things basic to leverage OS capabilities that's why I'm not sure it can be done.
I was thinking that a tricky way to do it would be a 'LimitedReadChannel' that when it's should return less than the available buffer size it uses another byte-buffer and then transfer to the original one (1). But seems more tricky than necessary. I also don't want to use anything related to streams because it would defeat the purpose of using NIO.
(1) So far....
LimitedChannel.read(buffer) {
if (buffer.available?? > contentLeft) {
delegateChannel.read(smallerBuffer);
// transfer from smallerBuffer to buffer
} else {
delegateChannel.read(buffer);
}
}
I've found that Buffers admit to ask for the current limit or set a new one. So that wrapper channel (the one that limits the effective number of bytes read) could modify the buffer limit to avoid reading more...
Something like:
// LimitedChannel.java
// private int bytesLeft; // remaining amount of bytes to read
public int read(ByteBuffer buffer) {
if (bytesLeft <= 0) {
return -1;
}
int oldLimit = buffer.limit();
if (bytesLeft < buffer.remaining()) {
// ensure I'm not reading more than allowed
buffer.limit(buffer.position() + bytesLeft);
}
int bytesRead = delegateChannel.read(buffer);
bytesLeft -= bytesRead;
buffer.limit(oldLimit);
return bytesRead;
}
Anyway not sure if this already exists somewhere. It's difficult to find documentation about this use case...

Categories

Resources