What is the difference between FileInputStream and BufferedInputStream in Java? [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
What is the difference between FileInputStream and BufferedInputStream in Java?

Key differences:
BufferedInputStream is buffered, but FileInputStream is not.
A BufferedInputStream reads from another InputStream, but a FileInputStream reads from a file1.
In practice, this means that every call to FileInputStream.read() will perform a syscall (expensive) ... whereas most calls to BufferedInputStream.read() will return data from the buffer. In short, if you are doing "small" reads, putting a BufferedInputStream into your stream stack will improve performance.
For most purposes / use-cases, that is all that is relevant.
There are a few other things (like mark / reset / skip) but these are rather specialist ...
For more detailed information, read the javadocs ... and the source code.
1 - Or more precisely, from some object that 1) has a name in the operating system's "file system" namespace, and 2) that the operating system allows you to read as a sequence of bytes. This may encompass devices, named pipes, and various other things that one might not consider as "files". It is also worth noting that there some kinds of things that definitely cannot be read using a FileInputStream.

You must google for that or read Javadocs,
public class FileInputStream
extends InputStream
A FileInputStream obtains input bytes from a file in a file system. What files are available depends on the host environment.
FileInputStream is meant for reading streams of raw bytes such as image data. For reading streams of characters, consider using FileReader.
For more details: https://docs.oracle.com/javase/7/docs/api/java/io/FileInputStream.html.
public class BufferedInputStream
extends FilterInputStream
A BufferedInputStream adds functionality to another input stream-namely, the ability to buffer the input and to support the mark and reset methods. When the BufferedInputStream is created, an internal buffer array is created. As bytes from the stream are read or skipped, the internal buffer is refilled as necessary from the contained input stream, many bytes at a time. The mark operation remembers a point in the input stream and the reset operation causes all the bytes read since the most recent mark operation to be reread before new bytes are taken from the contained input stream.
For more details https://docs.oracle.com/javase/7/docs/api/java/io/BufferedInputStream.html.

1,2c1,2
< public class FileInputStream
< extends InputStream
---
> public class BufferedInputStream
> extends FilterInputStream
4,8c4,11
< A FileInputStream obtains input bytes from a file in a file system. What files
< are available depends on the host environment.
<
< FileInputStream is meant for reading streams of raw bytes such as image data.
< For reading streams of characters, consider using FileReader.
---
> A BufferedInputStream adds functionality to another input stream-namely, the
> ability to buffer the input and to support the mark and reset methods. When the
> BufferedInputStream is created, an internal buffer array is created. As bytes
> from the stream are read or skipped, the internal buffer is refilled as
> necessary from the contained input stream, many bytes at a time. The mark
> operation remembers a point in the input stream and the reset operation causes
> all the bytes read since the most recent mark operation to be reread before new
> bytes are taken from the contained input stream.

Related

Why BufferedWriter is writing the data into the file partially? [duplicate]

This question already has answers here:
BufferedWriter not writing everything to its output file
(8 answers)
Closed 11 months ago.
I am trying to write a json file using this code:
File f = new File("words_3.json");
if (!f.exists()) {
f.createNewFile();
}
if (fileWriter == null)
fileWriter = new BufferedWriter(new FileWriter(f));
while (scanner.hasNext()) {
String text = scanner.nextLine();
fileWriter.append(text);
System.out.println("writing : "+text);
}
Statement System.out.println() shows all text in the terminal.
When I'm checking the output file, I see that only 1300 lines has been written, while there are more than 2000 lines available.
The data that you're writing in to an output stream isn't guaranteed to reach its destination immediately.
The BufferedWritter is a so-called high-level stream which decorates the underlying stream that deals with a particular destination of data like FileWriter (and there could be a few more streams in between them) by buffering the text output and providing a convince-method newLine().
BufferedWritter maintains a buffer (an array of characters) with a default size of 8192. And when it gets full, it hands it out to the underlying low-level stream. In this case, to a FileWriter, which will take care of encoding the characters into bytes.
When it's done, the JVM will hand the data out to the operating system via FileOutputStream (because under the hood character streams are build on top of bite streams).
So, the data written to the buffer will appear in a file in chunks:
when the buffer gets full;
and after the stream was closed.
Javadoc for method close() says:
Closes the stream, flushing it first.
I.e. before releasing the resource close() invokes method flush() which forces the cached data to be passed into its destination.
If no exception occur, everything that was written into the stream is guaranteed to reach the destination when the stream is being closed.
You can also use flush() in your code. But it has to applied with great caution. Probably when you deal with large amounts of critical data and which is useful, even when partially written (so in case of exceptions you'll lose less information). Misusing the flush() could significantly reduce the performance.

How do an InputStream, InputStreamReader and BufferedReader work together in Java?

I am studying Android development (I'm a beginner in programming in general) and learning about HTTP networking and saw this code in the lesson:
private String readFromStream(InputStream inputStream) throws IOException {
StringBuilder output = new StringBuilder();
if (inputStream != null) {
InputStreamReader inputStreamReader = new InputStreamReader(inputStream, Charset.forName("UTF-8"));
BufferedReader reader = new BufferedReader(inputStreamReader);
String line = reader.readLine();
while (line != null) {
output.append(line);
line = reader.readLine();
}
}
return output.toString();
}
I don't understand exactly what InputStream, InputStreamReader and BufferedReader do. All of them have a read() method and also readLine() in the case of the BufferedReader.Why can't I only use the InputStream or only add the InputStreamReader? Why do I need to add the BufferedReader? I know it has to do with efficiency but I don't understand how.
I've been researching and the documentation for the BufferedReader tries to explain this but I still don't get who is doing what:
In general, each read request made of a Reader causes a corresponding
read request to be made of the underlying character or byte stream. It
is therefore advisable to wrap a BufferedReader around any Reader
whose read() operations may be costly, such as FileReaders and
InputStreamReaders. For example,
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
will buffer the input from the specified file. Without buffering, each
invocation of read() or readLine() could cause bytes to be read from
the file, converted into characters, and then returned, which can be
very inefficient.
So, I understand that the InputStream can only read one byte, the InputStreamReader a single character, and the BufferedReader a whole line and that it also does something about efficiency which is what I don't get. I would like to have a better understanding of who is doing what, so as to understand why I need all three of them and what the difference would be without one of them.
I've researched a lot here and elsewhere on the web and don't seem to find any explanation about this that I can understand, almost all tutorials just repeat the documentation info. Here are some related questions that maybe begin to explain this but don't go deeper and solve my confusion: Q1, Q2, Q3, Q4. I think it may have to do with this last question's explanation about system calls and returning. But I would like to understand what is meant by all this.
Could it be that the BufferedReader's readLine() calls the InputStreamReader's read() method which in turn calls the InputStream's read() method? And the InputStream returns bytes converted to int, returning a single byte at a time, the InputStreamReader reads enough of these to make a single character and converts it to int and returns a single character at a time, and the BufferedReader reads enough of these characters represented as integers to make up a whole line? And returns the whole line as a String, returning only once instead of several times? I don't know, I'm just trying to get how things work.
Lots of thanks in advance!
This Streams in Java concepts and usage link, give a very nice explanations.
Streams, Readers, Writers, BufferedReader, BufferedWriter – these are the terminologies you will deal with in Java. There are the classes provided in Java to operate with input and output. It is really worth to know how these are related and how it is used. This post will explore the Streams in Java and other related classes in detail. So let us start:
Let us define each of these in high level then dig deeper.
Streams
Used to deal with byte level data
Reader/Writer
Used to deal with character level. It supports various character encoding also.
BufferedReader/BufferedWriter
To increase performance. Data to be read will be buffered in to memory for quick access.
While these are for taking input, just the corresponding classes exists for output as well. For example, if there is an InputStream that is meant to read stream of byte, and OutputStream will help in writing stream of bytes.
InputStreams
There are many types of InputStreams java provides. Each connect to distinct data sources such as byte array, File etc.
For example FileInputStream connects to a file data source and could be used to read bytes from a File. While ByteArrayInputStream could be used to treat byte array as input stream.
OutputStream
This helps in writing bytes to a data source. For almost every InputStream there is a corresponding OutputStream, wherever it makes sense.
UPDATE
What is Buffered Stream?
Here I'm quoting from Buffered Streams, Java documentation (With a technical explanation):
Buffered Streams
Most of the examples we've seen so far use unbuffered I/O. This means
each read or write request is handled directly by the underlying OS.
This can make a program much less efficient, since each such request
often triggers disk access, network activity, or some other operation
that is relatively expensive.
To reduce this kind of overhead, the Java platform implements buffered
I/O streams. Buffered input streams read data from a memory area known
as a buffer; the native input API is called only when the buffer is
empty. Similarly, buffered output streams write data to a buffer, and
the native output API is called only when the buffer is full.
Sometimes I'm losing my hair reading a technical documentation. So, here I quote the more humane explanation from https://yfain.github.io/Java4Kids/:
In general, disk access is much slower than the processing performed
in memory; that’s why it’s not a good idea to access the disk a
thousand times to read a file of 1,000 bytes. To minimize the number
of times the disk is accessed, Java provides buffers, which serve as
reservoirs of data.
In reading File with FileInputStream then BufferedInputStream, the
class BufferedInputStream works as a middleman between FileInputStream
and the file itself. It reads a big chunk of bytes from a file into
memory (a buffer) in one shot, and the FileInputStream object then
reads single bytes from there, which are fast memory-to-memory
operations. BufferedOutputStream works similarly with the class
FileOutputStream.
The main idea here is to minimize disk access. Buffered streams are
not changing the type of the original streams — they just make reading
more efficient. A program performs stream chaining (or stream piping)
to connect streams, just as pipes are connected in plumbing.
InputStream, OutputStream, byte[], ByteBuffer are for binary data.
Reader, Writer, String, char are for text, internally Unicode, so that all scripts in the world may be combined (say Greek and Arabic).
InputStreamReader and OutputStreamWriter form a bridge between both. If you have some InputStream and know that its bytes is actually text in some encoding, Charset, then you can wrap the InputStream:
try (InputStreamReader reader =
new InputStreamReader(stream, StandardCharsets.UTF_8)) {
... read text ...
}
There is a constructor without Charset, but that is not portable, as it uses the default platform encoding.
On Android StandardCharset may not exist, use "UTF-8".
The derived classes FileInputStream and BufferedReader add something to the parent InputStream resp. Reader.
A FileInputStream is for input from a File, and BufferedReader uses a memory buffer, so the actual physical reading does not does not read character wise (inefficient). With new BufferedReader(otherReader) you add buffering to your original reader.
All this understood, there is the utility class Files with methods like newBufferedReader(Path, Charset) which add additional brevity.
I have read lots of articles on this very topic. I hope this might help you in some way.
Basically, the BufferedReader maintains an internal buffer.
During its read operation, it reads bytes from the files in bulk and stores that bytes in its internal buffer.
Now byte is passed to the program from that internal buffer for each read operation.
This reduces the number of communication between the program and the file or disks. Hence more efficient.

Does an OutputStreamWriter without buffering exist?

I need to convert a stream of char into a stream of bytes, i.e. I need an adapter from a java.io.Writer interface to a java.io.OutputStream, supporting any valid Charset which I will have as a configuration parameter.
However, the java.io.OutputStreamWriter class has a hidden secret: the sun.nio.cs.StreamEncoder object it delegates to underneath creates an 8192 byte (8KB) buffer, even if you don't ask it to.
The problem is, at the OutputStream end I have inserted a wrapper that needs to count the amount of bytes being written, so that it immediately stops execution of the source system once a specific amount of bytes has been output. And if OutputStreamWriter is creating an 8K buffer, I simply get notified of the amount of bytes generated too late because they will only reach my counter when the buffer is flushing (so there will be already more than 8,000 already-generated bytes waiting for me at the OutputStreamWriter buffer).
So the question is, is there anywhere in the Java runtime a Writer -> OutputStream bridge that can run unbuffered?
I would really, really hate to have to write this myself :(...
NOTE: hitting flush() on the OutputStreamWriter for each write is not a valid alternative. That brings a large performance penalty (there's a synchronized block involved at the StreamEncoder).
NOTE 2: I understand it might be necessary to keep a small char overflow at the bridge in order to compute surrogates. It's not that I need to stop the execution of the source system in the very moment it generates the n-th byte (that would not be possible given bytes can come to me in the form of a larger byte[] in a write call). But I need to stop it asap, and waiting for an 8K, 2K or even 200-byte buffer to flush would simply be too late.
As you have already detected the StreamEncoder used by OutputStreamWriter has a buffer size of 8KB and there is no interface to change that size.
But the following snippet gives you a way to obtain a Writer for a OutputStream which internally also uses a StreamEncoder but now has a user-defined buffer size:
String charSetName = ...
CharsetEncoder encoder = Charset.forName(charSetName).newEncoder();
OutputStream out = ...
int bufferSize = ...
WritableByteChannel channel = Channels.newChannel(out);
Writer writer = Channels.newWriter(channel, encoder, bufferSize);

How does InputStream read() in Java determine the number of bytes to read?

http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read()
The doc says "Reads some number of bytes from the input stream and stores them into the buffer array b.".
How does InputStream read() in Java determine that number of bytes?
The buffer array has a defined length, call it n. The read() method will read between 1 and n bytes. It will block until at least one byte is available, unless EOF is detected.
I think the confusion comes from what "read" means.
read() returns to you the next byte in the InputStream or -1 if there are no more bytes left.
However, due to implementation details of the particular InputStream you are using, the source that contains the bytes being read might have more than one byte read in order to tell you the next byte:
If your InputStream is buffered, then the entire buffer length might be read into memory just to tell you what the next byte is. However, subsequent calls to read() might not need to read the underlying source again until the in memory buffer is exhausted.
If your InputStream is reading a zipped file, then the underlying source may have to have several bytes read in to unzip your data in order to return the next unzipped byte.
Layers of Inputstreams wrapping other inputstreams such asnew GZIPInputStream(new BufferedInputStream(new FileInputStream(file))); will use #1 and #2 above depending on the layer.

What is the result of buffering a buffered stream in java?

Was writing the javadoc for :
/**
* ...Buffers the input stream so do not pass in a BufferedInputStream ...
*/
public static void meth(InputStream is) throws IOException {
BufferedInputStream bis = new BufferedInputStream(is,
INPUT_STREAM_BUFFER_SIZE);
// rest omitted
}
But is it really a problem to pass a buffered input stream in ? So this :
InputStream is = new BufferedInputStream(new FileInputStream("C:/file"), SIZE);
meth(is);
would buffer the is into bis - or would java detect that is is already buffered and set bis = is ? If yes, would different buffer sizes make a difference ? If no, why not ?
NB : I am talking about input streams but actually the question is valid for output streams too
But is it really a problem to pass a buffered input stream in ?
Not really. There is potentially a small overhead in doing this, but it is negligible compared with the overall cost of reading input.
If you look at the code of BufferedInputStream, (e.g. the read1 method) you will see that block reads are implemented to be efficient when buffered streams are stacked.
[Re the example code:] would java detect that is is already buffered and set bis = is ?
No.
If no, why not ?
Because Java (the language, the compiler) generally doesn't understand the semantics of Java library classes. And in this case, since the benefit of such an optimization would be negligible, it i not worthwhile implementing.
Of course, you are free to write your meth method to do this kind of thing explicitly ... though I predict that it will make little difference.
I do not quite get why in read1 they "bother" to copy to the input buffer only if the requested length is less than the buf.length (or if there is a marked position in the input stream)
I assume that you are referring to this code (in read1):
if (len >= getBufIfOpen().length && markpos < 0) {
return getInIfOpen().read(b, off, len);
}
The first part is saying that if the user is asking for less than the stream's configured buffer size, we don't want to short-circuit the buffering. (Otherwise, we'd have the problem that doing a read(byte[], int, int) with a small requested length would be pessimal.)
The second part is to do with the way that mark / reset is implemented. Instead of using mark / reset on the underlying stream (which may or may not be supported), a BufferedInputStream uses the buffer to implement it. What you are seeing is part of that logic. (You can work the details for yourself ... reading the comments in the source code.)
If you buffer the stream twice then it will use more memory and be slower than if you only did so once, but it will still work.
It's certainly worth documenting that your stream does buffering so that users will know they don't need to do so themselves.
Generally it's best to discourage rather than actively prevent this sort of misuse.
The answer is no, Java would not detect the double buffering.
It is up to the user to avoid this problem. The BufferedInputStream has no way of knowing whether the InputStream you pass into the constructor is buffered or not.
Here is the source code for the BufferedInputStream constructor:
public BufferedInputStream(InputStream in, int size) {
super(in);
if (size <= 0) {
throw new IllegalArgumentException("Buffer size <= 0");
}
buf = new byte[size];
}
EDIT
From the comments is it a problem to double buffer a stream?
The short answer is yes.
The idea of buffering is to increase speed so that data is spooled into memory and written out (usually to very slow IO) in chunks. If you double buffer you spool data into memory and then flush that data back into memory somewhere else. This certainly has a cost in terms of speed...

Categories

Resources