I am sorry if this is a duplicate but I was not able to find a definitive answer to what is the best practice for each type.
I would like to know what the appropriate conditions are that define when to use BufferedReader vs FileReader or BufferedInput/OutputStream vs FileInput/OutputStream? Is there a formula of sorts that will always tell you what is appropriate?
Should I just always used buffered?
Thanks
Use a buffer if the stream is going to have lots of small access. Use unbuffered if you are going to have relatively few, relatively large accesses.
The only time you should use unbuffered I/O is when the delay and aggregation imposed by buffering is inappropriate to your application.
" Is there a formula of sorts that will always tell you what is appropriate?"
If there was, it would already be in the libraries and would not be a design decision that you would have to make.
Since there's no pat answer, you have to make the design decision, you have to actually think about it.
Or, you can try both options and see which is "better" based on your unique problem and your unique criteria.
Most standard I/O libraries are buffered. That's a hint that most I/O benefits from buffering. But not all. Games, for instance, need unbuffered access to the game controls.
Keep in mind also that the BufferedReader provides you with a convenience readLine() method that allows you to read your content one line at a time.
I suggest you use Buffered* if this makes your application go faster, otherwise I wouldn't bother with it. i.e. try it with realistic data for see whether it helps.
Related
I'm trying to find out, what are the methods mark() and reset() of BufferedReader really useful for?
I understand what they are doing, but for going back and forth in some text I never used them - usually I solve this problem by reading either a sequence of chars or the whole line in an array or StringBuilder and go back and forth through it.
I believe there must be some reason why these methods are present in the BufferedReader and other Reader implementations supporting it but I'm unable to make an assumption why.
Does the usage of mark() & reset provide some benefit compared to reading the data in our own array and navigating through it?
I've searched through the codebase of one of my large projects I'm working on (mainly Java backend using Spring Boot), with lots of dependencies on the classpath and the only thing for which the mark & reset methods were used (in only very few libraries) was skipping an optional BOM character at the beginning of a text file. And even for this simple use case, I find it a bit contrived to do it that way.
Also, I was searching for other tutorials and on Stackoverflow (e.g. What are mark and reset in BufferedReader?) and couldn't find any explanation why to actually solve these kinds of problems using mark & reset. All code examples only explain what the methods are doing on "hello world" examples (jumping from one position in the stream back to a previous position for no particular reason). Nowhere I could find any explanation why someone should actually use it among other ways which sound more elegant and aren't really of worse performance.
I haven't used them myself, but a case that springs to mind is where you want to copy the data into a structure that needs to be sized correctly.
When reading streams and copying data into a target data structure (perhaps after parsing it), you always have the problem that you don't know how big to make your target in advance. The mark/rewind feature lets you mark, read the stream, parse it quickly to calculate the size, reset, allocate the memory, and then re-parse copying the data this time. There are of course other ways of doing it (e.g., using your own dynamic buffer), but if your code is already centered around the Reader concept then mark/reset lets you stay with that.
That said, even BufferedReader's own readLine method doesn't use this technique (it creates a StringBuffer internally).
A few weeks ago, I was searching for a way to extract some specific value from a file and stumbled on this question which introduced me to the Stream Object.
My first instinct was to investigate if this object would help with other file operations, such as replacing several placeholders with corresponding values for which I used BufferedReader and FileWriter. I failed miserably at producing any working code, but since then I began taking interest on articles which covered the subject, so I could understand the intended use of Stream.
On the way, I stumbled upon Optional and came to a good understanding of it and can now identify the cases where I am comfortable using Optional while maintaining my code clean and understandable. However, I can't say this is the case for Stream, not mentioning that it may not have provided the performance gain I imagined it would bring and will still need a finally clause in cases where IO is involved.
Here is the main issue I've been trying to wrap my head around, keeping in mind that I mostly worked on one-thread programming until now: When is it prefered to use a Stream aside from parallel processing?
Is it to do an operation in bulk on a specific subset of a big collection of data, where Collection would have been used when trying to access and manipulate specific objects of the said collection? Although it seems to be the intended use, I'm still not sure that the example I linked at the beginning of my question is your typical use case.
Or is it only a construct used to make the code smaller thanks to lambda expression at the sacrifice of readability? (Nothing against lambda if used correctly, but most of the example of Stream usage I saw where quite illegible, which didn't help for my general understanding)
I've always referred to the description on the Java 8 Streams API page to help me decide between a Collection and a Stream:
However, [the Streams API] has many benefits. First, the Streams API makes use of several
techniques such as laziness and short-circuiting to optimize your data
processing queries.
Both a Stream and a Collection can be used to apply a computation on every single element of a dataset before storing it. However, I've found Streams useful if my pipeline includes several distinct filter/sort/map operations for each data element, as the Stream API can optimize these calculations behind the scenes and has parallelization support built in as well.
I agree that readability can be affected both positively and negatively by using a Stream - you're correct that some Stream examples are completely unreadable, and I don't think that readability should be the key decision point for using a Stream over something else.
If you're truly optimizing for performance on a large dataset, consider using a toolset that's purpose-built for massive datasets instead.
I am reading from an InputStream.
and writing what I read into an outputStream.
I also check a few things.
Like if I read an
& (ampersand)
I need to write
"& amp;"
My code works. But now I wonder if I have written the most efficient way (which I doubt).
I read byte by byte. (but this is because I need to do odd modifications)
Can somebody who's done this suggest the fastest way ?
Thanks
If you are using BufferedInputStream and BufferedOutputStream then it is hard to make it faster.
BTW if you are processing the input as characters as opposed to bytes, you should use readers/writers with BufferedReader and BufferedWriter.
The code should be reading/writing characters with Readers and Writers. For example, if its in the middle of a UTF-8 sequence, or it gets the second half of a UCS-2 character and it happens to read the equivalent byte value of an ampersand, then its going to damage the data that its attempting to copy. Code usually lives longer than you would expect it to, and somebody might try to pick it up later and use it in a situation where this could really matter.
As far as being faster or slower, using a BufferedReader will probably help the most. If you're writing to the file system, a BufferedWriter won't make much of a difference, because the operating system will buffer writes for you and it does a good job. If you're writing to a StringWriter, then buffering will make no difference (may even make it slower), but otherwise buffering your writes ought to help.
You could rewrite it to process arrays; and that might make it faster. You can still do that with arrays. You will have to write more complicated code to handle boundary conditions. That also needs to be a factor in the decision.
Measure, don't guess, and be wary of opinions from people who aren't informed of all the details. Ultimately, its up to you ot figure out if its fast enough for this situation. There is no single answer, because all situations are different.
I would prefer to use BufferedReader for reading input and BufferedWriter for output. Using Regular Expressions for matching your input can make your code short and also improve your time complexity.
I am try to solve a prob in one of the Programming contests.The actual prob is sort a given list of numbers.I am using a algorithm with complexity(nlog n) and that is the maximum level of optimization I can do with the algorithm.Now from the forum I understood that I need a faster I/O for which I should create a new reader class.I/O are from and to the standard I/O.
I want to know how to create a reader class (insted of other standard Reader classes)?
Thanks in Advance!
This question really seems like a "barking up the wrong tree" kind of question. I find it unlikely that you'd be able to subclass Reader and make it run faster, given that you don't know how to do it. If there was an obvious way, wouldn't it already be in java?
If I/O speed is the problem, perhaps it's the method you're using. There are several different types of Readers, and several algorithms to use them. For example, do you read the whole file at once then parse it, or do you read one line at a time? Some of these options may not even be possible depending on the type of file, size of the file, and other conditions.
If you're trying to solve a problem for a programming contest, solving the actual problem should be all that's required. You shouldn't have to create your own Reader class unless that's a part of the problem being described. Besides, you mention that you're getting your direction from a forum. How do you know they even know what they're talking about?
So, I feel like you're doing something wrong here that's outside the scope of the question you asked.
I have a BufferedReader looping through a file. When I hit a specific case, I would like to continue looping using a different instance of the reader but starting at this point.
Any ideas for a recommended solution? Create a separate reader, use the mark function, etc.?
While waiting for your answer to my comment, I'm stuck with making assumptions.
If it's the linewise input you value, you may be as pleasantly surprised as I was to discover that RandomAccessFile now (since 1.4 or 1.5) supports the readLine method. Of course RandomAccessFile gives you fine-grained control over position.
If you want buffered IO, you may consider wrapping a reader around a CharacterBuffer or maybe a ByteBuffer wrapped around a file mapped using the nio API. This gives you the ability to treat a file as memory, with fine control of the read pointer. And because the data is all in memory, buffering is included free of charge.
Have you looked at BufferedReader's mark method? Used in conjunction with reset it might meet your needs.
If you keep track of how many characters you've read so far, you can create a new BufferedReader and use skip.
As Noel has pointed out, you would need to avoid using BufferedReader.readLine(), since readLine() will discard newlines and make your character count inaccurate. You probably shouldn't count on readLine() never getting called if anyone else will ever have to maintain your code.
If you do decide to use skip, you should write your own buffered Reader which will give you an offset counting the newlines.