As understood, InputStream once consumed, cannot be reused/ re-read back, as it would not have any content.
But, using PushbackInputStream, we could unread the bytes read back into the inputstream.
Can we use PushbackInputStream safely to make InputStream consumable repeatedly?
Yes, within its pushback limit, and so is BufferedInputStream with the mark() and reset() features and an adequate buffer, but if you have a solution that involves re-reading input, you are designing it wrong. A compiler can proceed left to right along the input file without ever backtracking: so can you.
Related
It seems to me that InputStream and OutputStream are ambiguous names for I/O.
InputStream can be thought of as "to input into a stream", and OutputStream can be thought of as "get output of a stream".
After all, we read from an "input" stream, but shouldn't you be reading from an "output"?
What was the rationale behind choosing these two names and what is a good way to remember Input/Output stream without confusing one for the other?
The streams are named not for how you use them inside your code but for what they accomplish. An InputStream accomplishes reading input from somewhere outside your program (the console, a file, etc.), whereas an OutputStream accomplishes writing an output to somewhere else (again, console, file, etc.). Your Java code is only the intermediary in this scenario: In order to make use of the input, you have to read it from the stream, and in order to produce an output, you first have to write something to the stream.
The problem with the naming is only that streams by design always have something that goes in and something that comes out - you can always read and write on/with any stream. All you have to remember is that they are named for the more important task they do: interacting with something outside your code.
Think of your program/code as the Actor.
When the Actor wants to read something in, it seeks an handle to
InputStream cause its this stream that will provide the Input. And hence when you Read from it.
When the Actor wants to write something out, it seeks an handle
to OutputStream and then start writing to the handle which will do
the rest. Likewise you Write to it.
I hope this answers. I just visualize my code as the classic Stick Diagram Actor and InputStream and OutputStream as the entities with which you interact.
I have the following piece of code in Java:
HttpURLConnection con = (HttpURLConnection)new URL(url).openConnection();
con.connect();
InputStream stream = con.getInputStream();
BufferedReader file = new BufferedReader(new InputStreamReader(stream));
At this point, I read the file from start to end while searching for something:
while (true)
{
String line = file.readLine();
if (line == null)
break;
// Search for something...
}
Now I want to search for something else within the file, without opening another URL connection.
For reasons unrelated to this question, I wish to avoid searching for both things "in a single file-sweep".
Questions:
Can I rewind the file with reset?
If yes, should I apply it on the InputStream object, on the BufferedReader object or on both?
If no, then should I simply close the file and reopen it?
If yes, should I apply it on the InputStream object, on the BufferedReader object or on both?
If no, how else can I sweep the file again, without reading through the URL connection again?
You can rewind the file with reset(), provided that you have mark()'ed the position you want to rewind to. These methods should be invoked on the decorator, i.e. BufferedReader.
However, you may probably want to reconsider your design as you can easily read the whole file into some data structure (even a list of strings, or some stream backed by a string) and use the data multiple times.
Use the following methods:
mark
skip
reset
You can do it only if markSupported() returns true. Please note that actually reader typically does not add this functionality but delegates it to wrapped intput stream, so always call markSupported() and keep in mind that it can return false for streams that do not support this feature.
For example it really can happen for URL based streams: think, how can you reset stream that is originated from remote server. This may require client side to cache all content that you have already downloaded.
I usually end up using something like InputStreamSource to make re-reading convenient. When I'm dealing with connections, I find it useful to use an in-memory or on-disk spooling strategy for re-reading. Use a threshold for choosing storage location, "tee" into the spool on first read, and re-read from the spool on subsequent reads.
Edit: Also found guavas ByteSource and CharSource which have the same purpose.
I have a Java app that fetches a relatively small .zip file using a URL, saves it in a temp directory, unzips it onto the local machine and makes some changes to one of the files. This all works great.
However, I am accessing the .zip file via a BufferedInputStream in the following way:
Url url = "http://somedomain.com/file.zip";
InputStream is = new BufferedInputStream(url.openStream(), 1024);
My concern is that this app will actually be used to transfer very large zip files and I was wondering if a BufferedInputStream is actually the best way to do this, or whether I would just end up throwing some type of OutOfMemoryException?
So my question is, will a BufferedInputStream be suitable for this job, or should I be going about it in a completely different way?
BufferedInputStream doesn't load all the file into memory, it only uses an internal buffer, in your case of size 1024 bytes = 1kb. It never gets larger than that. You could actually increase the value if you aren't going to have many streams at once.
Edit: what you are thinking about maybe is a ByteArrayOutputStream, where data is saved in memory.
It depends on what you do with the content you read. If you read everything in memory, it will fail. If you write it to another stream, then it will be fine. Use BufferedInputStream
From the official Java Tutorials - Buffered Streams:
The Java platform implements buffered I/O streams. Buffered input
streams read data from a memory area known as a buffer; the native
input API is called only when the buffer is empty. Similarly, buffered
output streams write data to a buffer, and the native output API is
called only when the buffer is full.
There is another great SUN article.
So the answer is: BufferedInputStream is suitable for this kind of job in the sense of performance.
And yes, the memory consumption isn't so much dependent on the type of the input stream....
I'm trying to read java.io.InputStream multiple times starting from the top of the stream.
Obviously for streams that return true to markSupported() I can try and use mark(availableBytes) and then reset() to read stream again from the top.
Most of the streams do not support mark and those that do (e.g. java.io.BufferedInputStream) copy data into temporary byte array which is not nice in term of memory consumption, etc.
If my method receives java.io.InputStream as a parameter can I close it and then somehow reopen it to reset same original stream to the top so I can read it again?
I couldn't find any way to do this trick apart from writing original InputStream into memory (yak!) or temporary file and than opening new InputStream to those temporary locations if I need to read stream from top again.
You can close it, but the only way to reopen the same stream to the same data without creating an explicit copy of the data somewhere is to determine what concrete type of InputStream you are dealing with (easy), what that stream was initialized to point at (may be easy, hard, or impossible depending upon the stream type and its interface), and then adding code to instantiate a new instance of the concrete stream type with the original source input (not difficult, but also not very maintainable and easily breakable if someone creates a custom InputStream implementation that you don't know how to handle).
Good day,
Currently, we are using ByteArrayInputStream for our reset-able InputStream. My problem with it is that it consumes a lot of memory (it loads all the bytes it represents in memory unlike some other InputStream implementations).
My question then is, is there any lighter implementation of InputStream which supports mark() & read()?
I tried searching in commons-io as well, but I fail to see any.
Thanks,
Franz
Would using a BufferedInputStream work for you? Without knowing where the original data is coming from (eg, why you have a ByteArrayInputStream) to begin with it is a bit hard to answer your question.
I most often use a PushbackInputStream when parsing data, and have the need to go back and re-read the data. Here is an explanation:
http://tutorials.jenkov.com/java-io/pushbackinputstream.html
There is also a PushbackReader should you need a character based stream instead.