Read from a BufferedReader more than once in Java - java

I have the following piece of code in Java:
HttpURLConnection con = (HttpURLConnection)new URL(url).openConnection();
con.connect();
InputStream stream = con.getInputStream();
BufferedReader file = new BufferedReader(new InputStreamReader(stream));
At this point, I read the file from start to end while searching for something:
while (true)
{
String line = file.readLine();
if (line == null)
break;
// Search for something...
}
Now I want to search for something else within the file, without opening another URL connection.
For reasons unrelated to this question, I wish to avoid searching for both things "in a single file-sweep".
Questions:
Can I rewind the file with reset?
If yes, should I apply it on the InputStream object, on the BufferedReader object or on both?
If no, then should I simply close the file and reopen it?
If yes, should I apply it on the InputStream object, on the BufferedReader object or on both?
If no, how else can I sweep the file again, without reading through the URL connection again?

You can rewind the file with reset(), provided that you have mark()'ed the position you want to rewind to. These methods should be invoked on the decorator, i.e. BufferedReader.
However, you may probably want to reconsider your design as you can easily read the whole file into some data structure (even a list of strings, or some stream backed by a string) and use the data multiple times.

Use the following methods:
mark
skip
reset
You can do it only if markSupported() returns true. Please note that actually reader typically does not add this functionality but delegates it to wrapped intput stream, so always call markSupported() and keep in mind that it can return false for streams that do not support this feature.
For example it really can happen for URL based streams: think, how can you reset stream that is originated from remote server. This may require client side to cache all content that you have already downloaded.

I usually end up using something like InputStreamSource to make re-reading convenient. When I'm dealing with connections, I find it useful to use an in-memory or on-disk spooling strategy for re-reading. Use a threshold for choosing storage location, "tee" into the spool on first read, and re-read from the spool on subsequent reads.
Edit: Also found guavas ByteSource and CharSource which have the same purpose.

Related

InputStream and OutputStream - How to differentiate ambiguity

It seems to me that InputStream and OutputStream are ambiguous names for I/O.
InputStream can be thought of as "to input into a stream", and OutputStream can be thought of as "get output of a stream".
After all, we read from an "input" stream, but shouldn't you be reading from an "output"?
What was the rationale behind choosing these two names and what is a good way to remember Input/Output stream without confusing one for the other?
The streams are named not for how you use them inside your code but for what they accomplish. An InputStream accomplishes reading input from somewhere outside your program (the console, a file, etc.), whereas an OutputStream accomplishes writing an output to somewhere else (again, console, file, etc.). Your Java code is only the intermediary in this scenario: In order to make use of the input, you have to read it from the stream, and in order to produce an output, you first have to write something to the stream.
The problem with the naming is only that streams by design always have something that goes in and something that comes out - you can always read and write on/with any stream. All you have to remember is that they are named for the more important task they do: interacting with something outside your code.
Think of your program/code as the Actor.
When the Actor wants to read something in, it seeks an handle to
InputStream cause its this stream that will provide the Input. And hence when you Read from it.
When the Actor wants to write something out, it seeks an handle
to OutputStream and then start writing to the handle which will do
the rest. Likewise you Write to it.
I hope this answers. I just visualize my code as the classic Stick Diagram Actor and InputStream and OutputStream as the entities with which you interact.

When are readers/writers/streams identified as being open?

I am creating an abstract binding class for a Reader and Writer where the user doesn't have to reference each one individually.
Example: I have a FileStream which inside of it houses both a FileReader and FileWriter.
The question I have refers to optimizing the class. I know I can't have two streams opened simultaneously due to concurrency, however I need to initialize them somewhere without having data leaks all over the place.
Are streams/readers/writers classified as being open, as soon as you initialize them, or are the 'pipes' only opened once the first read/write begins? I'm looking at the JavaDoc and don't see anything here about when the streams actually open up...
For those who do not understand what I am asking (ignoring try-catch blocks):
// does my reader become OPEN here?
BufferedReader br = new BufferedReader(new FileReader("foobar.txt"));
// or here, now that I have performed the first operation.
br.readLine();
They are open as soon as you construct them. There is no 'open' operation, so they are already open.
Discussion:
new FileInputStream(...) and new FileOutputStream(...) open the file, as they throw IOExceptions about it. Practically every other input or output stream extends FilterInput/OutputStream, with a FileInput/OutputStream as its delegate (including socket input/output streams as a matter of fact). The FileInput/OutputStream is created first in any such stack, ergo it is already open before the decorator streams, ergo they are already open too.
ByteArrayInput/OutputStreams and StringReader/Writer don't need opening at all.
Alternative solution: forget about re-inventing the wheel.
Java has a class that is specifically designed to allow for reading and writing to the same file: java.io.RandomAcessFile
So, if you have to wrap around... Use that class, instead of combining two other things that were never intended to be combined!

Stream chaining in Java

Is it bad style to keep the references to streams "further down" a filter chain, and use those lower level streams again, or even to swap one type of stream for another? For example:
OutputStream os = new FileOutputStream("file");
PrintWriter pw = new PrintWriter(os);
pw.print("print writer stream");
pw.flush();
pw = null;
DataOutputStream dos = new DataOutputStream(os);
dos.writeBytes("dos writer stream");
dos.flush();
dos = null;
os.close();
If so, what are the alternatives if I need to use the functionality of both streams, e.g. if I want to write a few lines of text to a stream, followed by binary data, or vice versa?
This can be done in some cases, but it's error-prone. You need to be careful about buffers and stuff like the stream headers of ObjectOutputStream.
if I want to write a few lines of text to a stream, followed by binary
data, or vice versa?
For this, all you need to know is that you can convert text to binary data and back but always need to specify an encoding. However, it is also error-prone because people tend to use the API methods that use the platform default encoding, and of course you're basically implementing a parser for a custom binary file format - lots of things can go wrong there.
All in all, if you're creating a file format, especially when mixing text and binary data, it's best to use an existing framework like Google protocol buffers
If you have to do it, then you have to do it. So if you're dealing with an external dependency that you don't have control over, you just have to do it.
I think the bad style is the fact that you would need to do it. If you had to send binary data across sometimes, and text across at others, it would probably be best to have some kind of message object and send the object itself over the wire with Serialization. The data overhead isn't too much if structured properly.
I don't see why not. I mean, the implementations of the various stream classes should protect you from writing invalid data. So long as you're reading it back the same way, and your code is otherwise understandable, I don't see why that would be a problem.
Style doesn't always mean you have to do it the way you've seen others do it. So long as it's logical, and someone reading the code would see what (and why) you're doing it without you needing to write a bunch of comments, then I don't see what the issue is.
Since you're flushing between, it's probably fine. But it might be cleaner to use one OutputStream and just use os.write(string.getBytes()); to write the strings.

(Java) File redirection (both ways) within Runtime.exec?

I want to execute this command:
/ceplinux_work3/myName/opt/myCompany/ourProduct/bin/EXECUTE_THIS -p cepamd64linux.myCompany.com:19021/ws1/project_name < /ceplinux_work3/myName/stressting/Publisher/uploadable/00000.bin >> /ceplinux_work3/myName/stressting/Publisher/stats/ws1.project_name.19021/2011-07-22T12-45-20_PID-2237/out.up
But it doesn't work because EXECUTE_THIS requires an input file via redirect, and simply passing this command to Runtime.exec doesn't work.
Side note: I searched all over on how to solve this before coming here to ask. There are many questions/articles on the web regarding Runtime.exec and Input/Output redirect. However, I cannot find any that deal with passing a file to a command and outputting the result to another file. Plus, I am totally unfamiliar with Input/Output streams, so I have a hard time putting all the info out there together for my specific situation.
That said, any help is much appreciated.
P.S. If there are multiple ways to do this, I prefer whatever is fastest in terms of throughput.
Edit: As discussed in my last question, I CANNOT change this to a bash call because the program must wait for this process to finish before proceeding.
Unless you are sending a file name to the standard input of the process, there is no distinction of whether the data came from a file or from any other data source.
You need to write to the OutputStream given by Process.getOutputStream(). The data you write to it you can read in from a file using a FileInputStream.
Putting that together might look something like this:
Process proc = Runtime.getRuntime().exec("...");
OutputStream standardInputOfChildProcess = proc.getOutputStream();
InputStream dataFromFile = new FileInputStream("theFileWithTheData.dat");
byte[] buff = new byte[1024];
for ( int count = -1; (count = dataFromFile.read(buff)) != -1; ) {
standardInputOfChildProcess.write(buff, 0, count);
}
I've left out a lot of details, this is just to get the gist of it. You'll want to safely close things, might want to consider buffering and you need to worry about the pitfalls of Runtime.exec().
Edit
Writing the output to a file is similar. Obtain a FileOutputStream pointing to the output file and write the data you read from Process.getInputStream() to that OutputStream. The major caveat here is that you must do this operation in a second thread, since accessing two blocking streams from the same thread will lead to deadlock (see the article above).

Can I close/reopen InputStream to mimic mark/reset for input streams that do not support mark?

I'm trying to read java.io.InputStream multiple times starting from the top of the stream.
Obviously for streams that return true to markSupported() I can try and use mark(availableBytes) and then reset() to read stream again from the top.
Most of the streams do not support mark and those that do (e.g. java.io.BufferedInputStream) copy data into temporary byte array which is not nice in term of memory consumption, etc.
If my method receives java.io.InputStream as a parameter can I close it and then somehow reopen it to reset same original stream to the top so I can read it again?
I couldn't find any way to do this trick apart from writing original InputStream into memory (yak!) or temporary file and than opening new InputStream to those temporary locations if I need to read stream from top again.
You can close it, but the only way to reopen the same stream to the same data without creating an explicit copy of the data somewhere is to determine what concrete type of InputStream you are dealing with (easy), what that stream was initialized to point at (may be easy, hard, or impossible depending upon the stream type and its interface), and then adding code to instantiate a new instance of the concrete stream type with the original source input (not difficult, but also not very maintainable and easily breakable if someone creates a custom InputStream implementation that you don't know how to handle).

Categories

Resources