Concurrently consume stdout from an external process - java

Is there a thread-safe way to concurrently consume the stdout from an external process, using ProcessBuilder in Java 1.6?
Background: I need to invoke pbzip2 to unzip large files to stdout and to process each line as the file is decompressed (pbzip2 utilizes multiple CPUs, unlike other implementations).
The logical approach is to create a child thread to loop over the InputStream (i.e. stdout; don't you just love the naming?), as follows:
while((line = reader.readLine()) != null)
{
// do stuff
}
However, unzipping is slow, so what I really need is for the reader.readLine method to quietly wait for the next line(s) to become available, instead of exiting.
Is there a good way to do this?

You should be able to wrap your input stream with an InputStreamReader and BufferedReader. You can then call readLine() and that will block as required.
Note that you should have a corresponding reader for the stderr. You don't have to do anything with it, but you will need to consume the stderr stream, otherwise your spawned process may well block. See this answer for links etc.

You more or less have the solution yourself. You just create a new thread which reads the next line in a loop from the stream of your external process and processes that line.
readLine() will block and wait until an entire new line is available. If you're on a multicore/processor machine, your external process can happily continue unzipping while your thread processes a line. Atleast unzipping can continue until the OS pipes/buffers becomes full.
Just note that if your processing is slower than unzipping, you'll block the unzipping, and at this point it becomes a memory vs speed issue. e.g. you could create one thread that does nothing but read lines(so unzipping will not block), buffer them up in a queue in memory and a another thread - or even several, that consumes said queue.
readLine method to quietly wait for
the next line(s) to become available,
instead of exiting
nd that's exactly what readLine should do, it will just block until a whole line is available.

Yes.
I have written some code that kicks off a time consuming job (ffmpeg) in a Process (spawned by process builder), and it in turn kicks off my OutputStreamReaderclass that is an extention of Thread that consumes the stdio and does some magic with it.
The catch (for me) was redirecting the error stream. Here is my code snippet:
procbbuilder.redirectErrorStream(true);
proc = pb.start();
err = new MyOutputStreamReader(this, proc.getInputStream()); //extenion of thread
err.start();
int exitCode = proc.waitFor();

Related

Why processBuilder in java hangs after 5 mins?

I hava command line which process more than 5 mins. when I invoke command line with ProcessBuilder, it works the command completes the job with in 5 mins.
Whereas the process hangs if it takes more than 5 mins and shows no improvement on process until I quit the process.
p = new ProcessBuilder("myprogram","with","parameter").start();
p.waitFor();
Please let me know if you doesn't understand the above question?
The problem might be, that command "myprogram" produces some output, and you are not reading it. This means that the process is blocked as soon as the buffer is full and waits for your process to continue reading. Your process in turn waits for the other process to finish (which it won't because it waits for your process, ...). This is a classical deadlock situation.
You need to continually read from the processes input stream to ensure that it doesn't block.
Javadocs says:
Class Process
Because some native platforms only provide limited buffer size for
standard input and output streams, failure to promptly write the input
stream or read the output stream of the subprocess may cause the
subprocess to block, and even deadlock.
Fail to clear the buffer of input stream (which pipes to the output
stream of subprocess) from Process may lead to a subprocess blocking.

BufferedReader.read() hangs when running a perl script using Runtime.exec()

I'm trying to run a perl script from Java code and read it's output with the following code:
String cmd = "/var/tmp/./myscript";
Process process = Runtime.getRuntime().exec(cmd);
stdin = new BufferedReader(new InputStreamReader(process.getInputStream()));
String line;
while((line = stdin.readLine()) != null) {
System.out.println(line);
}
But the code always hangs on the readLine().
I tried using
stdin.read();
Instead but that also hangs.
tried modifying the cmd to
cmd = "perl /var/tmp/myscript";
And also
cmd = {"perl","/var/tmp/myscript"};
But that also hangs.
tried reading the stdin in separate thread. tried reading both stdin and stderr in separate threads. Still no luck.
I know there are many questions here dealing with Process.waitFor() hanging due to not reading the streams, as well as BufferedReader.read() hanging, tried all the suggested solutions, still no luck.
Of course, running the same script on the CLI itself writes output to the standard output (console) and exists with exit code 0.
I'm running on Centos 6.6.
Any help will be appreciated.
I presume that when run directly from the command line, the script runs to completion, producing the expected output, and terminates cleanly. If not, then fix your script first.
The readLine() invocation hanging almost surely means that neither a line terminator nor end-of-file is encountered. In other words, the method is blocked waiting for the script. Perhaps the script produces no output at all under the conditions, but does not terminate. This might happen, for instance, if it expects to read data from its own standard input before it proceeds. It might also happen if it is blocked on output to its stderr.
In the general case, you must read both a Process's stdout and its stderr, in parallel, via the InputStreams provided by getInputstream() and getErrorStream(). You should also handle the OutputStream provided by getOutputStream() by either feeding it the needed standard input data (also in parallel with the reading) or by closing it. You can substitute closing the process's streams for reading them if the particular process you are running does not emit data to those streams, and you normally should close the Process's OutputStream when you have no more data for it. You need to read the two InputStreams even if you don't care about what you read from them, as the process may block or fail to terminate if you do not. This is tricky to get right, but easier to do for specific cases than it is to write generalized support for. And anyway, there's ProcessBuilder, which goes some way toward an easier general-purpose interface.
Try using ProcessBuilder like so:
String cmd = "/var/tmp/./myscript";
ProcessBuilder perlProcessBuilder = new ProcessBuilder(cmd);
perlProcessBuilder.redirectOutput(ProcessBuilder.Redirect.PIPE);
Process process = perlProcessBuilder.start();
stdin = new BufferedReader(new InputStreamReader(process.getInputStream()));
String line;
while((line = stdin.readLine()) != null) {
System.out.println(line);
}
From the ProcessBuilder javadoc (link)
public ProcessBuilder redirectOutput(ProcessBuilder.Redirect destination)
Sets this process builder's standard output destination. Subprocesses subsequently started by this object's start() method send their standard output to this destination.
If the destination is Redirect.PIPE (the initial value), then the standard output of a subprocess can be read using the input stream returned by Process.getInputStream(). If the destination is set to any other value, then Process.getInputStream() will return a null input stream.
Parameters:
destination - the new standard output destination
Returns:
this process builder
Throws:
IllegalArgumentException - if the redirect does not correspond to a valid destination of data, that is, has type READ
Since:
1.7

Reuse an InputStream to a Process in Java

I am using ProcessBuilder to input and receive information from a C++ program, using Java. After starting the process once, I would like to be able to input new strings, and receive their output, without having to restart the entire process. This is the approach I have taken thus far:
public void getData(String sentence) throws InterruptedException, IOException{
InputStream stdout = process.getInputStream();
InputStreamReader isr = new InputStreamReader(stdout);
OutputStream stdin = process.getOutputStream();
OutputStreamWriter osr = new OutputStreamWriter(stdin);
BufferedWriter writer = new BufferedWriter(osr);
BufferedReader reader = new BufferedReader(isr);
writer.write(sentence);
writer.close();
String ch = reader.readLine();
preprocessed="";
while (ch!=null){
preprocessed = preprocessed+"~"+ch;
ch = reader.readLine();
}
reader.close();
}
Each time I want to send an input to the running process, I call this method. However, there is an issue: the first time I send an input, it is fine, and the output is received perfectly. However, the second time I call it, I receive the error
java.io.IOException: Stream closed
which is unexpected, as everything is theoretically recreated when the method is called again. Moreover, removing the line the closes the BufferedWriter results in the code halting at the following line, as if the BufferedReader is waiting for the BufferedWriter to be closed.
One final thing - even when I create a NEW BufferedWriter and instruct the method to use that when called for the second time, I get the same exception, which I do not understand at all.
Is there any way this can be resolved?
Thanks a lot!
Your unexpected IOException happens because when Readers and Writers are closed, they close their underlying streams in turn.
When you call your method the first time, everything appears to work. But you close the writer, which closes the process output stream, which closes stdin from the perspective of the process. Not sure what your C++ binary looks like, but probably it just exits happily when it's done with all its input.
So subsequent calls to your method don't work.
There's a separate but similar issue on the Reader side. You call readLine() until it returns null, meaning the Reader has felt the end of the stream. But this only happens when the process is completely done with its stdout.
You need some way of identifying when you're done processing a unit of work (whatever you mean by "sentence") without waiting for the whole entire stream to end. The stream has no concept of the logical pause between outputs. It's just a continuous stream. Reader and Writer are just a thin veneer to buffer between bytes and characters but basically work the same as streams.
Maybe the outputs could have delimiters. Or you could send the length of each chunk of output before actually sending the output and distinguish outputs that way. Or maybe you know in advance how long each response will be?
You only get one shot through streams. So they will have to outlive this method. You can't be opening and closing streams if you want to avoid restarting your process every time. (There are other ways for processes to communicate, e.g. sockets, but that's probably out of scope.)
On an orthogonal note, appending to a StringBuilder is generally more efficient than a big loop of string concatenations when you're accumulating your output.
You might also have some thread check process.exitValue() or otherwise make sure the process is working as intended.
Don't keep trying to create and close your Streams, because once you close it, it's closed for good. Create them once, then in your getData(...) method use the existing Streams. Only close your Streams or their wrapping classes when you're fully done with them.
Note that you should open and close the Streams in the same method, and thus may need additional methods or classes to help you process the Streams. Consider creating a Runnable class for this and then reading from the Streams in another Thread. Also don't ignore the error stream, as that may be sending key information that you will need to fully understand what's going on here.

what are the concern regarding simultaneous read and write to a file?

consider the following scenario:
Process 1 (Writer) continuously appends a line to a file ( sharedFile.txt )
Process 2 (Reader) continuously reads a line from sharedFile.txt
my questions are:
In java is it possible that :
Reader process somehow crashes Writer process (i.e. breaks the process of Writer)?
Reader some how knows when to stop reading the file purely based on the file stats (Reader doesn't know if others are writing to the file)?
to demonsterate
Process one (Writer):
...
while(!done){
String nextLine;//process the line
writeLine(nextLine);
...
}
...
Process Two (Reader):
...
while(hasNextLine()){
String nextLine= readLine();
...
}
...
NOTE:
Writer Process has priority. so nothing must interfere with it.
Since you are talking about processes, not threads, the answer depends on how the underlying OS manages open file handles:
On every OS I'm familiar with, Reader will never crash a writer process, as Reader's file handle only allows reading. On Linux, system calls a Reader can potentially invoke on the underlying OS are open(2) with O_RDONLY flag, lseek(2) and read(2) -- are known not to interfere with the syscalls that the Writer is invoking, such as write(2).
Reader most likely won't know when to stop reading on most OS. More precisely, on some read attempt it will receive zero as the number of read bytes and will treat this as an EOF (end of file). At this very moment, there can be Writer preparing to append some data to a file, but Reader have no way of knowing it.
If you need a way for two processes to communicate via file, you can do it using some extra files that pass meta-information between Readers and Writers, such as whether there are Writer currently running. Introducing some structure into a file can be useful too (for example, every Writer appends a byte to a file indicating that the write process is happening).
For very fast non-blocking I/O you may want consider memory mapped files via Java's MappedByteBuffer.
The code will not crash. However, the reader will terminate when the end is reached, even if the writer may still be writing. You will have to synchronize somehow!
Concern:
Your reader thread can read a stale value even when you think another writer thread has updated the variable value
Even if you write to a file if synchronization is not there you will see a different value while reading
Java File IO and plain files were not designed for simultaneous writes and reads. Either your reader will overtake your writer, or your reader will never finish.
JB Nizet provided the answer in his comment. You use a BlockingQueue to hold the writer data while you're reading it. Either the queue will empty, or the reader will never finish. You have the means through the BlockingQueue methods to detect either situation.

Java hangs when trying to close a ProcessBuilder OutputStream

I have the following Java code to start a ProcessBuilder, open an OutputStream, have the process write a string to an OutputStream, and then close the OutputStream. The whole thing hangs indefinitely when I try to close the OutputStream. This only happens on Windows, never on Mac or Linux.
Some of the related questions seem to be close to the same problem I'm having, but I haven't been able to figure out how to apply the answers to my problem, as I am a relative newbie with Java. Here is the code. You can see I have put in a lot of println statements to try to isolate the problem.
System.out.println("GenMic trying to get the input file now");
System.out.flush();
OutputStream out = child.getOutputStream();
try {
System.out.println("GenMic getting ready to write the input file to out");
System.out.flush();
out.write(intext.getBytes()); // intext is a string previously created
System.out.println("GenMic finished writing to out");
System.out.flush();
out.close();
System.out.println("GenMic closed OutputStream");
System.out.flush();
} catch (IOException iox) {
System.out.println("GenMic caught IOException 2");
System.out.flush();
String detailedMessage = iox.getMessage();
System.out.println("Exception: " + detailedMessage);
System.out.flush();
throw new RuntimeException(iox);
}
And here is the output when this chunk is executed:
GenMic trying to get the input file now
GenMic getting ready to write the input file to out
GenMic finished writing to out
You need to make sure that the streams returned by getInputStream() and getOutputStream() are drained on individual threads and these threads are different from the one on which you close the stream returned by getOutputStream().
Basically it is a requirement to have at least 3 threads per sub-process if you want to manipulate and examine its stdin, stdout and stderr. One of the threads, depending on your circumstances, may be your current execution thread ( the one on which you create ProcessBuilder ).
When that happened to me it was because I hadn't read everything from the stream being written to by the process.
The API docs for the java.lang.Process class say:
The created subprocess does not have its own terminal or console. All its standard io (i.e. stdin, stdout, stderr) operations will be redirected to the parent process through three streams (getOutputStream(), getInputStream(), getErrorStream()). The parent process uses these streams to feed input to and get output from the subprocess. Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock.
I would try calling getInputStream() on the Process instance and writing a loop to read one byte at a time until it reaches EOF. And I'd do the same thing with getErrorStream() just in case the process is writing to stderr.
Do you have anything reading from stdout/stderr of the process ?
It's quite likely the process tries to output something, but gets blocked since noone is reading the output. Meaning your out.flush() or out.close() blocks as the process can't get around to process the input since its blocked doing output.

Categories

Resources