Understanding I/O blocking

Understanding I/O blocking - java

I have a binary linux executable which prints some bytes in stdout. I consume these bytes from Java application as this:
String[] cmd;
Process p = Runtime.getRuntime().exec(cmd);
InputStream is = p.getInputStream();
int r = is.read();
while(r != -1){
System.out.println(r);
r = is.read(); //1
}
But after some time of working the //1 is blocked for I/O forever (dead-lock). I created thread dump and noticed that
"pool-2-thread-1#627" prio=5 tid=0xd nid=NA runnable
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(FileInputStream.java:-1)
at java.io.FileInputStream.read(FileInputStream.java:255)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked <0x2c0> (a java.lang.UNIXProcess$ProcessPipeInputStream)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
It's locked on a native method
private native int readBytes(byte[] var1, int var2, int var3) throws IOException;
What's a possible reason this method is blocked forever? Maybe this is platform-specific. I'm using Ubuntu 16.04.

You don't need to look at the internal methods of the standard library for an answer. The docs of InputStream.read() specify, in part, that
If no byte is available because the end of the stream has been
reached, the value -1 is returned. This method blocks until input data
is available, the end of the stream is detected, or an exception is
thrown.
Perhaps the key point of confusion is the meaning of "the end of the stream is detected." This does not mean that no more bytes are available to read right now -- that wouldn't be consistent with specification that the method blocks until data is available, and in practice it would work out poorly in many situations. Rather, the end of the stream is detected when the system receives some sort of signal that no more bytes will ever be available from it. Generally, that means that the stream has been closed on the source end, and all bytes drained from it at the destination end.
In your particular case, if the external process remains running, does not write anything further to its standard output, yet does not close its standard output, then Java (or a native consumer) will never see end-of-stream on that process's output.

Your InputStream is waiting for the process to either supply another byte, or close the stream (by finishing).
One reason the process might be doing neither of those things, is that it's written to stderr, and is blocking on that write.
Possibly there is a difference in environment between your shell and the environment used by Java, that means the command is resulting in output to stderr (e.g. the program is not in $PATH; you don't have adequate permissions; your CWD isn't what you expect, etc.)
A quick and dirty way to find out if this is the problem, is to redirect stderr to stdout -- either with ProcessBuilder.redirectErrorStream() or by using 2>&1 on the UNIX side of things.
If this is the case, then it is a form of deadlock -- the Java program is waiting for the native program to write to one pipe. The native program is waiting for the Java program to read from another.
The clean way to handle it, which you might choose to do in the longer term, is to use getErrorStream() and handle its output. This isn't exactly trivial, because in a single-threaded program using blocking reads, you can never know which stream is going to have data. You can either do blocking reads in separate threads, or use NIO to handle both inputs in a non-blocking manner.
By the way - note that the Java docs advise ProcessBuilder.start() over Runtime.exec(), since Java 1.5.

There are five different type IO model under Linux , they are respectively
Blocking IO
NonBlocking IO
IO Multiplexing
Signal-Driven
Asynchornours IO
java.io.FileInputStream.readBytes() use first type, Blocking IO. Application waits until kernel returns data.

Related

How to deal with a slow consumer in traditional Java NIO?

So, I've been brushing up my understanding of traditional Java non-blocking API. I'm a bit confused with a few aspects of the API that seem to force me to handle backpressure manually.
For example, the documentation on WritableByteChannel.write(ByteBuffer) says the following:
Unless otherwise specified, a write operation will return only after
writing all of the requested bytes. Some types of channels,
depending upon their state, may write only some of the bytes or
possibly none at all. A socket channel in non-blocking mode, for
example, cannot write any more bytes than are free in the socket's
output buffer.
Now, consider this example taken from Ron Hitchens book: Java NIO.
In the piece of code below, Ron is trying to demonstrate how we could implement an echo response in a non-blocking socket application (for context here's a gist with the full example).
//Use the same byte buffer for all channels. A single thread is
//servicing all the channels, so no danger of concurrent access.
private ByteBuffer buffer = ByteBuffer.allocateDirect(1024);
protected void readDataFromSocket(SelectionKey key) throws Exception {
var channel = (SocketChannel) key.channel();
buffer.clear(); //empty buffer
int count;
while((count = channel.read(buffer)) > 0) {
buffer.flip(); //make buffer readable
//Send data; don't assume it goes all at once
while(buffer.hasRemaining()) {
channel.write(buffer);
}
//WARNING: the above loop is evil. Because
//it's writing back to the same nonblocking
//channel it read the data from, this code
//can potentially spin in a busy loop. In real life
//you'd do something more useful than this.
buffer.clear(); //Empty buffer
}
if(count < 0) {
//Close channel on EOF, invalidates the key
channel.close();
}
}
My confusion is on the while loop writing into output channel stream:
//Send data; don't assume it goes all at once
while(buffer.hasRemaining()) {
channel.write(buffer);
}
It really confuses me how NIO is helping me here. Certainly the code may not be blocking as per the description of the WriteableByteChannel.write(ByteBuffer), because if the output channel cannot accept any more bytes because its buffer is full, this write operation does not block, it just writes nothing, returns, and the buffer remains unchanged. But --at least in this example-- there is no easy way to use the current thread in something more useful while we wait for the client to process those bytes. For all that matter, if I only had one thread, the other requests would be piling up in the selector while this while loop wastes precious cpu cycles “waiting” for the client buffer to open some space. There is no obvious way to register for readiness in the output channel. Or is there?
So, assuming that instead of an echo server I was trying to implement a response that needed to send a big number of bytes back to the client (e.g. a file download), and assuming that the client has a very low bandwidth or the output buffer is really small compared to the server buffer, the sending of this file could take a long time. It seems as if we need to use our precious cpu cycles attending other clients while our slow client is chewing our file download bytes.
If we have readiness in the input channel, but not on the output channel, it seems this thread could be using precious CPU cycles for nothing. It is not blocked, but it is as if it were since the thread is useless for undetermined periods of time doing insignificant CPU-bound work.
To deal with this, Hitchens' solution is to move this code to a new thread --which just moves the problem to another place--. Then I wonder, if we had to open a thread every time we need to process a long running request, how is Java NIO better than regular IO when it comes to processing this sort of requests?
It is not yet clear to me how I could use traditional Java NIO to deal with these scenarios. It is as if the promise of doing more with less resources would be broken in a case like this. What if I were implementing an HTTP server and I cannot know how long it would take to service a response to the client?
It appears as if this example is deeply flawed and a good design of the solution should consider listening for readiness on the output channel as well, e.g.:
registerChannel(selector, channel, SelectionKey.OP_WRITE);
But how would that solution look like? I’ve been trying to come up with that solution, but I don’t know how to achieve it appropriately.
I'm not looking for other frameworks like Netty, my intention is to understand the core Java APIs. I appreciate any insights anyone could share, any ideas on what is the proper way to deal with this back pressure scenario just using traditional Java NIO.

NIO's non-blocking mode enables a thread to request reading data from a channel, and only get what is currently available, or nothing at all, if no data is currently available. Rather than remain blocked until data becomes available for reading, the thread can go on with something else.
The same is true for non-blocking writing. A thread can request that some data be written to a channel, but not wait for it to be fully written. The thread can then go on and do something else in the meantime.
What threads spend their idle time on when not blocked in IO calls, is usually performing IO on other channels in the meantime. That is, a single thread can now manage multiple channels of input and output.
So I think you need to rely on the design of the solution by using a design pattern for handling this issue, maybe **Task or Strategy design pattern ** are good candidates and according to the framework or the application you are using you can decide the solution.
But in most cases you don't need to implement it your self as it's already implemented in Tomcat, Jetty etc.
Reference : Non blocking IO

How to set a timeout when reading from a Java RandomAccessFile

I am writing to and reading from a Linux file in java, which in reality is a communication port to a hardware device. To do this I use RandomAccessFile (I'll explain why later) and it works well in most cases. But sometimes a byte is lost and then my routine blocks indefinitely since there is no timeout on the read method.
To give some more details on the file: it is a USB receipt printer that creates a file called /dev/usb/lp0 and though I can use a cups driver to print, I still need the low level communication through this file to query the status of the printer.
The reason I use RandomAccessFile is that I can have the same object for both reading and writing.
I tried to make a version with InputStream and OutputStream instead (since that would allow me to use the available() method to implement my timeout). But when I first open the InputStream and then the OutputStream I get an exception when opening the OutputStream since the file is occupied.
I tried writing with the OutputStream and then closing it before opening the InputStream to read, but then I lose some or all of the reply before it has opened the InputStream.
I tried switching to channels instead (Files.newByteChannel()). This also allows me to have just one object, and the documentation says it only reads the bytes available and returns the count (which also allows me to implement a timeout). But it blocks in the read method anyway when there is nothing to read, despite what the documentation says.
I also tried a number of ways to implement timeouts on the RandomAccessFile using threads.
The first approach was to start a separate thread at the same time as starting to read, and if the timeout elapsed in the thread I closed the file from the thread, hoping that this would unlock the read() operation with an exception, but it didn't (it stayed blocked).
I also tried to do the read in a separate thread and brutally kill it with the deprecated Thread.stop() once the time had elapsed. This worked one time, but it was not possible to reopen the file again after that.
The only solution I have made work is to have a separate thread that continuously calls read, and whenever it gets a byte it puts it in a LinkedBlockingQueue, which I can read from with a timeout. This approach works, but the drawback is that I can never close the file (again for the same reasons explained above, I can't unblock a blocked read). And my application requires that I sometimes close this connection to the hardware.
Anyone who can think of a way to read from a file with timeout that would work in my case (that allows me to have both a read and a write access open to the file at the same time)?
I am using Java8 by the way.

Timeout on opening and reading from a named pipe in Java

In my current design I have a named pipe which can be sequentially written to by an unspecified number of writer processes. There is only one reader implemented in Scala, but for the sake of simplicity we can assume it's implemented in Java. The operating system is Linux >= 2.6.
The reader needs to:
re-open the pipe after each writer sends its input
read all the input from pipe until EOF which indicates that the writer closed its end of the named pipe
The main difficulty here is that the reader needs to be able to cancel both open and read calls after a given timeout. Reaching the timeout indicates that all writers have done their job and the reader can safely exit.
In C, I would:
first call open(file_name, O_NONBLOCK) and poll for the pipe to be open for writing
poll for reading in non-blocking mode or change file descriptor to blocking mode and use select()
What is the most straightforward way to complete this in Java? I've looked at the classic IO and NIO but there more I try, the more complex the design becomes and it still doesn't do exactly what I want.

Executed C binary from java and reading from output stream of the process

Ok, So i am trying to read the output of a c binary from java code and I am unable to figure out whether the communication channel is blocking or non blocking.
The setup is such:
A java class (A.java) is run
A.java runs a c binary (B.o) using Runtime.getRuntime().exec("B.o"). At this point I have the Process object (returned by Runtime.exec)
A.java reads from the input stream of the Process object using a bufferedreader
A.java outputs the data read from the input stream to a file (output.txt)
The B.o binary simply prints random lines using printf function call.
Now, if I run the above setup, I receive all the data sent by B.o flawlessly. Then to test (the blocking / nonblocking thing), I changed the A.java to sleep for 5 milliseconds after every read from the inputstream of the Process object of B.o. As it turned out, now I wasn't receiving the complete data in A.java send by B.o. This indicates that the communication channel being used is non-blocking (as per my weak understanding).
Then just to make sure, I started looking at the source code of java to see if I was right. And I have found the following so far:
Every call to Runtime.getRuntime().exec(...) ends up in forkAndExec() method in ProcessImpl_md.c. In ProcessImpl_md.c the command is executed, a process is created, and PIPES are setup for communication (using the pipe function call in c). I can't find anywhere in the source code where the PIPES are being set to nonblocking mode (as indicated by my code). I am assuming the PIPES are blocking by default.
I know this is a very bad way to check what I want to check. I am way out of my depth here and I am just head-banging uselessly, I think.
Can anyone point me in the right direction or tell me:
Are the PIPES of a process created through java runtime API are blocking or non-blocking?
When I make A.java sleep after reading from the input stream, why all data is not received? (Assumption being that the PIPE is blocking)
Any non-programmatic way (i.e. I don't have to change the source code of java and etc!) to figure out if the PIPES of a process are blocking or non-blocking?
Thank you.
EDIT: (added code)
Following is not the actual (or even compilable) code but it shows what i am trying to do.
Source of "B.o":
#include <stdio.h>
void main(int argc, char*argv[]){
int a = 0;
for(; a<9000000; a++){
printf("%s", argv[1]);
}
}
Source of "A.java":
<java imports>
public class A{
public static void main(String[] args) throws Exception{
Process p = Runtime.getRuntime().exec("./B.o");
BufferedReader br = new
BufferedReader(new InputStreamReader(p.getInputStream()));
int a = 0;
while(br.readLine() != null){
a++;
Thread.sleep(5);//data missed if this line not commented out
}
br.close();
System.out.println(a);
}
}
PLEASE CHECK MY ANSWER. USELESS QUESTION BY ME.

Whether the communication channels between Java and the external program (there are three, one from Java to native, and two coming back) are operating in blocking or non-blocking mode is not directly relevant to whether all data will be successfully transmitted across each. Likewise, delays between read requests are not directly relevant to whether all data will be successfully transmitted, regardless of blocking vs. non-blocking I/O in your particular implementation of java.lang.Process.
Really, your efforts to probe blocking vs. non-blocking inter-process I/O are futile, because the I/O interface provided to your Java program is based on InputStream and OutputStream, which provide only for blocking I/O. Even if non-blocking I/O were involved at some low level of the implementation, I can't think of any way for your program to detect that.
With respect to your specific questions, however:
Are the PIPES of a process created through java runtime API are blocking or non-blocking?
They could be either, but they are more likely blocking because that better matches the interface presented to the Process user.
When I make A.java sleep after reading from the input stream, why all data is not received? (Assumption being that the PIPE is blocking)
I can only speculate, but the problem is likely in the external program. Possibly it goes to sleep when its output buffer fills, and nothing happens to wake it up. It might help to invoke myProcess.getOutputStream().close() if your Java program is not sending data to the external program. It's in any case a good idea to close that stream once you've written to it everything you're ever going to write.
Any non-programmatic way (i.e. I don't have to change the source code of java and etc!) to figure out if the PIPES of a process are blocking or non-blocking?
Potentially you could run the VM under strace or connect a native debugger to it, and analyze the VM's behavior that way. If you mean to do this from inside Java then the answer is a resounding "NO". Your Java program will see blocking behavior under all circumstances because the contracts of InputStream and OutputStream demand it.

I was making a big blunder and was completely off base. Posting this answer to clear things up (though I would like to delete the question altogether). I wanted to know if the communication channels between a C binary run from Java code are blocking or non-blocking. And I mentioned that the data was missing when I made my java code sleep after reading from the input stream of the created process (of C code). The data wasn't missing because of that. I had actually put a timer in Java code after which to terminate the process of the C binary. And since the PIPES are blocking, it wasn't able to receive all the data before the timer expired. I was misinterpreting this loss of data to mean that the PIPES were non-blocking. Confirmed this by running STRACE on the created C binary process. There were no EAGAIN errors on the write syscalls. My bad. But thank you very much to all for taking out the time to respond.

Concurrently consume stdout from an external process

Is there a thread-safe way to concurrently consume the stdout from an external process, using ProcessBuilder in Java 1.6?
Background: I need to invoke pbzip2 to unzip large files to stdout and to process each line as the file is decompressed (pbzip2 utilizes multiple CPUs, unlike other implementations).
The logical approach is to create a child thread to loop over the InputStream (i.e. stdout; don't you just love the naming?), as follows:
while((line = reader.readLine()) != null)
{
// do stuff
}
However, unzipping is slow, so what I really need is for the reader.readLine method to quietly wait for the next line(s) to become available, instead of exiting.
Is there a good way to do this?

You should be able to wrap your input stream with an InputStreamReader and BufferedReader. You can then call readLine() and that will block as required.
Note that you should have a corresponding reader for the stderr. You don't have to do anything with it, but you will need to consume the stderr stream, otherwise your spawned process may well block. See this answer for links etc.

You more or less have the solution yourself. You just create a new thread which reads the next line in a loop from the stream of your external process and processes that line.
readLine() will block and wait until an entire new line is available. If you're on a multicore/processor machine, your external process can happily continue unzipping while your thread processes a line. Atleast unzipping can continue until the OS pipes/buffers becomes full.
Just note that if your processing is slower than unzipping, you'll block the unzipping, and at this point it becomes a memory vs speed issue. e.g. you could create one thread that does nothing but read lines(so unzipping will not block), buffer them up in a queue in memory and a another thread - or even several, that consumes said queue.
readLine method to quietly wait for
the next line(s) to become available,
instead of exiting
nd that's exactly what readLine should do, it will just block until a whole line is available.

Yes.
I have written some code that kicks off a time consuming job (ffmpeg) in a Process (spawned by process builder), and it in turn kicks off my OutputStreamReaderclass that is an extention of Thread that consumes the stdio and does some magic with it.
The catch (for me) was redirecting the error stream. Here is my code snippet:
procbbuilder.redirectErrorStream(true);
proc = pb.start();
err = new MyOutputStreamReader(this, proc.getInputStream()); //extenion of thread
err.start();
int exitCode = proc.waitFor();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.