How to consume a process' stdout as a stream, without blocking?

How to consume a process' stdout as a stream, without blocking? - java

In Java (or clojure) I would like to spin up an external process and consume its stdout as a stream. Ideally, I would like to consume the process' output stream every time that the external process flushes it, but am not sure how that can be accomplished, and how it can be accomplished without blocking.
Going around consuming a Java ProcessPipeInputStream for a shelled out process (for example a Unix ProcessPipeInputStream), I find the inherited InputStream methods a bit low-level to work with, and am not sure if there's a non-blocking way to consume from the stream every time the producer-side flushes or otherwise in a non-blocking fashion.
Many code examples block on the output stream in an infinite loop, thereby hogging a thread for the listening. My hope is this blocking behavior can be avoided altogether.
Bottom line:
Is there a non-blocking way to be notified on an input stream, every time that the producing side of it flushes?

You need to create a separate Thread that would consume from such a stream allowing the rest of your program to do whatever is meant to be do doing in parallel.
class ProcessOutputReader implements Runnable {
private InputStream processOutput;
public ProcessOutputReader(final InputStream processOutput) {
this.processOutput = processOutput;
}
#Override
public void run() {
int nextByte;
while ((nextByte = processOutput.read()) != -1) {
// do whatever you need to do byte-by-byte.
processByte(nextByte);
}
}
}
class Main {
public static void main(final String[] args) {
final Process proc = ...;
final ProcessOutputReader reader = new ProcessOutputReader(proc.getInputStream());
final Thread processOutputReaderThread = new Thread(reader);
processOutputReaderThread.setDaemon(true); // allow the VM to terminate if this is the only thread still active.
processOutputReaderThread.start();
...
// if you wanna wait for the whole process output to be processed at some point you can do this:
try {
processOutputReaderThread.join();
} catch (final InterruptedException ex) {
// you need to decide how to recover from if your wait was interrupted.
}
}
}
If instead of processing byte-by-byte you want to deal with each flush as a single piece... I'm not sure there is 100% guaranteed to be able tocapture each process flush. After all the process own's IO framework software (Java, C, Python, etc.) may process the "flush" operation differently and perhaps what you end up receiving is multiple blocks of bytes for any given flush in that external process.
In any case you can attempt to do that by using the InputStream's available method like so:
#Override
public void run() {
int nextByte;
while ((nextByte = processOutput.read()) != -1) {
final int available = processOutput.available();
byte[] block = new byte[available + 1];
block[0] = nextByte;
final int actuallyAvailable = processOutput.read(block, 1, available);
if (actuallyAvailable < available) {
if (actuallyAvailable == -1) {
block = new byte[] { nextByte };
} else {
block = Arrays.copyOf(block, actuallyAvailable + 1);
}
}
// do whatever you need to do on that block now.
processBlock(block);
}
}
I'm not 100% sure of this but I think that one cannot trust that available will return a guaranteed lower bound of the number of bytes that you can retrieve without being block nor that the next read operation is going to return that number of available bytes if so requested; that is why the code above checks on the actual number of bytes read (actuallyAvailable).

Related

Aborting a library operation?

I'm using a 3rd party library, which has a method:
secureSend(int channel, byte[] data);
This method sends my binary data to the library, and if the data is larger than 64K, the method splits it to 64K chunks and sends them in order.
This method is marked as blocking, so it won't return immediately. Therefore is also advised to spawn a thread for each usage of this function:
new Thread(new Runnable() {
public void run() {
library.secureSend(channel, mydata);
}
}).start();
If I'm trying to send larger data (>1Mb), it will take about 30 seconds. This is fine.
However sometimes I need to interrupt the sending because there is a higher priority data to send.
Currently, If I spawn a new thread with calling secureSend it will have to wait, as library operates in FIFO-manner, ie.: it will finish first with previous sendings.
I decompiled the library's class files, and secureSend has the following pseudo algorithm:
public synchronized void secureSend(int c, byte[] data) {
try {
local_data = data;
HAS_MORE_DATA_TO_SEND = (local_data.length > 0)
while (HAS_MORE_DATA_TO_SEND) {
HAS_MORE_DATA_TO_SEND = sendChunk(...); //calculates offset, and length, and returns if still has more, operates with local_data!
}
} catch(IOException ex) {}
}
I've tried to interrupt the thread (I've stored it), but it didn't helped.
The library spends a lot of time in that while loop. However, it also fear of IOException.
My question: can I anyhow interrupt/kill/abort this function call? Maybe somehow throwing an IOException into the Thread? Is this somewhat possible?

Concurrency of RandomAccessFile in Java

I am creating a RandomAccessFile object to write to a file (on SSD) by multiple threads. Each thread tries to write a direct byte buffer at a specific position within the file and I ensure that the position at which a thread writes won't overlap with another thread:
file_.getChannel().write(buffer, position);
where file_ is an instance of RandomAccessFile and buffer is a direct byte buffer.
For the RandomAccessFile object, since I'm not using fallocate to allocate the file, and the file's length is changing, will this utilize the concurrency of the underlying media?
If it is not, is there any point in using the above function without calling fallocate while creating the file?

I made some testing with the following code :
public class App {
public static CountDownLatch latch;
public static void main(String[] args) throws InterruptedException, IOException {
File f = new File("test.txt");
RandomAccessFile file = new RandomAccessFile("test.txt", "rw");
latch = new CountDownLatch(5);
for (int i = 0; i < 5; i++) {
Thread t = new Thread(new WritingThread(i, (long) i * 10, file.getChannel()));
t.start();
}
latch.await();
file.close();
InputStream fileR = new FileInputStream("test.txt");
byte[] bytes = IOUtils.toByteArray(fileR);
for (int i = 0; i < bytes.length; i++) {
System.out.println(bytes[i]);
}
}
public static class WritingThread implements Runnable {
private long startPosition = 0;
private FileChannel channel;
private int id;
public WritingThread(int id, long startPosition, FileChannel channel) {
super();
this.startPosition = startPosition;
this.channel = channel;
this.id = id;
}
private ByteBuffer generateStaticBytes() {
ByteBuffer buf = ByteBuffer.allocate(10);
byte[] b = new byte[10];
for (int i = 0; i < 10; i++) {
b[i] = (byte) (this.id * 10 + i);
}
buf.put(b);
buf.flip();
return buf;
}
#Override
public void run() {
Random r = new Random();
while (r.nextInt(100) != 50) {
try {
System.out.println("Thread " + id + " is Writing");
this.channel.write(this.generateStaticBytes(), this.startPosition);
this.startPosition += 10;
} catch (IOException e) {
e.printStackTrace();
}
}
latch.countDown();
}
}
}
So far what I've seen:
Windows 7 (NTFS partition): Run linearly (aka one thread writes and when it is over, another one gets to run)
Linux Parrot 4.8.15 (ext4 partition) (Debian based distro), with Linux Kernel 4.8.0: Threads intermingle during the execution
Again as the documentation says:
File channels are safe for use by multiple concurrent threads. The
close method may be invoked at any time, as specified by the Channel
interface. Only one operation that involves the channel's position or
can change its file's size may be in progress at any given time;
attempts to initiate a second such operation while the first is still
in progress will block until the first operation completes. Other
operations, in particular those that take an explicit position, may
proceed concurrently; whether they in fact do so is dependent upon the
underlying implementation and is therefore unspecified.
So I'd suggest to first give it a try and see if the OS(es) you are going to deploy your code to (possibly the filesystem type) support parallel execution of a FileChannel.write call
Edit: As pointed out to, the above does not mean that threads can write concurrently to the file, it is actually the opposite as the write call behave according to the contract of a WritableByteChannel which clearly specifies that only one thread at a time can write to a given file:
If one thread initiates a write operation upon a channel then any
other thread that attempts to initiate another write operation will
block until the first operation is complete

As the documentation states and Adonis already mentions this, a write can only be performed by one thread at a time. You won't achieve performance gains through concurreny, moreover, you should only worry about performance if it's an actual issue, because writing concurrently to a disk may actually degrade your performance (probably less for SSDs than HDDs).
The underlying media is in most cases (SSD, HDD, Network) single-threaded - actually, there is no such thing as a thread on hardware level, threads are nothing but an abstraction.
In your case the media is an SSD.
While the SSD internally may write data to multiple modules concurrently ( they may reach a level of parallism where writes may be as fast and even outperform a read), the internal mapping datastructures are shared resource and therefore contended, especially on frequent updates such as concurrent writes. Nevertheless, the updates of this datastructure is quite fast and therefore nothing to worry about unless it becomes a problem.
But apart from this, those are just internals of the SSD. On the outside you communicate over a Serial ATA interface, thus one-byte-at-a-time (actually packets in a Frame Information Structure, FIS). On top of this is a OS/Filesystem that again has a probably contended datastructure and/or applies their own means of optimization such as write-behind-caching.
Further, as you know what your media is, you may optimize especially for that and SSDs are really fast when one single threads writes a large piece of data.
Thus, instead of using multiple threads for writing, you may create a large In-Memory Buffer (probably consider a memory-mapped file) and write concurrently into this buffer. The memory itself is not contended, as long as you ensure each thread access it's own address space of the buffer. Once all threads are done, you write this one buffer to the SSD (not needed if using memory-mapped file).
See also this good summary about developing for SSDs:
A Summary – What every programmer should know about solid-state drives
The point for doing pre-allocation (or to be more precise, file_.setLength(), which acutally maps to ftruncate) is that the resizing of the file may use extra-cycles and you may wan't to avoid that. But again, this may depend on the OS/Filesystem.

Reading Really big Files With Java

I am reading a 77MB file inside a Servlet, in future this will be 150GB. This file is not written using any kind of nio package thing, it is just written using BufferedWriter.
Now this is what I need to do.
Read the file line by line. Each line is a "hash code" of a text. Separate it into pieces of 3 chars (3 chars represents 1 word) It could be long, it could be short, I don't know.
After reading the line, convert it into real words. We have a Map of words and Hashes so we can find the words.
Up to now, I used BufferedReader to read the file. It is slow and not good for Huge files like 150GB. It took hours to complete the entire process even for this 77MB file. Because we can't keep the user waiting for hours, it should be within seconds. So, we decided to load the file into the memory. First we thought about loadng every single line into a LinkedList, so the memory coulkd save it. But you know, memory cannot save such a big amount. After a Big Search, I decided Mapping Files to the memory would be the answer. Memory is super faster than the Disk, so we could read the files super fast too.
Code:
public class MapRead {
public MapRead()
{
try {
File file = new File("E:/Amazon HashFile/Hash.txt");
FileChannel c = new RandomAccessFile(file,"r").getChannel();
MappedByteBuffer buffer = c.map(FileChannel.MapMode.READ_ONLY, 0,c.size()).load();
for(int i=0;i<buffer.limit();i++)
{
System.out.println((char)buffer.get());
}
System.out.println(buffer.isLoaded());
System.out.println(buffer.capacity());
} catch (IOException ex) {
Logger.getLogger(MapRead.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
But I could not see any "super fast" thing. And I need line by line. I have few questions to ask.
You read my description and you know what I need to do. I have done the first step for that, so is that correct?
The way I Map is correct? I mean, this is no difference than reading it in normal way. So does this hold the "entire" file in memory first? (lets say using a technique called Mapping) Then we have to write another code to access that memory?
How to read line by line, in super "fast" ? (If I have to load/map the entire file to the memory first for hours, then access it in super speed in seconds, I am totally fine with it too)
Reading files in Servlets is good ? (Because it is being accessed by number of people, and only one IO stream will be opened at once. In this case this servlet will be accessed by thousands at once)
Update
This is how my code look when I updated it with SO user Luiggi Mendoza's answer.
public class BigFileProcessor implements Runnable {
private final BlockingQueue<String> linesToProcess;
public BigFileProcessor (BlockingQueue<String> linesToProcess) {
this.linesToProcess = linesToProcess;
}
#Override
public void run() {
String line = "";
try {
while ( (line = linesToProcess.take()) != null) {
System.out.println(line); //This is not happening
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public class BigFileReader implements Runnable {
private final String fileName;
int a = 0;
private final BlockingQueue<String> linesRead;
public BigFileReader(String fileName, BlockingQueue<String> linesRead) {
this.fileName = fileName;
this.linesRead = linesRead;
}
#Override
public void run() {
try {
//Scanner do not work. I had to use BufferedReader
BufferedReader br = new BufferedReader(new FileReader(new File("E:/Amazon HashFile/Hash.txt")));
String str = "";
while((str=br.readLine())!=null)
{
// System.out.println(a);
a++;
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
public class BigFileWholeProcessor {
private static final int NUMBER_OF_THREADS = 2;
public void processFile(String fileName) {
BlockingQueue<String> fileContent = new LinkedBlockingQueue<String>();
BigFileReader bigFileReader = new BigFileReader(fileName, fileContent);
BigFileProcessor bigFileProcessor = new BigFileProcessor(fileContent);
ExecutorService es = Executors.newFixedThreadPool(NUMBER_OF_THREADS);
es.execute(bigFileReader);
es.execute(bigFileProcessor);
es.shutdown();
}
}
public class Main {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
BigFileWholeProcessor b = new BigFileWholeProcessor ();
b.processFile("E:/Amazon HashFile/Hash.txt");
}
}
I am trying to print the file in BigFileProcessor. What I understood is this;
User enter file name
That file get read by BigFileReader, line by line
After each line, the BigFileProcessor get called. Which means, assume BigFileReader read the first line. Now the BigFileProcessor is called. Now once the BigFileProcessor completes the processing for that line, now the BigFileReader reads the line 2. Then again the BigFileProcessor get called for that line, and so on.
May be my understanding about this code is incorrect. How should I process the line anyway?

I would suggest using multi thread here:
One thread will take care to read every line of the file and insert it into a BlockingQueue in order to be processed.
Another thread(s) will take the elements from this queue and process them.
To implement this multi thread work, it would be better using ExecutorService interface and passing Runnable instances, each should implement each task. Remember to have only a single task to read the file.
You could also manage a way to stop reading if the queue has a specific size e.g. if the queue has 10000 elements then wait until its size is down to 8000, then continue reading and filling the queue.
Reading files in Servlets is good ?
I would recommend never do heavy work in servlet. Instead, fire an asynchronous task e.g. via JMS call, then in this external agent you will process your file.
A brief sample of the above explanation to solve the problem:
public class BigFileReader implements Runnable {
private final String fileName;
private final BlockingQueue<String> linesRead;
public BigFileReader(String fileName, BlockingQueue<String> linesRead) {
this.fileName = fileName;
this.linesRead = linesRead;
}
#Override
public void run() {
//since it is a sample, I avoid the manage of how many lines you have read
//and that stuff, but it should not be complicated to accomplish
Scanner scanner = new Scanner(new File(fileName));
while (scanner.hasNext()) {
try {
linesRead.put(scanner.nextLine());
} catch (InterruptedException ie) {
//handle the exception...
ie.printStackTrace();
}
}
scanner.close();
}
}
public class BigFileProcessor implements Runnable {
private final BlockingQueue<String> linesToProcess;
public BigFileProcessor (BlockingQueue<String> linesToProcess) {
this.linesToProcess = linesToProcess;
}
#Override
public void run() {
String line = "";
try {
while ( (line = linesToProcess.take()) != null) {
//do what you want/need to process this line...
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public class BigFileWholeProcessor {
private static final int NUMBER_OF_THREADS = 2;
public void processFile(String fileName) {
BlockingQueue<String> fileContent = new LinkedBlockingQueue<String>();
BigFileReader bigFileReader = new BigFileReader(fileName, fileContent);
BigFileProcessor bigFileProcessor = new BigFileProcessor(fileContent);
ExecutorService es = Executors.newFixedThreadPool(NUMBER_OF_THREADS);
es.execute(bigFileReader);
es.execute(bigFileProcessor);
es.shutdown();
}
}

NIO won't help you here. BufferedReader is not slow. If you're I/O bound, you're I/O bound -- get faster I/O.
Mapping the file in to memory can help, but only if you're actually using the memory in place, rather than just copying all of the data out of the big byte array that you get back. The primary advantage of mapping the file is that it keeps the data out of the java heap, and away from the garbage collector.
Your best performance will come from working on the data in place, and not copying it in to the heap if you can.
Some of your performance may be impacted by the object creation. For example, if you were trying to load your data in to the LinkedList, you're creating (likely) millions of nodes for the List itself, plus the object surrounding your data (even if they're just strings).
Creating Strings based on your memory mapped array can be quite efficient, as the String will simply wrap the data, not copy it. But you'll have to be UTF aware if you're working with something other than ASCII (as bytes are not characters in Java).
Also if you're loading in large things, with lots of objects, ensure that you have free space in your heap for them. And by free space, I mean actual room. You can have a 500MB heap, as specified by -Xmx, but the ACTUAL heap will not be that large initially, it will grow to that limit.
Assuming you have sufficient memory in the first place, you can do this via -Xms, which will pre-allocate the heap to a desired size, or you can simply do a quick byte[] buf = new byte[400 * 1024 * 1024], to make a huge allocation, force the GC, and stretch the heap.
What you don't want to be doing is allocating a million objects and have the VM GC every 10000 or so as it grows. Pre-allocating other data structures is also helpful (notably ArrayLists, LinkedLists not so much).

Divide the file into smaller parts. For this you'll need have access to seekable read so you can fast-forward to other parts of file.
For each part, spawn multiple worker threads, each with its own copy of the hash lookup table. Let completed threads join a collector thread, which will write completed chunks in order and signal the processing completion.
It will be better to stream file chunks rather than loading all of them in memory.

The lightest way to ignore/capture output of sub-process start from Java

Sub-process in java are very expensive. Each process is usually support by a NUMBERS of threads.
a thread to host the process (by JDK 1.6 on linux)
a thread to read to read/print/ignore the input stream
another thread to read/print/ignore the error stream
a more thread to do timeout and monitoring and kill sub-process by your application
the business logic thread, holduntil for the sub-process return.
The number of thread get out of control if you have a pool of thread focking sub-process to do tasks. As a result, there may be more then a double of concurrent thread at peak.
In many cases, we fork a process just because nobody able to write JNI to call native function missing from the JDK (e.g. chmod, ln, ls), trigger a shell script, etc, etc.
Some thread can be saved, but some thread should run to prevent the worst case (buffer overrun on inputstream).
How can I reduce the overhead of creating sub-process in Java to the minimum?
I am thinking of NIO the stream handles, combine and share threads, lower background thread priority, re-use of process. But I have no idea are they possible or not.

JDK7 will address this issue and provide new API redirectOutput/redirectError in ProcessBuilder to redirect stdout/stderr.
However the bad news is that they forget to provide a "Redirect.toNull" what mean you will want to do something like "if(*nix)/dev/null elsif(win)nil"
Unbeliable that NIO/2 api for Process still missing; but I think redirectOutput+NIO2's AsynchronizeChannel will help.

I have created an open source library that allows non-blocking I/O between java and your child processes. The library provides an event-driven callback model. It depends on the JNA library to use platform-specific native APIs, such as epoll on Linux, kqueue/kevent on MacOS X, or IO Completion Ports on Windows.
The project is called NuProcess and can be found here:
https://github.com/brettwooldridge/NuProcess

To answer your topic (I don't understand description), I assume you mean shell subprocess output, check these SO issues:
platform-independent /dev/null output sink for Java
Is there a Null OutputStream in Java?
Or you can close stdout and stderr for the command being executed under Unix:
command > /dev/null 2>&1

You don't need any extra threads to run a subprocess in java, although handling timeouts does complicate things a bit:
import java.io.IOException;
import java.io.InputStream;
public class ProcessTest {
public static void main(String[] args) throws IOException {
long timeout = 10;
ProcessBuilder builder = new ProcessBuilder("cmd", "a.cmd");
builder.redirectErrorStream(true); // so we can ignore the error stream
Process process = builder.start();
InputStream out = process.getInputStream();
long endTime = System.currentTimeMillis() + timeout;
while (isAlive(process) && System.currentTimeMillis() < endTime) {
int n = out.available();
if (n > 0) {
// out.skip(n);
byte[] b = new byte[n];
out.read(b, 0, n);
System.out.println(new String(b, 0, n));
}
try {
Thread.sleep(10);
}
catch (InterruptedException e) {
}
}
if (isAlive(process)) {
process.destroy();
System.out.println("timeout");
}
else {
System.out.println(process.exitValue());
}
}
public static boolean isAlive(Process p) {
try {
p.exitValue();
return false;
}
catch (IllegalThreadStateException e) {
return true;
}
}
}
You could also play with reflection as in Is it possible to read from a InputStream with a timeout? to get a NIO FileChannel from Process.getInputStream(), but then you'd have to worry about different JDK versions in exchange for getting rid of the polling.

nio won't work, since when you create a process you can only access the OutputStream, not a Channel.
You can have 1 thread read multiple InputStreams.
Something like,
import java.io.InputStream;
import java.util.List;
import java.util.concurrent.CopyOnWriteArrayList;
class MultiSwallower implements Runnable {
private List<InputStream> streams = new CopyOnWriteArrayList<InputStream>();
public void addStream(InputStream s) {
streams.add(s);
}
public void removeStream(InputStream s) {
streams.remove(s);
}
public void run() {
byte[] buffer = new byte[1024];
while(true) {
boolean sleep = true;
for(InputStream s : streams) {
//available tells you how many bytes you can read without blocking
while(s.available() > 0) {
//do what you want with the output here
s.read(buffer, 0, Math.min(s.available(), 1024));
sleep = false;
}
}
if(sleep) {
//if nothing is available now
//sleep
Thread.sleep(50);
}
}
}
}
You can pair the above class with another class that waits for the Processes to complete, something like,
class ProcessWatcher implements Runnable {
private MultiSwallower swallower = new MultiSwallower();
private ConcurrentMap<Process, InputStream> proceses = new ConcurrentHashMap<Process, InputStream>();
public ProcessWatcher() {
}
public void startThreads() {
new Thread(this).start();
new Thread(swallower).start();
}
public void addProcess(Process p) {
swallower.add(p.getInputStream());
proceses.put(p, p.getInputStream());
}
#Override
public void run() {
while(true) {
for(Process p : proceses.keySet()) {
try {
//will throw if the process has not completed
p.exitValue();
InputStream s = proceses.remove(p);
swallower.removeStream(s);
} catch(IllegalThreadStateException e) {
//process not completed, ignore
}
}
//wait before checking again
Thread.sleep(50);
}
}
}
As well, you don't need to have 1 thread for each error stream if you use ProcessBuilder.redirectErrorStream(true), and you don't need 1 thread for reading the process input stream, you can simply ignore the input stream if you are not writing anything to it.

Since you mention, chmod, ln, ls, and shell scripts, it sounds like you're trying to use Java for shell programming. If so, you might want to consider a different language that is better suited to that task such as Python, Perl, or Bash. Although it's certainly possible to create subprocesses in Java, interact with them via their standard input/output/error streams, etc., I think you will find a scripting language makes this kind of code less verbose and easier to maintain than Java.

Have you considered using a single long-running helper process written in another language (maybe a shell script?) that will consume commands from java via stdin and perform file operations in response?

Java: merging InputStreams

My goal is to create (or use existing) an InputStream implementation (say, MergeInputStream) that will try to read from a multiple InputStreams and return the first result. After that it will release lock and stop reading from all InputStreams until next mergeInputStream.read() call. I was quite surprised that I didn't found any such tool. The thing is: all of the source InputStreams are not quite finite (not a file, for example, but a System.in, socket or such), so I cannot use SequenceInputReader. I understand that this will probably require some multi-thread mechanism, but I have absolutely no idea how to do it. I tried to google it but with no result.

The problem of reading input from multiple sources and serializing them into one stream is preferably solved using SelectableChannel and Selector. This however requires that all sources are able to provide a selectable channel. This may or may not be the case.
If selectable channels are not available, you could choose to solve it with a single thread by letting the read-implementation do the following: For each input stream is, check if is.available() > 0, and if so return is.read(). Repeat this procedure until some input stream has data available.
This method however, has two major draw-backs:
Not all implementations of InputStream implements available() in a way such that it returns 0 if and only if read() will block. The result is, naturally, that data may not be read from this stream, even though is.read() would return a value. Whether or not this is to be considered as a bug is questionable, as the documentation merely states that it should return an "estimate" of the number of bytes available.
It uses a so called "busy-loop", which basically means that you'll either need to put a sleep in the loop (which results in a reading latency) or hog the CPU unnecessarily.
Your third option is to deal with the blocking reads by spawning one thread for each input stream. This however will require careful synchronization and possibly some overhead if you have a very high number of input streams to read from. The code below is a first attempt to solve it. I'm by no means certain that it is sufficiently synchronized, or that it manages the threads in the best possible way.
import java.io.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;
public class MergedInputStream extends InputStream {
AtomicInteger openStreamCount;
BlockingQueue<Integer> buf = new ArrayBlockingQueue<Integer>(1);
InputStream[] sources;
public MergedInputStream(InputStream... sources) {
this.sources = sources;
openStreamCount = new AtomicInteger(sources.length);
for (int i = 0; i < sources.length; i++)
new ReadThread(i).start();
}
public void close() throws IOException {
String ex = "";
for (InputStream is : sources) {
try {
is.close();
} catch (IOException e) {
ex += e.getMessage() + " ";
}
}
if (ex.length() > 0)
throw new IOException(ex.substring(0, ex.length() - 1));
}
public int read() throws IOException {
if (openStreamCount.get() == 0)
return -1;
try {
return buf.take();
} catch (InterruptedException e) {
throw new IOException(e);
}
}
private class ReadThread extends Thread {
private final int src;
public ReadThread(int src) {
this.src = src;
}
public void run() {
try {
int data;
while ((data = sources[src].read()) != -1)
buf.put(data);
} catch (IOException ioex) {
} catch (InterruptedException e) {
}
openStreamCount.decrementAndGet();
}
}
}

I can think of three ways to do this:
Use non-blocking I/O (API documentation). This is the cleanest solution.
Multiple threads, one for each merged input stream. The threads would block on the read() method of the associated input stream, then notify the MergeInputStream object when data becomes available. The read() method in MergedInputStream would wait for this notification, then read data from the corresponding stream.
Single thread with a busy loop. Your MergeInputStream.read() methods would need to loop checking the available() method of every merged input stream. If no data is available, sleep a few ms. Repeat until data becomes available in one of the merged input streams.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.