Java: merging InputStreams - java

My goal is to create (or use existing) an InputStream implementation (say, MergeInputStream) that will try to read from a multiple InputStreams and return the first result. After that it will release lock and stop reading from all InputStreams until next mergeInputStream.read() call. I was quite surprised that I didn't found any such tool. The thing is: all of the source InputStreams are not quite finite (not a file, for example, but a System.in, socket or such), so I cannot use SequenceInputReader. I understand that this will probably require some multi-thread mechanism, but I have absolutely no idea how to do it. I tried to google it but with no result.

The problem of reading input from multiple sources and serializing them into one stream is preferably solved using SelectableChannel and Selector. This however requires that all sources are able to provide a selectable channel. This may or may not be the case.
If selectable channels are not available, you could choose to solve it with a single thread by letting the read-implementation do the following: For each input stream is, check if is.available() > 0, and if so return is.read(). Repeat this procedure until some input stream has data available.
This method however, has two major draw-backs:
Not all implementations of InputStream implements available() in a way such that it returns 0 if and only if read() will block. The result is, naturally, that data may not be read from this stream, even though is.read() would return a value. Whether or not this is to be considered as a bug is questionable, as the documentation merely states that it should return an "estimate" of the number of bytes available.
It uses a so called "busy-loop", which basically means that you'll either need to put a sleep in the loop (which results in a reading latency) or hog the CPU unnecessarily.
Your third option is to deal with the blocking reads by spawning one thread for each input stream. This however will require careful synchronization and possibly some overhead if you have a very high number of input streams to read from. The code below is a first attempt to solve it. I'm by no means certain that it is sufficiently synchronized, or that it manages the threads in the best possible way.
import java.io.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;
public class MergedInputStream extends InputStream {
AtomicInteger openStreamCount;
BlockingQueue<Integer> buf = new ArrayBlockingQueue<Integer>(1);
InputStream[] sources;
public MergedInputStream(InputStream... sources) {
this.sources = sources;
openStreamCount = new AtomicInteger(sources.length);
for (int i = 0; i < sources.length; i++)
new ReadThread(i).start();
}
public void close() throws IOException {
String ex = "";
for (InputStream is : sources) {
try {
is.close();
} catch (IOException e) {
ex += e.getMessage() + " ";
}
}
if (ex.length() > 0)
throw new IOException(ex.substring(0, ex.length() - 1));
}
public int read() throws IOException {
if (openStreamCount.get() == 0)
return -1;
try {
return buf.take();
} catch (InterruptedException e) {
throw new IOException(e);
}
}
private class ReadThread extends Thread {
private final int src;
public ReadThread(int src) {
this.src = src;
}
public void run() {
try {
int data;
while ((data = sources[src].read()) != -1)
buf.put(data);
} catch (IOException ioex) {
} catch (InterruptedException e) {
}
openStreamCount.decrementAndGet();
}
}
}

I can think of three ways to do this:
Use non-blocking I/O (API documentation). This is the cleanest solution.
Multiple threads, one for each merged input stream. The threads would block on the read() method of the associated input stream, then notify the MergeInputStream object when data becomes available. The read() method in MergedInputStream would wait for this notification, then read data from the corresponding stream.
Single thread with a busy loop. Your MergeInputStream.read() methods would need to loop checking the available() method of every merged input stream. If no data is available, sleep a few ms. Repeat until data becomes available in one of the merged input streams.

Related

ReadWriteLock for concurrency access to file

I have a class which implements read and write operation to file in a concurrent environment. I know BufferedInputStream and BufferedWriter are synchronized but in my case read and write operations can be used simultaneously. Now I use ReentrantReadWriteLock but I'm not confident about a solution correctly.
public class FileSource {
private final File file;
private final ReadWriteLock lock;
public FileWrapper(final File file) {
if (Objects.isNull(file)) {
throw new IllegalArgumentException("File can't be null!");
}
this.file = file;
this.lock = new ReentrantReadWriteLock();
}
public String getContent() {
final Lock readLock = lock.readLock();
readLock.lock();
final StringBuilder sb = new StringBuilder();
try (final BufferedInputStream in =
new BufferedInputStream(
new FileInputStream(file))) {
int data;
while ((data = in.read()) > 0) {
sb.append(data);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
readLock.unlock();
}
return sb.toString();
}
public void saveContent(final String content) {
final Lock writeLock = lock.writeLock();
writeLock.lock();
try (BufferedWriter out =
new BufferedWriter(
new FileWriter(file))) {
out.write(content);
} catch (IOException e) {
e.printStackTrace();
} finally {
writeLock.unlock();
}
}
}
ReentrantReadWriteLock is the correct solution in this case or I need to use ReentrantLock or something else? (with a reason)
This discussion not about class design like File as a state or send File directly in the method or using nio package or ext. It shouldn't be a utility class. Method signatures and File as a field must stay without changes. It is about potential concurrency problems with File and InputStream\OutputStream.
RRWL is fine here. Of course, if some code makes a new FileWrapper("/foo/bar.txt") and then some other code also makes a separate new FileWrapper("/foo/bar.txt") those two wrappers will be falling all over themselves and will cause things to go pearshaped; I assume you have some external mechanism to ensure this cannot happen. If you don't, some take on ConcurrentHashMap and its concurrency methods (such as computeIfAbsent; don't use the plain jane get/put for these) can help you out.
Note that your exception handling is bad. Exception messages should not end in punctuation (think about it: Without this rule, 80% of all exception messages would end in an exclamation mark and will make log files a fun exercise), and in general, if you ever write catch (Exception e) { e.printStackTrace(); }, you go to that special place reserved for people who talk in movie theaters, and people who write that.
I'd say a method called saveContent is justified in throwing some checked exception; after all, it's rather obvious that can fail, and code calling it can feasibly be expected to perhaps take some action if it does.
If you just can't get there, the proper ¯\(ツ)/¯ I dunno catch block handler is: throw new RuntimeException("uncaught", e);. Not e.printStackTrace();. The latter logs to an uncontrollable place, shaves off useful information, and crucially, keeps on running as if nothing is wrong, silently ignoring the fact that a save call just failed, whereas the former preserves all and will in fact abort code execution. It makes it hard to recover, but at least it's easier than e.printStackTrace, and if you wanted it to be easier, than make a special exception. Or just throw that IOException unmolested (you get way shorter code to boot!).
Another insiduous bug in this code is that it uses 'platform default charset encoding' to read your file, which is very rarely what you want.
The new Files API can also read the entire file in one go, saves you a ton of code on that read op. As you upgrade your code, you get the benefit of the Files API's unique take on charset encodings: Unlike most other places in the java libraries, java.nio.file.Files will assume you meant to use UTF-8 encoding if you fail to specify (instead of 'platform default', i.e. 'the thing that you cannot test for that will blow up in production and waste a week of your time chasing after it').

How to consume a process' stdout as a stream, without blocking?

In Java (or clojure) I would like to spin up an external process and consume its stdout as a stream. Ideally, I would like to consume the process' output stream every time that the external process flushes it, but am not sure how that can be accomplished, and how it can be accomplished without blocking.
Going around consuming a Java ProcessPipeInputStream for a shelled out process (for example a Unix ProcessPipeInputStream), I find the inherited InputStream methods a bit low-level to work with, and am not sure if there's a non-blocking way to consume from the stream every time the producer-side flushes or otherwise in a non-blocking fashion.
Many code examples block on the output stream in an infinite loop, thereby hogging a thread for the listening. My hope is this blocking behavior can be avoided altogether.
Bottom line:
Is there a non-blocking way to be notified on an input stream, every time that the producing side of it flushes?
You need to create a separate Thread that would consume from such a stream allowing the rest of your program to do whatever is meant to be do doing in parallel.
class ProcessOutputReader implements Runnable {
private InputStream processOutput;
public ProcessOutputReader(final InputStream processOutput) {
this.processOutput = processOutput;
}
#Override
public void run() {
int nextByte;
while ((nextByte = processOutput.read()) != -1) {
// do whatever you need to do byte-by-byte.
processByte(nextByte);
}
}
}
class Main {
public static void main(final String[] args) {
final Process proc = ...;
final ProcessOutputReader reader = new ProcessOutputReader(proc.getInputStream());
final Thread processOutputReaderThread = new Thread(reader);
processOutputReaderThread.setDaemon(true); // allow the VM to terminate if this is the only thread still active.
processOutputReaderThread.start();
...
// if you wanna wait for the whole process output to be processed at some point you can do this:
try {
processOutputReaderThread.join();
} catch (final InterruptedException ex) {
// you need to decide how to recover from if your wait was interrupted.
}
}
}
If instead of processing byte-by-byte you want to deal with each flush as a single piece... I'm not sure there is 100% guaranteed to be able tocapture each process flush. After all the process own's IO framework software (Java, C, Python, etc.) may process the "flush" operation differently and perhaps what you end up receiving is multiple blocks of bytes for any given flush in that external process.
In any case you can attempt to do that by using the InputStream's available method like so:
#Override
public void run() {
int nextByte;
while ((nextByte = processOutput.read()) != -1) {
final int available = processOutput.available();
byte[] block = new byte[available + 1];
block[0] = nextByte;
final int actuallyAvailable = processOutput.read(block, 1, available);
if (actuallyAvailable < available) {
if (actuallyAvailable == -1) {
block = new byte[] { nextByte };
} else {
block = Arrays.copyOf(block, actuallyAvailable + 1);
}
}
// do whatever you need to do on that block now.
processBlock(block);
}
}
I'm not 100% sure of this but I think that one cannot trust that available will return a guaranteed lower bound of the number of bytes that you can retrieve without being block nor that the next read operation is going to return that number of available bytes if so requested; that is why the code above checks on the actual number of bytes read (actuallyAvailable).

Reading Really big Files With Java

I am reading a 77MB file inside a Servlet, in future this will be 150GB. This file is not written using any kind of nio package thing, it is just written using BufferedWriter.
Now this is what I need to do.
Read the file line by line. Each line is a "hash code" of a text. Separate it into pieces of 3 chars (3 chars represents 1 word) It could be long, it could be short, I don't know.
After reading the line, convert it into real words. We have a Map of words and Hashes so we can find the words.
Up to now, I used BufferedReader to read the file. It is slow and not good for Huge files like 150GB. It took hours to complete the entire process even for this 77MB file. Because we can't keep the user waiting for hours, it should be within seconds. So, we decided to load the file into the memory. First we thought about loadng every single line into a LinkedList, so the memory coulkd save it. But you know, memory cannot save such a big amount. After a Big Search, I decided Mapping Files to the memory would be the answer. Memory is super faster than the Disk, so we could read the files super fast too.
Code:
public class MapRead {
public MapRead()
{
try {
File file = new File("E:/Amazon HashFile/Hash.txt");
FileChannel c = new RandomAccessFile(file,"r").getChannel();
MappedByteBuffer buffer = c.map(FileChannel.MapMode.READ_ONLY, 0,c.size()).load();
for(int i=0;i<buffer.limit();i++)
{
System.out.println((char)buffer.get());
}
System.out.println(buffer.isLoaded());
System.out.println(buffer.capacity());
} catch (IOException ex) {
Logger.getLogger(MapRead.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
But I could not see any "super fast" thing. And I need line by line. I have few questions to ask.
You read my description and you know what I need to do. I have done the first step for that, so is that correct?
The way I Map is correct? I mean, this is no difference than reading it in normal way. So does this hold the "entire" file in memory first? (lets say using a technique called Mapping) Then we have to write another code to access that memory?
How to read line by line, in super "fast" ? (If I have to load/map the entire file to the memory first for hours, then access it in super speed in seconds, I am totally fine with it too)
Reading files in Servlets is good ? (Because it is being accessed by number of people, and only one IO stream will be opened at once. In this case this servlet will be accessed by thousands at once)
Update
This is how my code look when I updated it with SO user Luiggi Mendoza's answer.
public class BigFileProcessor implements Runnable {
private final BlockingQueue<String> linesToProcess;
public BigFileProcessor (BlockingQueue<String> linesToProcess) {
this.linesToProcess = linesToProcess;
}
#Override
public void run() {
String line = "";
try {
while ( (line = linesToProcess.take()) != null) {
System.out.println(line); //This is not happening
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public class BigFileReader implements Runnable {
private final String fileName;
int a = 0;
private final BlockingQueue<String> linesRead;
public BigFileReader(String fileName, BlockingQueue<String> linesRead) {
this.fileName = fileName;
this.linesRead = linesRead;
}
#Override
public void run() {
try {
//Scanner do not work. I had to use BufferedReader
BufferedReader br = new BufferedReader(new FileReader(new File("E:/Amazon HashFile/Hash.txt")));
String str = "";
while((str=br.readLine())!=null)
{
// System.out.println(a);
a++;
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
public class BigFileWholeProcessor {
private static final int NUMBER_OF_THREADS = 2;
public void processFile(String fileName) {
BlockingQueue<String> fileContent = new LinkedBlockingQueue<String>();
BigFileReader bigFileReader = new BigFileReader(fileName, fileContent);
BigFileProcessor bigFileProcessor = new BigFileProcessor(fileContent);
ExecutorService es = Executors.newFixedThreadPool(NUMBER_OF_THREADS);
es.execute(bigFileReader);
es.execute(bigFileProcessor);
es.shutdown();
}
}
public class Main {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
BigFileWholeProcessor b = new BigFileWholeProcessor ();
b.processFile("E:/Amazon HashFile/Hash.txt");
}
}
I am trying to print the file in BigFileProcessor. What I understood is this;
User enter file name
That file get read by BigFileReader, line by line
After each line, the BigFileProcessor get called. Which means, assume BigFileReader read the first line. Now the BigFileProcessor is called. Now once the BigFileProcessor completes the processing for that line, now the BigFileReader reads the line 2. Then again the BigFileProcessor get called for that line, and so on.
May be my understanding about this code is incorrect. How should I process the line anyway?
I would suggest using multi thread here:
One thread will take care to read every line of the file and insert it into a BlockingQueue in order to be processed.
Another thread(s) will take the elements from this queue and process them.
To implement this multi thread work, it would be better using ExecutorService interface and passing Runnable instances, each should implement each task. Remember to have only a single task to read the file.
You could also manage a way to stop reading if the queue has a specific size e.g. if the queue has 10000 elements then wait until its size is down to 8000, then continue reading and filling the queue.
Reading files in Servlets is good ?
I would recommend never do heavy work in servlet. Instead, fire an asynchronous task e.g. via JMS call, then in this external agent you will process your file.
A brief sample of the above explanation to solve the problem:
public class BigFileReader implements Runnable {
private final String fileName;
private final BlockingQueue<String> linesRead;
public BigFileReader(String fileName, BlockingQueue<String> linesRead) {
this.fileName = fileName;
this.linesRead = linesRead;
}
#Override
public void run() {
//since it is a sample, I avoid the manage of how many lines you have read
//and that stuff, but it should not be complicated to accomplish
Scanner scanner = new Scanner(new File(fileName));
while (scanner.hasNext()) {
try {
linesRead.put(scanner.nextLine());
} catch (InterruptedException ie) {
//handle the exception...
ie.printStackTrace();
}
}
scanner.close();
}
}
public class BigFileProcessor implements Runnable {
private final BlockingQueue<String> linesToProcess;
public BigFileProcessor (BlockingQueue<String> linesToProcess) {
this.linesToProcess = linesToProcess;
}
#Override
public void run() {
String line = "";
try {
while ( (line = linesToProcess.take()) != null) {
//do what you want/need to process this line...
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public class BigFileWholeProcessor {
private static final int NUMBER_OF_THREADS = 2;
public void processFile(String fileName) {
BlockingQueue<String> fileContent = new LinkedBlockingQueue<String>();
BigFileReader bigFileReader = new BigFileReader(fileName, fileContent);
BigFileProcessor bigFileProcessor = new BigFileProcessor(fileContent);
ExecutorService es = Executors.newFixedThreadPool(NUMBER_OF_THREADS);
es.execute(bigFileReader);
es.execute(bigFileProcessor);
es.shutdown();
}
}
NIO won't help you here. BufferedReader is not slow. If you're I/O bound, you're I/O bound -- get faster I/O.
Mapping the file in to memory can help, but only if you're actually using the memory in place, rather than just copying all of the data out of the big byte array that you get back. The primary advantage of mapping the file is that it keeps the data out of the java heap, and away from the garbage collector.
Your best performance will come from working on the data in place, and not copying it in to the heap if you can.
Some of your performance may be impacted by the object creation. For example, if you were trying to load your data in to the LinkedList, you're creating (likely) millions of nodes for the List itself, plus the object surrounding your data (even if they're just strings).
Creating Strings based on your memory mapped array can be quite efficient, as the String will simply wrap the data, not copy it. But you'll have to be UTF aware if you're working with something other than ASCII (as bytes are not characters in Java).
Also if you're loading in large things, with lots of objects, ensure that you have free space in your heap for them. And by free space, I mean actual room. You can have a 500MB heap, as specified by -Xmx, but the ACTUAL heap will not be that large initially, it will grow to that limit.
Assuming you have sufficient memory in the first place, you can do this via -Xms, which will pre-allocate the heap to a desired size, or you can simply do a quick byte[] buf = new byte[400 * 1024 * 1024], to make a huge allocation, force the GC, and stretch the heap.
What you don't want to be doing is allocating a million objects and have the VM GC every 10000 or so as it grows. Pre-allocating other data structures is also helpful (notably ArrayLists, LinkedLists not so much).
Divide the file into smaller parts. For this you'll need have access to seekable read so you can fast-forward to other parts of file.
For each part, spawn multiple worker threads, each with its own copy of the hash lookup table. Let completed threads join a collector thread, which will write completed chunks in order and signal the processing completion.
It will be better to stream file chunks rather than loading all of them in memory.

Most efficient way to create InputStream from OutputStream

This page: http://blog.ostermiller.org/convert-java-outputstream-inputstream
describes how to create an InputStream from OutputStream:
new ByteArrayInputStream(out.toByteArray())
Other alternatives are to use PipedStreams and new threads which is cumbersome.
I do not like the idea of copying many megabytes to new in memory byte array.
Is there a library that does this more efficiently?
EDIT:
By advice from Laurence Gonsalves, i tried PipedStreams and it turned out they are not that hard to deal with.
Here's the sample code in clojure:
(defn #^PipedInputStream create-pdf-stream [pdf-info]
(let [in-stream (new PipedInputStream)
out-stream (PipedOutputStream. in-stream)]
(.start (Thread. #(;Here you write into out-stream)))
in-stream))
If you don't want to copy all of the data into an in-memory buffer all at once then you're going to have to have your code that uses the OutputStream (the producer) and the code that uses the InputStream (the consumer) either alternate in the same thread, or operate concurrently in two separate threads. Having them operate in the same thread is probably much more complicated that using two separate threads, is much more error prone (you'll need to make sure that the consumer never blocks waiting for input, or you'll effectively deadlock) and would necessitate having the producer and consumer running in the same loop which seems way too tightly coupled.
So use a second thread. It really isn't that complicated. The page you linked to had reasonable example. Here's a somewhat modernized version, which also closes the streams:
try (PipedInputStream in = new PipedInputStream()) {
new Thread(() -> {
try (PipedOutputStream out = new PipedOutputStream(in)) {
writeDataToOutputStream(out);
} catch (IOException iox) {
// handle IOExceptions
}
}).start();
processDataFromInputStream(in);
}
There is another Open Source library called EasyStream that deals with pipes and thread in a transparent way.
That isn't really complicated if everything goes well. Problems arise when (looking at Laurence Gonsalves example)
class1.putDataOnOutputStream(out);
Throws an exception.
In that example the thread simply completes and the exception is lost, while the outer InputStream might be truncated.
Easystream deals with exception propagation and other nasty problems I've been debugging for about one year. (I'm the mantainer of the library: obviously my solution is the best one ;) )
Here is an example on how to use it:
final InputStreamFromOutputStream<String> isos = new InputStreamFromOutputStream<String>(){
#Override
public String produce(final OutputStream dataSink) throws Exception {
/*
* call your application function who produces the data here
* WARNING: we're in another thread here, so this method shouldn't
* write any class field or make assumptions on the state of the outer class.
*/
return produceMydata(dataSink)
}
};
There is also a nice introduction where all other ways to convert an OutputStream into an InputStream are explained. Worth to have a look.
A simple solution that avoids copying the buffer is to create a special-purpose ByteArrayOutputStream:
public class CopyStream extends ByteArrayOutputStream {
public CopyStream(int size) { super(size); }
/**
* Get an input stream based on the contents of this output stream.
* Do not use the output stream after calling this method.
* #return an {#link InputStream}
*/
public InputStream toInputStream() {
return new ByteArrayInputStream(this.buf, 0, this.count);
}
}
Write to the above output stream as needed, then call toInputStream to obtain an input stream over the underlying buffer. Consider the output stream as closed after that point.
I think the best way to connect InputStream to an OutputStream is through piped streams - available in java.io package, as follow:
// 1- Define stream buffer
private static final int PIPE_BUFFER = 2048;
// 2 -Create PipedInputStream with the buffer
public PipedInputStream inPipe = new PipedInputStream(PIPE_BUFFER);
// 3 -Create PipedOutputStream and bound it to the PipedInputStream object
public PipedOutputStream outPipe = new PipedOutputStream(inPipe);
// 4- PipedOutputStream is an OutputStream, So you can write data to it
// in any way suitable to your data. for example:
while (Condition) {
outPipe.write(mByte);
}
/*Congratulations:D. Step 4 will write data to the PipedOutputStream
which is bound to the PipedInputStream so after filling the buffer
this data is available in the inPipe Object. Start reading it to
clear the buffer to be filled again by the PipedInputStream object.*/
In my opinion there are two main advantages for this code:
1 - There is no additional consumption of memory except for the buffer.
2 - You don't need to handle data queuing manually
I usually try to avoid creating a separate thread because of the increased chance of deadlock, the increased difficulty of understanding the code, and the problems of dealing with exceptions.
Here's my proposed solution: a ProducerInputStream that creates content in chunks by repeated calls to produceChunk():
public abstract class ProducerInputStream extends InputStream {
private ByteArrayInputStream bin = new ByteArrayInputStream(new byte[0]);
private ByteArrayOutputStream bout = new ByteArrayOutputStream();
#Override
public int read() throws IOException {
int result = bin.read();
while ((result == -1) && newChunk()) {
result = bin.read();
}
return result;
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
int result = bin.read(b, off, len);
while ((result == -1) && newChunk()) {
result = bin.read(b, off, len);
}
return result;
}
private boolean newChunk() {
bout.reset();
produceChunk(bout);
bin = new ByteArrayInputStream(bout.toByteArray());
return (bout.size() > 0);
}
public abstract void produceChunk(OutputStream out);
}

RAII in Java... is resource disposal always so ugly?

I just played with Java file system API, and came down with the following function, used to copy binary files. The original source came from the Web, but I added try/catch/finally clauses to be sure that, should something wrong happen, the Buffer Streams would be closed (and thus, my OS ressources freed) before quiting the function.
I trimmed down the function to show the pattern:
public static void copyFile(FileOutputStream oDStream, FileInputStream oSStream) throw etc...
{
BufferedInputStream oSBuffer = new BufferedInputStream(oSStream, 4096);
BufferedOutputStream oDBuffer = new BufferedOutputStream(oDStream, 4096);
try
{
try
{
int c;
while((c = oSBuffer.read()) != -1) // could throw a IOException
{
oDBuffer.write(c); // could throw a IOException
}
}
finally
{
oDBuffer.close(); // could throw a IOException
}
}
finally
{
oSBuffer.close(); // could throw a IOException
}
}
As far as I understand it, I cannot put the two close() in the finally clause because the first close() could well throw, and then, the second would not be executed.
I know C# has the Dispose pattern that would have handled this with the using keyword.
I even know better a C++ code would have been something like (using a Java-like API):
void copyFile(FileOutputStream & oDStream, FileInputStream & oSStream)
{
BufferedInputStream oSBuffer(oSStream, 4096);
BufferedOutputStream oDBuffer(oDStream, 4096);
int c;
while((c = oSBuffer.read()) != -1) // could throw a IOException
{
oDBuffer.write(c); // could throw a IOException
}
// I don't care about resources, as RAII handle them for me
}
I am missing something, or do I really have to produce ugly and bloated code in Java just to handle exceptions in the close() method of a Buffered Stream?
(Please, tell me I'm wrong somewhere...)
EDIT: Is it me, or when updating this page, I saw both the question and all the answers decreased by one point in a couple of minutes? Is someone enjoying himself too much while remaning anonymous?
EDIT 2: McDowell offered a very interesting link I felt I had to mention here:
http://illegalargumentexception.blogspot.com/2008/10/java-how-not-to-make-mess-of-stream.html
EDIT 3: Following McDowell's link, I tumbled upon a proposal for Java 7 of a pattern similar to the C# using pattern: http://tech.puredanger.com/java7/#resourceblock . My problem is explicitly described. Apparently, even with the Java 7 do, the problems remain.
The try/finally pattern is the correct way to handle streams in most cases for Java 6 and lower.
Some are advocating silently closing streams. Be careful doing this for these reasons: Java: how not to make a mess of stream handling
Java 7 introduces try-with-resources:
/** transcodes text file from one encoding to another */
public static void transcode(File source, Charset srcEncoding,
File target, Charset tgtEncoding)
throws IOException {
try (InputStream in = new FileInputStream(source);
Reader reader = new InputStreamReader(in, srcEncoding);
OutputStream out = new FileOutputStream(target);
Writer writer = new OutputStreamWriter(out, tgtEncoding)) {
char[] buffer = new char[1024];
int r;
while ((r = reader.read(buffer)) != -1) {
writer.write(buffer, 0, r);
}
}
}
AutoCloseable types will be automatically closed:
public class Foo {
public static void main(String[] args) {
class CloseTest implements AutoCloseable {
public void close() {
System.out.println("Close");
}
}
try (CloseTest closeable = new CloseTest()) {}
}
}
There are issues, but the code you found lying about on the web is really poor.
Closing the buffer streams closes the stream underneath. You really don't want to do that. All you want to do is flush the output stream. Also there's no point in specifying the underlying streams are for files. Performance sucks because you are copying one byte at a time (actually if you use java.io use can use transferTo/transferFrom which is a bit faster still). While we are about it, the variable names suck to. So:
public static void copy(
InputStream in, OutputStream out
) throw IOException {
byte[] buff = new byte[8192];
for (;;) {
int len = in.read(buff);
if (len == -1) {
break;
}
out.write(buff, 0, len);
}
}
If you find yourself using try-finally a lot, then you can factor it out with the "execute around" idiom.
In my opinion: Java should have someway of closing resources at end of scope. I suggest adding private as a unary postfix operator to close at the end of the enclosing block.
Unfortunately, this type of code tends to get a bit bloated in Java.
By the way, if one of the calls to oSBuffer.read or oDBuffer.write throws an exception, then you probably want to let that exception permeate up the call hierarchy.
Having an unguarded call to close() inside a finally-clause will cause the original exception to be replaced by one produced by the close()-call. In other words, a failing close()-method may hide the original exception produced by read() or write(). So, I think you want to ignore exceptions thrown by close() if and only if the other methods did not throw.
I usually solve this by including an explicit close-call, inside the inner try:
try {
while (...) {
read...
write...
}
oSBuffer.close(); // exception NOT ignored here
oDBuffer.close(); // exception NOT ignored here
} finally {
silentClose(oSBuffer); // exception ignored here
silentClose(oDBuffer); // exception ignored here
}
static void silentClose(Closeable c) {
try {
c.close();
} catch (IOException ie) {
// Ignored; caller must have this intention
}
}
Finally, for performance, the code should probably work with buffers (multiple bytes per read/write). Can't back that by numbers, but fewer calls should be more efficient than adding buffered streams on top.
Yes, that's how java works. There is control inversion - the user of the object has to know how to clean up the object instead of the object itself cleaning up after itself. This unfortunately leads to a lot of cleanup code scattered throughout your java code.
C# has the "using" keyword to automatically call Dispose when an object goes out of scope. Java has no such thing.
For common IO tasks such as copying a file, code such as that shown above is reinventing the wheel. Unfortunately, the JDK doesn't provide any higher level utilities, but apache commons-io does.
For example, FileUtils contains various utility methods for working with files and directories (including copying). On the other hand, if you really need to use the IO support in the JDK, IOUtils contains a set of closeQuietly() methods that close Readers, Writers, Streams, etc. without throwing exceptions.

Categories

Resources