I want to know the distinction to clear the conceptual difference as I have been seeing SocketChannel, FileChannel etc. classes. compared to Socket and File I/O Streams
As I know, I/O Streams must be accessed sequentially i.e. they are a sequence of bytes which can be read and written to. You can also use Buffered Stream to increase efficiency of I/O as well.
So, compared to Streams, are "Channels" a totally new concept or just a wrapper over Streams?
Yes, If we say "Stream is a sequence of bytes" then what is a Channel in that sense if both are different?
Neither. Channels are not wrappers around streams (unless you explicitly wrap a stream via Channels.newChannel(InputStream) or Channels.newChannel(OutputStream)) and they are not a “totally new concept”.
Depending on the particular type, a channel still represents a sequence of bytes that may be read or written sequentially. The fact that you can translate between these APIs via factory methods in the Channels class shows that there is a relationship.
But the NIO API addresses certain design issues which could not be fixed by refactoring the old stream classes (in a compatible way). E.g. the base types are interfaces now, which allows certain channels to implement multiple types, like ReadableByteChannel and WritableByteChannel at the same time. Further, there is no method for reading a single byte, which is a good way to get rid of the “You can use BufferedStream to increase efficiency” myth. If an insufficient buffer size is the cause of an I/O performance bottleneck, you solve it by providing a larger buffer in the first place, rather than wrapping a stream or channel into another, forcing it to copy all data between buffers. Consequently, there is no BufferedChannel.
Certain implementations like FileChannel offer additional methods allowing random access to the underlying resource, in addition to the sequential access. That way, you can use a uniform interface, instead of dealing with entirely different APIs, like with the RandomAccessFile/ InputStream/ OutputStream relationship.
Further, a lot of previously missing I/O features were added when NIO was introduced. Some of them could have been implemented atop the old classes without problems, but the designers clearly favored using the new API for all of them, where these features could be considered in the design right from the start.
But generally, as said, a channel isn’t an entirely new concept compared to streams.
Related
I wanted to pipe an OutputStream to an InputStream such that every time I write to my OutputStream those bytes become available in my InputStream.
Reading the JDK documentation I found the PipeInputStream and the PipeOutputStream which seemed a good alternative, e.g.
PipedOutputStream out = new PipedOutputStream();
PipedInputStream in = new PipedInputStream(out);
However, the documentation explicitly states the two streams must be used in separate threads.
Typically, data is read from a PipedInputStream object by one thread and data is written to the corresponding PipedOutputStream by some other thread. Attempting to use both objects from a single thread is not recommended, as it may deadlock the thread.
Is there another easy way to pipe these two streams to run in the same thread using some other features available in Java?
I suppose the major issue here is buffering the data being written by to output, since while we're writing, those bytes get buffered somewhere to be later consumed by the input reading part.
However I'm working with discrete amounts of data, of the kind that would easily fit in memory. So for me buffering a few bytes is not a big concern. I'm more interested in finding a simple pattern to do this.
Being that the case, I thought I could easily do this manually by writing everything to a ByteArrayOutputStream and then get the bytes from it and read them again in a ByteArrayInputStream.
However this piping scenario seems such a natural use case that I was wondering if there's another simpler way to pipe two streams in a single-threaded application, e.g.
output.pipe(input);
message.writeTo(output);
process(input);
I'm currently using Java sockets in a client-server application with OutputStream and not BufferedOutputStream (and the same for input streams).
The client and server exchanges serialized objects (writeObject() method).
Does it make sense (more speed) to use BufferedOutputStream and BufferedInputStream in this case?
And when I have to flush or should I not write a flush() statement?
Does it make sense (more speed) to use BufferedOutputStream and BufferedInputStream in this case?
Actually, it probably doesn't make sense1.
The object stream implementation internally wraps the stream it has been given with a private class called BlockDataOutputStream that does buffering. If you wrap the stream yourself, you will have two levels of buffering ... which is likely to make performance worse2.
And when I have to flush or should I not write a flush() statement?
Yes, flushing is probably necessary. But there is no universal answer as to when to do it.
On the one hand, if you flush too often, you generate extra network traffic.
On the other hand, if you don't flush when it is needed, the server can stall waiting for an object that the client has written but not flushed.
You need to find the compromise between these two syndromes ... and that depends on your application's client/server interaction patterns; e.g. whether the message patterns are synchronous (e.g. message/response) or asynchronous (e.g. message streaming).
1 - To be certain on this, you would need to do some forensic testing to 1) measure the system performance, and 2) determine what syscalls are made and when network packets are sent. For a general answer, you would need to repeat this for a number of use-cases. I'd also recommend looking at the Java library code yourself to confirm my (brief) reading.
2 - Probably only a little bit worse, but a well designed benchmark would pick up a small performance difference.
UPDATE
After writing the above, I found this Q&A - Performance issue using Javas Object streams with Sockets - which seems to suggest that using BufferedInputStream / BufferedOutputStream helps. However, I'm not certain whether the performance improvement that was reported is 1) real (i.e. not a warmup artefact) and 2) due to the buffering. It could be just due to adding the flush() call. (Why: because the flush could cause the network stack to push the data sooner.)
I think these links might help you:
What is the purpose of flush() in Java streams?
The flush method flushes the output stream and forces any buffered output bytes to be written out. The general contract of flush is that calling it is an indication that, if any bytes previously written have been buffered by the implementation of the output stream, such bytes should immediately be written to their intended destination.
How java.io.Buffer* stream differs from normal streams?
Internally a buffer array is used and instead of reading bytes individually from the underlying input stream enough bytes are read to fill the buffer. This generally results in faster performance as less reads are required on the underlying input stream.
http://www.oracle.com/technetwork/articles/javase/perftuning-137844.html
As a means of starting the discussion, here are some basic rules on how to speed up I/O: 1.Avoid accessing the disk. 2.Avoid accessing the underlying operating system. 3.Avoid method calls. 4.Avoid processing bytes and characters individually.
So using Buffered-Streams usually speeds speeds up the IO-processe, as less read() are done in the background.
I use MappedByteBuffers to achieve thread safety between readers and writers of a file via volatile variables (writer updates position and readers read the writer's position) (this is a file upload system, the the incoming file is a stream, if that matters). There are more tricks, obviously (sparse files, power of two mapping growth), but it all boils down to that.
I can't find a faster way to write to a file while concurrently reading the same without caching the same completely in memory (which I cannot do due to shear size).
Is there any other method of IO that guarantees visibility within the same process for readers to written bytes? MappedByteBuffer makes its guarantees, indirectly, via the Java Memory Model, and I'd expect any other solution to do the same (read: non platform specific and more).
Is this the fastest way? Am I missing something in the docs?
I did some tests quite a few years ago on what was then decent hardware, and MappedByteBuffer was about 20% faster than any other I/O technique. It does have the disadvantage for writing that you need to know the file size in advance.
Which of the two would be the best choice and in which circumstance?
Clearly there is no sense in using a file channel for a very small file. Besides that, what are the pro and cons of the two input/output means.
Thanks a lot in advance.
FileChannel has many features missing in java.io: it is interruptible, it can move position within the file, it can lock a file, etc. And it can be faster than old IO, especially when it uses direct byte buffers, here is an explanation from ByteBuffer API:
byte buffer is either direct or non-direct. Given a direct byte buffer, the Java virtual machine will make a best effort to perform native I/O operations directly upon it. That is, it will attempt to avoid copying the buffer's content to (or from) an intermediate buffer before (or after) each invocation of one of the underlying operating system's native I/O operations.
If you need none of the above features go with streams, you'll get a shorter code.
Is it more efficient to flush the OutputStream after each individual invocation of ObjectOutputStream#writeObject rather than flushing the stream after a sequence of object writes? (Example: write object and flush 4 times, or write 4 times and then just flush once?)
How does ObjectOutputStream work internally?
Is it somehow better sending four Object[5] (flushing each one) than a Object[20], for example?
It is not better. In fact it is probably worse, from a performance perspective. Each of those flushes will force the OS-level TCP/IP stack to send the data "right now". If you just do one flush at the end, you should save on system calls, and on network traffic.
If you haven't done this already, inserting a BufferedOutputStream between the Socket OutputStream and the ObjectOutputStream will make a much bigger difference to performance. This allows the serialized data to accumulate in memory before being written to the socket stream. This potentially save many system calls and could improve performance by orders of magnitude ... depending on the actual objects being sent.
(The representation of four Object[5] objects is larger than one Object[20] object, and that results in a performance hit in the first case. However, this is marginal at most, and tiny compared with the flushing and buffering issues.)
How does this stream work internally?
That is too general a question to answer sensibly. I suggest that you read up on serialization starting with the documents on this page.
No, it shouldn't matter, unless you have reason to believe the net link is likely to go down, and partial data is useful. Otherwise it just sounds like a way to make the code more complex for no reason.
If you look at the one and only public constructor of ObjectOutputStream, you note that it requires an underlying OutputStream for its instantiation.
When and how you flush your ObjectStream is entirely dependent on the type of stream you are using. (And in considering all this, do keep in mind that not all extension of OutputStream are guaranteed to respect your request to flush -- it is entirely implementation independent, as it is spelled out in the 'contract' of the javadocs.)
But certainly we can reason about it and even pull up the code and see what is actually done.
IFF the underlying OutputStream must utilize the OS services for devices (such as the disk or the network interface in case of Sockets) then the behavior of flush() is entirely OS dependent. For example, you may grab the output stream of a socket and then instantiate an ObjectOutputStream to write serialized objects to the net. TCP/IP implementation of the host OS is in charge.
What is more efficient?
Well, if your object stream is wrapping a ByteArrayOutputStream, you are potentially looking at a series of reallocs and System.arrayCopy() calls. I say potentially, since the implementation of byte array doubles the size on each (internal) resize() op and it is very unlikely that writing n (small) objects and flushing each time will result in n reallocs. (Where n is assumed to be a reasonably small number).
But if you are wrapping a network stream, you must keep in mind that network writes are very expensive. It makes much more sense, if your protocol allows it, to chunk your writes (to fill the send buffer) and just flush once.