Difference Between DataInputStream/DataOutputStream Class & InputStream/OutputStream Class

Difference Between DataInputStream/DataOutputStream Class & InputStream/OutputStream Class - java

Whenever I use HttpConnection Class in Java ME, Android or in BlackBerry, I uses DataInputStream/DataOutputStream class for reading & writing datas over remote server. However there are other class like InputStream/OutputStream which can be use for same purpose. I saw Question regarding InputStream/OutputStream class with HttpConnection. So I would like to know from experts that what are the differences between these two ?

DataInputStream/DataOutputStream is an InputStream/Outputstream. InputStream and OutputStream are the most generic IO streams you can use and they are the base class of all streams in Java. You can read and write raw bytes only with them. DataInputStream writes formatted binary data. Instead of just simple unformatted bytes, you can read Bytes, Integer, Double, Float, Short, UTF-8 strings, and any mixture of that data. And the same can be said for DataOutputStream except that it writes these higher level data types.
A DataInputStream/DataOutputStream has a reference to an InputStream/OutputStream which it reads the raw bytes and interprets those bytes as those previously mentioned data types.
Although reading strings from the DataInputStream isn't a good idea because it makes unchangeable assumptions about the character encoding of the underlying InputStream. Instead, it's better to use a Reader which will properly apply character encodings to the underlying byte stream to read data. That's why DataInputStream/DataOutputStream is of limited use. Typically it's better to trade textual data between processes because it's easiest to make a server and client agree on how to parse the data. Trading binary has lots of bit twiddling that has to occur to make sure each process is talking the same language. It's easy if you have two Java processes using DataInputStream/DataOutputStream, but if you ever want to add a new client that isn't Java you'll have a harder time reusing it. Not impossible, but just harder.

DataOutputStream can only handle basic types.
It can only read/write primtive types and Strings.DataInput/OutputStream performs generally better because its much simpler.
ObjectInput/OutputStream can read/write any object type was well as primitives. It is less efficient but much easier to use if you want to send complex data.
With the ObjectOutputStream class, instances of a class that implements Serializable can be written to the output stream, and can be read back with ObjectInputStream.
I would assume that the Object*Stream is the best choice until you know that its performance is an issue.

DataOutputStream makes sure the data is formatted in a platform independent way
OutputStream only if you transfer raw binary data.
DataOutputStream-This is the big benefit.
There is no significant performance difference between both.

Related

Java I/O stream definition and subtypes

I'm new to I/O in Java, and read in one of the posts on this site that:
All streams behave in the same manner, even if the actual physical devices to which they are linked differ. Thus, the same I/O classes and methods can be applied to any type of device
Quoted from: Stream definition
What I can't wrap my head around is how is it that all streams (take the different byte stream subtypes for example - BufferedInputStream, FilterInputStream, ObjectInputStream, .., etc.) behave in the same manner and can be connected to any physical device, when they are implemented as different classes to supposedly offer varying functionality and accomodate different sources/destinations? For example, can I use ObjectInputStream or FileOutputStream to read from and write to the console? Different streams, different devices, and all (streams) can be connected to all (devices) - I'm at loss here..

The quote does not say that you can connect any stream to any device, as you are saying. There are different implementations of InputStream and OutputStream that connect to specific devices - for example, FileInputStream connects to a file on the filesystem, and ByteArrayInputStream connects to a byte array in memory.
The main idea that the quote is explaining is that all those different kinds of streams are all extensions of the classes InputStream and OutputStream, so that you can do all the common operations on streams using any of the specific kinds of streams, regardless of where the specific kind of stream reads or writes data from or to.
Some streams are wrappers around other streams, adding specific functionality. For example, BufferedOutputStream adds buffering to an underlying stream. This is often useful because for some streams, writing in blocks is more efficient than writing byte by byte - BufferedOutputStream collects bytes that you write into a buffer, which is then written to the underlying stream as one block. ObjectOutputStream is another wrapper, which adds the functionality to convert serializable Java objects to bytes which can be written to an underlying stream.

You cannot use every Stream for every device. According to the definition in your question (bold by me),
All streams behave in the same manner.
So you can use every Stream the same, which means every Stream has the same methods since they inherit from java.io.OutputStream or java.io.InputStream.
So it does not matter whether you want to write to the console or a file or a networt socket, you can e.g. always write a byte array to the device.
Nonetheless, there are different implementations which handle writing this byte array differently.

What's the difference between BufferedInputStream and java.nio.Buffer?

We can get a BufferedInputStream by decorating an FileInputStream. And Channel got from FileInputStream.getChannel can also read content into a Buffer.
So, What's the difference between BufferedInputStream and java.nio.Buffer? i.e., when should I use BufferedInputStream and when should I use java.nio.Buffer and java.nio.Channel?

Getting started with new I/O (NIO), an article excerpt:
A stream-oriented I/O system deals with data one byte at a time. An
input stream produces one byte of data, and an output stream consumes
one byte of data. It is very easy to create filters for streamed data.
It is also relatively simply to chain several filters together so that
each one does its part in what amounts to a single, sophisticated
processing mechanism. On the flip side, stream-oriented I/O is often
rather slow.
A block-oriented I/O system deals with data in blocks. Each operation
produces or consumes a block of data in one step. Processing data by
the block can be much faster than processing it by the (streamed)
byte. But block-oriented I/O lacks some of the elegance and simplicity
of stream-oriented I/O.

These classes where written at different times for different packages.
If you are working with classes in the java.io package use BufferedInputStream.
If you are using java.nio use the ByteBuffer.
If are not using either you could use a plain byte[]. ByteBuffer has some useful methods for working with primitives so you might use it for that.
It is unlikely there will be any confusion because in general you will only use one when you have to and in which case only one will compile when you do.

I think we use BufferedInputStream to wrap the InputStream to make it works like block-oriented. But when deal with too much data, it actually consume more time than the real block-oriented I/O (Channel), but still faster than the unwrapperd InputStream.

What is meant by Streams w.r.t Java IO

I am getting a hard time visualizing what exactly stream means in terms of IO. I imagine stream as a continuous flow of data coming from a file, socket or any other data source. Is that correct? But then I get confused on how our java programs react to stream because when we write any java code let say:
Customer getCustomer(Customer customer)
Doesn't the above java code expects the whole object to be present before it gets processed?
Now lets say we are reading from a stream something like
FileInputStream in = new FileInputStream("abc.txt")
in.read();
Doesn't in.read() expects the whole file to be present in memory to be processed. If it is, then how come it is a stream? Why do we call them streams? Do they process data as it is read?
A similar confusion when reading through hadoop streams, looks like they have a different meaning altogether.

The word "stream" is used for different things in different contexts. But you're specifically asking about streams in I/O, i.e. InputStream and OutputStream.
I imagine stream as a continuous flow of data coming from a file, socket or any other data source. Is that correct?
Yes, a stream is a source of a sequence of bytes, which may come from a file, socket etc.
About getCustomer: You need to have a Customer object to pass to that method. But calling methods, passing objects and getting objects returned really does not have anything to do with streams.
Doesn't in.read() expects the whole file to be present in memory to be processed.
No. FileInputStream is an object which represents the stream. It's the thing that knows how to read bytes from the file.
Streams are not a fundamentally special kind of object. It's not like that there are classes, objects and streams. Streams are just a concept that is implemented using the standard Java OO programming features (classes and objects).

Doesn't the above java code expects the whole object to be present before it gets processed?
Yes. But it's not a stream.
Doesn't in.read() expects the whole file to be present in memory to be processed.
No.
If it is
It isn't.
then how come it is a stream?
It is.
Why do we call them streams? Do they process data as it is read?
Yes.
A similar confusion
There is no confusion here, except your own confusion when comparing method calls with I/O streams, which comparing apples versus oranges.
when reading through hadoop streams, looks like they have a different meaning altogether.
Very possibly.

An I/O Stream represents an input source or an output destination. A stream can represent many different kinds of sources and destinations, including disk files, devices, other programs, and memory arrays.
Streams support many different kinds of data, including simple bytes, primitive data types, localized characters, and objects. Some streams simply pass on data; others manipulate and transform the data in useful ways.
No matter how they work internally, all streams present the same simple model to programs that use them: A stream is a sequence of data.
In Java there are two kinds of streams (Byte & Character), they differ in the way how the data is transferred between source & destination.
Hope this answers your question. Please let me know if you need any further information.

From here..
A Stream is a free flowing sequence of elements. They do not hold any storage as that responsibility lies with collections such as arrays, lists and sets. Every stream starts with a source of data, sets up a pipeline, processes the elements through a pipeline and finishes with a terminal operation. They allow us to parallelize the load that comes with heavy operations without having to write any parallel code. A new package java.util.stream was introduced in Java 8 to deal with this feature.
Streams adhere to a common Pipes and Filters software pattern. A pipeline is created with data events flowing through, with various intermediate operations being applied on individual events as they move through the pipeline. The stream is said to be terminated when the pipeline is disrupted with a terminal operation. Please keep in mind that a stream is expected to be immutable — any attempts to modify the collection during the pipeline will raise a ConcurrentModifiedException exception.

How do I read a file without any buffering in Java?

I'm working through the problems in Programming Pearls, 2nd edition, Column 1. One of the problems involves writing a program that uses only around 1 megabyte of memory to store the contents of a file as a bit array with each bit representing whether or not a 7 digit number is present in the file. Since Java is the language I'm the most familiar with, I've decided to use it even though the author seems to have had C and C++ in mind.
Since I'm pretending memory is limited for the purpose of the problem I'm working on, I'd like to make sure the process of reading the file has no buffering at all.
I thought InputStreamReader would be a good solution, until I read this in the Java documentation:
To enable the efficient conversion of bytes to characters, more bytes may be read ahead from the underlying stream than are necessary to satisfy the current read operation.
Ideally, only the bytes that are necessary would be read from the stream -- in other words, I don't want any buffering.

One of the problems involves writing a program that uses only around 1 megabyte of memory to store the contents of a file as a bit array with each bit representing whether or not a 7 digit number is present in the file.
This implies that you need to read the file as bytes (not characters).
Assuming that you do have a genuine requirement to read from a file without buffering, then you should use the FileInputStream class. It does no buffering. It reads (or attempts to read) precisely the number of bytes that you asked for.
If you then need to convert those bytes to characters, you could do this by applying the appropriate String constructor to a byte or byte[]. Note that for multibyte character encodings such as UTF-8, you would need to read sufficient bytes to complete each character. Doing that without the possibility of read-ahead is a bit tricky ... and entails "knowledge* of the character encoding you are reading.
(You could avoid that knowledge by using a CharsetDecoder directly. But then you'd need to use the decode method that operates on Buffer objects, and that is a bit complicated too.)
For what it is worth, Java makes a clear distinction between stream-of-byte and stream-of-character I/O. The former is supported by InputStream and OutputStream, and the latter by Reader and Write. The InputStreamReader class is a Reader, that adapts an InputStream. You should not be considering using it for an application that wants to read stuff byte-wise.

When to use byte array & when byte buffer?

What is the difference between a byte array & byte buffer ?
Also, in what situations should one be preferred over the other?
[my usecase is for a web application being developed in java].

There are actually a number of ways to work with bytes. And I agree that it's not always easy to pick the best one:
the byte[]
the java.nio.ByteBuffer
the java.io.ByteArrayOutputStream (in combination with other streams)
the java.util.BitSet
The byte[] is just a primitive array, just containing the raw data. So, it does not have convenient methods for building or manipulating the content.
A ByteBuffer is more like a builder. It creates a byte[]. Unlike arrays, it has more convenient helper methods. (e.g. the append(byte) method). It's not that straightforward in terms of usage. (Most tutorials are way too complicated or of poor quality, but this one will get you somewhere. Take it one step further? then read about the many pitfalls.)
You could be tempted to say that a ByteBuffer does to byte[], what a StringBuilder does for String. But there is a specific difference/shortcoming of the ByteBuffer class. Although it may appear that a bytebuffer resizes automatically while you add elements, the ByteBuffer actually has a fixed capacity. When you instantiate it, you already have to specify the maximum size of the buffer.
That's one of the reasons, why I often prefer to use the ByteArrayOutputStream because it automatically resizes, just like an ArrayList does. (It has a toByteArray() method). Sometimes it's practical, to wrap it in a DataOutputStream. The advantage is that you will have some additional convenience calls, (e.g. writeShort(int) if you need to write 2 bytes.)
BitSet comes in handy when you want to perform bit-level operations. You can get/set individual bits, and it has logical operator methods like xor(). (The toByteArray() method was only introduced in java 7.)
Of course depending on your needs you can combine all of them to build your byte[].

ByteBuffer is part of the new IO package (nio) that was developed for fast throughput of file-based data. Specifically, Apache is a very fast web server (written in C) because it reads bytes from disk and puts them on the network directly, without shuffling them through various buffers. It does this through memory-mapped files, which early versions of Java did not have. With the advent of nio, it became possible to write a web server in java that is as fast as Apache. When you want very fast file-to-network throughput, then you want to use memory mapped files and ByteBuffer.
Databases typically use memory-mapped files, but this type of usage is seldom efficient in Java. In C/C++, it's possible to load up a large chunk of memory and cast it to the typed data you want. Due to Java's security model, this isn't generally feasible, because you can only convert to certain native types, and these conversions aren't very efficient. ByteBuffer works best when you are just dealing with bytes as plain byte data -- once you need to convert them to objects, the other java io classes typically perform better and are easier to use.
If you're not dealing with memory mapped files, then you don't really need to bother with ByteBuffer -- you'd normally use arrays of byte. If you're trying to build a web server, with the fastest possible throughput of file-based raw byte data, then ByteBuffer (specifically MappedByteBuffer) is your best friend.

Those two articles may help you http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly and http://evanjones.ca/software/java-bytebuffers.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.