I am confused by the following statement that appears here
The basic read() method of the InputStream class reads a single
unsigned byte of data and returns the int value of the unsigned byte.
This is a number between 0 and 255. If the end of stream is
encountered, it returns -1 instead; and you can use this as a flag to
watch for the end of stream.
Since one byte can represent up to 256 integers, I fail to see how it can represent 0 to 256 and -1. Can someone please comment on what I am missing here?
The return type of InputStream#read() is an int, where the value can be read as a byte if it falls in the range of 0-255.
Although the read() operation just reads a byte it actually returns an int so there is no problem.
Just values in range 0-255 are returned though, aside from the special -1 end of stream value.
It returns an int, not a byte, so though it normally will only contain 0-255, it can contain other values.
Related
Why does InputStream#read() return an int and not a byte?
Because a byte can only hold -128 until 127, while it should return 0 until 255 (and -1 when there's no byte left (i.e. EOF)). Even if it returned byte, there would be no room to represent EOF.
A more interesting question is why it doesn't return short.
It returns an int because when the stream can no longer be read, it returns -1.
If it returned a byte, then -1 could not be returned to indicate a lack of input because -1 is a valid byte. In addition, you could not return value above 127 or below -128 because Java only handles signed bytes.
Many times when one is reading a file, you want the unsigned bytes for your processing code. To get values between 128 and 255 you could use a short, but by using an int you will align the memory registers with your data bus more efficiently. As a result, you don't really lose any information by using an int, and you probably gain a bit of performance. The only downside is the cost of the memory, but odds are you won't be hanging on to that int for long (as you will process it and turn it into a char or byte[]).
So it can return "-1" . It must do that when there is no more bytes to read.
You can't have it return a byte sometimes AND -1 for EOF/nobyte/whatever, so it returns an int ;)
as the Java doc says in InputStream#read, The value byte is returned as an int in the range 0 to 255. That is to say the byte value[-128~127] has been changed to int value[0~255], so the return value can be used to represent the end of the stream.
Because EOF (end of file or generally end of data) can't be represented using char.
Appending to BalusC answer:
not a byte to allow [0; 255] as main capacity and additionaly -1 as EOF result
int is used to adjust result to machine word (one of the main requirements to I/O operations - velocity, so they should work as fast as possible!)
Exception is not used because they are significantly slow!
As I searched difference between InputStream and Reader. I got answered that.
InputStream: Byte-Base ( read byte by byte )
Reader: Character-Base ( read char by char )
I paste á character in file that's ASCII (or may be other Charset) is 225 in my OS and byte's max_value is 127. and I used FileInputStream to just read() then why it returning 225? how it is able to read more than one byte? because read() method just read one byte or character at a time.
Or what is the actually difference between InputStream and Reader?
á does indeed have a unicode value of 225 (that's its code point, and is unrelated to its encoding). When you cast that down to a byte, you'll get -31. But if you take a careful look at the docs for InputStream.read, you'll see:
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255.
(emphasis added) The read method returns an int, not a byte, but that int essentially represents an unsigned byte. If you cast that int down to a char, you'll get back to á. If you cast that int down to a byte, it'll wrap down to -31.
A bit more detail:
á has a unicode value of 225.
chars in Java are represented as UTF-16, which for 225 has a binary representation of 00000000 11100001
if you cast that down to a byte, it'll drop the high byte, leaving you with 11100001. This has a value of -31 if treated as a signed byte, but 225 if treated as unsigned.
InputStream.read returns an int so that it can represent the stream's end as -1. But if the int is non-negative, then only its bottom 8 bits are set (decimal values 0-255)
When you cast that int down to a byte, Java will drop all but the lowest 8 bits -- leaving you again with 11100001
The difference is that an InputStream will read the contents of the file as is, with no interpretation: the raw bytes.
A Reader on the other hand will use a CharsetDecoder to process the byte input and turn it into a sequence of chars instead. And the way it will process the byte input will depend on the Charset used.
And this is not a 1 <-> 1 relationship!
Also, forget about "ASCII values"; Java doesn't use ASCII, it uses Unicode, and a char is in fact a UTF-16 code unit. It was a full code point when Java began, but then Unicode defined code points outside the BMP and Java had to adapt: code points over U+FFFF are now represented using a surrogate pair, ie two chars.
See here for a more detailed explanation.
InputStream.read() returns an int. That is a value between 0 and 255.
Byte.MAX_VALUE is 127 but Byte.MIN_VALUE is -128 which is binary 10000000. But java does not support unsigned primitives so the most significant byte is always the sign bit.
All sublasses of InputStream return an int representing the value of the byte read. Same for all OutputStream subclasses, that take as arguments to the write method an int instead of a byte.
Two questions:
1- Why is that?
2- If I want to write the byte 10110101 to an output stream, how can I programatically convert it to an int before passing it into write? Same for when I receive an int from an input stream, how can I convert it to a byte?
InputStream.read() returns an int because that's an easy way to differentiate between valid data (values in range 0..255) and end-of-file (-1). That's covered here.
OutputStream.write() takes an int because otherwise you'd have to cast the value from InputStream.read().
If you have a byte in the range -128..-1 and want to convert it to an int in the range 128..255, you use a mask:
byte b = (byte)0xCD;
int i = b & 0xFF;
Because they need to return -1 for EOS. As a side effect this results in returning unsigned values (even though bytes are signed in Java).
This question already has answers here:
Why does InputStream#read() return an int and not a byte?
(6 answers)
Closed 6 years ago.
This page shows says that it is so that the method can return -1 when it wants to indicate that there are no more bytes to be read.
But a byte ranges from -128 to 127, right? And wouldn't it make more sense for the return type of read() to be byte since it returns a byte?
Thank you for your time.
The reason for it returning the value as an int is that it needs to return a value between 0-255, as well as being able to indicate when there is no more bytes to read from the file. By using an int, you can return the full range of positive unsigned values 0-255, as well as indicate when the file is complete. It wouldn't be able to provide this with only the 256 distinct values of a byte value, half of which are negative by Java default.
Sure, but the JavaDocs go on to say..
Returns:
the total number of bytes read into the buffer, or -1 if there is no more data because the end of the file has been reached.
Hopefully more than 127 bytes can be read from a stream at a time.
A byte of data is an unsigned value with a range from 0 to 255, while a byte in java is defined to range from -128 to 127, which doesn't make sense when reading binary data. read() returns an integer to allow it to use all of the non-negative values to represent valid data, and a negative value to signal end of data.
In general, a function should indicate an error condition or exception using a different mechanism from the one it uses to return data. In the simplest case, it can return a value that cannot be used to represent valid data, to ensure its meaning is unambiguous.
Q: wouldn't it make more sense for the return type of read() to be
byte?
A: No, because "byte" can't return the whole range 0..255 (unsigned), and "short" is just a PITA.
The FileInputStream class makes it possible to read the contents of a file as a stream of bytes. Here is a simple example:
InputStream input = new FileInputStream("c:\\data\\input-text.txt");
int data = input.read();
while(data != -1) {
//do something with data...
doSomethingWithData(data);
data = input.read();
}
input.close();
Note: The proper exception handling has been skipped here for the sake of clarity. To learn more about correct exception handling, go to Java IO Exception Handling.
The read() method of a FileInputStream returns an int which contains the byte value of the byte read. If the read() method returns -1, there is no more data to read in the stream, and it can be closed. That is, -1 as int value, not -1 as byte value. There is a difference here!
Why does InputStream#read() return an int and not a byte?
Because a byte can only hold -128 until 127, while it should return 0 until 255 (and -1 when there's no byte left (i.e. EOF)). Even if it returned byte, there would be no room to represent EOF.
A more interesting question is why it doesn't return short.
It returns an int because when the stream can no longer be read, it returns -1.
If it returned a byte, then -1 could not be returned to indicate a lack of input because -1 is a valid byte. In addition, you could not return value above 127 or below -128 because Java only handles signed bytes.
Many times when one is reading a file, you want the unsigned bytes for your processing code. To get values between 128 and 255 you could use a short, but by using an int you will align the memory registers with your data bus more efficiently. As a result, you don't really lose any information by using an int, and you probably gain a bit of performance. The only downside is the cost of the memory, but odds are you won't be hanging on to that int for long (as you will process it and turn it into a char or byte[]).
So it can return "-1" . It must do that when there is no more bytes to read.
You can't have it return a byte sometimes AND -1 for EOF/nobyte/whatever, so it returns an int ;)
as the Java doc says in InputStream#read, The value byte is returned as an int in the range 0 to 255. That is to say the byte value[-128~127] has been changed to int value[0~255], so the return value can be used to represent the end of the stream.
Because EOF (end of file or generally end of data) can't be represented using char.
Appending to BalusC answer:
not a byte to allow [0; 255] as main capacity and additionaly -1 as EOF result
int is used to adjust result to machine word (one of the main requirements to I/O operations - velocity, so they should work as fast as possible!)
Exception is not used because they are significantly slow!