Unclear use of int variable - official Oracle example Using Byte Streams [duplicate]

Unclear use of int variable - official Oracle example Using Byte Streams [duplicate] - java

Why does InputStream#read() return an int and not a byte?

Because a byte can only hold -128 until 127, while it should return 0 until 255 (and -1 when there's no byte left (i.e. EOF)). Even if it returned byte, there would be no room to represent EOF.
A more interesting question is why it doesn't return short.

It returns an int because when the stream can no longer be read, it returns -1.
If it returned a byte, then -1 could not be returned to indicate a lack of input because -1 is a valid byte. In addition, you could not return value above 127 or below -128 because Java only handles signed bytes.
Many times when one is reading a file, you want the unsigned bytes for your processing code. To get values between 128 and 255 you could use a short, but by using an int you will align the memory registers with your data bus more efficiently. As a result, you don't really lose any information by using an int, and you probably gain a bit of performance. The only downside is the cost of the memory, but odds are you won't be hanging on to that int for long (as you will process it and turn it into a char or byte[]).

So it can return "-1" . It must do that when there is no more bytes to read.
You can't have it return a byte sometimes AND -1 for EOF/nobyte/whatever, so it returns an int ;)

as the Java doc says in InputStream#read, The value byte is returned as an int in the range 0 to 255. That is to say the byte value[-128~127] has been changed to int value[0~255], so the return value can be used to represent the end of the stream.

Because EOF (end of file or generally end of data) can't be represented using char.

Appending to BalusC answer:
not a byte to allow [0; 255] as main capacity and additionaly -1 as EOF result
int is used to adjust result to machine word (one of the main requirements to I/O operations - velocity, so they should work as fast as possible!)
Exception is not used because they are significantly slow!

Related

Why does InputStream read() return an int and not a short?

I was reading the byte stream trial and noticed the following statement
Notice that read() returns an int value. If the input is a stream of
bytes, why doesn't read() return a byte value? Using a int as a return
type allows read() to use -1 to indicate that it has reached the end
of the stream.
The given reason for using an int is that they can identify EOF by a -1. (seems shallow)
So the next bigger primitive type is short and it also supports -1 so why not use it?
From what i gather: (reasons to use int)
Due to performance int is preferred.
(this)
int variable holds a character value in its last 16 bits (from
character
trial)
Other more abstract streams would need to read more than just one
byte (something that i guess (happens with character streams))
Are my reasons correct? Am i missing something (like error correction)?

The most important reason to prefer int over short is that short is kind of a second-class citizen: all integer literals, as well as all arithmetical operations, are int-typed so you've got short->int promotion happening all over the place. Plus there is very little or no argument against the usage of int.

There is only one scenario where using short will give you an advantage: large arrays of short. To be sure, you can use them only when it is clear that the numbers to be stored fit the bounds.
In all other cases, it makes no real difference whether you have short or int. For example:
class A {
short s;
double d;
}
will not use less memory than:
class B {
int s;
double d;
}
because of alignment issues. So while the first one only has 10 bytes netto data, as compared to the second one that has 12, when you allocate an object it will still get aligned to some 8-byte boundary. Even if it is only a 4 byte boundary, the memory usage will be the same.

This is an interesting question :-) . It is true, that they had to use signed integer value type to represent EOF, but the preference of int over short is probably really just performance.
As I found on a different StackOverflow thread where this was discussed, the Java VM would automatically use int internally even if the definition used short.
The Java documentation states, that short should be used in large arrays and situations where memory really matters - source - http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html . That is apparently no the case here, because we always get just one value.

Purpose of byte type in Java

I read this line in the Java tutorial:
byte: The byte data type is an 8-bit signed two's complement integer. It has
a minimum value of -128 and a maximum value of 127 (inclusive). The
byte data type can be useful for saving memory in large arrays, where
the memory savings actually matters. They can also be used in place of
int where their limits help to clarify your code; the fact that a
variable's range is limited can serve as a form of documentation.
I don't clearly understand the bold line. Can somebody explain it for me?

Byte has a (signed) range from -128 to 127, where as int has a (also signed) range of −2,147,483,648 to 2,147,483,647.
What it means is that since the values you're going to use will always be between that range, by using the byte type you're telling anyone reading your code this value will be at most between -128 to 127 always without having to document about it.
Still, proper documentation is always key and you should only use it in the case specified for readability purposes, not as a replacement for documentation.

If you're using a variable which maximum value is 127 you can use byte instead of int so others know without reading any if conditions after, which may check the boundaries, that this variable can only have a value between -128 and 127.
So it's kind of self-documenting code - as mentioned in the text you're citing.
Personally, I do not recommend this kind of "documentation" - only because a variable can only hold a maximum value of 127 doesn't reveal it's really purpose.

Integers in Java are stored in 32 bits; bytes are stored in 8 bits.
Let's say you have an array with one million entries. Yikes! That's huge!
int[] foo = new int[1000000];
Now, for each of these integers in foo, you use 32 bits or 4 bytes of memory. In total, that's 4 million bytes, or 4MB.
Remember that an integer in Java is a whole number between -2,147,483,648 and 2,147,483,647 inclusively. What if your array foo only needs to contain whole numbers between, say, 1 and 100? That's a whole lot of numbers you aren't using, by declaring foo as an int array.
This is when byte becomes helpful. Bytes store whole numbers between -128 and 127 inclusively, which is perfect for what you need! But why choose bytes? Because they use one-fourth of the space of integers. Now your array is wasting less memory:
byte[] foo = new byte[1000000];
Now each entry in foo takes up 8 bits or 1 byte of memory, so in total, foo takes up only 1 million bytes or 1MB of memory.
That's a huge improvement over using int[] - you just saved 3MB of memory.
Clearly, you wouldn't want to use this for arrays that hold numbers that would exceed 127, so another way of reading the bold line you mentioned is, Since bytes are limited in range, this lets developers know that the variable is strictly limited to these bounds. There is no reason for a developer to assume that a number stored as a byte would ever exceed 127 or be less than -128. Using appropriate data types saves space and informs other developers of the limitations imposed on the variable.

I imagine one can use byte for anything dealing with actual bytes.
Also, the parts (red, green and blue) of colors commonly have a range of 0-255 (although byte is technically -128 to 127, but that's the same amount of numbers).
There may also be other uses.
The general opposition I have to using byte (and probably why it isn't seen as often as it can be) is that there's lots of casting needed. For example, whenever you do arithmetic operations on a byte (except X=), it is automatically promoted to int (even byte+byte), so you have to cast it if you want to put it back into a byte.
A very elementary example:
FileInputStream::read returns a byte wrapped in an int (or -1). This can be cast to an byte to make it clearer. I'm not supporting this example as such (because I don't really (at this moment) see the point of doing the below), just saying something similar may make sense.
It could also have returned a byte in the first place (and possibly thrown an exception if end-of-file). This may have been even clearer, but the way it was done does make sense.
FileInputStream file = new FileInputStream("Somefile.txt");
int val;
while ((val = file.read()) != -1)
{
byte b = (byte)val;
// ...
}
If you don't know much about FileInputStream, you may not know what read returns, so you see an int and you may assume the valid range is the entire range of int (-2^31 to 2^31-1), or possibly the range of a char (0-65535) (not a bad assumption for file operations), but then you see the cast to byte and you give that a second thought.
If the return type were to have been byte, you would know the valid range from the start.
Another example:
One of Color's constructors could have been changed from 3 int's to 3 byte's instead, since their range is limited to 0-255.

It means that knowing that a value is explicitly declared as a very small number might help you recall the purpose of it.
Go for real docs when you have to create a documentation for your code, though, relying on datatypes is not documentation.

An int covers the values from 0 to 4294967295 or 2 to the 32nd power. This is a huge range and if you are scoring a test that is out of 100 then you are wasting that extra spacce if all of your numbers are between 0 and 100. It just takes more memory and harddisk space to store ints, and in serious data driven applications this translates to money wasted if you are not using the extra range that ints provide.

byte data types are generally used when you want to handle data in the forms of streams either from file or from network. Reason behind this is because network and files works on the concept of byte.
Example: FileOutStream always takes byte array as input parameter.

end of stream in JAVA

I am confused by the following statement that appears here
The basic read() method of the InputStream class reads a single
unsigned byte of data and returns the int value of the unsigned byte.
This is a number between 0 and 255. If the end of stream is
encountered, it returns -1 instead; and you can use this as a flag to
watch for the end of stream.
Since one byte can represent up to 256 integers, I fail to see how it can represent 0 to 256 and -1. Can someone please comment on what I am missing here?

The return type of InputStream#read() is an int, where the value can be read as a byte if it falls in the range of 0-255.

Although the read() operation just reads a byte it actually returns an int so there is no problem.
Just values in range 0-255 are returned though, aside from the special -1 end of stream value.

It returns an int, not a byte, so though it normally will only contain 0-255, it can contain other values.

Why does the read() in FileInputStream return an integer? [duplicate]

This question already has answers here:
Why does InputStream#read() return an int and not a byte?
(6 answers)
Closed 6 years ago.
This page shows says that it is so that the method can return -1 when it wants to indicate that there are no more bytes to be read.
But a byte ranges from -128 to 127, right? And wouldn't it make more sense for the return type of read() to be byte since it returns a byte?
Thank you for your time.

The reason for it returning the value as an int is that it needs to return a value between 0-255, as well as being able to indicate when there is no more bytes to read from the file. By using an int, you can return the full range of positive unsigned values 0-255, as well as indicate when the file is complete. It wouldn't be able to provide this with only the 256 distinct values of a byte value, half of which are negative by Java default.

Sure, but the JavaDocs go on to say..
Returns:
the total number of bytes read into the buffer, or -1 if there is no more data because the end of the file has been reached.
Hopefully more than 127 bytes can be read from a stream at a time.

A byte of data is an unsigned value with a range from 0 to 255, while a byte in java is defined to range from -128 to 127, which doesn't make sense when reading binary data. read() returns an integer to allow it to use all of the non-negative values to represent valid data, and a negative value to signal end of data.
In general, a function should indicate an error condition or exception using a different mechanism from the one it uses to return data. In the simplest case, it can return a value that cannot be used to represent valid data, to ensure its meaning is unambiguous.

Q: wouldn't it make more sense for the return type of read() to be
byte?
A: No, because "byte" can't return the whole range 0..255 (unsigned), and "short" is just a PITA.

The FileInputStream class makes it possible to read the contents of a file as a stream of bytes. Here is a simple example:
InputStream input = new FileInputStream("c:\\data\\input-text.txt");
int data = input.read();
while(data != -1) {
//do something with data...
doSomethingWithData(data);
data = input.read();
}
input.close();
Note: The proper exception handling has been skipped here for the sake of clarity. To learn more about correct exception handling, go to Java IO Exception Handling.
The read() method of a FileInputStream returns an int which contains the byte value of the byte read. If the read() method returns -1, there is no more data to read in the stream, and it can be closed. That is, -1 as int value, not -1 as byte value. There is a difference here!

Why does InputStream#read() return an int and not a byte?

Why does InputStream#read() return an int and not a byte?

Because a byte can only hold -128 until 127, while it should return 0 until 255 (and -1 when there's no byte left (i.e. EOF)). Even if it returned byte, there would be no room to represent EOF.
A more interesting question is why it doesn't return short.

It returns an int because when the stream can no longer be read, it returns -1.
If it returned a byte, then -1 could not be returned to indicate a lack of input because -1 is a valid byte. In addition, you could not return value above 127 or below -128 because Java only handles signed bytes.
Many times when one is reading a file, you want the unsigned bytes for your processing code. To get values between 128 and 255 you could use a short, but by using an int you will align the memory registers with your data bus more efficiently. As a result, you don't really lose any information by using an int, and you probably gain a bit of performance. The only downside is the cost of the memory, but odds are you won't be hanging on to that int for long (as you will process it and turn it into a char or byte[]).

So it can return "-1" . It must do that when there is no more bytes to read.
You can't have it return a byte sometimes AND -1 for EOF/nobyte/whatever, so it returns an int ;)

as the Java doc says in InputStream#read, The value byte is returned as an int in the range 0 to 255. That is to say the byte value[-128~127] has been changed to int value[0~255], so the return value can be used to represent the end of the stream.

Because EOF (end of file or generally end of data) can't be represented using char.

Appending to BalusC answer:
not a byte to allow [0; 255] as main capacity and additionaly -1 as EOF result
int is used to adjust result to machine word (one of the main requirements to I/O operations - velocity, so they should work as fast as possible!)
Exception is not used because they are significantly slow!

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.