Purpose of byte type in Java

Purpose of byte type in Java - java

I read this line in the Java tutorial:
byte: The byte data type is an 8-bit signed two's complement integer. It has
a minimum value of -128 and a maximum value of 127 (inclusive). The
byte data type can be useful for saving memory in large arrays, where
the memory savings actually matters. They can also be used in place of
int where their limits help to clarify your code; the fact that a
variable's range is limited can serve as a form of documentation.
I don't clearly understand the bold line. Can somebody explain it for me?

Byte has a (signed) range from -128 to 127, where as int has a (also signed) range of −2,147,483,648 to 2,147,483,647.
What it means is that since the values you're going to use will always be between that range, by using the byte type you're telling anyone reading your code this value will be at most between -128 to 127 always without having to document about it.
Still, proper documentation is always key and you should only use it in the case specified for readability purposes, not as a replacement for documentation.

If you're using a variable which maximum value is 127 you can use byte instead of int so others know without reading any if conditions after, which may check the boundaries, that this variable can only have a value between -128 and 127.
So it's kind of self-documenting code - as mentioned in the text you're citing.
Personally, I do not recommend this kind of "documentation" - only because a variable can only hold a maximum value of 127 doesn't reveal it's really purpose.

Integers in Java are stored in 32 bits; bytes are stored in 8 bits.
Let's say you have an array with one million entries. Yikes! That's huge!
int[] foo = new int[1000000];
Now, for each of these integers in foo, you use 32 bits or 4 bytes of memory. In total, that's 4 million bytes, or 4MB.
Remember that an integer in Java is a whole number between -2,147,483,648 and 2,147,483,647 inclusively. What if your array foo only needs to contain whole numbers between, say, 1 and 100? That's a whole lot of numbers you aren't using, by declaring foo as an int array.
This is when byte becomes helpful. Bytes store whole numbers between -128 and 127 inclusively, which is perfect for what you need! But why choose bytes? Because they use one-fourth of the space of integers. Now your array is wasting less memory:
byte[] foo = new byte[1000000];
Now each entry in foo takes up 8 bits or 1 byte of memory, so in total, foo takes up only 1 million bytes or 1MB of memory.
That's a huge improvement over using int[] - you just saved 3MB of memory.
Clearly, you wouldn't want to use this for arrays that hold numbers that would exceed 127, so another way of reading the bold line you mentioned is, Since bytes are limited in range, this lets developers know that the variable is strictly limited to these bounds. There is no reason for a developer to assume that a number stored as a byte would ever exceed 127 or be less than -128. Using appropriate data types saves space and informs other developers of the limitations imposed on the variable.

I imagine one can use byte for anything dealing with actual bytes.
Also, the parts (red, green and blue) of colors commonly have a range of 0-255 (although byte is technically -128 to 127, but that's the same amount of numbers).
There may also be other uses.
The general opposition I have to using byte (and probably why it isn't seen as often as it can be) is that there's lots of casting needed. For example, whenever you do arithmetic operations on a byte (except X=), it is automatically promoted to int (even byte+byte), so you have to cast it if you want to put it back into a byte.
A very elementary example:
FileInputStream::read returns a byte wrapped in an int (or -1). This can be cast to an byte to make it clearer. I'm not supporting this example as such (because I don't really (at this moment) see the point of doing the below), just saying something similar may make sense.
It could also have returned a byte in the first place (and possibly thrown an exception if end-of-file). This may have been even clearer, but the way it was done does make sense.
FileInputStream file = new FileInputStream("Somefile.txt");
int val;
while ((val = file.read()) != -1)
{
byte b = (byte)val;
// ...
}
If you don't know much about FileInputStream, you may not know what read returns, so you see an int and you may assume the valid range is the entire range of int (-2^31 to 2^31-1), or possibly the range of a char (0-65535) (not a bad assumption for file operations), but then you see the cast to byte and you give that a second thought.
If the return type were to have been byte, you would know the valid range from the start.
Another example:
One of Color's constructors could have been changed from 3 int's to 3 byte's instead, since their range is limited to 0-255.

It means that knowing that a value is explicitly declared as a very small number might help you recall the purpose of it.
Go for real docs when you have to create a documentation for your code, though, relying on datatypes is not documentation.

An int covers the values from 0 to 4294967295 or 2 to the 32nd power. This is a huge range and if you are scoring a test that is out of 100 then you are wasting that extra spacce if all of your numbers are between 0 and 100. It just takes more memory and harddisk space to store ints, and in serious data driven applications this translates to money wasted if you are not using the extra range that ints provide.

byte data types are generally used when you want to handle data in the forms of streams either from file or from network. Reason behind this is because network and files works on the concept of byte.
Example: FileOutStream always takes byte array as input parameter.

Related

Java - How can I create a variable with the required size in Java?

Consider this situation. I am having a thousand int variables in my program. We know that an int variable occupies 2 bytes of memory space in Java. So, the total amount of space occupied by my variables would be 2000 bytes. But, the problem is that, the value in every variable occupies only half of the space it has. So, the actual required space used would be 1000 bytes, which means that 1000 bytes goes waste. How can I address this problem? Is there a way to do address this problem in Java?

Your data fits in a byte, so just use byte instead of int (which is 4 bytes).

Checking the space required by values every time they are stored would mean creating an additional useless overhead.
If you know which size your values will be, it is up to you to properly chose their type while developing. If you don't know how much space your value might occupy, you have to choose the largest possible one.
During runtime, if a value is an int for example, this means it can possibly be between -2^31 and (2^31)-1. If you know this value will never be this size, you can use smaller sized primitives such as byte or short.
We could check and allocate the exact amount of memory required for each element in memory but it would take more time. On the contrary, using a fixed amount of memory takes up more space evidently but is faster, this is how Java works. Unfortunately, we can't do without a compromise.

An int in Java is 32 bit (4 bytes) long
The value held in the int does not affect the int size in memory, e.g. if your int has the value 0x55
So, in order to occupy 2000 bytes in memory, you will need 500 ints, if you create an int array, there will be overhead of the array itself, so it will take up more memory.

int data type is a 32-bit signed two's complement integer. And hence occupies 4 bytes of memory. For int:
Minimum value is - 2,147,483,648.(-2^31)
Maximum value is 2,147,483,647(inclusive).(2^31 -1)
If your values are not going to be so large then you can use either byte or short as per your requirement.
Byte data type is an 8-bit signed two's complement integer.
Minimum value is -128 (-2^7)
Maximum value is 127 (inclusive)(2^7 -1)
Short data type is a 16-bit signed two's complement integer.
Minimum value is -32,768 (-2^15)
Maximum value is 32,767 (inclusive) (2^15 -1)
You could also use a linked list if you really are not sure on how many elements are going to be in a large list.
The principal benefit of a linked list over a conventional array is that the list elements can easily be inserted or removed without reallocation or reorganization of the entire structure because the data items need not be stored continuously in memory or on disk, while an array has to be declared in the source code, before compiling and running the program. Linked lists allow insertion and removal of nodes at any point in the list, and can do so with a constant number of operations if the link previous to the link being added or removed is maintained during list traversal.
In general use a linked list for int data type or heavier data types as they actually incur a per-element overhead of at least 1 pointer size, so if your elements are small, you're actually worse off.

See the Oracle docs:
int: By default, the int data type is a 32-bit signed two's complement
integer
So it means that int occupies 4 bytes in memory.
And
byte: The byte data type is an 8-bit signed two's complement integer
So in your case you can change your datatype to byte instead of int.

What Primitive To Use In Java

I am a little confused on when to use what primitives. If I am defining a number, how do I know what to use byte, short, int, or long? I know they are different bytes, but does that mean that I can only use one of them for a certain number?
So simply, my question is, when do I use each of the four primitives listed above?

If I am lets say defining a number, how do I know what to use byte, short, int, or long?
Depending on your use-case. There's no huge penalty to using an int as opposed to a short, unless you have billions of numbers. Simply consider the range a variable might use. In most cases it is reasonable to use int, whose range is -2,147,483,648 to 2,147,483,647, while long handles numbers in the range of +/- 9.22337204*1018. If you aren't sure, long won't specifically hurt.
The only reasons you might want to use byte specifically is if you are storing byte data such as parts of a file, or are doing something like network communication or serialization where the number of bytes is important. Remember that Java's bytes are also signed (-128 to 127). Same for short--might be useful to save 2GB of memory for a billion-element array, but not specifically useful for much other than, again, serialization with a specific byte alignment.
does that mean that I can only use one of them for a certain number?
No, you can use any that is large enough to handle that number. Of course, decimal values need a double or a float--double is usually ideal due to its higher precision and few drawbacks. Some libraries (e.g. 3D drawing) might use floats however. Remember that you can cast numeric types (e.g. (byte) someInt or (float) functionReturningADouble();

A byte can be a number from -128 to 127
A short can be a number from -32,768 to 32,767
An int can be from -2^32 to 2^32 - 1
A long is -2^63 to 2^63 - 1
Usually unless space is an issue an int is fine.

In order to use a primitive you need to know the data range of the number stored in it. This depends heavily on the thing that you are modeling:
Use byte when your number is in the range [-128..127]
Use short when your number is in the range [-32768..32767]
Use int when your number is in the range [-231..231-1]
Use long when your number is in the range [-263..263-1]
In addition, byte data type is used for representing "raw" binary data, in which case it is not used like a number.
When you want an unlimited range, use BigInteger. This flexibility comes at a cost: operations on BigInteger can be orders of magnitude more expensive than identical operations on primitives.

Do you know what values your variable will potentially hold?
If it will be between -128 and 127, use a byte.
If it will be between -32,768 and 32,767, use a short.
If it will be between -2^32 and 2^32 - 1, use an int.
If it will be between -2^63 and 2^63 - 1, use a long.
Generally an int will be fine for most cases, but you might use a byte or a short if memory is a concern, or a long if you need larger values.
Try writing a little test program. What happens when you put a value that's too large in one of these types of variables?
Source: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
See also: the year 2038 problem

The general guidance should definitely be: If you don't have any good reason to pick a specific primitive, pick int.
This has at least three advantages:
Every kind of arithmetic or logical operation will return an int. Hence if you use anything smaller you will have to cast a lot to actually assign values back.
It's the accepted standard for "some number". If you pick anything else, other developers will wonder about the reasons.
It will be the most efficient, although it's likely that the performance difference from using anything else will be minuscule.
That said if you have an array of millions of elements yes you definitely should figure out what primitive fits the input range - I refer you to the other answers for a summary of how large each one is.

#dasblinkenlight
When you are working for large project, you need to optimize the code for each requirement. Apparently you would reduce the memory occupied by the project.
Hence using int, short, long, byte plays a vital role in terms of optimising the memory. When the interger holds a 1 digit value then you declare the integer with int or short, but you can still choose long,float and double technically but to become a good programmer you can try to follow standards.
Byte - Its 8 bit. So it can hold a value from -128 to 127.
short: Its 32 bit. so -32,768 and a maximum value of 32,767
Long- Its 64 bit. -2 (power of) 63 and a maximum value of 2(power of)63-1
float and double- for large decimal values.
Boolean for Yes or No decesions- can hold a value of 1 or 0
Char for characters.
Hope this answer helps you.

Why do bools take less memory in an array?

On the Oracle website, it says that bools take 32 bits on the stack, but 8 in an array. I'm having trouble understanding why it is that they would take less in a group than in as singles. How are they stored, and what difference does it make? If arrays of bools are more efficient, why has that technology not been transferred over to singles?
Also, why not 1 bit?
And what is the difference between how a 64 but system and a 32 bit system stores these?
Thanks!

A boolean value can be stored as a single binary digit, but our computers group values as a convenience. The smallest unit practically dealt with is a byte, the next largest being a word. A byte is, in modern hardware, always 8 bits. 32 bits has emerged as the standard for a word. Even our 64 bit computers can deal in 32 bit words effectively. It is much more convenient to store a bool in whatever unit comes naturally than as a single bit. In an array, the natural unit would be a byte, since you can address any byte in memory. On the stack, which is a word stack, the natural unit is a word. You could stuff bools into bytes and words and work on pulling them out again bit by bit, literally, but that's less efficient than storing them in bytes or words because modern memories are large, so CPU speed is more of a concern. You wouldn't want to waste all of the time it takes to pack bits in compactly, so we waste memory instead, since it is more expendable.

Look, when it comes to stack, one must keep in mind that speed is the most important thing. For example, consider the following:
void method(int foo, boolean bar, String name) ....
Then the stack just after entering the method looks like this:
|-other variables-|-...-|-name-|-bar-|-foo-|---- return address etc. --
^
stack pointer
These are all quantities on a word boundary, symbolized by |. Sure, the JVM could (theoretically, but see below) store the boolean in a single byte. But one must keep in mind that 32bit loads may be slower when they don't address word boundaries. Depending on architecture, it may be impossible to go through a pointer that does not live on a word boundary. Or it may be impossible use the quantity in a floating point instruction, etc. etc.
In addition, the byte code format can only address the n-th word on the stack. If this were not so, addresses relative to the stack pointer would have to be specified in bytes and this would mean that almost any stack access would have two bits that are irrelevant most of the time, as the majority of arguments will be words (int, float or reference) or double words (long, double).
What is never possible is to use 1 single bit for booleans. Why? Because bits are not directly addressable. The smallest addressable unit is the byte.
You can still store 32 booleans in an int if you feel that you should save on memory.

Because of the way CPUs work, all operations are done in 32 bits. If you have a single bool, the only realistic thing a compiler can do is zero out the rest of the 24 bits and save that to the stack, since it's not practical to scan your java file for other bools to and store them all in the same 32 bit memory block.
If you have an array of bools, it's simple to just reference them in blocks of 4, so it's only 8 bits per bool.
Note that this only applies to 32 bit applications/machines.

I want to store a number between 0 and 100 in a variable - should I use an INT or a BYTE and why?

I know this is a n00b question but I want to define a variable with a value between 0 and 100 in JAVA. I can use INT or BYTE as data types - but which one would be the best to use. And why?(benefits?)

Either int or byte would work for storing a number in that range.
Which is best depends on what you are aiming to do.
If you need to store lots of these numbers in an array, then a byte[] will take less space than an int[] with the same length.
(Surprisingly) a byte variable takes the same amount of space as a int ... due to the way that the JVM does the stack frame / object frame layout.
If you are doing lots of arithmetic, etc using these values, then int is more convenient than byte. In Java, arithmetic and bitwise operations automatically promote the operands to int, so when you need to assign back to a byte you need to use a (byte) type-cast.

Use byte datatype to store value between -128 to 127

I think since you have used 0.00 with decimals, this means you want to allow decimals.
The best way to do this is by using a floating point type: double or float.
However, if you don't need decimals, and the number is always an integer, you can use either int or byte. Byte can hold anything up to +127 or down to -128, and takes up much less space than an int.
with 32-bit ints on your system, your byte would take up 1/4 of the space!
The disadvantages are that you have to be careful when doing arithmetic on bytes - they can overflow, and cause unexpected results if they go out of range.
If you are using basic IO functions, like creating sounds or reading old file formats, byte is invaluable. But watch out for the sign convention: negative has the high-order bit set. However when doing normal numerical calculations, I always use int, since it allows easy extension to larger values if needed, and 32-bit cpus there is minimal speed cost really. Only thing is if you are storing a large quantity of them.

Citing Primitive Data Types:
byte: The byte data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive). The byte data type can be useful for saving memory in large arrays, where the memory savings actually matters. They can also be used in place of int where their limits help to clarify your code; the fact that a variable's range is limited can serve as a form of documentation.
int: The int data type is a 32-bit signed two's complement integer. It has a minimum value of -2,147,483,648 and a maximum value of 2,147,483,647 (inclusive). For integral values, this data type is generally the default choice unless there is a reason (like the above) to choose something else. This data type will most likely be large enough for the numbers your program will use, but if you need a wider range of values, use long instead.

Well it depends on what you're doing... out of the blue like that I would say use int because it's a standard way to declare an integer. But, if you do something specific that require memory optimization than maybe byte could be better in that case.

Technically, neither one is appropriate--if you need to store the decimal part, you want a float.
I'd personally argue that unless you're actually getting into resource-constrained issues (embedded system, big data, etc.) your code will be slightly clearer if you go with int. Good style is about being legible to people, so indicating that you're working with integers might make more sense than prematurely optimizing and thus forcing future maintainers of your code (including yourself) to speculate as to the true nature of that variable.

what data structure should I use to store binary codes in java?

I have binary codes as 10111 , 100011, 11101111 etc. now what data structure should I use to store these codes so that minimum size is required to store them?
I can't use string array as size required will be more as compared than storing the decimal equivalent of above binary codes.

java.util.BitSet is designed for that if the length is not fixed.

Depending on the length of the codes, simply use int or long.

If they'll be short, use byte, int, long (depending on how short).
If they'll be a bit longer, use an array of bytes, ints, or longs. For instance, if you need to store a 256-bit code, you can do that in a long[4].
If the length of the codes you need to store varies widely, you might consider either a class with a length member giving the number of bits and a byte, int, long, byte[], int[], or long[] member for storing them (depending on sizes and what kind of granularity you want). Or if you're really trying to pack as much in as you can, you can set aside some of the bits from your storage area to hold the number of bits in the code.

Each binary code could be split in portions by 8 bits (a byte). It would be necessary to define how to handle the tail if it is less than 8 bits. Byte arrays should work here nicely to store each portion.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.