A while ago I read that java byte which is an 8 bits is stored internally as an int. I don't seem to find any info online that affirms this.
Thank you for taking the time to answer my question!
What about C++ char? Is it stored as an 8 bits or 32 bits?
How a byte (or any other Java value, for that matter) is stored is not specified by the JLS or the JVMS (the closest you'll find is the abstract specification on the level of the JVM, but that still doesn't say how it's stored natively). It is usually stored in the way which is most appropriate to the hardware architecture at hand, and that is usually 32 bits (or even 64).
Well, if you look at how methods are represented in a class file, you will notice that method parameters are loaded onto a method frame's execution stack with the same byte code instruction if they are bytes, ints, booleans, shorts or chars. This implies that they need to take the same size within a method frame what usually takes 32 bit.
As of storing bytes on the heap, most JVM implementations choose to store bytes with 32 bit while byte arrays are stored with 8 bit per array entry. This is however not specified in the JLS or the JVMS. If you wanted to implement your own JVM, you could use any amount of bit to store a byte and still pass the Java TCK compatibility tests.
So to say: What you say is not a manifestured truth but it is still correct most of the time.
Related
I'm trying to figure out how Java structures/allocates memory for objects. (Yes this is implementation specific. I'm using the Oracle 1.7 runtime for this.) I did some work on this here and here and the results are confusing.
First off, in both referenced links, when I allocated an array of objects, the equivalent of new Object[10000], it used 4 bytes per object. On a 32-bit system this makes perfect sense. But I'm on a 64-bit system so what's going on here?
Is Java limited to a 32-bit address space even on 64-bit systems?
Is Java limited to a 32-bit address space per array and each array object then has a pointer to where the elements are?
Something else?
Second, I compared the memory footprint of 8 booleans vs. a byte as the variables in a class. The 8 booleans requires 24 bytes/object or 3 bytes/boolean. The single byte approach requires 10 bytes/object.
What is going on with 3 bytes/boolean? I would understand 4 (making each an int) and definitely 1 (making each a byte. But 3?
And what's with expanding a byte to 10 bytes? I would understand 8 where it's expanding a byte to a native int (I'm on a 64-bit system). But what's with the other 2 bytes?
And in the case of different ways to create a RGB class it gets really weird.
For a class composed of 3 byte variables it uses 24 bytes/instance. Expensive but understandable as each uses an int.
So I tried where each class is a single int with the RGB stored in parts of the int (using bit shifting). And it's still 24 bytes/instance. Why on earth is Java taking 3 ints for storage? This makes no sense.
But the weirdest case is where the class has a single variable of "byte[] color = new byte3;" Note that the byte is allocated so it's not just a null pointer. This approach takes less memory than the other two approaches. How???
Any guidance as to what is going on here is appreciated. I have a couple of classes that get allocated a lot and the flywheel pattern won't work (these objects have their values changed all over the place).
And an associated question, does the order of declaring variables matter? Back in the old days when I did C++ programming, declaring "int, byte, int, byte" used 4 int's work of space while "int, int, byte, byte" used 3.
There are far too many questions here but I'll address two of them.
Is Java limited to a 32-bit address space even on 64-bit systems?
32-bit Java is limited to a 32-bit address space. 64-bit Java is not.
Is Java limited to a 32-bit address space per array and each array object then has a pointer to where the elements are?
A Java array is indexed by an int which is 32 bits.
Understood that you are on a 64bit system but are you running a 32bit or 64bit jvm? The JVM itself has an influence on many optimizations that will affect the results of your testing. There are many JVMs (although admittedly only a few are popular) and some may have better, or different, optimizations than others.
On the Oracle website, it says that bools take 32 bits on the stack, but 8 in an array. I'm having trouble understanding why it is that they would take less in a group than in as singles. How are they stored, and what difference does it make? If arrays of bools are more efficient, why has that technology not been transferred over to singles?
Also, why not 1 bit?
And what is the difference between how a 64 but system and a 32 bit system stores these?
Thanks!
A boolean value can be stored as a single binary digit, but our computers group values as a convenience. The smallest unit practically dealt with is a byte, the next largest being a word. A byte is, in modern hardware, always 8 bits. 32 bits has emerged as the standard for a word. Even our 64 bit computers can deal in 32 bit words effectively. It is much more convenient to store a bool in whatever unit comes naturally than as a single bit. In an array, the natural unit would be a byte, since you can address any byte in memory. On the stack, which is a word stack, the natural unit is a word. You could stuff bools into bytes and words and work on pulling them out again bit by bit, literally, but that's less efficient than storing them in bytes or words because modern memories are large, so CPU speed is more of a concern. You wouldn't want to waste all of the time it takes to pack bits in compactly, so we waste memory instead, since it is more expendable.
Look, when it comes to stack, one must keep in mind that speed is the most important thing. For example, consider the following:
void method(int foo, boolean bar, String name) ....
Then the stack just after entering the method looks like this:
|-other variables-|-...-|-name-|-bar-|-foo-|---- return address etc. --
^
stack pointer
These are all quantities on a word boundary, symbolized by |. Sure, the JVM could (theoretically, but see below) store the boolean in a single byte. But one must keep in mind that 32bit loads may be slower when they don't address word boundaries. Depending on architecture, it may be impossible to go through a pointer that does not live on a word boundary. Or it may be impossible use the quantity in a floating point instruction, etc. etc.
In addition, the byte code format can only address the n-th word on the stack. If this were not so, addresses relative to the stack pointer would have to be specified in bytes and this would mean that almost any stack access would have two bits that are irrelevant most of the time, as the majority of arguments will be words (int, float or reference) or double words (long, double).
What is never possible is to use 1 single bit for booleans. Why? Because bits are not directly addressable. The smallest addressable unit is the byte.
You can still store 32 booleans in an int if you feel that you should save on memory.
Because of the way CPUs work, all operations are done in 32 bits. If you have a single bool, the only realistic thing a compiler can do is zero out the rest of the 24 bits and save that to the stack, since it's not practical to scan your java file for other bools to and store them all in the same 32 bit memory block.
If you have an array of bools, it's simple to just reference them in blocks of 4, so it's only 8 bits per bool.
Note that this only applies to 32 bit applications/machines.
In Java, is a local variable allocated a maximum memory space of 32 bits? If it is, what happens if I use a local variable of data type long (64 bits) in a method in my java code? In what way would memory be allocated to this variable?
Whenever i googled to get an answer, I got explanations related only to java memory area which explained where (in the frame of the concerned method in stack..that is OK i know this) a local variable gets memory which is certainly not a relevant response to my query.
The original VM specification is actually really messed up with regards to local variables, each local variable is reseved a "slot" on the stack (simply an index number) and each slot is supposed to hold 4 bytes. So each variable is mapped to one "slot". But variables that occupy more than 4 bytes (double, long) need to occupy two consecutive slots. References do occupy one slot however, although they may be 8 bytes on a 64 bit VM. There was no 64 bit VM when this was specified, hence the specification assumed 32 bit references.
In practice, I'm pretty sure any current VM will remap the stack slots as it sees fit and the actual size reserved on the stack will also be decided by the VM. So all that remains is a peculiar slot allocation scheme in the byte code, all that actual "slot" stuff is purely on the bytecode level - the VM doesn't need to physically adhere to the slot layout the bytecode specified.
Take a look into the bytecode specification: http://docs.oracle.com/javase/specs/jvms/se5.0/html/Overview.doc.html#17257
JVMs usually word-align local variables on the stack, which means that they take up 32 bits on a 32 bit JVM (except for longs and doubles, which will take up 64 bits) and that they will take up 64 bits on a 64 bit JVM. The JVM is allowed to pack the variables so that they take up less space (e.g. putting 4 bytes in a 32 bit word rather than putting 4 bytes in 4 separate words), but this is slower than having all of the variables be word aligned since the processor will have to unpack them before using them.
A single byte takes up four bytes of space inside the Java virtual machine(32 bit processor).
Yes,we can use an array of byte which would occupy only the amount of space it actually needs. But I want to use a single byte not an array of bytes.
So,is there any type in Java to represent an 8 bit datum.
A single byte can be allocated more than a single byte of storage, for memory alignment reasons.
Do not worry about the target processor. An array of 10000 bytes will be stored in approximately 10000 bytes of space.
is there any type in Java to represent an 8 bit datum.
Yes, it is called byte.
How much a single byte actually needs only depends on the Java VM.
It's up to the implementation (JVM) how to deal with the internal types. I guess any JVM on an 8bit machine uses 1 byte for the type byte - on 32bit or 64bit machines this might not always be the case, as you noticed :)
If you use byte then Java will use the most efficient method to store it. Might be 8 bits, might be 64 bits, but whatever it is it's for a good reason. Don't fight the compiler, it knows better than you.
A byte does represent an 8 bit datum. Why do you care how many bytes an implementation of a vm uses to store it?
If int is enough for a field, and if I use long for some reason, would that cost me more memory?
In Java, yes, a long is 8 bytes wide and an integer is 4 bytes wide. This Java tutorial goes over the primitive data types. If you multiply the number of allocations by a certain amount (say, if you're allocating five million of these variables), the difference becomes more than negligible. For the average usage, however, it doesn't matter as much.
(You're already using Java, memory's kind of all over the place anyway.)
In native languages, there's a performance consideration; a 32-bit value can be held in a single register on a 32-bit architecture but not a 64-bit value; on 64-bit architectures, obviously, it can. I'm not sure what kind of optimization Java does on its native integers, but this might be true in its runtime as well. There's alignment issues to worry about, as well -- you see this more with using shorts and bytes, though.
Best practice would be to use the type you need. If the value will never be over 2^31, don't use a long.
Assuming from your previous questions that you mean to ask this in the scope of Java, the int data type is four bytes and the long data type is eight bytes.
However, whether the difference in size actually means a difference in memory usage depends on the situation.
If it's a local variable, it's allocated on the stack. As the stack is already allocated, using more stack space will not use more memory, provided of course that you don't exhaust the stack.
If it a member of a class, it will depend on how the members are aligned. Sometimes members are not stacked compactly in memory, but padding is used so that some members start at an even address. If you for example have a byte and an int in a class, there will likely be three bytes of padding between them so that the int starts at the next address divisible by four.
intis 32 bit and long is 64 bit. long takes twice as much memory (which is pretty insignificant for most applications).
long in Java is 64bit, and int is 32bit, so obviously longs uses more memory (8 bytes instead of 4 bytes).
ifwdev guessed correctly. Java defines an int as a 32-bit signed integer, and a long as a 64-bit signed integer. If you declare a variable as a long, then yes, it will take twice as much memory as the same variable declared as an int. In general, int is typically the "default" numeric type, even for values that could be contained in smaller types like a short. Unless you have a particular reason for requiring values greater than 2^31-1, use an int.
...would that cost me more memory?
You'll be using twice as memory
Before worrying about if you're using more memory or not, you should profile.
To use 1 megabyte extra of ram using long rather than int you'll have to declare: 262,144 long variables (or use them indirectly in your program ).
So if for some reason you declare one or two long variables when int's should be used, you'll be using 4 or 8 bytes more of memory. Not too much to worry about ( I mean, there might be worst memory problems in your app )
Taken from the Java Tutorial here's the definition of int and long
int: The int data type is a 32-bit signed two's complement integer. It has a minimum value of -2,147,483,648 and a maximum value of 2,147,483,647 (inclusive). For integral values, this data type is generally the default choice unless there is a reason (like the above) to choose something else. This data type will most likely be large enough for the numbers your program will use, but if you need a wider range of values, use long instead.
long: The long data type is a 64-bit signed two's complement integer. It has a minimum value of -9,223,372,036,854,775,808 and a maximum value of 9,223,372,036,854,775,807 (inclusive). Use this data type when you need a range of values wider than those provided by int.
But remember: "Premature optimization is the root of all evil" according to Donald Knuth ( according to me Copy/Paste is the root of all evil though )
If you know your data will fit in a specific data type (say short int in C), the only reason to use a bigger one is performance right? And if that's your goal, regardless of how marginal your performance gain is, as a general rule of thumb you want to use a size that matches your architecture's size (so for a normal 32-bit target system, you'd use a 32-bit type).
If you target more than one system, you can use a data type that matches the most often used one.