How is reference to java object is implemented? - java

Is pointer is just used for implementing java reference variable or how it is really implemented?
Below are the lines from Java language specification
4.3.1 Objects An object is a class instance or an array. The reference
values (often just references) are
pointers to these objects, and a
special null reference, which refers
to no object.
Does that mean it is pointer all the time?

In modern JVMs, references are implemented as an address.
Going back to the first version of HotSpot (and a bit earlier for the "classic VM"), references were implemented as handles. That is a fixed pointer to a pointer. The first pointer never changes for any particular object, but as the object data itself is moved the second pointer is changed. Obviously this impacts performance in use, but is easier to write a GC for.
In the latest builds of JDK7 there is support for "compressed oops". I believe BEA JRockit has had this for some time. Moving to 64 bit systems requires twice as much memory and hence bandwidth for addresses. "Compressed oops" takes advantage of the least significant three or four bits of address always being zero. 32 bits of data are shifted left three or four bits, allowing 32 or 64 GB of heap instead of 4 GB.

You can actually go and get the source code from here: http://download.java.net/jdk6/source/
The short answer to your question is: yes, there is a pointer to a memory location for your java variables (and a little extra). However this is a gigantic oversimplification. There are many many many C++ objects involved in moving java variables around in the VM. If you want to get dirty take a look at the hotspot\src\share\vm\oops package.
In practice none of this matters to developing java though, as you have no direct way of working with it (and secondly you wouldn't want to, the JVM is optimized for various processor architectures).

The answer is going to depend on every JVM implementation, but the best way to think of it is as a handle. It is a value that the JVM can look up in a table or some other such implementation the memory location of the reference. That way the JVM can move objects around in memory during garbage collection without changing the memory pointers everywhere.

A primitive type is always passed by value.
where as a Class Variable is actually a reference variable for the Object.
Consider a primitive type:
int i=0;
now the value of this primitive type is stored in a memory location of address 2068.
Every time you use this primitive type as a parameter, a new copy is created as it is not pass by reference but pass by value.
Now consider a class variable:
MyClass C1 = new MyClass();
Now this creates an object of the class type MyClass with a variable name C1.
The class variable C1 contains an address of the memory location of the object which is linked to the Valriable C1. So basically the class variable C1 points to the object location(new MyClass()).
And primitive types are stored in stack and objects in heaps.

Does that mean it is pointer all the time?
Yes, but it can't be manipulated as you normally do in C.
Bear in mind that being Java a different programming language that relies on its VM, this concept ( pointer ) should be used only as an analogy to understand better the behavior of such artifacts.

Related

Java references and primitives

In Java, when we assign an object to a variable of the matching class type, the variable only contains a reference to the memory location where the object in stored.
Is the case same with Primitive data types as well?
I mean, in int i = 10;, does i store the address of the memory location where the value 10 is stored?
PS: In sharp contrast, C++ actually stores the objects and not the references, right? Unless we use pointers and reference variables, right?
In Java, everything is stored by value. The value of an Object type in contrast to a primitive is the reference. Note that the wrapper types (like Integer) do constant interning for low values.
Indeed, in Java, primitives are always handled by value and objects are always handled by reference. Note however that these are the semantics; i.e., what the meaning of Java code is supposed to be. A particular implementation of Java (i.e., a JVM) is free to manage memory however it likes internally, as long as it appears to obey the correct semantics for anything that can be observed (i.e., output of the program).
And your PS remark is also correct.

Difference between reference and pointer [duplicate]

This question already has answers here:
Is Java "pass-by-reference" or "pass-by-value"?
(93 answers)
Closed 9 years ago.
I’ve read a lot of articles about how “pass-by-reference” doesn’t exist in Java since a copy of the value of the reference is passed, hence “pass-by-copy-of-reference-value”.
The articles also say a reference value is a pointer.
(So pointers do exist in Java.)
Some other articles say: Java has no pointers.
So what is the correct solution?
How does a pointer differ from a reference (or reference value), and do they exist in Java?
They aren't like C pointers. There's no pointer arithmetic allowed.
Java has only one mechanism for passing parameters: pass by value in all cases. For primitives, the value is passed. For objects, the reference to the object on the heap is passed.
A pointer is a reference type; it refers to something. What you're basically asking is: "Does Java have Dobermans? Because some articles say it has dogs."
As noted in Wikipedia entry for Pointer:
A pointer is a simple, more concrete implementation of the more abstract reference data type. Several languages support some type of pointer, although some have more restrictions on their use than others
It goes on to say this about Java specifically:
Unlike C, C++, or Pascal, there is no explicit representation of pointers in Java. Instead, more complex data structures like objects and arrays are implemented using references. The language does not provide any explicit pointer manipulation operators. It is still possible for code to attempt to dereference a null reference (null pointer), however, which results in a run-time exception being thrown. The space occupied by unreferenced memory objects is recovered automatically by garbage collection at run-time.
Looking up Reference you find:
In computer science, a reference is a value that enables a program to indirectly access a particular datum, such as a variable or a record, in the computer's memory or in some other storage device. The reference is said to refer to the datum, and accessing the datum is called dereferencing the reference.
A reference is distinct from the data itself. Typically, a reference is the physical address of where the data is stored in memory or in the storage device. For this reason, a reference is often called a pointer or address, and is said to point to the data. However a reference may also be the offset (difference) between the datum's address and some fixed "base" address, or an index into an array.
Java chose to use the broader term "reference" instead of "pointer" because of the differences between Java and C. (Thus creating a sisyphus-like situation where we have to keep explaining that Java is pass-by-value).
You don't have a C pointer, you have a Java Reference. This has nothing to do with a C++ reference, or pass-by-reference.
Because Java is pass-by-value it is similar to using a C pointer in that when you pass it to a method, the value (e.g. memory address) is copied.
It is right to say both:)
Java has no pointers since java has simplified pointers as references.
Object o=new Object();
We got an object o here; o is actually a pointer.
Basically, pointers and references are the same thing; they point to (refer to) something else in memory. However, you cannot do integer arithmetic on references. You may find some pages on this slide useful:
http://www.cis.upenn.edu/~matuszek/cit594-2005/Lectures/15-pointers-and-references.ppt
You have to get your head around the different, but related concepts of types, variables and objects. If we ignore for now the fundamental types like int and only consider class types, then in Java there are variables, which are "named things", and objects. Both variables and objects have a type. However, a variable of type T is not an object; rather, it is a mechanism for locating an object of type T, and for informing the runtime that this object is in use. A variable may at any point not locate any object, in which case it is null, or it may, and in that case the very existence of the variable keeps the object alive.
Let's repeat: Variables have names. Objects don't have names. Variables are not objects.
When you pass a variable as an argument into a function call, the corresponding function parameter becomes duplicate of the argument, so that there are now two variables which both locate the same object. When you assign one variable to another, you make the left-hand variable locate the same object (possibly null) as the right-hand variable, relinquishing the possibly previously held location. But no objects are being affected by this; the objects exist in some unrelated, unprobable plane of existence.
Also, variables have a deterministic lifetime, which is determined by their scope (essentially block-local or static-global). The lifetime of variables is non-deterministically related to the lifetime of objects, but the lifetime of objects cannot be controlled directly.
That's the type system and object model of Java (for class types) in a nutshell. It's up to you what you want to label this; it makes sense to say that "variables are references", since that's what they do, but you might as well just stop trying to compare yourself to other languages and just say "variables", which is clear enough within the context of Java. Variables are variables, objects are objects, neither one is ever the other, and you need the former to talk about the latter.
In Java, a reference is a pointer, usually one that isn't null. That's why it's called NullPointerException, not NullReferenceException. "The reference values (often just references) are pointers to these objects, and a special null reference, which refers to no object. "
Java pointers/references are akin to Pascal pointers, not to C or C++ pointers, in that they are very strongly typed and do not support address arithmetic.

How are Class declarations and defintions stored in object oriented languages (C++) after compilation?

I understand how the memory is organised for C programs(the stack, heap, function calls etc).
Now, I really don't understand how all these things work in Object Oriented Languages (to be more specific, C++).
I understand that whenever I use the new keyword, the space for the object is allocated onto the heap.
Some of my basic questions regarding this are:
1) Are class definitions stored somewhere in memory during execution of the program ?
2) If yes, then where and how is it stored. If no, then how are the functions dispatched at run time (in case of the virtual/non-virtual functions).
3) When an object is allocated memory, what all details about the object are stored in it ? (which class it belongs to, the member functions, the public private variables/functions etc.)
So basically, can someone please explain how the object oriented code gets converted after/during compilation so that these O.O.P. features are implemented?
I am comfortable with Java/C++. So you can explain the logic with either of the languages since both have quite distinct features.
Also, please add any reference links so that I can read it from there too, just in case some further doubts arise!
Thanks!
1) Are class definitions stored somewhere in memory during execution of the program ?
In C++, no. In Java, yes.
2) If yes, then where and how is it stored. If no, then how are the functions dispatched at run time (in case of the virtual/non-virtual functions).
In C++, calls to non-virtual functions are replaced by the compiler with the actual static address of the function; calls to virtual functions work through a virtual table. new is translated to memory allocation (the compiler knows the precise size) followed by a call to the (statically-determined) constructor. A field access is translated by the compiler to accessing memory in a statically-known offset from the beginning of the object.
It's similar in Java - in particular, a virtual table is used for virtual calls - except that field access can be done symbolically.
3) When an object is allocated memory, what all details about the object are stored in it ? (which class it belongs to, the member functions, the public private variables/functions etc.)
In C++ - no metadata is stored (well, with the exception of some bits needed for RTTI). In Java you get type information and visibility for all members and a few other things - you can check out the Java class file definition for more information.
So basically, can someone please explain how the object oriented code gets converted after/during compilation so that these O.O.P. features are implemented?
As you can see from my answers above, it really depends on the language.
In a language like C++, the heavy lifting is done by the compiler, and the resulting code have very little to do with object oriented concepts - in fact, the typical target language for a C++ compiler (native binary code) is untyped.
In a language like Java, the compiler targets an intermediate representation which usually contains a lot of extra details - type information, member visibility, etc. This is also what enables reflection in those sorts of languages.
Are class definitions stored somewhere in memory during execution of the program ?
Definitions are not preserved - at least not in the sense of maintaining the information you have at compile time.
When an object is allocated memory, what all details about the object are stored in it ? (which class it belongs to, the member functions, the public private variables/functions etc.)
During compilation, things like references to fields are transformed into dereferences of pointers with a fixed offset. For example, the a->first might be translated as something like *(a + 4), a->second as *(a + 8) and so on. The actual numbers will depend on the sizes of the previous fields, the target architecture, etc.
Similar things apply for the size of the objects (for purposes of allocation and deallocation).
In short, the sizes of the objects and the offsets of their fields are known at compile time and they are replaced in the actual binary.
If no, then how are the functions dispatched at run time (in case of the virtual/non-virtual functions).
Things like virtual method calls are typically translated in a similar way as fields, since they too can be considered "fields" of a hidden data structure (called vtable) of that class. A pointer to the vtable of a given class is stored in every object (of that class), if it has virtual methods.
The correct implementations of non-virtual methods are known at compile time and thus these methods can be "linked" on the spot without the use of a vtable.
Details may differ, but generally for each C++ class we have:
a set of its methods, each method is just a function, and
virtual methods table: an array where each element refers to a method of this class or one of its superclasses
An object without virtual methods is just a structure like in C. As soon as a virtual method is declared, object gets a hidden field which refers to the virtual table (below, vmt).
Invocation of non-virtual method obj.m(arg) is converted to invocation of C-like function m$(obj, arg) where m$ is some artifical identifier generated by C++ compiler to distinguish method named m from so named methods in other classes.
Invocation of virtual method obj.m(arg) is converted to (obj->vmt[N])(obj, arg), that is, actual function is taken from the object's virtual table. Each method has its own number in the table. This number is known at compile time and hardcoded into the invocation instruction sequence.
No other information is saved/used in runtime for ordinary execution. More information can be kept for debugging purposes.
Look at the C++ standard to know what mandates should be shared with all compilers. The C++ standard governs some details on how objects are to be laid out in memory. These limitations should be shared among compilers. However, the details are left to the implementation of the language. Here are the traits I've found common in excess of the standard.
A simple object without inheritance or static fields is laid out like you see it. C++ mandates that memory is byte addressed, but that doesn't mean the data will be aligned to bytes. It will align to the compiler's specification (depending on architecture and other factors). Mostly I've found that data is aligned to words. If you find it packs by words, and you have only single bytes, the memory will have empty spots between the bytes. There is no metadata for an object other than a reference to the virtual function table, if it's needed. When you get to inheritance and multiple inheritance, it becomes more complicated.
Functions are stored separately from the object, and how you cast the object determines what functions you'll call those look at the object as whatever they expect it to be. This all works because in reality, the function has a hidden this pointer as its first argument. There are no run-time checks to make sure you're referring to the right object type. If you cast an object into another object and call a function on it, that function can hit a memory exception. There's no type safety on a c-style cast, avoid them.
Then you have the virtual function table, which returns pointers to functions depending on the type that you are accessing. But again, this is all decided at compile time.
When you get to a langauge that has reflection this changes drastically.
Type metadata is stored for runtime use, and there are type checks at runtime. You'll get exceptions for calling the wrong method on the wrong type.

Where is "null" in memory

In java, you cannot state an array's size in its declaration
int[5] scores; //bad
I'm told this is because the JVM does not allocate space in memory until an object is initialized.
If you have an instance array variable (auto initialized with a default value of null), does that variable point to a place in the heap indicating null?
A null reference is literally a zero-value. The operating system prevents any program from accessing the zero address in memory, so the JVM will proactively check and make sure that a reference value isn't zero before allowing you to access it. This lets the JVM give you a nice NullPointerException rather than a "The program has performed an Illegal Operation" crash.
So you could say that the variable "points to" an invalid heap location. Or you could just say the variable doesn't "point to" anything. At that point it's just a question of semantics.
No, because in the JVM there's no need for that.
If you're in a native language (C and C++, for instance), NULL is a pointer with a zero value, and that points to the memory base address. Obviously that's not a valid address, but you "can" dereference it anyway - especially in a system without protected memory, like old MS-DOS or small ones for embedded processors. Not that it would be a valid address - usually that location contains interrupt vectors and you shouldn't touch them. And of course in any modern OS that will raise a protection fault.
But in the JVM a reference to an object is more like a handle (i.e. an index in a table) and null is an 'impossible' value (an index that is outside the domain of the table), so it can't be dereferenced and doesn't occupy space in such table.
I think this post will answer your question - Java - Does null variable require space in memory
In Java, null is just a value that a reference (which is basically a
restricted pointer) can have. It means that the reference refers to
nothing. In this case you still consume the space for the reference.
This is 4 bytes on 32-bit systems or 8 bytes on 64-bit systems.
However, you're not consuming any space for the class that the
reference points to until you actually allocate an instance of that
class to point the reference at.
Edit: As far as the String, a String in Java takes 16 bits (2 bytes)
for each character, plus a small amount of book-keeping overhead,
which is probably undocumented and implementation specific.
(remember to upvote the answer in the link if it helps you out)
Nope...you'll have a null (0x00) reference in the object's variable.
I would argue that "int[5] scores; //bad" is not due to memory allocation.
Notice that when you declare something you are really declaring
Type ReferenceName = new Type()
typically.
Observe the two examples
int[] scores = new int[5];
JLabel label = new JLabel();
The types (on the left hand side) are int[] and JLabel, which have nothing to do with memory allocation (except for a pointer), while the new instances (on the right side, requiring memory allocation) are int[5], requiring space for 5 ints, and JLabel(), requiring no arguments to call the constructor, but memory enough for a JLabel.

Reference type of JVM

In some Java literature, The statement
The reference type of the Java virtual
machine is cleverly named
reference
is widely popular. However, authors tend not to explain more why such statement is valid. Another thing that would make me understand this more is
What does the reference type of the JVM means ? Does the JVM represent itself in the heap ?
Would appreciate a lot an explanation on this statement.
Thank you,
Ashmawy
The word you're looking for here is irony:
the use of words to convey a meaning that is the opposite of its literal meaning
The use of "clever" in that sentence is clearly ironic. "The reference type of the Java virtual machine is given the clearly really stupidly obvious name 'reference'" is another way to read that sentence.
I think the cleverly part relates to the fact that a reference type is typically called a pointer, which necessitates the reader to learn two terms. The JVM terminology simply uses the term reference for this.
There's also a historical context.
When Java was introduced, its biggest competitor was C++. C++'s main problem was that it was deemed to be too difficult. Java initially positioned itself as the easy alternative to C++. It had a syntax very close to C++, but all the difficult stuff (operator overloading, templates, multiple pass-by mechanisms) etc were removed from the language.
And now comes the catch...
Java was initially marketed as not having pointers. The rationale for saying this was that pointers were deemed the most difficult thing of C++, so if Java would not have them, it had to be a simpler language.
The clever part thus comes from simply inventing another term for 'pointer'. Call them reference and you can state Java does not have pointers (but references).
This has lead to many debates and caused a good amount of confusion, especially since C++ already had the term 'reference' and uses it for something else (though conceptually a little related). The debate usually centers around two camps where one of them claims Java indeed does not have pointers, since you can't do pointer arithmetic with them and they don't directly represent memory addresses, while the other camp states that you don't have to be able to do arithmetic with a pointer to call it a pointer.
Put differently, whether it was clever to use the term reference is still open for debate.
This becomes clearer when the whole paragraph is taken into context:
The reference type of the Java virtual machine is cleverly named reference. Values of type reference come in three flavors: the class type, the interface type, and the array type. All three types have values that are references to dynamically created objects. The class type's values are references to class instances. The array type's values are references to arrays, which are full-fledged objects in the Java virtual machine. The interface type's values are references to class instances that implement an interface. One other reference value is the null value, which indicates the reference variable doesn't refer to any object.
(Taken from http://javadeveloper-jayaprakash-m.blogspot.com/)
I would assume from this that the "cleverly named" bit is referring to the fact that the references come in three different types and the JVM can distinguish between each one.
Or maybe it is only notion to express different approach taken by JVM designers for memory management.
If you'll remember in C/C++ one have freedom to allocate memory for variable either in local stack or in global heap. It is possible in C++ to allocate memory for object in method's local stack and then pass entire object as a parameter to other methods.
Java designers took away this freedom from developers. You just cannot create objects in local stack, only in global heap. So every variable of type Class/Interface/Array is indeed a reference to some memory address in the heap. And you cannot pass object by value only by reference.
If you don't have a choice - than you don't even need to think about what type of variable you have - value type or reference type.

Categories

Resources