What's the size cost of Java inheritance? - java

There are various articles out there on the interwebs that try to empirically estimate the overhead of a java.lang.Object in particular JVM implementations. For example, I've seen the size overhead of a bare Object estimated at 8 bytes in some JVMs.
What I would like to know is whether a typical JVM implementation of the extends relationship introduces incremental size overhead at every level of the class hierarchy. In other words, suppose you have a class hierarchy with N levels of subclasses. Is the overhead of the in-memory representation of a class instance O(1) or O(N)?
I imagine it is O(1) because although the size of some of the hidden fluffy stuff you need to be a Java Object (vtable, chain of classes) will grow as the inheritance hierarchy grows, they grow per-class, not per-instance, and the JVM implementation can store constant-size pointers to these entities in a constant-size header attached to every Object.
So in theory, the overhead directly attached to the in-memory representation of any Java object should be O(1) for inheritance depth N. Does anyone know if it's true in practice?

When in doubt, look at the source (well, a source; each JVM is free to choose how to do it, as the standard does not mandate any internal representation). So I had a look, and found the following comment within the implementation of JDK 7-u60's hotspot JVM:
// A Klass is the part of the klassOop that provides:
// 1: language level class object (method dictionary etc.)
// 2: provide vm dispatch behavior for the object
// Both functions are combined into one C++ class. The toplevel class "Klass"
// implements purpose 1 whereas all subclasses provide extra virtual functions
// for purpose 2.
// One reason for the oop/klass dichotomy in the implementation is
// that we don't want a C++ vtbl pointer in every object. Thus,
// normal oops don't have any virtual functions. Instead, they
// forward all "virtual" functions to their klass, which does have
// a vtbl and does the C++ dispatch depending on the object's
The way I read it, this means that, for this (very popular) implementation, object instances only store a pointer to their class. The cost, per-instance of a class, of that class having a longer or shorter inheritance chain is effectively 0. The classes themselves do take up space in memory though (but only once per class). Run-time efficiency of deep inheritance chains is another matter.

The JVM specification states
The Java Virtual Machine does not mandate any particular internal
structure for objects.
So the specification does not care how you do it. But...
In some of Oracle’s implementations of the Java Virtual Machine, a
reference to a class instance is a pointer to a handle that is itself
a pair of pointers: one to a table containing the methods of the
object and a pointer to the Class object that represents the type of
the object, and the other to the memory allocated from the heap for
the object data.
So in typical Oracle implementations it is O(1) for methods. This method table is the Method Area which is per class.
The Java Virtual Machine has a method area that is shared among all
Java Virtual Machine threads. The method area is analogous to the
storage area for compiled code of a conventional language or analogous
to the "text" segment in an operating system process. It stores
per-class structures such as the run-time constant pool, field and
method data, and the code for methods and constructors, including the
special methods (§2.9) used in class and instance initialization and
interface initialization.
Also, about method entries
The method_info structures represent all methods declared by this
class or interface type, including instance methods, class methods,
instance initialization methods (§2.9), and any class or interface
initialization method (§2.9). The methods table does not include items
representing methods that are inherited from superclasses or
superinterfaces.

An instance generally requires the following data, although it's up to the implementation exactly what to do:
the instance fields of the class and its parent classes, which I assume you don't mean to include in the term "overhead"
some means to lock the object
if the garbage collector relocates objects, then some means to record the original hash of the object (for Object.hashCode)
some means to access type information
As you guess in your question, in a "normal" Java implementation the type information is stored per-class, not per instance. Part of the definition of "type" is that two instances of the same class necessarily have the same type information, there's no obvious reason not to share it. So you would expect that per-instance overhead to be constant, not to depend on the class hierarchy.
That is to say, adding extra empty classes or interfaces to a class should not increase the size of its instances. I don't think either the language or the JVM spec actually guarantees this, though, so don't make too many assumptions about what a "non-normal" Java implementation is allowed to do.
As an aside, the second and third things on my list can be combined via cunning trickery, so that both of them together are a single pointer. The article you link to refers to references taking 4 bytes, so the 8 bytes it comes up with for an object is one pointer to type information, one field containing either a hashcode or a pointer to a monitor, and probably some flags in the lowest 2 bits of one or both of those pointer fields. Object will (you'd expect) be larger on a 64-bit Java.

Double and Integer, which extend Number, which extends Object, do not have an O(n) behavior, that is, an Integer is not 3X the size of an Object, so I think the answer is O(1). e.g. see this old SO question

in theory, the overhead directly attached to the in-memory representation of any Java object should be O(1) for inheritance depth N. Does anyone know if it's true in practice?
It can't be O(1) unless there are zero instance members at every level. Every instance member requires space per instance.

Related

Reduce visibility of classes and methods

TL;DR: Given bytecode, how can I find out what classes and what methods get used in a given method?
In my code, I'd like to programmatically find all classes and methods having too generous access qualifiers. This should be done based on an analysis of inheritance, static usage and also hints I provide (e.g., using some home-brew annotation like #KeepPublic). As a special case, unused classes and methods will get found.
I just did something similar though much simpler, namely adding the final keyword to all classes where it makes sense (i.e., it's allowed and the class won't get proxied by e.g., Hibernate). I did it in the form of a test, which knows about classes to be ignored (e.g., entities) and complains about all needlessly non-final classes.
For all classes of mine, I want to find all methods and classes it uses. Concerning classes, there's this answer using ASM's Remapper. Concerning methods, I've found an answer proposing instrumentation, which isn't what I want just now. I'm also not looking for a tool like ucdetector which works with Eclipse AST. How can I inspect method bodies based on bytecode? I'd like to do it myself so I can programmatically eliminate unwanted warnings (which are plentiful with ucdetector when using Lombok).
Looking at the usage on a per-method basis, i.e. by analyzing all instructions, has some pitfalls. Besides method invocations, there might be method references, which will be encoded using an invokedynamic instruction, having a handle to the target method in its bsm arguments. If the byte code hasn’t been generated from ordinary Java code (or stems from a future version), you have to be prepared to possibly encounter ldc instructions pointing to a handle which would yield a MethodHandle at runtime.
Since you already mentioned “analysis of inheritance”, I just want to point out the corner cases, i.e. for
package foo;
class A {
public void method() {}
}
class B implements bar.If {
}
package bar;
public interface If {
void method();
}
it’s easy to overlook that A.method() has to stay public.
If you stay conservative, i.e. when you can’t find out whether B instances will ever end up as targets of the If.method() invocations at other places in your application, you have to assume that it is possible, you won’t find much to optimize. I think that you need at least inlining of bridge methods and the synthetic inner/outer class accessors to identify unused members across inheritance relationships.
When it comes class references, there are indeed even more possibilities, to make a per-instruction analysis error prone. They may not only occur as owner of member access instructions, but also for new, checkcast, instanceof and array specific instructions, annotations, exception handlers and, even worse, within signatures which may occur at member references, annotations, local variable debugging hints, etc. The ldc instruction may refer to classes, producing a Class instance, which is actually used in ordinary Java code, e.g. for class literals, but as said, there’s also the theoretical possibility to produce MethodHandles which may refer to an owner class, but also have a signature bearing parameter types and a return type, or to produce a MethodType representing a signature.
You are better off analyzing the constant pool, however, that’s not offered by ASM. To be precise, a ClassReader has methods to access the pool, but they are actually not intended to be used by client code (as their documentation states). Even there, you have to be aware of pitfalls. Basically, the contents of a CONSTANT_Utf8_info bears a class or signature reference if a CONSTANT_Class_info resp. the descriptor index of a CONSTANT_NameAndType_info or a CONSTANT_MethodType_info points to it. However, declared members of a class have direct references to CONSTANT_Utf8_info pool entries to describe their signatures, see Methods and Fields. Likewise, annotations don’t follow the pattern and have direct references to CONSTANT_Utf8_info entries of the pool assigning a type or signature semantic to it, see enum_const_value and class_info_index…

Can the JVM JIT specialize non-overridden methods in sub-classes?

Well, that title along can't get the idea across but basically what I mean is that given some method m() in a class Base, which is not overridden in some subclass Derived, are the JIT compilers in current JVMs1 capable of "specializing"0 m() anyway when it makes sense, or will derived who inherit and don't override Base.m() share the same compiled code?
This specialization makes sense, where the derived class defines something that makes m() much simpler. For example and for the purposes of discussion, let's say m() calls another member function n() and in the derived class n() is defined such that when n() is inlined into m() the latter is greatly simplified.
To be concrete, consider following the two non-abstract methods in the following class (which are both m()-type methods, while the abstract methods are the corresponding n() methods):
public class Base {
abstract int divisor();
abstract boolean isSomethingEnabled();
int divide(int p) {
return p / divisor();
}
Object doSomething() {
if (isSomethingEnabled()) {
return slowFunction();
} else {
return null;
}
}
Both rely on abstract methods. Lets say you now have a Derived like this:
public class Derived extends Base {
final int divisor() {
return 2;
}
final boolean isSomethingEnabled() {
return false;
}
}
Now the effective behavior of the divide() and doSomething() methods are very simply, the divide is not a full division by an arbitrary number, but a simply halving that can be done with bit-operations. The doSomething() method always returns false. I assume that when the JIT goes to compile divide() or doSomething() if Derived is the only subclass, all is good: there exists (currently) only one possible implementation for the two abstract calls, and CHA will kick in and inline the only possible implementations and all is good.
In the more general case that other derived classes exist, however, it isn't clear to me if the JVM will only compile one2 version of the methods in Base with an invokevirtual call to the abstract methods, or if it is smart enough to say, "Hey, even though Derived doesn't override divisor() I should compile a version specifically for it 'cause it's going to be much simpler".
Of course, even without specialized recompilation aggressive inlining often makes it work out fine anyway (i.e., when you call divide() on a class that is known or even just likely to be a Derived, inlining is likely to give you the good implementation anyway, but, equally, there are plenty of cases where such inlining isn't done.
0 My specializing I don't mean anything specific beyond compiling another version of the function appropriate in some restricted domain, in the same sense that say inlining is a form of specialization to a specific call site, or in the same way that most functions are somewhat specialized to the current context (e.g., loaded classes, assumptions about nullness, etc).
1In particular, when one says "Can the JVM blah, blah?" one is usually talking about Hotspot, and I'm also mostly in Hotspot but also whether any other JVM can do this too.
2OK sure, you might have several version of a function, for on-stack-replacement, for different compiler levels, when deoptimization occurs, etc...
HotSpot JVM has at most one current, entrant version of compiled method. This is obvious from one-to-one relationship between Method and nmethod entities in the source code. However, there can be multiple non-entrant previous versions (e.g. nmethods compiled at lower tier and OSR stubs).
This single compiled version is often optimized for the most common case basing on run-time profiling. For example, when during profiling of Base.doSomething() JIT sees that isSomethingEnabled() is always invoked on Derived instance (even if there are more subclasses), it will optimize the call for the fast case, leaving an uncommon trap for a slow one. After this optimization doSomething() will look like
if (this.getClass() != Derived.class) {
uncommon_trap(); // this causes deoptimization
}
return false;
Profile data is collected separately for each branch and for each call site. This makes possible to optimize (specialize) a part of a method for one receiver, and the other part for a different receiver.
If two different receivers were detected during profiling, JIT can inline both callees guarded by a type check.
A virtual call with more than two receivers will be compiled using vtable lookup.
To see the method profile data use -XX:+PrintMethodData option available in debug builds of JVM.
No, my understanding is that the JVM would not specialize a method on its own but rather optimize the base class function if it finds during profile optimization that divisor() often resolves to a certain method.
Have you tried to print from diagnostics to see what happens?
Rather than trying to guess what the JIT is doing, you can take a peek
at what’s happening by turning on java command line flags:
-XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining (from Java JIT compiler inlining)
According to the OpenJDK Wiki:
Methods are often inlined. This increases the compiler's "horizon" of optimization.
Static, private, final, and/or "special" invocations are easy to inline.
Virtual (and interface) invocations are often demoted to "special" invocations, if the class hierarchy permits it. A dependency is registered in case further class loading spoils things.
Virtual (and interface) invocations with a lopsided type profile are compiled with an optimistic check in favor of the historically common type (or two types).
That is, for the two most frequent receiver types, the derived methods would be inlined into their caller (if small enough, which should be the case here), and unreachable branches pruned.
Also, if the base method is small enough for inlining into its caller, it will be optimized for that caller's two most frequent receiver types.
That is, the Hotspot JVM specializes code if it is small enough for inlining for the two most frequent receiver types of that call site.
The JVM doesn't define nor redefine types. It interprets running implementations of behaviors. It's the compiler, i.e., the source language that deals in types. The JVM is the low level, the "metal" of the Java universe. Types and their instances are instructions to create a series of observable events influenced by inputs. That series of inputs and observable events over time constitute what computer scientists call the "semantics" of a program.
It's up to the JVM to figure out ways to carry out those instructions while preserving the semantics. It will, at times, destroy a class structure entirely, in effect. Conceptually a class instance lives in heap memory with labeled attributes. For some period of time, until the semantics prohibit it because of some state change, the JVM might keep two active values in registers, not even in RAM, and ignore the whole rest of the defined class. Is that "specializing" the method?
No, it is not. There is no new definition, no new set of Java-level instructions, no ephemeral type in the JVM. There's just a temporary, compiled and optimized way in the moment to fulfill the instructions. When the optimization no longer works, or matters as much, the JVM even reverts to interpreting bytecode. And bytecode won't contain any new types, either. It's an assembly language, and redefining what the high-level code demands is above its pay grade.
Bottom line, the only types in the program are those mandated in the source code, not the bytecode or JVM. All the semantics come from the source.

State of Object, which don't has any attribute

We all know state of an Object is value of it's attributes (instance variables), but if class doesn't has any attribute (no inherited attributes), what would be the state of an Object of such class.
There is a word for such objects - stateless.
There is no such thing as a Java class without a parent class. The default parent would be used, e.g. java.lang.Object.
At a minimum every instance of a class has two attributes: a reference address and a Class type. Note, not every class can be instantiated. There is also some space used in the ClassLoader and any String(s) may (or may not) be interned. This actual implementation might vary slightly on the specific version of the JDK and run-time platform, and additional optimizations can be added by the JIT. However, as a Java developer you are not responsible for this memory management and I would be wary of premature optimization.
first thing
any class we write in java will extend Object class by default if there is no extends written by the developer.
so each and every class will definitely have a parent with no doubt atleast Object class.
second
if you dont put any attributes in your class , obviously it will get all the instance variables except private gets inherited to your class.
so it will have atleast object state but it will not serve any purpose
An object with no data members and links to other objects is a stateless object and in this form can hardly be of any use.
This kind of classes can nevertheless be usefull, because of its methods. It can be...
a base for a further inheritance. It declares/defines some methods, that could be inherited by derived classes. This class will probably be an abstract class, having no objects at all (although not a condition)
a service class. It can define some methods, which in nature do not belong to concrete objects but are used by other objects. Like some all-purpose mathematical operations, a service that returns a current time or similar. These methods can be static, so again no instances are needed.
We call those object stateless. As the name suggests, they have no state.
Referring to other answers/comments, even though every Java object implicitly extends Object, mind that Object has no fields. So even though every object has a runtime address and class attributes, for all practical purposes you can still consider some objects stateless.
Next, it is definitely not true that stateless objects serve no purpose! You can use stateless object for:
1) Grouping functions with similar functionality, similar to java.lang.Math, which groups mathematical functions.
2) Passing functionality as a parameter, e.g. Comparator<T> can be used to sort objects that do not implement Comparable<T>, and it definitely needs no state.
Stateless objects are somehow similar to immutable objects: their state can never be changed and therefore they are always thread-safe.
You may also want to see JEE Stateless Session Beans which differentiate between a converstional state and an instance state.

What is the main difference in object creation between Java and C++?

I'm preparing for an exam in Java and one of the questions which was on a previous exam was:"What is the main difference in object creation between Java and C++?"
I think I know the basics of object creation like for example how constructors are called and what initialization blocks do in Java and what happens when constructor of one class calls a method of another class which isn't constructed yet and so on, but I can't find anything obvious. The answer is supposed to be one or two sentences, so I don't think that description of whole object creation process in Java is what they had in mind.
Any ideas?
What is the main difference in object creation between Java and C++?
Unlike Java, in C++ objects can also be created on the stack.
For example in C++ you can write
Class obj; //object created on the stack
In Java you can write
Class obj; //obj is just a reference(not an object)
obj = new Class();// obj refers to the object
In addition to other excellent answers, one thing very important, and usually ignored/forgotten, or misunderstood (which explains why I detail the process below):
In Java, methods are virtual, even when called from the constructor (which could lead to bugs)
In C++, virtual methods are not virtual when called from the constructor (which could lead to misunderstanding)
What?
Let's imagine a Base class, with a virtual method foo().
Let's imagine a Derived class, inheriting from Base, who overrides the method foo()
The difference between C++ and Java is:
In Java, calling foo() from the Base class constructor will call Derived.foo()
In C++, calling foo() from the Base class constructor will call Base.foo()
Why?
The "bugs" for each languages are different:
In Java, calling any method in the constructor could lead to subtle bugs, as the overridden virtual method could try to access a variable which was declared/initialized in the Derived class.
Conceptually, the constructor’s job is to bring the object into existence (which is hardly an ordinary feat). Inside any constructor, the entire object might be only partially formed – you can know only that the base-class objects have been initialized, but you cannot know which classes are inherited from you. A dynamically-bound method call, however, reaches “forward” or “outward” into the inheritance hierarchy. It calls a method in a derived class. If you do this inside a constructor, you call a method that might manipulate members that haven’t been initialized yet – a sure recipe for disaster.
Bruce Eckel, http://www.codeguru.com/java/tij/tij0082.shtml
In C++, one must remember a virtual won't work as expected, as only the method of the current constructed class will be called. The reason is to avoid accessing data members or even methods that do not exist yet.
During base class construction, virtual functions never go down into derived classes. Instead, the object behaves as if it were of the base type. Informally speaking, during base class construction, virtual functions aren't.
Scott Meyers, http://www.artima.com/cppsource/nevercall.html
Besides heap/stack issues I'd say: C++ constructors have initialization lists while Java uses assignment. See http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.6 for details.
I would answer: C++ allows creating an object everywhere: on the heap, stack, member. Java forces you allocate objects on the heap, always.
In Java, the Java Virtual Machine (JVM) that executes Java code has to might1 log all objects being created (or references to them to be exact) so that the memory allocated for them can later be freed automatically by garbage collection when objects are not referenced any more.
EDIT: I'm not sure whether this can be attributed to object creation in the strict sense but it surely happens somewhen between creation and assignment to a variable, even without an explicit assignment (when you create an object without assigning it, the JVM has to auto-release it some time after that as there are no more references).
In C++, only objects created on the stack are released automatically (when they get out of scope) unless you use some mechanism that handles this for you.
1: Depending on the JVM's implementation.
There is one main design difference between constructors in C++ and Java. Other differences follow from this design decision.
The main difference is that the JVM first initializes all members to zero, before starting to execute any constructor. In C++, member initialization is part of the constructor.
The result is that during execution of a base class constructor, in C++ the members of the derived class haven't been initialized yet! In Java, they have been zero-initialized.
Hence the rule, which is explained in paercebal's answer, that virtual calls called from a constructor cannot descend into a derived class. Otherwise uninitialized members could be accessed.
Assuming that c++ uses alloc() when the new call is made, then that might be what they are
looking for. (I do not know C++, so here I can be very wrong)
Java's memory model allocates a chunk of memory when it needs it, and for each new it uses of
this pre-allocated area. This means that a new in java is just setting a pointer to a
memory segment and moving the free pointer while a new in C++ (granted it uses malloc in the background)
will result in a system call.
This makes objects cheaper to create in Java than languages using malloc;
at least when there is no initialization ocuring.
In short - creating objects in Java is cheap - don't worry about it unless you create loads of them.

Does having more methods in a class mean that object uses more memory at runtime

Say I have one class ClassBig with 100 methods inside, and a second with only 10 methods ClassSmall
When I have objects at runtime
ClassBig big = new ClassBig();
ClassSmall small = new ClassSmall();
Does the larger class take up more memory space?
If both classes contained an identical method, does the larger class take longer to execute it?
The in-memory representation of an instance of a class is mainly just its internal state plus a pointer to an in-memory representation of the class itself. The internal representation of an instance method has one more argument than you specified in the class definition - the implicit this reference. This is how we can store only one copy of the instance method, rather than a new copy for every instance.
So a class with more methods will take up more memory than a class with less methods (the code has to go somewhere), but an instance of a class with more methods will use the same amount of memory, assuming the classes have the same data members.
Execution time will not be affected by the number of other methods in the class.

Categories

Resources