My question is when how does the class info gets loaded during runtime?
When someone calls instanceof is that considered RTTI or reflection? Or it depends on the actual situation?
The term "RTTI" is a C++-specific term referring to the functionality of the core language that allows the program to determine the dynamic types of various objects at runtime. It usually refers to the dynamic_cast or typeid operators, along with the associated std::type_info object produced by typeid.
The term reflection, on the other hand, is a generic term used across programming languages to refer to the ability of a program to inspect and modify its objects, types, etc. at runtime.
The term I've heard applied to instanceof is type introspection and instanceof is sometimes referred to as object introspection, as the program is allowed to look at the running types to determine what course of action to take. I think this is a weaker term than reflection, as it doesn't allow for elaborate introspection on the fields or methods of an object, but I don't think it would be technically incorrect to call the use of the instanceof operator reflection.
As to your other question - how does class information get loaded at runtime? - that's really up to the JVM implementation. The ClassLoader type is ultimately responsible for loading classes into the system, but the JVM can interpret this however it wants to. I once built a prototype JVM in JavaScript, and internally all reflection calls just queried the underlying JS data structures I had in place to represent classes, fields, and methods. I would imagine that the HotSpot JVM does something totally different, but it's pretty much implementation-defined.
Hope this helps!
In short, the true difference between RTTI and reflection is that with RTTI, the compiler opens and examines the .class file at compile time. With reflection, the .class file is unavailable at compile time; it is opened and examined by the runtime environment.
Related
Java's Dynamic Proxy Docs describe these constructors as the following:
A dynamic proxy class is a class that implements a list of interfaces specified at runtime such that a method invocation through one of the interfaces on an instance of the class will be encoded and dispatched to another object through a uniform interface. Thus, a dynamic proxy class can be used to create a type-safe proxy object for a list of interfaces without requiring pre-generation of the proxy class.
Now, while everything in this sentence is accurate. All of this information is in fact present at compile time. In Java, when you are creating a proxy, you specify the exact interface you want to proxy in your code.
Now the first thing that really confuses me, is why the byte code here needs to be generated at run time? All of this information is present at the compile time... (and you don't even have to deal with type erasure)
P.S: I am not sure if this is still the case, basing this on a quite dated accepted answer here: How does Java's Dynamic Proxy actually work?)
The next step in getting this work is the type checking/type inference. I am not sure how Java actually handles this, but you need to be able to use Proxy<A> interchangeably with A. In order to pull this of you need the following:
∀ method m ∈ A, m ∈ Proxy<A>
Which means that
you need your proxy to have the same structure
you need some kind of delegation (i.e. dynamic dispatch).
Once you start writing out the inference rules, this gives us something very familiar. Structural Typing
Now Java doesn't have structural typing but one could easily add the few inference rules (i.e. Typescript) especially given Java has boxed primitives (Scala for example doesn't which makes it very difficult to introduce structural inference).
The Actual Question
Reflection is hard, not safe and not super performant. My question is why and how Java's proxies use reflection? It seems like most of this functionality could be implemented using other features already present in the language.
I've heard that in Kotlin creation of the new objects is cheap. How is Kotlin memory aspect of object creation is different from Java? Is there difference in cost of creating objects from the data class and class?
I think you mean Kotlin targeting JVM, so I will tell you about this target.
Kotlin uses the same bytecode as Java, so performance in general is the same (some operations can be less or more optimized in Kotlin(thanks to compiler or stdlib) in compare to Java).
Data classes are just normal classes with additionally generated toString(), equals(), hashCode() and clone() methods, so they have the same performance as normal classes.
I would expect no difference if Java and Kotlin are both compiled for same target VM - it should make no difference which source code produces the same bytecode.
As for data class, Hiosdra correctly pointed out that this is just a syntax sugar to tell the compiler to derive some standard methods useful for data-holding classes (see the documentation).
Since every class in java is a subclass of the Object, and variables in java are not objects themselves but instead are object references, why does java make type specification compulsory, when the Object type could be made implicit? The only time it seems necessary is when using the simple data types.
If a variable is of type Object, the compiler will not let you use the variable as any other type (unless you cast it).
This is called type safety.
For example:
Object str = "abc";
s.toUpppercase(); //Compiler error
Well. Java does it because.. this is how the language was defined.
this boils down to what was considered good practice when the language was designed (nearly 20 years ago), and also with complier ease of development.
Scala, a language which is closely related to Java (runs on the same JVM), does not require explicit type identifiers in most cases.
the downside is the scala compiler is much slower (for this among other reasons).
The answer is probably Java is an object oriented language used in large projects and created by reasonable company. Having such strong typing decrease amount of potential bugs, that you are able to remove on compilation level.
btw.
In the .NET product you have such thing as var x. But it can be used only locally in method body. But this is only a compiler sugar for developers. So Java is only a example of strong typed language.
Java is a strongly typed language. The types are needed to be able to compile code and validate types compatibility at compile time.
Weakly typed languages do not have types. For example when you say var x in JavaScript you just define variable. Then value of any type may be assigned there. This means that if for example your code has bug and you assign string to this variable and then try to divide this variable by 2 (y = x / 2) the script will just fail at runtime. Java will not allow you to compile such code.
There is the principle: the bug costs x during development, x*10 during QA and x*100 if it arrives to production. Compiler and strongly typed languages allow to decrease number of (stupid) bugs that arrive to QA and therefore make software development easier, faster and cheaper.
Some classes in the standard Java API are treated slightly different from other classes. I'm talking about those classes that couldn't be implemented without special support from the compiler and/or JVM.
The ones I come up with right away are:
Object (obviously) as it, among other things doesn't have a super class.
String as the language has special support for the + operator.
Thread since it has this magical start() method despite the fact that there is no bytecode instruction that "forks" the execution.
I suppose all classes like these are in one way or another mentioned in the JLS. Correct me if I'm wrong.
Anyway, what other such classes exist? Is there any complete list of "glorified classes" in the Java language?
There are a lot of different answers, so I thought it would be useful to collect them all (and add some):
Classes
AutoBoxing classes - the compiler only allows for specific classes
Class - has its own literals (int.class for instance). I would also add its generic typing without creating new instances.
String - with it's overloaded +-operator and the support of literals
Enum - the only class that can be used in a switch statement (soon a privilege to be given to String as well). It does other things as well (automatic static method creation, serialization handling, etc.), but those could theoretically be accomplished with code - it is just a lot of boilerplate, and some of the constraints could not be enforced in subclasses (e.g. the special subclassing rules) but what you could never accomplish without the priviledged status of an enum is include it in a switch statement.
Object - the root of all objects (and I would add its clone and finalize methods are not something you could implement)
References: WeakReference, SoftReference, PhantomReference
Thread - the language doesn't give you a specific instruction to start a thread, rather it magically applies it to the start() method.
Throwable - the root of all classes that can work with throw, throws and catch, as well as the compiler understanding of Exception vs. RuntimeException and Error.
NullPointerException and other exceptions such as ArrayIndexOutOfBounds which can be thrown by other bytecode instructions than athrow.
Interfaces
Iterable - the only interface that can be used in an enhanced for loop
Honorable mentions goes to:
java.lang.reflect.Array - creating a new array as defined by a Class object would not be possible.
Annotations They are a special language feature that behaves like an interface at runtime. You certainly couldn't define another Annotation interface, just like you can't define a replacement for Object. However, you could implement all of their functionality and just have another way to retrieve them (and a whole bunch of boilerplate) rather than reflection. In fact, there were many XML based and javadoc tag based implementations before annotations were introduced.
ClassLoader - it certainly has a privileged relationship with the JVM as there is no language way to load a class, although there is a bytecode way, so it is like Array in that way. It also has the special privilege of being called back by the JVM, although that is an implementation detail.
Serializable - you could implement the functionality via reflection, but it has its own privileged keyword and you would spend a lot of time getting intimate with the SecurityManager in some scenarios.
Note: I left out of the list things that provide JNI (such as IO) because you could always implement your own JNI call if you were so inclined. However, native calls that interact with the JVM in privileged ways are different.
Arrays are debatable - they inherit Object, have an understood hierarchy (Object[] is a supertype of String[]), but they are a language feature, not a defined class on its own.
Class, of course. It has its own literals (a distinction it shares with String, BTW) and is the starting point of all that reflection magic.
sun.misc.unsafe is the mother of all dirty, spirit-of-the-language-breaking hacks.
Enum. You're not allowed to subclass it, but the compiler can.
Many things under java.util.concurrent can be implemented without JVM support, but they would be a lot less efficient.
All of the Number classes have a little bit of magic in the form of Autoboxing.
Since the important classes were mentioned, I'll mention some interfaces:
The Iterable interface (since 1.5) - it allows an object to participate in a foreach loop:
Iterable<Foo> iterable = ...;
for (Foo foo : iterable) {
}
The Serializable interface has a very special meaning, different from a standard interface. You can define methods that will be taken into account even though they are not defined in the interface (like readResolve()). The transient keyword is the language element that affects the behaviour of Serializable implementors.
Throwable, RuntimeException, Error
AssertionError
References WeakReference, SoftReference, PhantomReference
Enum
Annotation
Java array as in int[].class
java.lang.ClassLoader, though the actual dirty work is done by some unmentioned subclass (see 12.2.1 The Loading Process).
Not sure about this. But I cannot think of a way to manually implement IO objects.
There is some magic in the System class.
System.arraycopy is a hook into native code
public static native void arraycopy(Object array1, int start1,
Object array2, int start2, int length);
but...
/**
* Private version of the arraycopy method used by the jit
* for reference arraycopies
*/
private static void arraycopy(Object[] A1, int offset1,
Object[] A2, int offset2, int length) {
...
}
Well since the special handling of assert has been mentioned. Here are some more Exception types which have special treatment by the jvm:
NullPointerException
ArithmeticException.
StackOverflowException
All kinds of OutOfMemoryErrors
...
The exceptions are not special, but the jvm uses them in special cases, so you can't implement them yourself without writing your own jvm. I'm sure that there are more special exceptions around.
Most of those classes isn't really implemented with 'special' help from the compiler or JVM. Object does register some natives which poke around the internal JVM structures, but you can do that for your own classes as well. (I admit this is subject to semantics, "calls a native defined in the JVM" can be considered as special JVM support.)
What /is/ special is the behaviour of the 'new', and 'throw' instructions in how they initialise these internal structures.
Annotations and numbers are pretty much all-out freaky though.
This question is inspired from Joel's "Making Wrong Code Look Wrong"
http://www.joelonsoftware.com/articles/Wrong.html
Sometimes you can use types to enforce semantics on objects beyond their interfaces. For example, the Java interface Serializable does not actually define methods, but the fact that an object implements Serializable says something about how it should be used.
Can we have UnsafeString and SafeString interfaces/subclasses in, say Java, that are used in much of the same way as Joel's Hungarian notation and Java's Serializable so that it doesn't just look bad--it doesn't compile?
Is this feasible in Java/C/C++ or are the type systems too weak or too dynamic?
Also, beyond input sanitization, what other security functions can be implemented in this manner?
The type system already enforces a huge number of such safety features. That is essentially what it's for.
For a very simple example, it prevents you from treating a float as an int. That's one aspect of safety -- it guarantees that the type you're working on are going to behave as expected. It guarantees that only string methods are called on a string. Assembly doesn't have that safeguard, for example.
It's also the job of the type system to ensure that you don't call private functions on a class. That's another safety feature.
Java's type system is too anemic to enforce a lot of interesting constraints effectively, but in many other languages (including C++), the type system can be used to enforce far more wide-ranging rules.
In C++, template metaprogramming gives you a lot of tools for prohibiting "bad" code. For example:
class myclass : boost::noncopyable {
...
};
enforces at compile-time that the class can not be copied. The following will produce compile errors:
myclass m;
myclass m2(m); // copy construction isn't allowed
myclass m3;
m3 = m; // assignment also not allowed
Likewise, we can ensure at compile-time that a template function only gets called on types which fulfill certain criteria (say, they must be random-access iterators, while bilinear ones aren't allowed, or they must be POD types, or they must not be any kind of integer type (char, short, int, long), but all other types should be legal.
A textbook example of template metaprogramming in C++ implements a library for computing physical units. It allows you to multiply a value of type "meter" with another value of the same type, and automatically determines that the result must be of type "square meter". Or divide a value of type "mile" with a value of type "hour" and get a unit of type "miles per hour".
Again, a safety feature that prevents you from getting your types mixed up and accidentally getting your units mixed up. You'll get a compile error if you compute a value and try to assign it to the wrong type. trying to divide, say, liters by meters^2 and assigning the result to a value of, say, kilograms, will result in a compile error.
Most of this requires some manual work to set up, certainly, but the language gives you the tools you need to basically build the type-checks you want. Some of this could be better supported directly in the language, but the more creative checks would have to be implemented manually in any case.
Yes you can do such thing. I don't know about Java, but in C++ it isn't customary and there is no support for this, so you have to do some manual work. It is customary in some other languages, Ada for example, which have the equivalent of a typedef which introduces a new type which can't be converted implicitly into the orignal one (this new type "inherits" some basic operations from the one it is created, so it stays usefull).
BTW, in general inheritance isn't a good way to introduce the new types, as even if there is no implicit conversion in one way, there is one in the other one.
You can do a certian amount of this out of the box in Ada. For example, you can make integer types that cannot implcitily interoperate with each other, and Ada enumerations are not compatible with any integer type. You can still convert between them, but you have to explicitly do it, which calls attention to what you are doing.
You could do the same with present-day C++, but you'd have to wrap all your integers and enums in classes, which is just way too much work for something that should be simple (or better yet, the default way of doing things).
I understand the next version of C++ is going to fix at least the enumeration issue.
In C++, I suppose you could use typedef to create a synonym for a primitive type. Your synonym could imply something about the content of that variable, replacing the function of the apps hungarian notation.
Intellisense will report the synonym you used during declaration, so if you don't like using actual hungarian, it does save you from scrolling about (or using Go To Definition).
I guess you are thinking of something along the lines of Perl's "tainting" analysis.
In Java, it should be possible to use custom annotations and an annotation processor to implement this. Not necessarily easy though.
You can't have a UnsafeString subclass of String in Java, since java.lang.String is final.
In general, you cannot provide any kind of security on the source level - if you want to protect against evil code, you must do that on the binary level (e.g. Java bytecode). That's why private/protected can't be used as a security mechanism in C++: it is possible to bypass that with pointer manipulations.