Casting and Generics, Any performance difference? - java

I am coding in Android a lot lately, Though I am comfortable in JAVA, but missing some
ideas about core concepts being used there.
I am interested to know whether any performance difference is there between these 2 codes.
First Method:
//Specified as member variable.
ArrayList <String> myList = new ArrayList <String>();
and using as String temp = myList.get(1);
2nd Method:
ArrayList myList = new ArrayList(); //Specified as member variable.
and using
String temp1 = myList.get(1).toString();
I know its about casting. Does the first method has great advantage over the second,
Most of the time in real coding I have to use second method because arraylist can take different data types, I end up specifying
ArrayList <Object> = new ArrayList <Object>();
or more generic way.

In short, there's no performance difference worth worrying about, if it exists at all. Generic information isn't stored at runtime anyway, so there's not really anything else happening to slow things down - and as pointed out by other answers it may even be faster (though even if it hypothetically were slightly slower, I'd still advocate using generics.) It's probably good to get into the habit of not thinking about performance so much on this level. Readability and code quality are generally much more important than micro-optimisations!
In short, generics would be the preferred option since they guarantee type safety and make your code cleaner to read.
In terms of the fact you're storing completely different object types (i.e. not related from some inheritance hierarchy you're using) in an arraylist, that's almost definitely a flaw with your design! I can count the times I've done this on one hand, and it was always a temporary bodge.

Generics aren't reified, which means they go away at runtime. Using generics is preferred for several reasons:
It makes your code clearer, as to which classes are interacting
It keeps it type safe: you can't accidentally add a List to a List
It's faster: casting requires the JVM to test type castability at runtime, in case it needs to throw a ClassCastException. With Generics, the compiler knows what types things must be, and so it doesn't need to check them.

There is a performance difference in that code:
The second method is actually slower.
The reason why:
Generics don't require casting/conversion (your code uses a conversion method, not a cast), the type is already correct. So when you call the toString() method, it is an extra call with extra operations that are unnecessary when using the method with generics.
There wouldn't be a problem with casting, as you are using the toString() method. But you could accidentally add an incorrect object (such as an array of Strings). The toString() method would work properly and not throw an exception, but you would get odd results.

As android is used for Mobiles and handheld devices where resources are limited you have to be careful using while coding.
Casting can be overhead if you are using String data type to store in ArrayList.
So in my opinion you should use first method of being specific.

There is no runtime performance difference because of "type erasure".
But if you are using Java 1.5 or above, you SHOULD use generics and not the weakly typed counterparts.
Advantages of generics --
* The flexibility of dynamic binding, with the advantage of static type-checking. Compiler-detected errors are less expensive to repair than those detected at runtime.
* There is less ambiguity between containers, so code reviews are simpler.
* Using fewer casts makes code cleaner.

Related

Why Wrapper class like Boolean in java is immutable?

I can't see the reason why the Boolean wrapper classes were made Immutable.
Why the Boolean Wrapper was not implemented like MutableBoolean in Commons lang which actually can be reset.
Does anyone have any idea/understanding about this ? Thanks.
Because 2 is 2. It won't be 3 tomorrow.
Immutable is always preferred as the default, especially in multithreaded situations, and it makes for easier to read and more maintainable code. Case in point: the Java Date API, which is riddled with design flaws. If Date were immutable the API would be very streamlined. I would know Date operations would create new dates and would never have to look for APIs that modify them.
Read Concurrency in Practice to understand the true importance of immutable types.
But also note that if for some reason you want mutable types, use AtomicInteger AtomicBoolean, etc. Why Atomic? Because by introducing mutability you introduced a need for threadsafety. Which you wouldn't have needed if your types stayed immutable, so in using mutable types you also must pay the price of thinking about threadsafety and using types from the concurrent package. Welcome to the wonderful world of concurrent programming.
Also, for Boolean - I challenge you to name a single operation that you might want to perform that cares whether Boolean is mutable. set to true? Use myBool = true. That is a re-assignment, not a mutation. Negate? myBool = !myBool. Same rule. Note that immutability is a feature, not a constraint, so if you can offer it, you should - and in these cases, of course you can.
Note this applies to other types as well. The most subtle thing with integers is count++, but that is just count = count + 1, unless you care about getting the value atomically... in which case use the mutable AtomicInteger.
Wrapper classes in Java are immutable so the runtime can have only two Boolean objects - one for true, one for false - and every variable is a reference to one of those two. And since they can never be changed, you know they'll never be pulled out from under you. Not only does this save memory, it makes your code easier to reason about - since the wrapper classes you're passing around you know will never have their value change, they won't suddenly jump to a new value because they're accidentally a reference to the same value elsewhere.
Similarly, Integer has a cache of all signed byte values - -128 to 127 - so the runtime doesn't have to have extra instances of those common Integer values.
Patashu is the closest. Many of the goofy design choices in Java were because of the limitations of how they implemented a VM. I think originally they tried to make a VM for C or C++ but it was too hard (impossible?) so made this other, similar language. Write one, run everywhere!
Any computer sciency justification like those other dudes spout is just after-the-fact folderal. As you now know, Java and C# are evolving to be as powerful as C. Sure, they were cleaner. Ought to be for languages designed decade(s) later!
Simple trick is to make a "holder" class. Or use a closure nowadays! Maybe Java is evolving into JavaScript. LOL.
Boolean or any other wrapper class is immutable in java. Since wrapper classes are used as variables for storing simple data, those should be safe and data integrity must be maintained to avoid inconsistent or unwanted results. Also, immutability saves lots of memory by avoiding duplicate objects. More can be found in article Why Strings & Wrapper classes are designed immutable in java?

In Java, should I use ArrayList<Long> or long[] ?

I am writing a program which accepts 400 numbers of type long and will modify some of them depending on conditions at runtime and I want to know whether to use ArrayList<Long> or long[].
Which will be faster to use? I am thinking of using long[] because size is fixed.
When the size is fixed, long[] is faster, but it allows a less maintainable API, because it does not implement the List interface.
Note a long[] is faster for 2 reasons:
Uses primitive longs and not box object Longs (also enables better cache performace, since the longs are allocated contigously and the Longs aren't guaranteed to)
An array is much simpler and more efficient DS.
Nevertheless, for simpler maintainability - I would have used a List<Long>, unless performace is very critical at this part of the program.
If you use this collection very often in a tight loop - and your profiler says it is indeed a bottleneck - I would then switch to a more efficient long[].
As far as the speed goes, it almost does not matter for a list of 400 items. If you need to grow your list dynamically, ArrayList<Long> is better; if the size is fixed, long[] may be better (and a bit faster, although again, you would probably not notice the difference in most situations).
There are few things which haven't been mentioned in other answers:
A generic collection is in actuality a collection of Objects, or better said, this is what Java compiler will make of it. This is while long[] will always remain what it is.
A consequence of the first bullet point is that if you do something that eventually puts something other then Long into your collection, in certain situations the compiler will let it through (because Java type system is unsound, i.e. as an example, it will allow you to upcast and then re-cast to a completely disparate type).
A more general consequence of these two is that Java generics are half-baked and in some less trivial cases such as reflection, serialization etc. may "surprise" you. It is in fact safer to use plain arrays, then generics.
package tld.example;
import java.util.List;
import java.util.ArrayList;
class Example {
static void testArray(long[] longs) {
System.out.println("testArray");
}
static void testGeneric(List<Long> longs) {
System.out.println("testGeneric");
}
#SuppressWarnings("unchecked")
public static void main(String... arguments) {
List<Long> fakeLongs = new ArrayList<Long>();
List<Object> mischiefManaged = (List<Object>)(Object)fakeLongs;
mischiefManaged.add(new Object());
// this call succeeds and prints the value.
// we could sneak in a wrong type into this function
// and it remained unnoticed
testGeneric(fakeLongs);
long[] realLongs = new long[1];
// this will fail because it is not possible to perform this cast
// despite the compiler thinks it is.
Object[] forgedLongs = (Object[])(Object)realLongs;
forgedLongs[0] = new Object();
testArray(realLongs);
}
}
This example is a bit contrived because it is difficult to come up with a short convincing example, but trust me, in less trivial cases, when you have to use reflection and unsafe casts this is quite a possibility.
Now, you have to consider that besides what is reasonable, there is a tradition. Every community has a set of it's customs and tradition. There are a lot of superficial beliefs, such as those voiced here eg. when someone claims that implementing List API is an unconditional goodness, and if this does not happen, then it must be bad... This is not just a dominating opinion, this is what overwhelming majority of Java programmers believe in. After all, it doesn't matter that much, and Java, as a language has a lot more of other shortcomings... so, if you want to secure your job interview or simply avoid conflicts with other Java programmers, then use Java generics, no matter the reason. But if you don't like it - well, perhaps just use some other language ;)
long[] is both much faster and takes much less memory. Read "Effective Java" Item 49: "Prefer primitive types to boxed primitives"
I think better to use ArrayList because it is much much easier to maintain in long runs. In future if the size of your array gets increased beyond 400 then maintaining a long[] has lot of overhead whereas ArrayList grows dynamically so you don't need to worry about increased size.
Also, deletion of element is handled in much better way by ArrayList than static arrays (long[]) as they automatically reorganize the elements so that they still appear as ordered elements.
Static arrays are worse at this.
Why not use one of the many list implementations that work on primitive types? TLongArrayList for instance. It can be used just like Javas List but is based on a long[] array. So you have the advantages of both sides.
HPPC has a short overview of some libraries: https://github.com/carrotsearch/hppc/blob/master/ALTERNATIVES.txt

API Design for Idiot-Proof Iteration Without Generics

When you're designing the API for a code library, you want it to be easy to use well, and hard to use badly. Ideally you want it to be idiot proof.
You might also want to make it compatible with older systems that can't handle generics, like .Net 1.1 and Java 1.4. But you don't want it to be a pain to use from newer code.
I'm wondering about the best way to make things easily iterable in a type-safe way... Remembering that you can't use generics so Java's Iterable<T> is out, as is .Net's IEnumerable<T>.
You want people to be able to use the enhanced for loop in Java (for Item i : items), and the foreach / For Each loop in .Net, and you don't want them to have to do any casting. Basically you want your API to be now-friendly as well as backwards compatible.
The best type-safe option that I can think of is arrays. They're fully backwards compatible and they're easy to iterate in a typesafe way. But arrays aren't ideal because you can't make them immutable. So, when you have an immutable object containing an array that you want people to be able to iterate over, to maintain immutability you have to provide a defensive copy each and every time they access it.
In Java, doing (MyObject[]) myInternalArray.clone(); is super-fast. I'm sure that the equivalent in .Net is super-fast too. If you have like:
class Schedule {
private Appointment[] internalArray;
public Appointment[] appointments() {
return (Appointment[]) internalArray.clone();
}
}
people can do like:
for (Appointment a : schedule.appointments()) {
a.doSomething();
}
and it will be simple, clear, type-safe, and fast.
But they could do something like:
for (int i = 0; i < schedule.appointments().length; i++) {
Appointment a = schedule.appointments()[i];
}
And then it would be horribly inefficient because the entire array of appointments would get cloned twice for every iteration (once for the length test, and once to get the object at the index). Not such a problem if the array is small, but pretty horrible if the array has thousands of items in it. Yuk.
Would anyone actually do that? I'm not sure... I guess that's largely my question here.
You could call the method toAppointmentArray() instead of appointments(), and that would probably make it less likely that anyone would use it the wrong way. But it would also make it harder for people to find when they just want to iterate over the appointments.
You would, of course, document appointments() clearly, to say that it returns a defensive copy. But a lot of people won't read that particular bit of documentation.
Although I'd welcome suggestions, it seems to me that there's no perfect way to make it simple, clear, type-safe, and idiot proof. Have I failed if a minority of people are unwitting cloning arrays thousands of times, or is that an acceptable price to pay for simple, type-safe iteration for the majority?
NB I happen to be designing this library for both Java and .Net, which is why I've tried to make this question applicable to both. And I tagged it language-agnostic because it's an issue that could arise for other languages too. The code samples are in Java, but C# would be similar (albeit with the option of making the Appointments accessor a property).
UPDATE: I did a few quick performance tests to see how much difference this made in Java. I tested:
cloning the array once, and iterating over it using the enhanced for loop
iterating over an ArrayList using
the enhanced for loop
iterating over an unmodifyable
ArrayList (from
Collections.unmodifyableList) using
the enhanced for loop
iterating over the array the bad way (cloning it repeatedly in the length check
and when getting each indexed item).
For 10 objects, the relative speeds (doing multiple repeats and taking the median) were like:
1,000
1,300
1,300
5,000
For 100 objects:
1,300
4,900
6,300
85,500
For 1000 objects:
6,400
51,700
56,200
7,000,300
For 10000 objects:
68,000
445,000
651,000
655,180,000
Rough figures for sure, but enough to convince me of two things:
Cloning, then iterating is definitely
not a performance issue. In fact
it's consistently faster than using a
List. (this is why Java's
enum.values() method returns a
defensive copy of an array instead of
an immutable list.)
If you repeatedly call the method,
repeatedly cloning the array unnecessarily,
performance becomes more and more of an issue the larger the arrays in question. It's pretty horrible. No surprises there.
clone() is fast but not what I would describe as super faster.
If you don't trust people to write loops efficiently, I would not let them write a loop (which also avoids the need for a clone())
interface AppointmentHandler {
public void onAppointment(Appointment appointment);
}
class Schedule {
public void forEachAppointment(AppointmentHandler ah) {
for(Appointment a: internalArray)
ah.onAppointment(a);
}
}
Since you can't really have it both ways, I would suggest that you create a pre generics and a generics version of your API. Ideally, the underlying implementation can be mostly the same, but the fact is, if you want it to be easy to use for anyone using Java 1.5 or later, they will expect the usage of Generics and Iterable and all the newer languange features.
I think the usage of arrays should be non-existent. It does not make for an easy to use API in either case.
NOTE: I have never used C#, but I would expect the same holds true.
As far as failing a minority of the users, those that would call the same method to get the same object on each iteration of the loop would be asking for inefficiency regardless of API design. I think as long as that's well documented, it's not too much to ask that the users obey some semblance of common sense.

Is there any runtime cost for Casting in Java?

Would there be any performance differences between these two chunks?
public void doSomething(Supertype input)
{
Subtype foo = (Subtype)input;
foo.methodA();
foo.methodB();
}
vs.
public void doSomething(Supertype input)
{
((Subtype)input).methodA();
((Subtype)input).methodB();
}
Any other considerations or recommendations between these two?
Well, the compiled code probably includes the cast twice in the second case - so in theory it's doing the same work twice. However, it's very possible that a smart JIT will work out that you're doing the same cast on the same value, so it can cache the result. But it is having to do work at least once - after all, it needs to make a decision as to whether to allow the cast to succeed, or throw an exception.
As ever, you should test and profile your code if you care about the performance - but I'd personally use the first form anyway, just because it looks more readable to me.
Yes. Checks must be done with each cast along with the actual mechanism of casting, so casting multiple times will cost more than casting once. However, that's the type of thing that the compiler would likely optimize away. It can clearly see that input hasn't changed its type since the last cast and should be able to avoid multiple casts - or at least avoid some of the casting checks.
In any case, if you're really that worried about efficiency, I'd wonder whether Java is the language that you should be using.
Personally, I'd say to use the first one. Not only is it more readable, but it makes it easier to change the type later. You'll only have to change it in one place instead of every time that you call a function on that variable.
I agree with Jon's comment, do it once, but for what it's worth in the general question of "is casting expensive", from what I remember: Java 1.4 improved this noticeably with Java 5 making casts extremely inexpensive. Unless you are writing a game engine, I don't know if it's something to fret about anymore. I'd worry more about auto-boxing/unboxing and hidden object creation instead.
Acording to this article, there is a cost associated with casting.
Please note that the article is from 1999 and it is up to the reader to decide if the information is still trustworthy!
In the first case :
Subtype foo = (Subtype)input;
it is determined at compile time, so no cost at runtime.
In the second case :
((Subtype)input).methodA();
it is determined at run time because compiler will not know. The jvm has to check if it can converted to a reference of Subtype and if not throw ClassCastException etc. So there will be some cost.

Does Java casting introduce overhead? Why?

Is there any overhead when we cast objects of one type to another? Or the compiler just resolves everything and there is no cost at run time?
Is this a general things, or there are different cases?
For example, suppose we have an array of Object[], where each element might have a different type. But we always know for sure that, say, element 0 is a Double, element 1 is a String. (I know this is a wrong design, but let's just assume I had to do this.)
Is Java's type information still kept around at run time? Or everything is forgotten after compilation, and if we do (Double)elements[0], we'll just follow the pointer and interpret those 8 bytes as a double, whatever that is?
I'm very unclear about how types are done in Java. If you have any reccommendation on books or article then thanks, too.
There are 2 types of casting:
Implicit casting, when you cast from a type to a wider type, which is done automatically and there is no overhead:
String s = "Cast";
Object o = s; // implicit casting
Explicit casting, when you go from a wider type to a more narrow one. For this case, you must explicitly use casting like that:
Object o = someObject;
String s = (String) o; // explicit casting
In this second case, there is overhead in runtime, because the two types must be checked and in case that casting is not feasible, JVM must throw a ClassCastException.
Taken from JavaWorld: The cost of casting
Casting is used to convert between
types -- between reference types in
particular, for the type of casting
operation in which we're interested
here.
Upcast operations (also called
widening conversions in the Java
Language Specification) convert a
subclass reference to an ancestor
class reference. This casting
operation is normally automatic, since
it's always safe and can be
implemented directly by the compiler.
Downcast operations (also called
narrowing conversions in the Java
Language Specification) convert an
ancestor class reference to a subclass
reference. This casting operation
creates execution overhead, since Java
requires that the cast be checked at
runtime to make sure that it's valid.
If the referenced object is not an
instance of either the target type for
the cast or a subclass of that type,
the attempted cast is not permitted
and must throw a
java.lang.ClassCastException.
For a reasonable implementation of Java:
Each object has a header containing, amongst other things, a pointer to the runtime type (for instance Double or String, but it could never be CharSequence or AbstractList). Assuming the runtime compiler (generally HotSpot in Sun's case) cannot determine the type statically a some checking needs to be performed by the generated machine code.
First that pointer to the runtime type needs to be read. This is necessary for calling a virtual method in a similar situation anyway.
For casting to a class type, it is known exactly how many superclasses there are until you hit java.lang.Object, so the type can be read at a constant offset from the type pointer (actually the first eight in HotSpot). Again this is analogous to reading a method pointer for a virtual method.
Then the read value just needs a comparison to the expected static type of the cast. Depending upon instruction set architecture, another instruction will need to branch (or fault) on an incorrect branch. ISAs such as 32-bit ARM have conditional instruction and may be able to have the sad path pass through the happy path.
Interfaces are more difficult due to multiple inheritance of interface. Generally the last two casts to interfaces are cached in the runtime type. IN the very early days (over a decade ago), interfaces were a bit slow, but that is no longer relevant.
Hopefully you can see that this sort of thing is largely irrelevant to performance. Your source code is more important. In terms of performance, the biggest hit in your scenario is liable to be cache misses from chasing object pointers all over the place (the type information will of course be common).
For example, suppose we have an array of Object[], where each element might have a different type. But we always know for sure that, say, element 0 is a Double, element 1 is a String. (I know this is a wrong design, but let's just assume I had to do this.)
The compiler does not note the types of the individual elements of an array. It simply checks that the type of each element expression is assignable to the array element type.
Is Java's type information still kept around at run time? Or everything is forgotten after compilation, and if we do (Double)elements[0], we'll just follow the pointer and interpret those 8 bytes as a double, whatever that is?
Some information is kept around at run time, but not the static types of the individual elements. You can tell this from looking at the class file format.
It is theoretically possible that the JIT compiler could use "escape analysis" to eliminate unnecessary type checks in some assignments. However, doing this to the degree you are suggesting would be beyond the bounds of realistic optimization. The payoff of analysing the types of individual elements would be too small.
Besides, people should not write application code like that anyway.
The byte code instruction for performing casting at runtime is called checkcast. You can disassemble Java code using javap to see what instructions are generated.
For arrays, Java keeps type information at runtime. Most of the time, the compiler will catch type errors for you, but there are cases where you will run into an ArrayStoreException when trying to store an object in an array, but the type does not match (and the compiler didn't catch it). The Java language spec gives the following example:
class Point { int x, y; }
class ColoredPoint extends Point { int color; }
class Test {
public static void main(String[] args) {
ColoredPoint[] cpa = new ColoredPoint[10];
Point[] pa = cpa;
System.out.println(pa[1] == null);
try {
pa[0] = new Point();
} catch (ArrayStoreException e) {
System.out.println(e);
}
}
}
Point[] pa = cpa is valid since ColoredPoint is a subclass of Point, but pa[0] = new Point() is not valid.
This is opposed to generic types, where there is no type information kept at runtime. The compiler inserts checkcast instructions where necessary.
This difference in typing for generic types and arrays makes it often unsuitable to mix arrays and generic types.
In theory, there is overhead introduced.
However, modern JVMs are smart.
Each implementation is different, but it is not unreasonable to assume that there could exist an implementation that JIT optimized away casting checks when it could guarantee that there would never be a conflict.
As for which specific JVMs offer this, I couldn't tell you. I must admit I'd like to know the specifics of JIT optimization myself, but these are for JVM engineers to worry about.
The moral of the story is to write understandable code first. If you're experiencing slowdowns, profile and identify your problem.
Odds are good that it won't be due to casting.
Never sacrifice clean, safe code in an attempt to optimize it UNTIL YOU KNOW YOU NEED TO.

Categories

Resources