JodaTime Chronology caching

JodaTime Chronology caching - java

I have successfully written a new Chronology that represents my company's fiscal calendar, based off of JodaTime. I referred to the JodaTime source code quite a bit, to figure out what I needed to do. One of the things I noticed in the BasicChronology class was the use of the inner class YearInfo to cache the 'firstDayOfYearMillis' - the number of milliseconds since 1970-01-01 (ISO). Figuring that, if it was enough of a performance bottleneck that JodaTime was caching it, I should probably add it to my chronology too.
When I did so, though, I made some modifications. Specifically, I moved the getYearInfo method into the YearInfo inner class, as well as making it static. I also moved the array used to store the cached values into the inner class as well. Full definition of the modified class is like this:
/**
* Caching class for first-day-of-year millis.
*
*/
private static final class YearInfo {
/**
* Cache setup for first-day-of-year milliseconds.
*/
private static final int CACHE_SIZE = 1 << 10;
private static final int CACHE_MASK = CACHE_SIZE - 1;
private static transient final YearInfo[] YEAR_INFO_CACHE = new YearInfo[CACHE_SIZE];
/**
* Storage variables for cache.
*/
private final int year;
private final long firstDayMillis;
private final boolean isLeapYear;
/**
* Create the stored year information.
*
* #param inYear The year to store info about.
*/
private YearInfo(final int inYear) {
this.firstDayMillis = calculateFirstDayOfYearMillis(inYear);
this.isLeapYear = calculateLeapYear(inYear);
this.year = inYear;
}
/**
* Get year information.
*
* #param year The given year.
*
* #return Year information.
*/
private static YearInfo getYearInfo(final int year) {
YearInfo info = YEAR_INFO_CACHE[year & CACHE_MASK];
if (info == null || info.year != year) {
info = new YearInfo(year);
YEAR_INFO_CACHE[year & CACHE_MASK] = info;
}
return info;
}
}
My question is... What are the performance or design implications of my changes? I've already decided that my changes should be thread-safe (given answers about final member variables). But why was the original implementation done the way it was, and not like this? I get why most of the methods that are being used effectively staticly aren't (given subclasses of BasicChronology), but I'll admit that some of my OO design stuff is a little rusty (having spent the last two years using RPG).
So... thoughts?

Regarding correctness, by switching YEAR_INFO_CACHE to static, you've introduced a minor memory leak. There are a few ways to tell if your static references matter in practice, e.g. do a back-of-the-envelope approximation of how large the cache will grow based on what you know about the data; profile the heap during/after a load test of your application; etc.
You're caching such small objects that you probably can cache a lot of them without a problem. Still, if you find that the cache needs to be bounded, then you have a few options, such as an LRU cache, a cache based on soft references instead of direct (strong) references, etc. But again, I emphasize that for your particular situation, implementing either of these might be a waste of time.
To explain the theoretical problem with static references, I'll refer to other posts, rather than reproducing them here:
1. Are static fields open for garbage collection?
2. Can using too many static variables cause a memory leak in Java?
Also regarding correctness, the code is thread safe not because references are final, but rather because the YearInfo values created by multiple threads for some cache position must be equal, so it doesn't matter which one ends up in the cache.
Regarding design, all of the YearInfo related stuff in the original Joda code is private, so the YearInfo details including caching are well encapsulated. This is a good thing.
Regarding performance, the best thing to do is profile your code and see what's using a significant amount of CPU. For profiling, you want to see whether the time spent in this code matters in the context of your entire application. Run your app under load, and check if this particular part of the code matters. If you don't see a performance problem in this code even without the YearInfo cache, then it's probably not a good use of time to work on / worry about that cache. Here is some information about how to do the check:
1. Performance profiler for a java application
2. How to find CPU-intensive class in Java?
That said, the converse is true -- if what you've got is working, then leave it as is!

I wrote the original code that caches into YearInfo objects. Your solution to encapsulate more logic into the YearInfo class is perfectly fine and should perform just as well. I designed the YearInfo based on intent -- I wanted a crude data pair and nothing more. If Java supported structs I would have used one here.
As for the cache design itself, it was based on profiling results to see if it had any impact. In most places, Joda-Time lazily computes field values, and caching them for later did improve performance. Because this particular cache is fixed in size, it cannot leak memory. The maximum amount of memory it consumes is 1024 YearInfo objects, which is about 20k bytes.
Joda-Time is full of specialized caches like this, and all of them showed measurable performance improvement. I cannot say how effective these techniques are anymore, since they were written and tested against JDK 1.3.

Related

Java: Getter and setter faster than direct access?

I tested the performance of a Java ray tracer I'm writing on with VisualVM 1.3.7 on my Linux Netbook. I measured with the profiler.
For fun I tested if there's a difference between using getters and setters and accessing the fields directly. The getters and setters are standard code with no addition.
I didn't expected any differences. But the directly accessing code was slower.
Here's the sample I tested in Vector3D:
public float dot(Vector3D other) {
return x * other.x + y * other.y + z * other.z;
}
Time: 1542 ms / 1,000,000 invocations
public float dot(Vector3D other) {
return getX() * other.getX() + getY() * other.getY() + getZ() * other.getZ();
}
Time: 1453 ms / 1,000,000 invocations
I didn't test it in a micro-benchmark, but in the ray tracer. The way I tested the code:
I started the program with the first code and set it up. The ray tracer isn't running yet.
I started the profiler and waited a while after initialization was done.
I started a ray tracer.
When VisualVM showed enough invocations, I stopped the profiler and waited a bit.
I closed the ray tracer program.
I replaced the first code with the second and repeated the steps above after compiling.
I did at least run 20,000,000 invocations for both codes. I closed any program I didn't need.
I set my CPU on performance, so my CPU clock was on max. all the time.
How is it possible that the second code is 6% faster?

I did done some micro benchmarking with lots of JVM warm up and found the two approaches take the exact same amount of execution time.
This happens because the JIT compiler is in-lining the getter method with a direct access to the field thus making them identical bytecode.

Thank you all for helping me answering this question. In the end, I found the answer.
First, Bohemian is right: With PrintAssembly I checked the assumption that the generated assembly codes are identical. And yes, although the bytecodes are different, the generated codes are identical.
So masterxilo is right: The profiler have to be the culprit. But masterxilo's guess about timing fences and more instrumentation code can't be true; both codes are identical in the end.
So there's still the question: How is it possible that the second code seems to be 6% faster in the profiler?
The answer lies in the way how VisualVM measures: Before you start profiling, you need calibration data. This is used for removing the overhead time caused by the profiler.
Although the calibration data is correct, the final calculation of the measurement is not. VisualVM sees the method invocations in the bytecode. But it doesn't see that the JIT compiler removes these invocations while optimizing.
So it removes non-existing overhead time. And that's how the difference appears.

In case you have not taken a course in Statistics, there is always variance in program performance no matter how well that it is written. The reason why these two methods seem to run at approximately the same rate is because the accessor fields only do one thing: They return a particular field. Because nothing else happens in the accessor method, both tactics pretty much do the same thing; however, in case you know not about encapsulation, which is how well that a programmer hides the data (fields or attributes) from the user, a major rule of encapsulation is not to reveal internal data to the user. Modifying a field as public means that any other class can access those fields, and that can be very dangerous to the user. That is why I always recommend Java programmers to use accessor and mutator methods so that the fields will not get into the wrong hands.
In case you were curious about how to access a private field, you can use reflection, which actually accesses the data of a particular class so that you can mutate it if you really must do so. As a frivolous example, suppose that you knew that the java.lang.String class contains a private field of type char[] (that is, a char array). It is hidden from the user, so you cannot access the field directly. (By the way, the method java.lang.String.toCharArray() accesses the field for you.) If you wanted to access each character consecutively and store each character into a collection (for the sake of simplicity, why not a java.util.List?), then here is how to use reflection in this case:
/**
This method iterates through each character in a <code>String</code> and places each of them into a <code>java.util.List</code> of type <code>Character</code>.
#param str The <code>String</code> to extract from.
#param list The list to store each character into. (This is necessary because the compiler knows not which <code>List</code> to use, so it will automatically clear the list anyway.)
*/
public static void extractStringData(String str, List<Character> list) throws IllegalAccessException, NoSuchFieldException
{
java.lang.reflect.Field value = String.class.getDeclaredField("value");
value.setAccessible(true);
char[] data = (char[]) value.get(str);
for(char ch : data) list.add(ch);
}
As a sidenote, note that reflection takes a lot of performance out of your program. If there is a field, method, or inner or nested class that you must access for whatever reason (which is highly unlikely anyway), then you should use reflection. The main reason why reflection takes away precious performance is because of the relatively innumerable exceptions that it throws. I am glad to have helped!

java garbage collection and temporary objects

I'm a c++ developer by trade, but I've been doing a bit Java lately. This project I'm working in was done by a developer long since gone and I keep finding things where he is working around the Garbage collection by doing weird things.
Case and point he implemented his own string class to avoid slow down by GC
This section of the app takes a large binary file format and exports it to csv. This means building up a string for each line in the file (millions). In order to avoid those temporary string objects he made a string class that just has a large array of bytes he reuses.
/**
HACK
A Quick and Dirty string builder implementation optimized for GC.
Using String.format causes the application grind to a halt when
more than a couple of string operations are performed due to the number of
temporary objects allocated while formatting strings for drawing or logging.
*/
Does this actually help? is this really needed? Is this better than just declaring a String object outside the loop and setting it inside the loop?
The app also has a hash map containing doubles for the values. The keys in the map are fairly static but the values change often. Afraid of GC on doubles he made a myDouble class to use as the value for the hashmap
/**
* This is a Mutable Double Wrapper class created to avoid GC issues
*
*/
public class MyDouble implements Serializable {
/**
*
*/
private static final long serialVersionUID = C.SERIAL_VERSION_UID;
public double d;
public MyDouble(double d) {
this.d = d;
}
}
This is crazy and completely unnecessary... right?

It's true that string concatenation can be a bottleneck in Java because Strings are immutable. This means each concatenation creates a new String, unless a matching String was previously created and is therefore in the string-pool (see string interning). Either way, it can certainly lead to problems.
However your predecessor is not the first person to have encountered this and the standard way to deal with the need to concatenate many Strings in Java is to use a StringBuilder.
When a double (or any primative for that matter) is used as a local variable, it's kept on the stack and the memory it occupies released along with the stack frame (non sure if they're subject to GC or taken care of by the JVM as it runs). If however the double is the field on an object, it's stored on the heap and will be collected when the object containing it is collected.
Without seeing how the double values are being used, it hard to say for sure, but it's more than likely the use of the Map has increased the GC load.
In summary, yes, imho this is certainly, as you say 'crazy and completely unnecessary'. These sorts of premature optimizations only serve to complicate the code base making it more prone to bugs and making future maintenance more difficult. The golden rule should practically always be, build the simplest thing that works, profile it and then optimize.

Given that HashMaps in jdk1.6 and above cause problems with multi=threading, how should I fix my code

I recently raised a question in stackoverflow, then found the answer. The initial question was What mechanisms other than mutexs or garbage collection can slow my multi-threaded java program?
I discovered to my horror that HashMap has been modifed between JDK1.6 and JDK1.7. It now has a block of code that causes all threads creating HashMaps to synchronize.
The line of code in JDK1.7.0_10 is
/**A randomizing value associated with this instance that is applied to hash code of keys to make hash collisions harder to find. */
transient final int hashSeed = sun.misc.Hashing.randomHashSeed(this);
Which ends up calling
protected int next(int bits) {
long oldseed, nextseed;
AtomicLong seed = this.seed;
do {
oldseed = seed.get();
nextseed = (oldseed * multiplier + addend) & mask;
} while (!seed.compareAndSet(oldseed, nextseed));
return (int)(nextseed >>> (48 - bits));
}
Looking in other JDKs, I find this isn't present in JDK1.5.0_22, or JDK1.6.0_26.
The impact on my code is huge. It makes it so that when I run on 64 threads, I get less performance than when I run on 1 thread. A JStack shows that most of the threads are spending most of their time spinning in that loop in Random.
So I seem to have some options:
Rewrite my code so that I don't use HashMap, but use something similar
Somehow mess around with the rt.jar, and replace the hashmap inside it
Mess with the class path somehow, so each thread gets its own version of HashMap
Before I start down any of these paths (all look very time consuming and potentially high impact), I wondered if I have missed an obvious trick. Can any of you stack overflow people suggest which is the better path, or perhaps identify a new idea.
Thanks for the help

I am the original author of the patch which appeared in 7u6, CR#7118743 : Alternative Hashing for String with Hash-based Maps‌.
I'll acknowledge right up front that the initialization of hashSeed is a bottleneck but it is not one we expected to be a problem since it only happens once per Hash Map instance. For this code to be a bottleneck you would have to be creating hundreds or thousands of hash maps per second. This is certainly not typical. Is there really a valid reason for your application to be doing this? How long do these hash maps live?
Regardless, we will probably investigate switching to ThreadLocalRandom rather than Random and possibly some variant of lazy initialization as suggested by cambecc.
EDIT 3
A fix for the bottleneck was been pushed to the JDK7 update mercurial repo:
http://hg.openjdk.java.net/jdk7u/jdk7u-dev/jdk/rev/b03bbdef3a88
The fix will be part of the upcoming 7u40 release and is already available in IcedTea 2.4 releases.
Near final test builds of 7u40 are available here:
https://jdk7.java.net/download.html
Feedback is still welcomed. Send it to http://mail.openjdk.java.net/mailman/listinfo/core-libs-dev to be sure it gets seen by the openJDK devs.

This looks like a "bug" you can work around. There is a property that disables the new "alternative hashing" feature:
jdk.map.althashing.threshold = -1
However, disabling alternative hashing is not sufficient because it does not turn off the generation of a random hash seed (though it really should). So even if you turn off alt hashing, you still have thread contention during hash map instantiation.
One particularly nasty way of working around this is to forcefully replace the instance of Random used for hash seed generation with your own non-synchronized version:
// Create an instance of "Random" having no thread synchronization.
Random alwaysOne = new Random() {
#Override
protected int next(int bits) {
return 1;
}
};
// Get a handle to the static final field sun.misc.Hashing.Holder.SEED_MAKER
Class<?> clazz = Class.forName("sun.misc.Hashing$Holder");
Field field = clazz.getDeclaredField("SEED_MAKER");
field.setAccessible(true);
// Convince Java the field is not final.
Field modifiers = Field.class.getDeclaredField("modifiers");
modifiers.setAccessible(true);
modifiers.setInt(field, field.getModifiers() & ~Modifier.FINAL);
// Set our custom instance of Random into the field.
field.set(null, alwaysOne);
Why is it (probably) safe to do this? Because alt hashing has been disabled, causing the random hash seeds to be ignored. So it doesn't matter that our instance of Random isn't in fact random. As always with nasty hacks like this, please use with caution.
(Thanks to https://stackoverflow.com/a/3301720/1899721 for the code that sets static final fields).
--- Edit ---
FWIW, the following change to HashMap would eliminate the thread contention when alt hashing is disabled:
- transient final int hashSeed = sun.misc.Hashing.randomHashSeed(this);
+ transient final int hashSeed;
...
useAltHashing = sun.misc.VM.isBooted() &&
(capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
+ hashSeed = useAltHashing ? sun.misc.Hashing.randomHashSeed(this) : 0;
init();
A similar approach can be used for ConcurrentHashMap, etc.

There are lots of apps out there that create a transient HashMap per record in big data applications. This parsers and serializers, for example. Putting any synchronization into unsynchronized collections classes is a real gotcha. In my opinion, this is unacceptable and needs to be fixed ASAP. The change that was apparently introduced in 7u6, CR#7118743 should be reverted or fixed without requiring any synchronization or atomic operation.
Somehow this is reminds me of the colossal mistake of making StringBuffer and Vector and HashTable synchronized in JDK 1.1/1.2. People paid dearly for years for that mistake. No need to repeat that experience.

Assuming your usage pattern is reasonable, you'll want to use your own version of Hashmap.
That piece of code is there to make hash collisions lot harder to cause, preventing attackers to create performance problems (details) - assuming this problem is already dealt with in some other way, I don't think you'd need synchronization at all. However, irrelevant of if you use synchronization or not, it seems you would want to use your own version of Hashmap so you wouldn't depent that much on what JDK happens to provide.
So either you just normally write something similar and point to that, or override a class in JDK. To do the latter, you can override bootstrap classpath with -Xbootclasspath/p: parameter. Doing so will however "contravene the Java 2 Runtime Environment binary code license" (source).

Java: Best approach to have a long list of variables needed all the time without consuming memory?

I wrote an abstract class to contain all rules of the application because I need them almost everywhere in my application. So most of what it contains is static final variables, something like this:
public abstract class appRules
{
public static final boolean IS_DEV = true;
public static final String CLOCK_SHORT_TIME_FORMAT = "something";
public static final String CLOCK_SHORT_DATE_FORMAT = "something else";
public static final String CLOCK_FULL_FORMAT = "other thing";
public static final int USERNAME_MIN = 5;
public static final int USERNAME_MAX = 16;
// etc.
}
The class is big and contains LOTS of such variables.
My Question:
Isn't setting static variables means
these variables are floating in memory all the time?
Do you suggest insteading of
having an abstract class, I have a
instantiable class with non-static
variables (just public final), so I
instantiate the class and use the
variables only when I need them.
Or is what am I doing is completely
wrong approach and you suggest something else?

Given modern machines, RAM capacity etc, you'd have to have many thousand rules (if not millions) to make any noticable difference, both performance- and memory-wise.
So the question is not whether that's going to hog your system: it's not.
The question is whether that's a good practice.
I have of course used this pattern myself, so I understand it's usefulness. The main drawback however is: it makes your code untestable. Since there is no easy way to set these values differently for unit tests (as opposed to property files where you just put a different file on the classpath), it will be very hard to test functionality of individual modules without wiring up the whole application, but that depends on what you keep in those constants.
I guess maybe I'd try to split things up, have one constant class per module, package etc. and initialize those constants from property files:
private static final String CONSTANT_FOO;
private static final String CONSTANT_BAR;
static{
try{
Properties props = new Properties();
InputStream is =
MyConstantClass.class
.getResourceAsStream("my.module.properties");
props.load(is);
// you'll actually want to move this to finally, but I'm lazy
is.close();
CONSTANT_FOO = props.get("constants.foo");
CONSTANT_BAR = props.get("constants.bar");
}catch(Exception e){
throw new IllegalStateException(e);
}
}
That way your code gets more testable and configurable, while still enjoying the benefits of global configuration constants.

Isn't setting static variables means these variables are floating in memory all the time?
Yes. But unless, you have a really, really large number of constants like that, it isn't going to make much difference. Besides, if you really need them to be named constants, you cannot really improve on your current approach.
Do you suggest instead of having an abstract class, I have a instantiable class with non-static variables (just public final), so I instantiate the class and use the variables only when I need them.
No. That won't take any less memory. The JVM would have to keep the String objects corresponding to the literals anyway, so that it can assign them to the (non-static) variables when an instance of the class was created.
Add to this the possibility that you may create multiple instances of the classes, which uses more space and consumes more CPU cycles.
Or is what am I doing is completely wrong approach?
I don't think so.
#Peter Lawrey points out that there are theoretical limits on the number of static fields that a class can have. The limiting factors include:
the size of the bytecode segment for the Cinit pseudo-method (2G bytes) - JVM spec 4.7.3
the number of fields in a class (64K) - JVM spec 4.5
the number of string literals in a class (64K) - JVM spec 4.4.3
However, this is unlikely to be a practical problem. I cannot imagine a program really needing so many constants that the limits come into play. Besides, you could just split the constants over multiple classes.

Break the list of variables into a bunch of variables based on the sub modules of your application. Place the set of variables in a separate classes. Instantiate the class and store its object in the session.
whenever you need to use any variable, refer the object from session. If you are not using any variable from an object, release the object from the session.
I hope it wont take much of your server's memory.

I think it's not that much overhead. Thinking practically, lets say, you 3000 rules, each rule is a String of 1000 characters. Your memory consumption is 3MB. Is that too big?
I would say, its alright, even if you have 30,000 rules. Don't bother much about memory usage. This is not one of the area that's going to cause out-of-memory error.

I think it's OK for memory issue. If there are large objects to use, lazy-initialize can be a way to save memory. Or java.util.prefs.Preferences can be a choice.
If I were you, I would use name RuleUtils instead of appRules, and make it final, as abstract class should be designed for extension.

Its worth putting the amount of memory you are using perspective. Are you using a mobile device or a PC or a server? For a Server you can by a machine with 24 GB for around £1,800. Thats about £75 per GB, or 8 cents per MB.
If your time costs the company £50 per hour, its not worth spending even 1 minute to save a 10 MB.
On minimum wage, its not worth spending 1 minute to save 1 MB in a PC or server.

Java Profiling: Private Property Getter has Large Base Time

I'm using TPTP to profile some slow running Java code an I came across something interesting. One of my private property getters has a large Base Time value in the Execution Time Analysis results. To be fair, this property is called many many times, but I never would have guessed a property like this would take very long:
public class MyClass{
private int m_myValue;
public int GetMyValue(){
return m_myValue;
}
}
Ok so there's obviously more stuff in the class, but as you can see there is nothing else happening when the getter is called (just return an int). Some numbers for you:
About 30% of the Calls of the run are
on the getter (I'm working to reduce
this)
About 25% of the base time of
the run is spent in this getter
Average base time is 0.000175s
For comparison, I have another method in a different class that uses this getter:
private boolean FasterMethod(MyClass instance, int value){
return instance.GetMyValue() > m_localInt - value;
}
Which has a much lower average base time of 0.000018s (one order of magnitude lower).
What's the deal here? I assume there is something that I don't understand or something I'm missing:
Does returning a local primitive really take longer than returning a calculated value?
Should I look at metric other than Base Time?
Are these results misleading and I need to consider some other profiling tool?
Edit 1: Based on some suggestions below, I marked the method as final and re-ran the test, but I got the same results.
Edit 2: I installed a demo version of YourKit to re-run my performance tests, and the YourKit results look much closer to what I was expecting. I will continue to test YourKit and report back what I find.
Edit 3: Changing to YourKit seems to have resolved my issue. I was able to use YourKit to determine the actual slow points in my code. There are some excellent comments and posts below (upvoted appropriately), but I'm accepting the first person to suggest YourKit as "correct." (I am not affiliated with YourKit in any way / YMMV)

If possible try using another profiler (the Netbeans one works well). This may be hard to do depending on how your code is setup.
It is possible that, just like many other tools, a different profiler will result in different information.
Does returning a local primitive really take longer than returning a
calculated value?
Returning an instance variable takes longer than returning an local variable (VM dependent). The getter that you have is simple so it should be inlined, so it becomes as fast as accessing a public instance variable (which, again, is slower than accessing a local variable).
But you don't have a local value (local meaning in the method as opposed to in the class).
What exactly do you mean by "local"?
Should I look at metric other than Base Time?
I have not used the Eclipse tools, so I am not sure how it works... it might make a difference if it is a tracing or a sampling profiler (the two can give different results for things like this).
Are these results misleading and I need to consider some other
profiling tool?
I would consider another tool, just to see if the result is the same.
Edit based on comments:
If it is a sampling profiler what happens, essentially, that every "n-time units" the program is sampled to see where the program is. If you are calling the one method way more than the other it will show up as being in the method that is called more (it is simply more likely that that method is being run).
A tracing profiler adds code to your program (a process known as instrumentation) to essentially log what is going on.
Tracing profilers are slower but more accurate, they also require that the program be changed (the instrumentation process) which could potentially introduce bugs (not that I have heard of it happening... but I am sure it does at least while they are developing the profiler).
Sampling profilers are faster but less accurate (they just guess at how often a line of code is executed).
So, if Eclipse uses a sampling profiler you could see what you consider to be strange behaviour. Changing to a tracing profiler would show more accurate results.
If Eclipse uses a tracing profiler then chaning profilers should show the same result (however they new profiler may make it more obvious to you as to what is going on).

It does sound slightly misleading - perhaps the profiler is removing some optimizations?
Just for kicks, try making the method final, which will make it easier to inline. That may well be the difference between the property and FasterMethod. In real use, HotSpot will inline even virtual methods until the first time they're overridden (IIRC).
EDIT: Responding to Brian's comment: Yes, it's usually the case that making something final won't help performance (although it may be a good thing in terms of design :) because Hotspot will normally work out whether it can inline or not based on whether it's overridden or not. I was suggesting this profiler may have messed with that.
EDIT: I've now managed to reproduce the way that HotSpot "undoes" optimisation of classes which haven't been extended yet (or methods which haven't been overridden). This was harder to do for the server VM than the client, but I've done it :)
public class Test
{
public static void main(String[] args)
throws Exception
{
final long iterations = 1000000000L;
Base b = new Base();
// Warm up Hotspot
time(b, 1000);
// Before we load Derived
time(b, iterations);
// Load Derived and use it quickly
// (Just loading is enough to make the client VM
// undo its optimizations; the server VM needs more effort)
Base d = (Base) Class.forName("Derived").newInstance();
time(d, 1);
// Time it again with Base
time(b, iterations);
}
private static void time(Base b, long iterations)
{
long total = 0;
long start = System.currentTimeMillis();
for (long i = 0; i < iterations; i++)
{
total += b.getValue();
}
long end = System.currentTimeMillis();
System.out.println("Time: " + (end-start));
System.out.println("Total: " + total);
}
}
class Base
{
public int getValue() { return 1; }
}
class Derived extends Base
{
#Override
public int getValue() { return 2; }
}

That sounds very peculiar. You're not calling an overriding method by mistake, are you ?
I would be tempted to download a demo version of YourKit. It's trivial to set up, and it should give an indication as to what's really occurring. If both TPTP and YourKit agree, then something peculiar is happening (and I know that's not a lot of help!)

Something that used to make a lot of difference to performance of these sort of methods (although this may be to some extent historical) is that the size of the calling method can be an issue. HotSpot (and serious rivals) will happily inline small methods (some may choke on synchronized/try-finally). However, if the calling method is large, then it may not. This was particularly a problem with old versions of the HotSpot C1/client which had a really bad register allocation algorithm (it now has an algorithm that is both quite good and fast).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.