; last updated - 6 minutes read

At this year's JAX I attended a talk Charles Nutter held. He explained some of the optimizations Java's just in time compiler (aka JIT) does. After hearing the talk, I began to experiment with the JIT myself, doing some benchmarks and playing with the VM's parameters.

Before explaining what the JIT does, it is interesting to do a little experiment. Consider the following class:

public class PerformanceTest2 { public static void main(String[] args) { for (int outer=1;outer<=100;outer++) { long start = System.nanoTime(); testPerformance(); long duration = System.nanoTime()-start; System.out.println("Loop # " + outer + " took " + ((duration)/1000.0d) + " µs"); } } private static void testPerformance() { long sum = 0; for (int i = 1; i <= 5000; i++) { sum = sum + random(i); } } private static int random(int i) { int x = (int)(i*2.3d/2.7d); // This is a simulation int y = (int)(i*2.36d); // of time-consuming return x%y; // business logic. } }

This class consists of two nested loops and measures the time each look takes. The result is an impressive example of the optimizations of the JVM:

Our particular example shows the JIT behavior very clearly. In most cases the first two iterations are comparably slow. Beginning with the third iteration, there is a tremendous speedup. Until the tenth repetition (or - in some examples - the ten thousendth repetition) each repetition is faster than the previous one. After that, things don't change that much. Each iteration is up to 50 times as fast as the first iteration.[1] Nonetheless, the JIT never stops to try to improve the code. After playing around a lot with the benchmark code and the JVM parameters[2] I was able to generate a smoother diagram that shows the ongoing optimization:

Let's optimize the java code manually. An optimization that comes to mind immediately is inlining. If you call a method, the parameters are put on the stack (just to name one operation), so it is evident that getting rid of a method improves the performance. The optimized code might look like this:

public class PerformanceTest3 { public static void main(String[] args) { for (int outer=1;outer<=100;outer++) { long start = System.nanoTime(); long sum = 0; for (int i = 1; i <= 5000; i++) { int x = (int)(i*2.3d/2.7d); // This is a simulation int y = (int)(i*2.36d); // of time-consuming sum = sum + x%y; // business logic. } long duration = System.nanoTime()-start; System.out.println("Loop # " + outer + " took " + ((duration)/1000.0d) + " µs"); } } }

What about the performance charts?

Optimized versionnaive version

Well, this is not exactly what I hoped to see when I prepared the benchmarks. I wasn't aware that Java OSR compilation mechanism works that fine. I expected the left-hand side graphic to be all blue. Nonetheless, there are enough differences between the two graphics to illustrate the point:

  • On the long run, there is hardly any performance difference between the naive and the optimized version. In many cases, the optimized version will even perform worse. Just compare the size of the blue areas.

  • The optimized version needs more time to speed up than the naive version.

  • On the long run, the manually optimized version is not faster than the naive version.

  • During the first few iterations, the optimized version is indeed faster than the naive version.

  • There are at least two major speedups on the right-hand side diagram. The left-hand side diagram stays at an almost constant performance until the algorithm gets 10 or 20 times faster.

The reason for the mediocre success of our optimization is that the virtual machine does the same steps as we did. The default behavior of the JIT is to optimize method calls.

  • If a method is frequently called, and if it is short enough, it will be inlined.

  • If a method is called 10000 times, it compiles to machine code. The next call doesn't call the interpreted version but the compiled version.

  • If there is no method call, but the program spends a lot of time in a method, the method is compiled. After that the program is stopped, the variables are transferred to the machine code version of the method and the machine code version is started. This is a difficult operation called On Stack Replacement. Usually, the JIT tries to optimize everything else before starting OSR.

My conclusion is not to waste time on local optimizations. In most cases, we can't beat the JIT. A sensible (yet counter-intuitive) optimization is to split long methods into several methods that can be optimized individually by the JIT.

You'd rather focus on global optimizations. In most cases, they have much more impact. You can rewrite our example like so:

public class PerformanceTest4 { public static void main(String[] args) { for (int outer = 1; outer <= 100; outer++) { long start = System.nanoTime(); long sum = 10647704; long duration = System.nanoTime() - start; System.out.println("Loop # " + outer + " took " + ((duration) / 1000.0d) + " µs"); } } }

The program delivers exactly the same result, but the iterations are so fast I can't even measure them on my PC. Each iteration takes less than 0.3 µs, that is, it is at least 20 times quicker than the original JIT optimized version. And it didn't even compile!

  • Charles Nutter uploaded his slides here.

  • You can watch a video of another nice explanation of the JIT here if you've got a spare hour (57 minutes, to be precise).

  1. Notice that usually even the first iteration is twice as fast than in interpreted mode. You can verify this by adding the VM parameter -Xint.
  2. -XX:CompileThreshold=120.000 instead of 10.000, forcing the VM to wait 12 times as long as usual to start compiling