Common wisdom says that Java runs on bytecode. Most developers also know there's something like a JIT compiler, but few developers are aware that the JIT compiler generates real machine code. Let alone that we can see it.
Displaying which methods "run hot"
From the birds perspective, showing the Assembly code is simply a matter of adding a few JVM parameters and copying the hsdis
library into the JRE folder. In reality, it's a bit tricky to get and copy hsdis
. But fear not: even I managed to do that at a time long before I knew what I'm doing. We'll come back to getting hsdis
in a minute.
Let's start simple. In the first step, we are only interested in which methods are compiled. That gives a good overview of which parts of the program need to be optimized. Let's assume we've got a benchmark class with a main method:
public class Benchmark { public int testMethodProcedurally(MyState state) { ... } public void mainTest() { MyState s = new MyState(); for (int i = 0; i < 1000000; i++) { testMethodProcedurally(s); } } public void main(String... ars) { new MyBenchmark().mainTest(); } }After compiling, you can start the program like so:
java -Xbatch -XX:-TieredCompilation -XX:+PrintCompilation- -Xbatch forces the JIT compiler in the main thread of the JVM. By default, both compiler and optimizer run in parallel to you application, so they can do their magic without slowing your program down. That's great, but multithreading also means that sometimes the console messages are printed in parallel. In other words: sometimes the Assembly code and the other diagnostic messages get mixed up, making it hard to read. -Xbatch fixes that.
- -XX:-TieredCompilation is another flag you shouldn't use in production. The JVM has an interpreter and several compilers. Each compiler works slower than it's predecessor, yielding faster code at the end. Usually, the JVM starts with the interpreter because it doesn't require any start-up time. The compilers are only activated when a method runs hot. We don't want to see every intermediate optimization step. -XX:-TieredCompilation forces the JVM to skip the intermediate steps, so all we see is the final, most optimized Assembly code.
- -XX:+PrintCompilation prints a summary which methods have been compiled. It doesn't print the Assembly code itself. That's ideal for gaining a quick overview which methods run "hot". These are precisely the methods that benefit from optimizing.
Watching the JIT compiler manipulating your code
Running the Java program with the -XX:+PrintCompilation
flag prints something like this on the console:
The output shows every method that's been compiled. Most of these methods are internal methods of Java or your framework, especially at the beginning. I omitted most of them for the sake of brevity. In our example, only the last three lines have something to do with the code we've written ourselves. Everything else is third-party code we're just using.
Understanding the output of -XX:+PrintCompilation
The second row is simply the row index. The first column shows when the method has been compiled. It's the number of milliseconds since the start of the program.
The third column consists of a number of flags, as described in Steve Colebourne article:
-
If the third column contains a "%", the method has been replaced "on stack". It ran so long the JVM decided it pays to replace the interpreted code by machine code without waiting the method to terminate.
- After that, there may be an exclamation mark. This means the method contains a try-catch-block.
- "b" means "blocking compiler".
- "s" indicates that the method is declared as a
synchronized
method. - "n" refers to native methods.
- "*" means a native wrapper is generated.
If we hadn't switched off tiered compilation, we'd see a digit at before the method name. This digit indicates which compiler has been used, as Alexandra Ohja explains in her blog.
The last column contains the method name and the length of the method. Note that this isn't the length of the machine code. It's the length of the byte code. The resulting machine code is several times as long. That's one of the reasons why it's a good idea to use byte code instead of compiling everything in advance: the size of the CPU cache is limited, so it pays to replace only a small part of the compact byte code by super-fast machine code. Thing is, the machine code is only fast if it fits into the CPU cache. After that, it's one or two magnitudes slower. As a side effect, the JVM can watch the code in action, allowing for optimizations that go far beyond everything a ahead-of-time compiler can reasonably provide. The JVM can do certain optimizations because it already know they'll pay off. An AOT compiler can't do that, because it doesn't have a clue how the code behaves at runtime. Of course, there are also AOT compilers running a profiler - but this is fairly advanced stuff. The beauty of the Java approach is that it delivers decent performance without demanding much from the programmer.
The last line ends with "made non entrant". This means the JVM found out it compiled too aggressively. The compiled method is taken out of service.
Printing the Assembly code
To reveal the actual Assembly code, you need to start the Java application with the PrintAssembly
option:
This makes the JIT compiler print the Assembly code to the console each time it compiles a method. If we hadn't added the parameters -XX:TieredCompilation
and -Xbatch
, we might see the result of several compilations taking place in independent thread interspersed, and we might see the result of multiple compilations of the same method.
But before we can see the machine code, we have to add a DLL to your Java installation. (If you're using a Mac, it's a file called hsdis-amd64.dylib
, and UNIX users deals with a *.so
code> file. But you know what I mean). :)
How to build or get the hsdis library
For some legal reason[1], it's difficult to get the precompiled library. Usually you manage to find a copy of hsdis, but you never know if the source is trustworthy. It's better to compile the library yourself.
OSX users follow this recipe: by Chris Newland. There's also a fairly complete list of build instructions for Windows, Linux, and Mac OSX. You need a bit of patience and endurance to find out which version of bin-utils
to use. bin-utils-2.29
code> introduces a breaking change that has been solved with Java 12. If you're using Java 11 or GraalVM 19, you'll need bin-utils-2.28
.
Update June 28, 2020: I've prepared a shell script for OSX. The AdoptOpenJDK 11 version is here and the AdoptOpenJDK 8 version is here. All you have to do is to download the correct version of the bin-utils
and to modify the paths to match your installation.
How to read the Assembly code?
By now, you should see the Assembly code. Yay! But as it turns out, it's surprisingly difficult to read this code.
First, you should limit the code that's been printed to the method you're interested in. Unfortunately, I didn't managed to find out which JVM parameters are required. If you know them, please leave a comment!
Second, it pays to disable the advanced compiler levels. Each optimization step makes the code more sophisticated.
What remains, is difficult enough to grasp. The machine code consists of a lot of unexpected stuff, such as safepoints, deoptimizations, and so on. A good primer might be the blog of Jean-Phillipe Bempel. Of course, there's also my own series of articles on how Java translates to Assembly language.
Dig deeper
Introduction to how the compiler works (scroll down, the JIT is only part of the article)
Down the rabbit hole (the famous talk of Charles Nutter on what the compiler does with your code when you're looking away)
What does tiered compilation do?
Understanding Java JIT Compilation with JITWatch, Part 1
- The licenses of binutils and OpenJDK are incompatible. Thus many people believe it's ok to use them for personal use, but it's illegal to distribute the precompiled hsdis library.↩