- 11 minutes read

They say Truffle is GraalVM's interpreter framework. What an understatement! True, Truffle allows you to write an interpreter without the pain. But what's really cool about Truffle is your interpreter runs at full speed. Truffle uses some clever magic to generate highly optimized machine code. It's compiler-compiler. Let's have a look at this remarkable feat.

Remarkable performance

The JavaScript interpreter of GraalVM is written with Truffle. The resulting performance is striking, especially taking into account that GraalVM is the new kid on the block. The interpreter requires its warmup time, but after that, the peak performance is roughly the same as Google's V8 compiler.

Truth to tell, I wonder if Truffle is really "the new kid on the block." The oldest academic paper on Truffle I've found is from 2012. So I suppose Truffle is at least seven years old. Oracle's TruffleRuby repository even dates back to 2001 - but I suspect it wasn't Truffle at the time. The GraalVM repository itself goes back to 2011. Be that as it may, that's enough developer experience piled up to build trust. Truffle is a mature and sophisticated piece of software.

Back to topic. I reckon not everybody has busied themselves with compilers and interpreters, so let's add a short explanation.

Wait - what's the difference between an interpreter and a compiler?

Putting it in a nutshell, the difference between a compiler and an interpreter is that the compiler runs in advance, whereas the interpreter works on-the-fly. In the Java universe, traditional compilers are often called an "ahead-of-time" (AOT) compiler. Both tools examine your program, convert it into machine code, and execute it. The compiler generates the entire machine code before running the program. That takes some time. If you're working on an enterprise-size program, embrace yourself to wait a minute or two. However, once the machine code is ready, it's blazing fast.

The interpreter reads your program line-by-line and generates the machine code of the line on-the-fly. After that, it executes that line before proceeding to the next line. Traditionally, it forgets about the previous line, so loops are translated time and again to machine code.[1] As a result, simple interpreters run programs 100 to 1000 times slower than compilers.

Just-in-time compilers

During the last few decades, the advent of the just-in-time (JIT) compilers blurred this image. Nowadays, JIT languages are often faster than AOT compiler languages. They start slow, but after a while, they pick up speed, even overtaking the AOT compiler and reaching much better peak performance. The GraalVM offers both a JIT compiler and an AOT compiler for Java. Most benchmarks I've seen indicate the JIT compiler is roughly twice as fast. For the moment, you can think of the JIT compiler as an interpreter with a smart caching strategy. We'll come back to this in a minute.

The peak performance comes at a price: the JIT compiler needs more memory than the AOT compiler, and it starts slow. If you're writing an Amazon Lambda service, you're primarily interested in fast cold start times, so the AOT compiler is our friend.

Why should I write an interpreter?

The performance penalty sounds bad. Why should anybody deliberately write such a slow interpreter?

Well, it's easy. You can cobble together an interpreter for a simple language in half a day. It's also straightforward to debug and maintain such an interpreter. Every programmer and their grandmother can do it.

Compilers tend to be much more difficult. Modern compilers such as the V8 JavaScript engine or the HotSpot JVM are highly sophisticated beasts consisting of millions of lines. A plethora of optimizations make the complexity go through the roof. Is there a way to write an efficient compiler without the pain?

The best of both worlds

Enter GraalVM. At its heart, GraalVM is a JVM engine running Java bytecode with a JIT compiler. Can we use this JIT compiler to improve our interpreter?

Truffle does just that. It takes your interpreter, uses the program as input, and runs it line-by-line. At first, that's just a slow interpreter, until tree rewriting and the JIT compiler kick in. The latter converts your interpreter into machine code.

But that's only the beginning. The JIT compiler generates the code for both running your program and your interpreter, opening a window of opportunity for optimizations. Just one example: the JIT compiler only generates machine code that's really needed. It doesn't generate machine code for methods that are never called. If your program doesn't use floating-point numbers, the machine code won't contain floating-point code. A large part of the interpreter doesn't make it into the machine code. It's as if it'd never been written.

Speculative optimization

The JIT compiler optimizes the machine code aggressively. It inlines small methods after a while. That's why getters and setters are cheap. Your IDE tells you they're there, but in reality, direct variable assignments replace them.[2] Escape analysis detects variables only used locally, eliminating access to the slow main memory in favor of fast CPU registers. Even the "else" part of an if-statement is effectively dead code if it's never called. I'll demonstrate the latter below.

A large part of these optimizations builds on hope and heuristics. If the JIT compiler guesses wrong, it has to "deoptimize" the code. Usually, that's only a small penalty.

Partial Evaluation and Futamara projections

At this point, things get magic. When we run our program with Truffle and our interpreter, we're dealing with three entities we can consider input to your program:

  • the source code of your interpreter
  • the source code of the program to run
  • the data the program is working on

Of these three, only the latter may change. Both the interpreter and the program remain unchanged. So the JIT compiler can run a lot of speculative optimizations without every guessing wrong. Guessing wrong is expensive because the JIT compiler has to throw away the compiled and optimized machine code and to start from scratch.

That's the idea of the Futamara projection (aka partial evaluation). A real-world program uses only a small part of the interpreter. The rest is dead code. Remove it, perform optimizations like escape analysis and inlining, and you end up with surprisingly efficient machine code.

Truffle

Truffle takes this idea to a new level. It's an API providing everything you need to write an interpreter, plus a framework allowing you to leverage partial evaluation. When you're writing the interpreter, you can give hints to Truffle where and when to optimize code. There are several examples of how to do this in the paper on Practical Partial Evaluation for High-Performance Dynamic Language Runtimes.

Let's have a look at actual source code, inspired by the source of Truffle's JavaScript engine. Please note that all the code in this article is pseudocode. I've eliminated as much boilerplate code as possible.

I assume every language has integer values and an addition. To define the addition, you need to define something like this:

@NodeInfo(shortName = "+") public class AddNode extends Node { @Specialization int doInt(int a, int b) { return a + b; } }

That was easy. Let's have a look at what Truffle does with our code. A simple interpreter framework would simply call the method doAdd(). Maybe like so:

if (currentNode instanceof AddNode) { ((AddNode)currentNode).doInt(node.leftChild(), node.rightChild())); }

At this point, I can only guess, but from what I've read between the lines of the Truffle documents, Truffle is a bit more clever. It seems to extract the bytecode of the method to generate the code of the program to be executed.[3] So (I guess) instead of calling the method doInt() it compiles the JavaScript line console.log(30000 + 10000) into this Java code:

int temp = 30000 + 10000; System.out.println(temp);

Later, when the JIT compiler of the GraalVM kicks in, it probably runs two optimizations. First, it detects the temporary variable is never used again, so let's inline it:

System.out.println(30000 + 10000);

Second, since both numbers are constants, the JIT compiler can do the addition at compile time. (I'm not sure it does, but that's a common optimization when writing a compiler, so I consider it likely). After running all these optimizations, the code finally looks like this:

System.out.println(-7232);

Oops!

Declaring optimizaion and deoptimization hints

That's not what we'd expect from a JavaScript program. We've forgotten to consider the integer overflow. That's something that exists in Java, but not in JavaScript, so we have to improve our interpreter a bit. Java has an addExact() method raising an exception when an overflow occurs. So we can define two methods. The first method is the default method. The second method is the fallback method, only used after the first overflow has happened:

@NodeInfo(shortName = "+") public abstract class AddNode extends Node { @Specialization(rewriteOn = ArithmeticException.class) int doInt(int a, int b) { return Math.addExact(a, b); } @Specialization int|double doIntOverflow(int a, int b) { double result = (double) a + (double) b; if (result > Integer.MAX_VALUE || result < Integer.MIN_VALUE) { return (int)result; } return result; } }

Again, please note this is pseudocode. I've borrowed the union type from TypeScript. The type int|double indicates the method returns either an int or a double. From what I've read, after running all the optimizations of Truffle and the JIT compiler of GraalVM, that's pretty much what's really going on. However, a look at the real implementation reveals that in reality java.lang.Object is used, plus a lot of autoboxing. At runtime, the JIT compiler detects that autoboxing isn't needed. Following all this black magic, I've received the impression that the TypeScript metaphor int|double is a good match, even if it's not entirely true.

Truffle's approach to deoptimization

Let's have a look at what the code generated by Truffle looks like:

try { int result = Math.addExact(30000, 10000); System.out.println(result); } catch (ArithmethicException e) { deoptimize(); }

As double as the addition is called with small numbers, the integer version of our code is called. When the JIT compiler kicks in, it can only inline the method addExact(). Still, the resulting code is reasonable fast. Only when the exception is raised, the code is replaced by the second method:

int|double temp; double result = (double) 30000 + (double) 10000; if (result > Integer.MAX_VALUE || result < Integer.MIN_VALUE) { temp = (int)result; } else { temp = (double)result; } System.out.println(result);

This time, the JIT compiler can optimize much more. I guess it ends up with this version:

System.out.println(40000.0);

Effectively dead code

Oops. In this particular case, our optimation did more harm than good. So let's add a second example. Imagine an if statement that almost always never calls the else part:

if (dayOfWeek !== 'Sunday') { console.log('It's a business day.'); } else { // 500 lines of convoluted code }

Let's skip a few details. If you're using the annotation @PEFinal as described in the Partial evaluation... paper I've already mentioned above, Truffle compiles your code to this:

if (dayOfWeek !== "Sunday") { System.out.println("It"s a business day."); } else { deoptimize(); }

As long as you run your program only during the week, the 500 convoluted lines never make it to the bytecode run by Truffle. That, in turn, mean they are never compiled to machine code, so your CPU cache isn't jammed with useless code. That, in turn, guarantees your code runs at full speed.

More to the point, it runs even faster than a traditional ahead-of-time compiler could manage. The AOT compiler can't deoptimize the code, so it has to compile the 500 lines, filling your CPU cache with useless clutter. Deoptimization is the Damocles sword of more JIT compilers, but in most cases, taking the risk pays.

Wrapping it up

From a technical point of view, it's correct to call Truffle a framework allowing you to write an interpreter. But still, that's a bit unfair. If you don't mind a little warmup time, Truffle generates code that's almost as good as the code a carefully hand-crafted compiler generates. The JavaScript implementation written with Truffle is more or less on par with Google's V8 engine.

At the same time, it relieves you from having to know much about compilers. That's a remarkable achievement. Roughly fifty years after professor Yoshihiko Futamura invented and described his Futamura projections, a remarkably effective implementation of his idea has landed in the Java universe.


  1. Putting it in a nutshell, that's the same reason why calling a virtual method is slow.↩
  2. At least after a while, when the JIT compiler kicks in.↩
  3. I've also read hints that Truffle uses method handles. However, I can't see how tree rewriting works with method handles. So I'll use byte code extraction as a working hypothesis.↩

Comments