Concepts of programming languages

Graal – Towards the Holy Grail of Polyglot Programming

Anybody ordered a new JVM compiler? Whoever ordered it – now it’s there, and it’s exciting! But wait: is it a compiler? No, it’s not. It’s more. Currently, it’s half a dozen compilers, plus a framework making it easy to write your own compiler.

In a nutshell, this is what the Graal project promises:

  • It’s a compiler generating faster Java code than ever. Or it will be, in a few months. The version bundled with Java 10 is marked experimental, probably for a good reason, such as a performance penalty in some applications.
  • It’s a compiler everybody can understand and modify. People like you and me can contribute improvements.
  • It brings polyglot programming to a whole new level.
  • Plus, it’s a great opportunity for minor or new languages. It’s never been so easy to create a new language with a decent performance from day one.
  • Finally, it brings AOT compilation to the Java world. That’s a big plus in cloud environment like AWS Lambda, taking the scare out of hibernation.

OK, you get it: I’m excited. Let’s examine Graal piece by piece. And let’s have a look at the current state of the art.

A word of caution

Graal, SubstrateVM, and Truffle are fairly new to me, so most of this article isn’t backed up by hands-on expertise. Instead, I collected information from blogs, presentation talk videos, and slides. If you’re more familiar with Graal than I am, don’t hesitate to leave a comment pointing out whatever I got wrong.

What is Graal?

Graal is a plugin to the JVM, so let’s start with the JVM first. Among other things, Java 9 implements JEP 243, which sort of removes the JIT compiler from the JVM. The JIT compiler we’re used to is still there, but now it’s “only” the default implementation. You can replace it by a different implementation using arcane JVM parameters (-XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler, plus the path to the jar files containing the new compiler). Chris Seaton has written a great blog containing all the gory details, so suffice it to say it’s possible, and it’s easy, too.

So now the JIT compiler converting the Java bytecode isn’t baked into the JVM anymore. It’s a plugin. You can replace it with another plugin. One of these plugins is the Graal compiler (and most likely, it’s the first and – for a long time – the most important plugin).

The JIT compiler has been rewritten from scratch!

Graal, in turn, is exciting because it’s a complete rewrite of the Java compiler. Traditionally, the low-level code of the Java virtual machine has been written in C++. That was a good decision as long as Java was an interpreter language. But it always puzzled me when the JIT compiler was added to Java. One of the best practices of compiler construction is to write the compiler in the language to be compiled.

Sounds weird? It is.

But it makes a lot of sense once you take into account that the first few generations of the compiler are written in an already-established language before switching to the target language. So you write a little Bootstrap compiler, implement a couple of features of the new language, and as soon as the language is powerful enough to write a compiler, you replace the original compiler by a compiler written in the new language.

The rationale behind this is that writing the compiler in the new language is a great test case. Plus, it’s a great motivation to optimize the performance of the language.

In the case of Java, it took a tad longer to get there.

It’s Java, dude!

Writing the Java compiler in Java has other benefits, too. Now every Java programmer can experiment with the compiler. C++ is considered a hard language, so few people took the pain to download the source code on their own computer. I guess implementing the compiler in Java itself is going to result in countless pull requests improving the Java language. In other words, it allows for crowdsourcing. That’s why I believe the performance of the Java programs compiled by Graal is soon going to overtake Java applications compiled by the traditional HotSpot+JRockit compiler.

Of course, there’s also the combined weight of the history of the existing compiler. Like every other software, it became more and more difficult to maintain over the years. In the software industry, it’s often a good idea to start over again. It’s a pity that starting over is almost never an option. No matter how high the cost of maintenance is, the management always compares it to the money they’ve already invested in the product. From this perspective, it’s a lucky circumstance that Oracle decided to invest time and money to write a new compiler.

Graal operates on an AST

There’s a second reason why Graal might perform better than HotSpot. The HotSpot compiler operates on the Java bytecode. Graal takes a slightly different approach. The Java part of Graal also operates on the Java bytecode, but internally, Graal works on an abstract syntax tree (aka AST). That’s a fairly generic representation of the program code. Every modern programming language naturally translates into an AST. Plus, if my sources are right, it’s easier to optimize the AST than the underlying bytecode. (Note that I’m not entirely convinced: I always assumed every optimizer first converts the code to an AST).

Graal does not necessarily operate on Java byte code

Be that as it may, an important message is that the input of Graal is not necessarily the Java bytecode. The Truffle project is an alternative source of input. Truffle bypasses the bytecode, converting human-readable source code directly to the AST representation. Graal is a compiler translating this AST tree to machine code. In other words, it’s a great engine you can simply use.

That’s great news for everybody planning to create a new language. They don’t have to create the compiler from scratch. All they have to do is to write a parser translating the source code to a Graal AST. After that, they can use the infrastructure provided by Graal to translate the language to machine code. Most likely, they have to add or tweak a few things, but Graal has been designed with being extended in mind.

That’s also great news for dynamically-typed languages like JavaScript or Ruby. In the past, these languages had a hard time to adapt to the Java virtual machine. It simply didn’t know dynamical types. Either a memory cell represents an integer, or a string, or a method, but there’s no way a memory location can represent either of them. The only way to circumvent this limitation is to use objects.

Graal, in contrast, has been written with dynamic languages in mind. It does so by making the type information optional in the AST tree. If the compiler can induce type information from the context, it can emit optimized code. Otherwise, it emits fairly generic code. See this presentation (slide 62 – 66) for more details.

Truffle

Now for the part that’s been confusing me at first. How can Graal surpass class files for languages like JavaScript or Ruby? Doesn’t the JVM work solely on class files?

The Truffle framework solves this problem. It’s a framework for implementing languages as simple interpreters. It allows you to easily convert your language to an AST tree that can be compiled by Graal.

There are already implementations for JavaScript, Ruby, and R. So you can run a JavaScript application like so:

import org.graalvm.polyglot.*;
 
...

 Context context = Context.create();
 context.eval("js", "print(4+5);");
}

That’s the version they show you at conferences. But it’s just the beginning. You can run entire scripts from the command line, pretty much the way you start these scripts with node.js today. I haven’t seen a demo running – say – an Electron application in the GraalVM yet, but in theory, you should be able to do so. When I tried to create a multi-file project, GraalVM told me it doesn’t know require.js. However, I don’t know if it’s simply a configuration error, or if it’s because the JavaScript part of GraalVM is still in its early days.

Excursion: is the module system a mandatory part of a JavaScript engine?

Update March 27, 2018: Since writing the article, it dawned on me that the error concerning require.js is on my side indeed. Well, partially, at least. Originally, require.js was simple a library you can add to your application to benefit from a module system. This works fine in the browser: simply add another <script src="scripts/require.js"> tag to your HTML file. However, GraalVM is running on the server, so there’s no HTML file. The obvious solution is to use a tool like Webpack to merge your entire JavaScript application into a single file.

There’s a “but”: Currently, you have to install Node.js to run Webpack. So the Graal JavaScript engine isn’t a full-blown replacement for Node.js yet. It might become such a replacement in future, but currently, the goals of the project seem to be more down-to-earth.

There’s a second “but”: Node.js does have a built-in module system. Plus, the current ECMAScript versions define a module system for JavaScript. On the long run, I don’t consider supporting the require() method optional.

Cutting a long story short, I expected too much of Graals JavaScript engine, but only by a margin. On the other hand, chances are the feature’s I’m currently missing are added in the future. Implementing them doesn’t sound difficult to me.

Implementing your own language

I guess you’ve already got an idea how to implement your own language. Use Truffle as a framework to translate the source code to an AST, and use Graal to compile the AST to machine code. Chances are you can use AST nodes that already have a compiler. If you really have to define your own AST node type, it’s easy to add it to the Graal compiler and to write a machine code generator for the new node type, as Chris Seaton demonstrates. At least it’s easy compared to the traditional “create everything from scratch” approach.

OK, that sounds a bit abstract, so here’s an example. The GraalVM provides AST nodes emitting machine code. Local variables can be stored in slow main memory or in a fast CPU register. However, there are only a few registers. So it’s a high art to map local variables to CPU registers. Parameters are a similar story. Traditionally, parameters are simply stored in the cache, but that’s a slow memory access. Sometimes it’s possible to replace the stack by CPU registers. Graal brings a fairly sophisticated algorithm implementing the register mapping. You can just use it. Before Graal, you’d have to implement it yourself. That’s why many languages simply use the stack, which saves a lot of development time but costs a lot of performance.

Ahead of time compiler for Java

Another nice feature is the AOT compiler that’s already being shipped with Java 9 and 10. At first glance, the idea of an ahead-of-time compiler for Java sounds a bit odd. We’ve been told year after year that an AOT compiler can never beat the performance of the JIT compiler. The JIT compiler has the opportunity to watch the behavior of the running code. So it can run a lot of optimizations an AOT compiler can never do.

The disadvantage of a JIT compiler is that it starts in interpreter mode. That’s a couple of seconds performance penalty when the JVM starts. In traditional environments, that’s not a big deal because the server is running continuously for day and weeks. However, environments like AWS Lambda switch off the virtual machine when it’s not needed. So Lambda users frequently encounter a cold start. That’s why JavaScript is currently a better choice for AWS Lambda than Java: Node.js starts a lot faster.

The AOT compiler promises to change that. It allows you to precompile your application (or parts of it), taking the sting out of the cold-start phase.

If you want to experiment with the AOT compiler, have a look at the JEP 295 page which has a lot of hints how to use the AOT compiler.

State of the art in Java 10

As far as I can see, the AOT compiler and the new Java compiler are the only part of Graal that’s shipped with the standard Open JDK 10.

If you want to experiment with polyglot programming, downloads the GraalVM. It’s tested with Java 8, 9, 10, and 11, but the only public distribution is a Java 8 build.

After unpacking the tar.gz file, simply add the path to the JavaScript engine to the global path and run a JavaScript file like so:

export PATH=<your installation path>/graalvm-0.32/Contents/Home/bin/:$PATH // OSX
PATH=<your installation path>/graalvm-0.32/bin/;%PATH% // Windows
js demo.js

As mentioned above, I only manage to run single-file projects so far. For some reason unknown, I didn’t manage to use the module system of JavaScript. However, there’s also a Maven artifact for creating projects with Graal and Truffle. The source code it generates mentions requires.js, so maybe it was just a configuration issue. Or it may be a current design decision, as I’ve mentioned above: tools like Webpack allow you to bundle large projects into a single JavaScript file, so there’s no reason to support more than single-file applications.

Wrapping it up

Graal is an all-new rewrite of the JIT compiler, and it’s exciting for various reasons: it makes it easier for Java programmers to examine and to optimize the compiler and the optimizer because you don’t have to learn a new language besides Java, it’s been written with other languages than Java in mind, it opens new opportunities for developers and researchers writing their own programming language, and it brings an AOT compiler to Java. On the long run, this may (and probably will) result in an even better Java performance and easier integration with other programming languages, such as JavaScript, R, Python, just to name a few. In the case of R and Python, even the still-experimental Graal version of today (March 2018) already emits faster code than the established compilers.


Dig deeper

Download page from Oracle labs
Polyglot programming on the JVM with Graal by Akihiro Nishikawa
Learning to use wholly GraalVM with a large hands-on section
Understanding how Graal works – a tour de force through Graal, teaching you no less than how to tweak your compiler or how to write your own language


Leave a Reply

Your email address will not be published.