Tree Rewriting

Most of the research around GraalVM is about optimizing performance. Even better, much of revolves around how to make optimizing performance a walk in the park. Our previous article covered several of the optimization strategies of GraalVM. Now let's have a look at how to implement such an optimizer. Many articles mention the role of the abstract syntax tree, or AST in short. Almost everything the optimizer has to do with this tree.

Truth to tell, using an AST is pretty much what every optimizing compiler does. It's not limited to GraalVM. It's also what Oracle's HotSpot compiler does, or Google's V8 engine, or your average C compiler. Abstract syntax trees are a recurring topic in compiler technology.

So let's have a look at the tree and what to do with it. You'll learn something about GraalVM, and you'll learn something about optimizing compilers in general. We've also decided to add some background information, so you'll even pick up some insight about human languages along the way.

Level one: detecting units of meaning

Let's start with human language. English, in particular. Did you know human languages share many properties with programming languages?

Have a look at this sentence:

The cat eats a mouse.

What do you see?

Yeah, sure, it's a simple sentence. Bear with us. It's not that simple when you're trying to process it with a computer. Mind you, we humans had a lot of time to get acquainted with spoken language. Scientific proof is scarce and uncertain, but from what we've learned, we'd estimate human language roughly at one hundred thousand years, give or take a few. It took nature four billion years to get to this point. So maybe this simple sentence is not as simple as it seems.

Linguistics - the scientific key to parse the sentence

Chomsky hierarchy by J. Finkelstein published on Wikimedia under an CC BY-SA 3.0 licenseIn fact, there's a lot of science covering this topic. Maybe you've heard about Noam Chomsky in your computer science classes. That's funny because Chomsky's ideas are about natural languages. Probably he didn't care much about computers: he developed most of his theories in the 50s and early 60s of the 20th century. It's just programming languages are a much better match to his theories. Most programming languages fall into the context-free category. Human languages are much more complicated. You can consider them context-free or recursively enumerable - the outer bubbles in the onion picture on the right-hand side - but only if you don't look too closely. Spoken language is much more chaotic. Just think of all the unfinished sentences we all utter.

Lexical analysis

At first glance, our example sentence is just a string of characters. However, that's not what you see when you're reading. You're viewing a sequence of characters separated by white spaces. Even if you can't read, we're pretty sure you can't avoid recognizing this pattern. The white spaces may not mean anything to illiterate people, but it's almost impossible to ignore them.

Computer scientists call this process "splitting the sentence into tokens." Or tokenization, for short. There's even a special branch of computer science dedicated to breaking down a sentence to tokens. These programs are usually called "lexer." In the Java world, you'll usually meet ANTLR when you need to write a lexer. That's a tool allowing you to generate a lexer. Mind you: lexical analysis is such a common task that there are programs to generate lexers.

Simplifying things a bit, we've reduced the sentence to three tokens:

Grammar makes the difference

Awesome. Now we've got three individual units of meaning: cat, eat, mouse. But what does that mean? Do the cat and the mouse eat peacefully side-by-side? Does the mouse eat the cat?

Of course not. English sentences aren't just an arbitrary sequence of words. They are following certain rules. In computer science, that's called a "production rule." The most important rule is the "subject-predicate-object" rule. So in English, it's always obvious who's eating whom:

Side quest: grammars in real life

In other languages, people may read the sentence differently. For example, if you stress the words correctly in the German language - something you can't do in print - it's perfectly legal to reverse the words of the sentence. German speakers usually do this to stress it's the mouse who's eaten. Maybe the speaker is fond of the mouse for some reason. In such a situation, they can reverse the word order. The font size of the diagram hints at the voice-only stress disambiguating the sentence:

The mouse eats the cat.

Granted, this sentence looks completely odd if you're a native speaker of English. Plus, it's only a coarse approximation to how Germans really pronounce the sentence. Just believe us. There are languages with an incredibly flexible grammar. Mind you: more flexible also means more difficult to cope with. Everybody who's ever tried to learn classical Latin knows what we're talking about. The general tendency of human languages is to get simpler over time. So the reduced grammar of English is probably the future of human language, and that's good news. Nonetheless, if a native speaker of German hears this sentence, they'll probably decipher it like this:

Obviously, grammar makes a difference. It puts the words - or tokens - into relation. Languages like English forces you to put the words in a certain order to express yourself. Other languages, like Latin, allow you to permutate your words freely within a sentence. That doesn't come for free: Latin disambiguates the sentence using flexions and conjugations. But in every language, it's possible to analyze the sentence, and to arrange the words in a tree-like hierarchy.

What does this have to do with programming languages?

Almost every programming language has an extremely boring grammar. Usually it's impossible to reverse the word order in line of code. That's a big difference to human languages. There's no Latin in computer science. All popular languages are like English: a simple but versatile grammar, apart from Scala and Groovy. Even so, programming languages switch between reading left-to-right and right-to-left all the time. How to you implement the difference between these two expressions?

System.out.println(3 + 4 * 5); System.out.println(3 * 4 + 5);

Well, you've already seen the tree. Operator precedence is just a matter of creating a slightly different tree. The operator with the lowest precedence order is always the parent node. It's really that simple! Our two examples translate to these trees:

3 + 4 * 5

3 * 4 + 5

Introducing the abstract syntax tree

That's pretty much the idea of the syntax tree in computer science. Or abstract syntax tree, aka AST, as it's often called. Computer languages tend to be more restrictive than human languages, so the general principle is that every line of your source code can be arranged in a tree. That's precisely what your compiler does. Even your IDE does the same if it supports syntax highlighting and displaying errors at write time.

That's a humbling thought. Modern IDEs are so much more than simple text editors. Much of their magic requires a deep understanding of the structure of your program. The difference between compilers and IDEs has become blurry during the last decade. Syntax highlighting is easy. But many of the advanced features we all love have to do with parsers, lexers, compilers, and ASTs. Just think of refactorings and "quick fixes". Today, I've seen a "quick fix" of IntelliJ covering four lines. If offered to replace a complex if-then-else statement by a Lambda. I didn't like the proposal - functional programming isn't automatically better than procedural programming - but I was deeply impressed by the algorithm of the IDE. The if statement consisted of several expressions.

Drawing an abstract syntax tree isn't limited to single lines. You can draw your entire application as a tree, even if it consists of a hundred thousand lines. That's a powerful feature. We'll see in a minute how your compiler can optimize individual lines of your source code using the syntax tree. However, the optimization algorithm isn't limited to individual lines. It targets the entire tree. Your optimizing compiler may prune the tree of your application to a tiny bush if that's all that's needed.

GraalVM and HotSpot improvements

But who's to decide what's needed or not?

Traditional compilers - e.g. compilers from the pre-Java age - run before the program is executed. That's why they're called ahead-of-time compilers, aka AOT compilers. All they can do is to read your source code. They don't have a clue how the code is run. They don't know which lines are run frequently, which lines are run rarely, and they can't even detect dead code. So they compile every line of your application to machine code because it might be needed. They generate machine code no matter if it's needed or not. It might be used, so the AOT compiler has to generate the code, even if it's only run once in a year.

The invention of the just-in-time (aka JIT) compilers changed that. JIT compilers kick in after observing how your application is used for a while. That allows for a wide range of aggressive optimizations. For example, Truffle allows language designers to compile the then part of their if-statement without compiling the else part. More often than not, the else part is never ever called. So this bet pays off. And when the else part is called for the first time - or often enough to justify the effort of compiling the code - the optimizer can still replace the AST tree by a different, optimized version of the syntax tree. In the meantime, the optimizer performs its magic on the low-hanging fruits. More to the point: the time the compiler saves by not compiling the else part of an if statement can be used by the optimizer.

Are you still with us? We've been a bit fast. Let's dissect our bold claim step-by-step.

A simple, static example

Let's start with something straightforward. Have a look at this Java statement. How much time does it take to execute this statement?

if (6 + 3 == 5) { System.out.println("Hello world"); }

As we've mentioned above, this statement is re-phrased as an abstract syntax tree. Something like this:

If you were to write an application optimizing this algorithm, what'd you do?

Well, the bottom-most knots of the tree are constants. They'll never change. So why don't your perform the operation of the parent node at compile time?

Doing so, you'll add the numbers 6 and 3 and replace the addition with the result of the addition. A small subtree is replaced by a single node. That's called tree rewriting. After doing so, we've got this tree:

Tree rewriting can be done on all levels

Now let's have a look at the comparison 5 == 9.

Again, the child nodes - 5 and 9 - are constants. So we can evaluate the comparison at compile time. It's always false:

Now, what use is an if statement if the expression is always false? In our case, there's no else branch, so we know for sure there's no way our then part of the if statement if ever called. Let's get rid of it!

All that's left is - nothing. We've used the NOP operation of Assembly language to symbolize this. In reality, your precious if statement is simply removed from memory.

Benefits of immutability

Tree rewriting works best if the nodes are constants. That's one of the reasons why both the immutability pattern and functional programming have become popular in recent years. Both approaches help your compiler do a great job.

In contrast, using side-effects is a catastrophe, at least from the point of view of your compiler. It can't optimize much because it always has to take the side effects into account.

Aggressive optimization

Enter the HotSpot compiler. And its younger, bolder sibling, the GraalVM compiler.

These compilers optimize your code even if it's wrong. Or rather: they optimize your code if there's a small possibility that the optimization is wrong. For example, you've probably written code like this on a daily basis:

public readFile(String filename): String { if (null != filename) { return File.readFile(filename); } else { throw new MissingFilenameException(); } }

Will you ever call this method with a null filename? Unlikely. So why should the compiler compile the else statement at all?

A similar topic is classes implementing an interface. In real life, 99% of your interfaces are implemented by exactly one class, at least in production. Your test code may look differently. So it's a good idea to simply assume there's exactly one implementation. This allows the compiler to generate much faster code.

The advantage of a just-in-time compiler is that it can recognize such patterns. So it can generate optimized code for the "good case".

Here's the catch: even if the profiler has observed the bad case is never called, there's no way to predict the future. The else statement might be called every once in a while. Sometimes people write two classes implementing an interface.

So the compiler adds a guard. Before executing the optimized "good case" Assembly code, it checks if the assumptions are still true. If not, the JIT compiler throws away the optimized source code instead of executing it. It re-compiles the code using the updating knowledge of the world, or it calls the interpreter or the C1 compiler.

No matter which option the HotSpot compiler or the GraalVM compiler chooses, it prefers to use the slow-but-safe route instead of cutting corners. The art of the compiler team is to implement clever guards which don't cost much time but still detect "bad cases" reliably.

We'd love to cover partial compilation in more detail. However, this is already a long article, so we simply point you to the academic paper covering partial evaluation in Truffle and our own article about the compiler-compiler called Truffle.

Wrapping it up

About the co-author

Karine Vardanyan occupies herself with making her master at the technical university Darmstadt, Germany. Until recently, she used to work at OPITZ CONSULTING, where she met Stephan. She's interested in artificial intelligence, chatbots, and all things Java.

Computer languages are like human languages: they follow rules, just a bit stricter. That's great if you're writing a compiler. Knowing the rules, you can start implementing algorithms to optimizes common patterns. There are many real-life examples. We've shown how to optimize an apparently useless if statement to nothingness. In real life, that's an exotic use case, but aggressive compilers like GraalVM can still exploit the idea successfully. For example, there's no need to compile the else part of your if statement if it's virtually never called. The nasty keyword being "virtually". It forces you to add a guard, plus implement an expensive algorithm to de-optimize your code.

Even so, this extra efforts pays more often than not, provided your heuristics are tuned to real-life applications. Among other things, the secret to success it that Java bytecode - as well as many IR codes of other programming languages, such as .NET - has been optimized with memory footprint in mind. It's easy on your CPU's L1 cache. The optimized Assembly code tends to be much heavier, threatening to leak to the L2 cache. So it's a good idea not to compile code you don't need, because the non-compiled version still fits into the L1 cache.

How cool is that?

Dig deeper

Chomsky hierarchy

Partial evaluation in Truffle

Beyond Java