How I miss you, dear SSA
To recap, I recently succeeded in getting inlining to work in the Higgs JIT compiler. I went a little further and implemented some very basic logic to adjust the topmost stack frame. This allows replacing the currently running function by an optimized function in which code is inlined. It’s a limited form of On-Stack Replacement (OSR) which gets me most of the win without too much complexity. Higgs is now able to inline code in the large majority of functions. This isn’t sufficient to make inlining worthwhile, however. I needed to make the JIT be able to optimize the inlined code in the context of caller functions. This is where I ran into problems.
The Intermediate Representation (IR) used by Higgs is centered around the interpreter. It’s what’s commonly referred to as a 3-address representation. Instructions operate on mutable stack slots which are used both as inputs and outputs. The Abstract Syntax Tree (AST) gets directly translated into this IR and is immediately ready to be executed by the interpreter. My original idea was that the translation from source code to IR should be quick, and the IR should be in a format that’s reasonably quick for the interpreter to execute. Doing a naive JIT compilation of this IR to machine code isn’t difficult and provides very significant speedups. Where it gets problematic, however, is if one wants to do even basic optimizations on the IR.
Properly optimizing the IR with things as simple as constant propagation requires being able to do algebraic substitutions. This is where things get hairy quickly. The fact that stack slots are mutable means that we can’t easily do substitutions. We don’t know where all the values of a given slot may be used. We don’t know where it may be defined. We have no information about how values flow from one instruction to another. Acquiring this information requires a separate analysis pass. Such an analysis, which would produce use-def and def-use chains, isn’t very difficult to implement. The problem is that the information we obtained would be invalidated when we transform the IR. We may have to run this analysis multiple times for successive optimization passes.
Functional programming buffs may begin to see the fundamental problem here. The 3-address IR works with mutable, “impure” values. It’s a very literal IR, aimed at running code on a Virtual Machine (VM) in which we have virtual registers that have actual concrete addresses. The information we want for optimization, on the other hand, is rather symbolic. We want to work with values that flow from one operation to the next, and are ideally immutable, so that we can be able to perform algebraic substitutions and other simplifications, in the mathematical sense. Fortunately for us, this problem has already been solved in the compiler world. There is an IR out there called SSA, which stands for Static Single Assignment form and it has some very nice properties which make working with it much more enjoyable.
I’ve already worked with SSA, in the Tachyon compiler, and found it rather pleasant to use. I thought I was simplifying things, reducing overhead by going with a more literal IR, but I’ve now concluded that this was a mistake. I decided to bite the bullet and begin the tedious work of refactoring Higgs to use an SSA-based IR instead. This is a big piece of work because the IR is probably the most central component of Higgs. The interpreter, the JIT, and the analyses all work with the IR. So far, I’m working on the biggest piece of the puzzle, the ~2500 lines of code that build the IR from the AST. The code I’ve refactored so far makes a lot more sense, probably because it decouples the problem of extracting the core language semantics out of the AST from the problem of assigning stack slots to values in a semi-efficient manner, resulting in overall simpler code.