Higgs: the First JIT Compiler of its Kind

October 31, 2013

I submitted a conference paper to Compiler Construction 2014 (part of ETAPS) just two weeks ago. This paper details my recent work on compilation with the basic block versioning algorithm, an alternative to tracing. This work was briefly explained in my DConf 2013 presentation, as well as the talk I gave at Strange Loop, which should hopefully be made available online on InfoQ within a few weeks. I unfortunately won’t know if my paper makes it into CC 2014 until mid-late December, but in the meantime, while keeping my fingers crossed, I’m already planning the third iteration of the Higgs JIT compiler.

Basic block versioning can compile multiple versions of a given basic block based on context information (e.g.: type information) accumulated during compilation. It’s a formalized algorithm for tail duplication which allows a compiler to eliminate type tests, hoist type tests out of loops, and duplicate code paths where multiple types are possible at run time. This approach is similar to tracing JIT compilation in that, like tracing, it’s preoccupied with accumulating type information along linear sequences of code, and using this accumulated information to specialize code farther along, with the important difference that we’re working at a lower granularity, that of basic blocks, instead of traces. I believe that the paper on basic block versioning my advisor and I submitted is an original and interesting contribution, but I’m not stopping there, I already have many improvements planned.

The basic block versioning algorithm as implemented in the current version of Higgs (as available on GitHub) works at the intraprocedural level, and one function at a time. Type information does not currently propagate past function call boundaries. This means that Higgs must do aggressive inlining of function calls in order to eliminate type tests with any amount of success. The runtime library primitives are written as JavaScript functions, which must also be inlined for performance, but inlining, as is currently done in Higgs, is unfortunately rather expensive. The code for most JavaScript primitives must deal with many possible input types, and so primitives end up being relatively large functions. Inlining large functions takes time, but most of the code being inlined is actually useless as most JavaScript operators are not used in a polymorphic way.

The next iteration of Higgs will try to solve this problem by moving away from method-based compilation. The new Higgs JIT will lazily and incrementally compile functions one basic block at a time. Parts of functions that are not executed will never be compiled. I intend to push this approach further, and inline function calls incrementally as well, which means that in many cases, only small parts of functions will actually be inlined. There is significant potential for compilation time savings. There is also the potential for making better inlining and optimization decisions, through smart use of profiling during incremental compilation. Basic blocks, when first compiled, can be made to collect profiling information. These same blocks can then trigger recompilation later to produce more efficient code.

Won’t incremental compilation result in a horrible mess of machine code, you ask? I devised a scheme where blocks can be recompiled, garbage collected and relocated so as to make the machine code more efficient. I won’t get into all the details just yet, but I’ve started examining the technical challenges and I’m quite convinced this is possible. Not only is it possible, it will mesh with basic block versioning in very interesting ways, and open up many possibilities. As far as I know, Higgs is going to be the first compiler to feature incremental lazy compilation of functions and incremental inlining (if anyone has references to prior art, please let me know). Even if Higgs isn’t the first at this though, the combination of lazy incremental compilation and basic block versioning will make it a truly unique beast.

A special thanks to Marc Feeley, Paul Khuong, Luke Wagner, Benjamin Cerat, Tommy Everett and Erinn Di Staulo for helping me review my CC 2014 paper prior to submission.

From → Compilers, Higgs

15 Comments

Peter Goodman permalink

Very cool! Your basic block versioning sounds very familiar to something I do in a similar context: JIT-based dynamic binary translation. I use versions of basics blocks to track contextual state, e.g. one version of a basic block for inside a critical section, one version outside.

Having multiple versions of a basic block gets me runtime code specialization (not my term, of course). I achieve this using instrumentation policies. E.g. One policy is responsible for instrumenting code that will execute in a particular context. Policies can switch between themselves over control-flow instructions. This is similar to saying that a policy names a program state, and that the program is in that state if the instruction pointer is in a basic block instrumented by that policy. Combin this with the return addresses on the call stack and you get something like a pushdown automaton!

Anyway, great post, and if you’re interested in a similar but different approach, then check out https://github.com/Granary/granary (not that the code is particularly approachable).

Reply
- Maxime Chevalier-Boisvert permalink
  
  Very cool. I believe I’ve heard of your project before, and there are indeed similarities. Will you be publishing about your work? If so, I should probably cite it.
  
  Reply
  - Peter Goodman permalink
    
    We recently published some related stuff at HotDep’13, called Behave or Be Watched: Debugging with Behavioral Watchpoints, but no conference papers yet. I was hoping to send something to VEE’14 (deadline was a week ago or so), but our evaluation sucked. Thus, we have nothing directly talking about policy/context-driven instrumentation. Conclusion: no citation burden for anyone yet. If your paper is accepted/published first then perhaps I’ll be citing you ;-)
    
    Reply
    - Maxime Chevalier-Boisvert permalink
      
      Fingers crossed. I’ll know by December 20th!
Andrew Price (@AndrewAPrice) permalink

Hi Maxime, I’m really interested in integrating basic block versioning into my own virtual machine (I came up with a language similar but not identical to Javascript.)

The method I’m thinking of is slightly different to yours, I assume that upon compiling a basic block (it will be compiled on demand the first time it is executed), we know all the type of all possible variables and therefore are able to optimize all for that those specific types.

This assumes that all non-inlined function calls (where we store the returned value) and property property reads would be considered the start of new basic blocks, as it’s a point where we aren’t guaranteed to know the type at compile-time.

My second thoughts are on serializing types. Upon entering a basic block, I need to test the incoming types – I want this to be as fast as possible (a single comparison if possible), and I only want to test the types being used (there may be a type passing through a basic block that isn’t immediately needed or used)

In my interpreter, values are 9 bytes – 1 byte for the type, and 8 bytes (64 bits) for the actual value. My JIT will only target 64-bit architectures, so I want to store the actual value as a 64-bit register, without having to pass the type around.

There are 11 types in my language (nulls, booleans, unsigned ints, signed ints, floats, objects, arrays, memory buffers, function pointers, strings) – so the type can be stored in 4 bits (leaving 5 reserved/internal types).

One 64-bit register can store 16 4-bit types. So we can serialize the types of up to 16 values in a register right before jumping to another basic block:

mov rax, TypeOfVar1 | (TypeOfVar2 << 4) | (TypeOfVar3 << 8)
jmp next_basic_block

Then at the start of the next basic block, we can check: (pseudo-assembly)
if(rax == ) jmp compiled_version_1
if(rax == ) jmp compiled_version_2
; ect for each time
jmp compile_new_version

If we have more than 16 values to pass between basic blocks (which hopefully will be rare) we can use additional registers.

If values are just passing through (let’s say 4 values are passing through the basic block, but only 2 are going to be touched), then there’s no need to compare those values (otherwise we’d generate multiple versions of a basic block that contain exactly identical operations) we could mask only the types we want to check:

rax &= mask_of_values_we_care_about_checking;
if(rax == ) jmp compiled_version_1
if(rax == ) jmp compiled_version_2
; ect for each time
jmp compile_new_version

I’d like to know your thoughts on this, thanks!

Reply
- Maxime Chevalier-Boisvert permalink
  
  What you’re suggesting isn’t what I implemented in Higgs. Not saying that what you’re proposing is a bad idea, you have a clever encoding scheme which could be fairly efficient, but you seem to be introducing extra instructions to check types, extra dynamic branching, which might slow things down. It seems you want to try to capture all the types of live values at once, but this is only efficient if there are a limited number of type combinations entering a given block or function. You should also consider that encoding all the types of values in a bit vector, say before each call to a function, is going to be somewhat slow.
  
  In Higgs, I use the type tests that are part of the language semantics to drive the versioning. I don’t add new dynamic dispatch based on types. This doesn’t guarantee that we have a type for every live value, only the ones whose type has been tested. The idea is that we only need to test the type of a value once, and then we can propagate the information we gained, avoid testing this type repeatedly. Higgs with basic block versioning will execute strictly less type tests dynamically than it would have without basic block versioning.
  
  We wrote up a technical report last year about the way it works in Higgs:
  
  Click to access bbv_paper.pdf
  
  Also just submitted a paper with more up to date results and improvements on this technique. Hopefully it’s accepted. I’ll be able to share it with you in a month or so.
  
  Best of luck with your JIT compiler. Let me know how it goes.
  
  Reply
  - Andrew Price (@AndrewAPrice) permalink
    
    Thanks for the reply. My motivation with encoding types (statically compile it into a single register that can be compared) is not having to pass around the type with the value (except on function returns and accessing properties) because every type is explicitly known based on the version for the basic block you’re executing.
    
    The problem that is plaguing me is figuring out how to test if a value is of a type without encoding it within the value itself (sacrificing bits of 64-bit register) or passing around two values (the type and the value – sacrificing a dedicated register or stack position)?
    
    Reply
    - Maxime Chevalier-Boisvert permalink
      
      That’s what I do. I have a word and a type stack. I only use the type stack for values whose type isn’t known. As soon as you know the type, you can avoid loading and storing it. Most other JavaScript JITs use some scheme where they use the lower bits of words to encode type information.
    - Andrew Price (@AndrewAPrice) permalink
      
      I wouldn’t need to calculate the type encoding at run-time since the compiler knows all types and can simply hardcode it into the generated machine code if needed. If we know all types at all times, then we know at the end of a basic block which version of the next basic block we need to jump to – so we can compile that specific version of the basic block ahead of time (while compiling the previous basic block) and hardcode in a jump directly to it, eliminating the need to do any type checking.
      
      There are two special cases when we don’t know a type while compiling a basic block; 1) reading a property and 2) calling a function (and using the result). So function calls and property reads split a basic block into two. At the end of these special cases, we only need to test the type of what we just received, then compiling or jumping to the correct version of the basic block for that type. The only time we need to pass the type encoding is in function calls, so the first basic block of the callee can figure out which version to compile/run.
      
      The main difference I see is that you’re doing the type checking when you first do something with a variable, whereas I’m doing the type checking when I first receive a value (either as a function calling parameter, looking up a property, or returning from a function). I honestly think your version of basic block versioning will be more efficient than mine – you have far less function call overheads, and types that aren’t used and just passing through are never typed checked.
      
      I haven’t figured out how to do an upper bound to the number of versions a basic block may have (I may not have an upper bound and see if this is actually an issue). I just need a different name to call it than “context-driven basic block versioning”. :)
    - Peter Goodman permalink
      
      Your approach is similar to what I do in Granary and Granary+: all information related to the instrumentation of the next block is known at the end of the current block, and so only needs to be passed to the translator. Eventually the edge code is made unreachable because of runtime code patching. Applying this same technique to a dynamic language, modulo field accesses, function call/return, and global variable accesses seems reasonable.
      
      In terms of finding the next block based some some type info, I suggest checking out the paper describing the PIN binary translator: it seems like the way of chaining a few test and jump stubs together would work reasonably well.
      
      In terms of version explosion, one potential solution is to do dynamic code “deduplication” / modularisation via function call/return: find short sequences of straight-line code that can be re-used by making them into functions. Another approach is compensation code. Another approach is to handle flushing the code cache at fine granularities.