Positive Results for Higgs

April 10, 2014

Great news! The Higgs GitHub repository has 204 stars and 22 forks at the time of this writing. Even better news, however, is that I’ve finally reached a point where Basic Block Versioning (BBV), the compiler optimization technique I’ve been implementing in Higgs as part of my PhD thesis, finally delivers measurable speedups on benchmark programs. Although I was able to demonstrate a few months back that BBV is able to eliminate dynamic type checks quite effectively, one important problem remained, which is that this didn’t immediately translate in improvements in terms of running time. Many optimizations are required in order to make dynamic languages fast, and the quality of the code generated by Higgs was just too poor: removing 50% of type tests or more didn’t make a noticeable performance impact. It was like removing one drop from a bucket of inefficiency.

Recently, after measuring that property access took more than 30% of the execution time in some benchmarks, I decided to implement inline caching into Higgs. This provided noticeable speedups, but still didn’t make BBV win out in terms of running time. In fact, the results on most benchmarks were worse with versioning enabled. I concluded that the issue was probably largely one of code size. By generating multiple versions for some code paths, the code size increases. This can result in worse performance in terms of instruction cache accesses. Modern CPUs have large L2 and L3 caches multiple megabytes in size, but the instruction cache of the Core i7 is still just a puny 32 kilobytes. This cache is easily filled up, especially if the code you’re generating is rather bloated. Hence, I decided that my next focus should be to try and optimize the size of the machine code generated by Higgs.

I started by implementing a very naive on-the-fly register allocator. The first version spilled values very conservatively and produced many redundant writes and spills whose only purpose was to avoid having the Garbage Collector (GC) ever seeing false pointers (a flaw which has now been fixed). My main concern was to get Higgs to pass all its tests with the register allocator, and only then to optimize it. After many hours of debugging, I got it all working, and to my surprise, despite all the redundant operations, the unoptimized register allocator produced much smaller code and much improved performance. The performance was improved enough, in fact, that BBV finally won out in terms of running time on most benchmarks. I was extremely surprised to find out that on loop microbenchmarks, the naive register allocator produced code that ran more than 10 times faster than similar code loading and storing values on the stack. My conclusion is that when it comes to memory accesses, it’s really the reads that kill performance as the CPU doesn’t have to stall execution for redundant writes.

At this point, I’ve completed a few optimizations to the register allocator. Most of the redundant moves have been eliminated, but spilling still remain to be made less conservative. I’ve also stumbled upon a rather interesting finding. In some cases, it actually turns out that versioning doesn’t increase code size, but actually decreases it. Sometimes, the code it produces is more optimized and more compact than the original. So far, the results are extremely encouraging, but I still believe I can improve on them with additional tweaks. These results are coming in just in time for my presentation in May at DConf 2014. I’ll also be submitting a new publication at a conference in early June.

The performance of the machine code generated by Higgs is slowly but surely becoming more competitive. As I see it, the main flaw Higgs has in terms of usability at this point is that the compilation time is rather slow. Compilation time now largely dominates execution time on most benchmarks. This is something I haven’t had much time to look into. The codebase has become complex enough that I can’t exactly pinpoint where the major inefficiencies might be without some digging. If you’re interested in contributing to this project, profiling and optimizing parsing, analysis and code generation is something we could definitely use help on.

In other news, Tommy Everett has delivered an updated version of his FFI library, and is now working on a graphics library of 2D graphics bindings for Higgs, which he used to create a Pong clone in JavaScript. Stay tuned!