Upcoming talks about Higgs
I’ll be giving a talk about my research at mloc.js in Budapest on February 13th. This is going to be the first of my conference talks that is entirely focused on Higgs and my current research. The talk will be filmed and should appear online on InfoQ and Ustream. I’ll also be putting the slides online after the talk. In other news, I received an invitation from Andrei Alexandrescu to present at DConf this year again. I’m quite looking forward to this as I had a very positive experience at DConf 2013, where I met many very knowledgeable systems engineers (I may also have a crush on the SF bay area).
For those who would like to see some results of my current research right away, I’ve made available a paper which was recently submitted to CC 2014, but unfortunately rejected. This paper explains in depth the technique of basic block versioning, and gives results comparing it to a traditional type propagation analysis. The main reason stated for rejection is that we did not provide comparisons in terms of overall running time (due to limitations of Higgs). I still believe the results are quite encouraging and make a convincing case for basic block versioning. My PhD advisor and I will be submitting an improved version of this paper in a few months.
New JIT online
The lazy incremental JIT I’ve been working on is now in the master branch. It’s far from complete, but it’s reached the crucial milestone where it passes all of the Higgs tests and runs all of the benchmarks. There is currently no type propagation and no register allocation, but I’ve already started implementing incremental inlining of primitive calls. The inlining is currently done in a rather unsophisticated manner which doesn’t eliminate all call overhead, but this will be fine-tuned.
Performance-wise, the new JIT is comparable to the old one. It’s faster on most of the larger benchmarks, probably because there is no more interpreter and the new JIT generates machine code faster. Thus, there is less warmup time required before reaching full speed. I will be working on reintegrating the full power of basic block versioning and improving the performance within the next few months. I expect to exceed the performance of the old JIT on all benchmarks within a few weeks.
Following our recent announcement that we were looking for contributors, we’ve started to build a Higgs wiki and created a #higgsjs channel on freenode IRC. It seems that this work has paid off as we’ve gotten several pull requests for bug fixes and improvements.
Notably, zimbabao has sent us bug fixes to the runtime library, the regex engine and the standard library. Thanks to him, the SunSpider crypto-aes benchmark finally works properly. He’s even fixed bugs I wasn’t even aware of! Óscar Toledo has helped us find and track down two parser and two semantic bugs using his nanochess JS1K program. I’ve also taken the initiative of fixing several minor semantic bugs to make Higgs respect the ES5 specification more closely.
My number one, Tom Brasington, has completed a new test framework which will make it easy to add new functionality and regression tests by including a JS file in the source/tests directory and sending us a pull request. He’s also set up the travis build test system to notify us if a build fails. Tom is currently working on an improved, more convenient version of our FFI library. We have plans to build bindings to SDL and network APIs, among other things.
Should you wish to help, we’re always looking to make Higgs more JS compliant, add new useful features and grow our collection of tests. If you find a bug in Higgs, please open an issue and/or send us a pull request with a failing test. You’re also welcome to write useful libraries to be included with Higgs. You can help us make Higgs insanely great.
We’ve been told that real-time ray tracing is coming soon. It can produce superb renderings with unattained levels of realism. It works by simulating light, and leads to much more elegant architectures. It’s the future, but unfortunately, it’s been coming soon for several years now. Like thorium reactors, it’s one of those technologies that just seems to make so much sense, but that isn’t getting adopted, probably because we already have something that works “well enough”, and we aren’t yet willing to invest the effort required to convert to a radically different technology. There’s too much inertia, we haven’t yet reached that breaking point. Still, the potential is there, and with the advent of GPGPU, real-time global illumination is very close to becoming a reality in the world of gaming, as demonstrated by the Brigade renderer.
I should point out that ray tracing, path tracing, and global illumination are not the same things. Global illumination is a simulation of the entire light transport within a scene, which accounts for the ways in which light can illuminate objects indirectly, through several reflections between multiple surfaces. It produces superior renderings because it accounts for phenomena such as soft shadows, caustics, and color bleeding. Path tracing is an extended form of ray tracing which achieves global illumination, usually by tracing inverted light rays from the eye to the light sources in a scene. This is done through nondeterministic, approximate sampling, because there is an infinite number of paths light rays could take through a scene.
The way images are currently generated with path tracing is to send many rays through each pixel of the image. These rays then randomly scatter in the scene in ways that obey the properties of the materials they encounter, and eventually reach light sources. The problem with this is that the generated images have significant visual noise (grain), and it takes a large number of rays per pixel to eliminate this noise (easily up to 256+ rays per pixel). As you can imagine, this takes a tremendous amount of computational power. Generating a 4 megapixel image could easily require tracing a billion rays, with each ray bouncing several times through the scene. This is very difficult to achieve at real-time framerates, even with today’s GPUs.
Lately, I started thinking that maybe there’s a more efficient way to do this. Tracing hundreds of rays per pixel seems enormously wasteful. If there’s one thing I learned from my computer science education, it’s that algorithmic optimizations can pay off orders of magnitude more than clever little code tweaks. We’re rendering each pixel independently, when in fact, we know that the colors of pixels are rarely independent variables, because most images contain structured patterns. Maybe path tracing ought to take a hint from the world of image compression, and try to generate images using less information, less samples. It’s not just about the amount of data or raw computational power you have, using it intelligently matters as well. More concretely, I’ve been wondering if it might be possible to inspire ourselves directly from JPEG compression, and trace less rays to try and estimate DCT coefficients for multi-pixel patches, instead of individual pixels. I’m speculating here, but I believe that such an approach might be able to produce less noisy images from much less samples (i.e.: 20 or 50 times less).
The Higgs Roadmap
How You Can Help
There are a number of areas where help is needed on the Higgs project, for example:
- Trying out Higgs, finding and reporting bugs
- Finding new benchmarks for Higgs
- Writing useful libraries
- Repackaging existing JS libraries
- Profiling to improve compilation time
- Implementing support for ES6 features
We’ve opened several issues on GitHub detailing specific tasks that you could contribute to. Some of these are tagged as “easy” as they don’t require much knowledge of the existing Higgs system, and are a good starting point for newcomers. If you’re interested in contributing, we encourage you to take a look at the issues list, as well as the How to Contribute page. I’d also like to say that if you have a specific idea you’d like to implement in Higgs, a specific feature you’ve thought of, we’re open to that too. You’re welcome to pitch us your own ideas.
More about Higgs
We’ve started to document Higgs in Higgs Wiki. For instructions on how to install Higgs, see the Quickstart page. Also, we now have an IRC channel (#higgsjs on freenode) in which you can reach us on most days.
I will be giving a talk at mloc.js in mid-February. This talk will detail some of the novel compilation techniques I’m implementing in Higgs, such as basic block versioning and incremental compilation. This talk will be filmed and made available on InfoQ. In the meantime, you can watch my talks from Strange Loop, DConf and Air Mozilla.
I’m not any kind of data compression expert, but I find the topic quite interesting. There’s something fascinating about the idea of data compression and its relationship to machine learning. What I mean to say is that to compress a data stream efficiently, you should ideally capture something fundamental about the structure of the data being encoded. The ideal compression algorithm should somehow be able to “learn” and “understand” the core nature of the data being compressed, so that it can encode it in a very compact way, with as little redundancy as possible. The compression algorithms in use today do this in at a rather superficial level, seeking to shorten the encoding of repeated patterns of bits. LZW, for example, will build a dictionary (a simple grammar) as it encodes a binary data stream, and reuse pieces from this dictionary, when useful, to shorten the encoding. This simple yet clever strategy is remarkably effective, and quite difficult to improve upon.
Image compression is something I find particularly interesting. Images we typically see around, photos in particular, are not at all random. There’s structure: edges, shapes, visible patterns and repetitions. Real-world objects, letters, people, curves, textures, all of these things could in theory be mathematically approximated in a way that’s much more compact than an array of millions of pixel samples. Our brain is somehow able to recover the underlying structure of the world from images, and transform millions of samples into high-level semantic entities. I’ve seen people use genetic algorithms to approximate images using colored triangles. This got me thinking: what if you had something more expressive than triangles? Could you use genetic algorithms to evolve boolean functions to encode the pixels of an image as a function of their x/y coordinates?
I did a number of experiments trying to come up with boolean functions to approximate one-bit-per-pixel black and white images using mutations and hill climbing, but got surprisingly poor results. There are a number of issues. For one, evaluating complex boolean functions on millions of pixels is very time consuming. Another problem is that there are many complex functions to evaluate, and very very very few of them actually resemble the output we want. It’s like looking for a needle in a galaxy. I think that in all probability, an approach using neural networks working with floating-point values would have a better likelihood of panning out. There are training algorithms for neural networks, which would be a better starting point than completely random mutations. Furthermore, floating-point outputs are probably better for learning than boolean outputs, as there are multiple gradations of how wrong a given output is, rather than all or nothing, but this is a topic for another blog post.
After the horrible failure of my boolean image approximation attempt, I sobbed while sipping on one ounce of whisky, but then, I had an idea. Simple approaches that exploit fundamental properties of a problem space tend to do quite well in practice. Maybe something resembling the LZW algorithm would work better. Maybe I could exploit the self-similarity of images (recurring patterns) in a very simple way. I started thinking about an algorithm to encode images using tiles. Split the image into NxN tiles (e.g.: 4×4 or 8×8) and encode each tile either as a direct listing of pixel values, or as a reference to another part of the image. The idea is to save space by reusing image information when a similar enough tile already exists. I used the Mean Squared Error (MSE) of tile pixel values to decide when tiles are similar enough to allow reuse. Someone probably has already come up with a similar image compression algorithm before, but my google-fu has not revealed anything obvious.
The tile-based algorithm, which Tom has codenamed übersmüsh (usm for short), performs surprisingly well. It’s not comparable to JPEG, which is a very sophisticated format layering many levels of clever tricks to get an encoding as compact as possible, but it does much better than I would have expected. Using 4×4 tiles to compress a 512×512 version the classic Lenna image, I was able to generate something visually quite similar to the original by supplying only 610 tiles of source pixel data, out of 16384. This means that only 3.7% of the tiles in the compressed image come from the source image, the rest are all backreferences to x/y regions of the compressed image. I allowed backreferences to rescale the color values of the tiles being referenced for additional flexibility, which makes for a much smaller tile use.
There are some issues, such as the fact that we probably use too few tiles for the background and too many in some detailed regions. This can probably be improved by tweaking or replacing the heuristic deciding when to use source tiles or backreferences. I can think of several other ideas to improve on the algorithm as a whole:
- Pre-filtering of random noise to make tiles more reusable. The Lenna image has quite a bit of high-frequency noise, as most photographs probably do, due to the nature of image sensors in cameras.
- Reducing the size of source pixel data using chroma subsampling. This may also have the benefit of making tiles more similar to each other, and thus more reusable.
- Avoiding supplying whole source tiles and instead supplying only enough source pixels to get an acceptable error threshold. The rest of the pixels would always come from a backreference.
- Using bigger tiles when possible (e.g.: 16×16 or 8×8) and resorting to a recursive subdivision scheme only when necessary. This would reduce the number of tiles we need to encode.
Some important weaknesses right now might be the difficulty of encoding the tile reference table compactly. The code I wrote is also unoptimized and very slow. It’s a brute-force search that looks at all possible previous pixel coordinates to find the most-resembling 4×4 tile in what has yet been compressed. This currently takes over an hour on my Code 2 Quad. The algorithm could probably be made much faster with a few simple tweaks to reduce the candidate set size. For example, the tiles could be sorted or hashed based on some metric, and only a few candidate tiles examined, instead of hundreds of thousands. The source code is written in D. It’s not the cleanest, but if there’s interest I might just clean it up and make it available.
Happy new year to all :)
Edit: Challenge Accepted
A friend suggested I try übersmüsh on a different kind of image: a screenshot of Facebook containing text and large flat areas. I was curious to see how well it would do, and so I accepted his challenge. I quickly hacked the algorithm to look for matches in restricted size windows (close in position to the current tile). This probably reduces its effectiveness somewhat, but was needed to get a reasonable execution time (minutes instead of multiple hours).
A total of 3354 source tiles was used out of 49000 tiles total, which is about 6.8% of the source image data. Predictably, the algorithm does very well in flat areas. There is some amount of reuse in text, but this could be better if I hadn’t reduced the search window size, and instead implemented a better search heuristic. Some of the ad pictures, namely the ones in the bottom-right corner, show very poor reuse, probably because my mean-squared-error heuristic is a poor approximation of what error is visually acceptable.
It’s been about a month since I started refactoring the Higgs JIT to implement a lazy/incremental compilation strategy. The new JIT will combine the basic block versioning scheme I previously implemented with lazy compilation of basic block versions and eventually incremental inlining of function calls. Higgs will also no longer have an interpreter, everything will be JIT-compiled on the first execution. This is a natural extension of basic block versioning which I believe will bring many interesting possibilities, such as the ability to recompile or invalidate code at the level of individual basic blocks. This means, for example, that function call sites, if they are executed often enough, could be recompiled so as to inline the callee, but without recompiling the entire caller function. Other possible uses include making optimistic assumptions about the types of values, such as global variables, and being able to invalidate only the blocks making these assumptions if the said assumptions are violated.
This is a significant refactoring, but it’s not the first time I rebuild the JIT, and I’ve gotten fairly efficient at it. I already have lazy code generation, most of the runtime library, and the Higgs REPL working. Many of the basic unit tests are working, the crucial bits are in place, but there are still many important parts missing (register allocation, garbage collection, exception hanling, and more). Since I want Higgs to eventually be considered a serious and usable compiler, I decided to create a dev-maxime development branch and avoid feature regressions on the master branch. The new JIT will make it to the master branch once all tests and benchmarks work, which should hopefully happen within two months or so.
The most difficult aspect of this work so far has been to come up with a coherent and efficient design for the incremental code generation. I decided to sketch things up and begin experimenting with the parts of the implementation that I thought would be most problematic right away, so that I could know about the problems that might arise early on. Much exploratory programming was involved, but the key design decisions have been made. Machine code is generated in one pass and directly appended to an executable heap. It’s generated in multiple small fragments which all contain position-independent code. The fragments may end with branches which can refer to other fragments. Each fragment keeps a closure that knows how to write its final end branch, so that these branches can be rewritten when the fragments are relocated.
In other news, I’m still waiting to know if my paper about basic block versioning is accepted at Compiler Construction 2014, I should have the answer by December 20th. I did get some very positive news since my last blog post, however. I’ve received (and accepted) an invitation to present Higgs and my research work at the mloc.js conference, which takes place mid-February in Budapest. Like Strange Loop, this conference will be filmed and the videos will be made available online. I’ve also received an invitation to present at Web Directions Code 2014. I’m very much looking forward to sharing my latest research findings.
My thesis advisor has recently shown more openness to the idea that I could do an industry internship before the end of my PhD, which seems like it would be a good move career wise. This would probably occur around December 2014 or January 2015, after the next round of conference deadlines. I’ve been approached by people from Apple (LLVM team) and IBM (JIT and dynamic language runtimes). Back at DConf 2013, I also had the opportunity to meet Andrei Alexandrescu and other members of the HipHop VM team. They seemed very interested in my work on basic block versioning and in getting me to come do an internship at Facebook. Most likely, I’ll try approaching people at Mozilla and Facebook to see if I could possibly work on either IonMonkey or HipHop VM as part of an internship.
When I started my PhD, I originally wanted to do research in machine learning. Unfortunately, after only a few months, I discovered that field wasn’t really for me. I found it to be overly theoretical, mathematical and dry for my taste. My creativity wasn’t really there. Instead, I found myself thinking back to my MSc project, and having several ideas about how to improve upon the design of McVM. I decided to follow my passion, change advisors and do a PhD in compiler design instead. Compilers are not really a glamorous CS subfield, but I think this may have turned out to be a good career choice. There are clearly more jobs in compilers than there are compiler experts.
I always carry a book in my backpack so I have something to do while sitting on the bus, the subway or in a waiting room. It took me almost a year to read this one, but I was determined to get through it. I wanted to know more about the mythical figure that is Alan Turing, often cited as the father of computer science. I’ll begin by saying that Alan Turing: The Enigma is a remarkably well-researched book, with all references provided for the reader. It’s obvious that many years of work went into its making. The author, Andrew Hodges, is a mathematician, which makes him particularly well-placed to discuss the many creative ideas of Alan Turing.
The book of course discusses in detail Alan Turing’s seminal On Computable Numbers paper, as well as his involvement in the breaking of German cryptography (and the Enigma machine) during World War II, but also paints a very intimate and human picture of Alan Turing, a picture which includes character flaws and vulnerabilities. Alan Turing was an introvert, a man of few friends, stubborn, often dismissive, and probably quite lonely at times. He seems to have struggled to understand societal norms, and had the boldness (or perhaps foolishness) to be quite open about his homosexuality decades before it became acceptable to do so. His eventual fall from grace is a cruel tragedy which he never saw coming.
This book also reveals many interesting facts about Turing’s life. Contrarily to what I had assumed based on my introduction to Turing Machines in computer science classes, Alan Turing was not just a pure theoretical mathematician. He was also interested in relativity and quantum physics. He dabbled in chemistry as a hobby. He understood electronics well enough to design complex analog circuits, and was capable of using a soldering iron to assemble them as well. Alan Turing also had a crucial role in the design of an actual digital computer: the ACE (Automatic Computing Engine). Finally, while working at the University of Manchester, Turing had the opportunity to program a working computer to test theories about morphogenesis (his last published works) in simulations.
I would recommend this book to anyone who’s curious to know more about Alan Turing, or just interested in reading a good biography. If you’re cheap like me, you can acquire this book used on Amazon or AbeBooks for only a few dollars after shipping.
I own a small Revlon branded makeup mirror with a lightbulb and reflector inside. I find it fairly convenient as I can apply makeup while sitting at my desk and listening to music or watching videos online. Unfortunately, it has issues. It’s not very bright, and so doesn’t really give you an accurate image of your finished look. It also generates a fair bit of heat, which makes it uncomfortable to use in the summer. Recently, I found out that there are 12V LED light strips of many colors available on eBay for cheap (usually less than $10 for a 5 meter roll, that’s about 15 feet). I figured that with a cool-white LED strip, a little dimmer switch, some wire, some glue, and a source of 12V DC power, I could turn the vertical wall mirror I own into a giant lit makeup mirror.
I acquired a 5 meter cool-white LED strip with 60 LEDs per meter. The LEDs are of the “5050″ variety, which are a bit more expensive but much brighter than the alternative “3528″ kind. The total cost of the strip after shipping was about $9. The dimmer I bought is a digital part, that is, it controls brightness through pulse-width modulation (PWM), and is specifically made for 12V LED systems. I found it on ebay for less than $2 shipped. The total cost of all the parts for this project, including the glue and wire and clamps, is about $30. It’s important to note though that I still have leftover LED strip, glue, wire and of course the clamps to use on other projects.
Once the strip segments were glues down, I soldered wires in the corners of the mirror, connecting the strips to each other, and also to the dimmer. I applied epoxy glue over the connections to solidify and insulate them. I also added a long wire at the back of the mirror, which goes from the dimmer’s input to a female DC power jack, the kind one would connect to a typical wall-wart power adapter.
Assembly was more time-consuming than I expected, it took me about 4 total hours of work, and was done over 3 days, since I needed to leave time for the glue to solidify. I started by drilling a hole on the side so I could make the power input wires of the dimmer go to the back of the mirror, I then glued the dimmer over this hole. The second step was cutting the LED strips to length. These can only be cut every 3 LEDs, and so it wasn’t possible to make them go exactly all around the mirror, covering the whole perimeter, I needed to leave gaps, and connect the multiple pieces of strips with wire.
The LED strips found on eBay have an “auto-adhesive” backing, you’re supposed to be able to just peel off the protective tape, and stick them to things. In my experience, this adhesive backing is very much worthless, the strips will peel off immediately, and so some other kind of glue is needed. I decided to go with epoxy, which is sold at the dollar stores here. I also purchased two adjustable clamps to hold the strips in place while the epoxy solidifies. Epoxy glues have several advantages: they solidify quickly, the resin they form is extremely solid, and they are not electrically conductive. In my experience, epoxy can solidify in as little as 15-20 minutes.
The total power consumption for the mirror, when on maximum brightness, is less than two amps. I hooked it to the 12V 15A DC power distribution box I already built for other LED lighting shenanigans. Your bedroom probably doesn’t come with one of these, but fortunately you can get by with a good old 12V 2A DC wall-wart, which you can find on eBay for about $5 shipped. Just make sure to get one that fits your country’s electrical outlets.
The finished product, once wired up, can put out quite a bit of light: about 3000 lumens, which is more than a 100W incandescent lightbulb. Fortunately, the dimmer allows tuning down the brightness to the point where you can look at the LEDs directly without visual discomfort. I usually just leave it on maximum brightness, however. I’ve found the lights are not blinding if you get up close to the mirror to look at your face. The cool-white color also seems to have a pleasant daylight-like glow. I’m actually considering buying more of this strip just to make my bedroom brighter.