Inefficient Numerical Operators

April 2, 2012

My Ph.D. work involves the design and implementation of a JavaScript JIT compiler. As a result, I’m quite familiar with the intricacies (and the many inefficiencies) of the language. The ECMAScript 5 specification specifies, among other thing, how each JavaScript primitive operator should be implemented. That is, it provides algorithms, in English prose, to specify the behavior of addition, subtraction, multiplication, division, and all the others. Unfortunately for JavaScript (and all its users), these algorithms are long, complex and unintuitive. The algorithms in question involve dynamic dispatch, conversions from strings to integers and other nasty surprises.

As a programmer, I truly wish JavaScript’s operators had simpler semantics. All JavaScript arithmetic operators can take any type as input, they will never throw an exception, and will possibly convert strings to integers. Any Vulcan would surely frown when realizing that in JavaScript:

'4' + '4' is '44'
But...
'4' * '4' is 16

null + 1 is 1
But...
undefined + 1 is NaN

{} + [] is 0
But...
[] + {} is NaN

The rules behind JavaScript’s operators are unintuitive and unnecessarily complex. Let’s all be honest here. There is absolutely no need for arithmetic on values that aren’t numbers, an exception should be thrown if this ever occurs, and JavaScript would have been better off with a dedicated string concatenation operator, as Perl has. However, lack of intuitiveness isn’t the only problem here.

As a compiler writer, and someone who studies type analyses (i.e.: type inference), I find that the bigger problem is inefficiency. More complex algorithms behind primitive operators means using these operators is, in general, more expensive. Dynamic type checks need to be added to handle multiple possible cases, which is inefficient. Some of these tests can be optimized away, but some can’t. We’re all paying a cost for JavaScript features we hardly ever use or need.

Furthermore, the fact that all JavaScript operators can take any type as input and never throw exceptions means that type inference of JavaScript is slightly harder. If JavaScript operators were restricted to numbers, a type inference engine could take advantage of the fact that an expression such as x + y would imply that x and y must be numbers. Hence, subsequent uses of those variables could be optimized. This is obviously not the case: in JavaScript, x + y doesn’t guarantee anything about x and y, because the programmer could be performing an addition on booleans or objects or arrays or strings or any other types.

Another gripe I have with JavaScript is that all numbers are assumed to be 64-bit floating-point values (doubles). Integer arithmetic is still faster than floating-point arithmetic, and so all modern JavaScript JITs, behind the scenes, will try to represent numbers as integers when possible, and fall back to floating-point values when an overflow occurs or a fractional value is needed. This incurs performance costs because it means much of JavaScript’s arithmetic operations need to include conditional tests checking for overflow. Again, some of these tests can be optimized away, but many can’t. In general, proving that numerical values won’t overflow in a type analysis only works in a limited number of cases.

Why did the creator of JavaScript make this choice? Probably because he presumed it would be simpler for new programmers. It’s a common belief that integer types, integer overflow and modular arithmetic in C are hard concepts to understand. Floating-point numbers are of course necessary, but they are certainly not easier to understand than integers. Integer values wrapping around on overflow might seem a little puzzling to some people, but I don’t see how dealing with NaN, infinities, negative zero, rounding and precision errors is more intuitive.

I believe that a better design would have been to have distinct integer and floating-point numerical types in the language. The better choice, for performance, would be to have integers operations be closed over integers. That is, adding, subtracting, multiplying and dividing integer values should only ever yield integers. If a conversion to floating-point is needed (which I believe is relatively rare), it should be done explicitly. Here are some examples of what this could look like:

var x = 1;   // Integer
var y = 2;   // Integer
var z = 4.0; // Floating-point

var v0 = x + y        // v0 === 3
var v1 = x + y + z;   // v1 === 7.0
var v2 = x / y        // v2 === 0
var v3 = x / z        // v3 === 0.25
var v4 = Float(x) + y // v4 === 3.0

Such a design would give programmers access to both 64-bit integers and 64-bit floating-point values while making it possible for a type inference analysis to eliminate most type checks on arithmetic operations. Some might claim this would make the language more confusing, but I don’t think this is true. This system wouldn’t be any more complex than Python or Scheme’s numerical tower. It would actually give programmers more control over what their code is doing, without requiring unwieldy type annotations.

The more I learn about JavaScript, the more I think I want to design my own dynamic language someday. I strongly believe that there is a way to design a dynamic language that is efficient, practical, and suitable for both low-level systems programming as well as rapid prototyping. In the end, it all comes down to compromises in the design of the language. In my opinion, simpler semantics often means more intuitive and easier to optimize.

From → JavaScript, Programming, Programming Languages, Zeta

7 Comments

ProGrammar permalink

> The more I learn about JavaScript, the more I think I want to design my own dynamic language someday. I strongly believe that there is a way to design a dynamic language that is efficient, practical, and suitable for both low-level systems programming as well as rapid prototyping.

This sounds a lot like Go (http://golang.org). Go uses type inference to automatically determine what type should be used (http://golang.org/doc/go_faq.html#types) and has a dynamic feel despite being a compiled language.

Reply
- Maxime Chevalier permalink
  
  I haven’t really looked into Go. It’s statically typed though, isn’t it? You couldn’t have the same variable take two different types on either side of a branch?
  
  Reply
  - gnuvince permalink
    
    Go is indeed statically typed. If you wanted to have “different types” in your sides of a branch, you could use interfaces; the static type (in Java parlance) of the variable would be the same, but if the values are of types that implement that interface, you’d be fine. Not as flexible as dynamic typing though.
    
    Reply
Olivia permalink

A few decent points on this site and genuinely didnt have a idea
regarding any of this previously so thanks for
the insight

Reply