Why 0.1 + 0.2 = 0.30000000000000004
How do you know whether two expressions are equivalent?
As humans, this can be a rather simple task: we apply the formulas we know to convert one expression to another. But for computers, they may not be aware of all the general nuances and complexities that come with math expressions. Therefore, many checkers typically choose the brute force approach: that is, plug in numbers and check whether the returned values are the same. Rather simplistic, but it gets the job done well in a number of cases.
Let's take a look at an example:
Take a moment to convince yourself that due to the Power of a Quotient rule. Therefore if we were to create a new equation:
We should see that for all values of . So, when we look at a graph:
...hm, well that's weird. The graph seems to stop arbitrarily at while increasing exponentially around . And what's with that giant black bar at to ?
Well, my dear readers, welcome to the wonderful world of floating point precision errors.
Hmm? What's a floating point precision error, you ask? Well, let me show you through a simple example.
Evaluate:
Did you think the answer is ? Well, I'm sorry to tell you that you're wrong. It's actually . Don't believe me? Open up a JavaScript console and try it yourself.
To understand why, let's look at the binary representations of these numbers. JavaScript stores decimal values as double-precision floating-point numbers following the IEEE 754 standard. This means that a float is made up of 8 bytes, or 64 bits.
The IEEE 754 standard stores floating point numbers as a sort-of scientific notation with three parts: a 1 bit sign (positive or negative), a 52 bit fraction or significand, and a 11 bit exponent all in base 2.
We can get the bytes of a number with a neat little coding trick:
function floatToBinaryString(value) { var output = ''; // Get the bytes in reverse order for (const part of new Uint8Array(new Float64Array([value]).buffer, 0, 8)) { // Convert to string var partString = (part >>> 0).toString(2); // Pad any missing zeros while (partString.length < 8) partString = '0' + partString; // Prefix to place in proper order output = partString + output; } return output;}Using this, the binary representations of our two numbers are:
// Values are displayed in the correct order// In the form '<Sign> <Exponent> <Fraction>'0.1 = 0 01111111011 10011001100110011001100110011001100110011001100110100.2 = 0 01111111100 1001100110011001100110011001100110011001100110011010Now at a glance, you can tell something is weird, especially with that fraction part. It shouldn't be that complicated to store or in base 2, right?
Why don't we try evaluating these binary values manually then?
The sign is rather straight forward (0 if positive and 1 if negative). The same goes for the fraction: the sum of all that have a 1, where is the bit number starting from 1, plus 1. The exponent, on the other hand, is calculated differently to allow for negative values. This is done by performing the standard base 2 to base 10 conversion, then subtracting , where is the number of bits in the exponent (in this case ). Then all of these parts are multiplied together.
That was probably not explained that well, so let me show you the code:
// Outputs the signfunction binaryStringToSign(value) { // Gets the sign value from the first part return value[0] === '0' ? 1 : -1;}
// Outputs the 2^n exponentfunction binaryStringToExponent(value) { var output = 0; // Compute initial exponent for (var i = 0; i < 11; i++) { const bit = parseInt(value[i + 1]); output += bit * 2 ** (11 - 1 - i); } // Apply bias output -= (2 ** (11 - 1)) - 1; // Return exponent to raise '2' to return output;}
// Outputs the 2^n exponents to sum together and add 1function binaryStringToFractionParts(value) { const output = []; // Compute fraction parts for (var i = 0; i < 52; i++) { const bit = parseInt(value[i + 12]); if (bit === 1) output.push((-(i + 1))); } return output;}With all this, we can add up the values in our accurate calculator of choice (e.g., Wolfram Alpha):
Well, that's pretty much and , but it's not exactly those values. It's just a really accurate approximation, leaving some room for precision error.
But wait. Adding those two numbers together, we get:
Which is definitely not . Though, surprisingly enough, if we put that number into the JavaScript console, it does equal that value. So, there's even more precision bugs to work out.
Adding two floating-point numbers together require the exponents to be the same before adding the fraction component. First, let's convert this to a more convenient form to perform the operations on:
In this case, the exponent for is while it's for . Well, that's easy enough to deal with. We can increase the exponent for by 1 by shifting the fraction component 1 to the left.
Now all that's left to do is add the two fraction parts together:
Which we can rewrite as:
Shift to match proper scientific notation:
And round since there are now 53 bits representing the exponent. We round up if the 52nd bit is 1, and down if 0:
Which leaves us with the final number:
0.3 = 0 01111111101 0011001100110011001100110011001100110011001100110100And sure enough, evaluating the floating-point precisely gives us:
Where we can now see that pesky rounding error. It's all a game of approximation math. And with those precision issues, .
However, that graph hold a lot more math limitations than you think. But that's a story for next week.