What you’re looking for
You saw that joke telling that JavaScript doesn’t know math because of the expression below.
0.1 + 0.2 === 0.3
This expression will result in false
. In fact, such an expression will return false
in many programming languages. I tested it in JavaScript, Python, Ruby, Elixir, and Erlang, and I got the same result – that’s not true for COBOL though.
That’s the reason why you shouldn’t use float or double for storing monetary values (or any other kind of values where you need precision): floating point numbers have rounding errors
, so we get these known but kind of weird behaviors. It’s not a bug in your favorite language, it’s the nature of computers. In this post, we’ll understand why it happens by looking at what is under the hood.
You may not have enough time to dig into that right now. That’s okay, this section is intended to summarize it for you.
This behavior happens because not all fractional numbers can be represented in binary, hence in a machine. Let’s take 0.1
from this classical example. It’s not a real 0.1
. High are the odds – due to IEEE 754 standard – that it’s more like 0.10000000149...
.
On the other hand, every single integer can be represented in binary. So as Martin Fowler suggests: make your application work with the most fundamental unity of a specific metric (e.g. cents for monetary values) and store values in that unity.
So, instead of storing 10.12
, you should store 1012
cents. You can perform all needed operations using integers, and when it comes to display on a screen, you format it.
That’s it. Use integers
and you should be fine. If you were looking for that answer, there you go.
But this post goes beyond, and if you keep reading, you’ll be embarking on a journey on how numbers are represented in computers and a little bit of how computers do math.
Stay with me, and let’s continue.
Notations used in this article
Since we’ll be switching from binary and decimal, we need to establish some notations to avoid being ambiguous. All the binary numbers in this post (except the ones in IEEE 754) will be denoted as d1d2d3..2. Where d1
, d2
, d3
, and such are binary digits. Example: 11002.
We’ll also talk about scientific notation and to denote the pow operation we will be using the notation AeB (A to the power of B)
. Where A
and B
are numbers. Example: 10e4 (10 to the power of 4).
The IEEE 754 Standard
As mentioned in the previous section, the number you think is 0.1
might not be a real 0.1
due to IEEE 754 Standard.
It’s the top standard used for representing floating point numbers in modern computers. It has two variants: single and double precision.
In this post, we’ll discuss and have examples of single precision only since it’s enough for understanding the standard. Double precision is just a matter of more bits. So every time I refer to IEEE 754 it points to its single-precision variant.
The IEEE 754 represents any single floating point number into a sequence of 32 bits. These 32 bits are divided in:
- 1-bit signal;·
- 8-bit exponent;
- 23-bit mantissa.
Does it sound familiar somehow?
Scientific notation
The parallel we can draw here is with scientific notation because that’s what it uses. You might recall that any number in scientific notation is written into a base 10 power and can be broken into three parts:
- integer part;
- fractional part;
- exponent.
It also fulfills the following rules:
- the integer part must be between 0 and 10 (not inclusive);
- as long as you move the
floating
point to the right you have to decrement the exponent; - as long as you move the floating point to the left you increment the exponent.
The shape of a number in such a notation is shown below:
i.f x 10 ^ exp
i - integer part
f - the fractional part
exp - the exponent
Example: We wanna get 123.4
into scientific notation.
In fact, this number can be written as 123.4 x 10e0
. Since, the integer part is greater than 9 it breaks the first rule. To accomplish this rule, we can move the floating point to the left twice making the number look like 1.234 x 10e0
.
We move the floating point twice to the left, so we have to increment the exponent twice. Our final number will be 1.234 x 10e2
.
If you got the idea, then you’re good. IEEE 754 is all about scientific notation*.
* in IEEE 754 we have a biased exponent
Computers, nevertheless, don’t use 10-base powers – they don’t have ten fingers in both hands. They don’t even have hands. They use 2-base powers instead.
It’s very similar though. The shape of a binary number in scientific notation is shown below.
1.f x 2 ^ exp
f - fractional
exp - exponent
So given the binary number 0.00112 we can represent it in scientific notation by moving the floating point to the right until we reach the pattern 1.f
. In this case, we have to move it three times to the right so our number in scientific notation will be 1.12 x 2e-3.
Worth noticing that the rules are the same for exponent. We moved the floating point three times to the left so we decremented the exponent three times.
Representing numbers in IEEE 754
We already know the representation layout and that the standard uses scientific notation, so it’s time to see some examples of how we can represent numbers on it.
The algorithm for getting a number into IEEE 754 is as follows:
- convert the number to the binary system;
- apply scientific notation;
- sum the calculated exponent to the bias that is 127 – we’ll discuss the bias later in this post;
- convert the sum of the last step into binary;
By doing that you’ll have already:
- the
23-bit mantissa
that will be the fractional part of the number in scientific notation; - the
8-bit exponent
which will be the sum of the calculated exponent by moving the floating point number and the bias.
You just need the 1-bit signal
that you already have as well. This bit will be 0 if the number is positive
and will be 1 otherwise
.
Time to practice
Let’s take it easy and represent the decimal number 9.7510 in the IEEE 754 standard.
First, we convert it into a binary number. As we’re working with a non-integer number, we have to follow two different approaches. One approach is applied to the integer part of the number and the other to the fractional part.
For the integer part, it’s very trivial, we apply successive divisions, get the remainders, and mount the binary number by getting the remainders and the last quotient in the reversed order they were got.
9/2 -> remainder = 1; quotient = 4;
4/2 -> remainder = 0; quotient = 2;
2/2 -> remainder = 0; quotient = 1;
9 (10) = 1001 (2)
NOTE: Inside of blocks, you’ll the base denoted inside of parenthesis. (10) means a decimal number whereas (2) means a binary number.
To convert the fractional part, we multiply by two and get the integer part of the result.
0.75 x 2 = 1.5 -> our binary digit is 1; fractional is greater than 0 so it goes to the next iteration.
0.5 x 2 = 1.0 -> 1 is our binary digit and we stop since there are only zeroes in the fractional part.
0.75 (10) = 0.11 (2)
So 9.7510 is equal to 1001.112.
The next step is to turn this number into scientific notation:
1.00111 (2) x 2e3 // We moved the floating point three times to the left
Now we have to sum our exponent to the bias. We have 3 + 127 = 130
. Our final exponent in binary is 10000010
.
The signal bit is 0 since we’re representing a positive number. So, the decimal number 9.75 is represented in IEEE 754 as:
0 10000010 00111000000000000000000
Note that only fractional part goes in the mantissa. It’s in scientific notation, so we just assume that there’s a 1 in front of that ,and we have 23 bits entirely for expressing the mantissa – we’ll discuss how it makes harder to represent zero.
Converting from IEEE 754
To get back the decimal number we just converted in the last section – and any other number represented in this standard – we can apply the following formula:
(-1)^s * (1 + f) * 2^(exp-127)
s - signal value
f - mantissa
exp - exponent
NOTE: s
, f
, and exp
must be all in decimal already.
To convert 0 10000010 00111000000000000000000
we just need to perform the conversions and apply the formula.
(-1)e0 x (1 + 0.21875) x 2e(130 - 127)
1 x (1.21875) x 2e3
1.21875 x 2e3
1.21875 x 8 = 9.75 # it worked!!
What’s the issue with 0.1?
As mentioned at the beginning of this post, not all fractional numbers can be represented in binary. Indeed, a fractional number can be represented in binary only if its denominator is a power of 2.
In the previous section, we converted 0.752 to binary and it worked fine. It’s because 0.75 is in fact 3/4 and 4 is a power of 2. That’s not the case for 0.1 though. It’s 1/10 and 10 is not a power of two.
The issue starts when we try to convert 0.1
to binary. We end up with a recurring number.
0.1 (10) = ~0.0001100110011001100...(2) # it goes forever repeating 1100
This recurring number is a problem since we have a limited amount of bits to represent it. But we can try our best.
0.0001100110011001100 in scientific notation is 1.10011001100110011001100 x 2e-4
We moved the floating point four times to the right so our exponent will be given by the following sum.
-4 + 127 = 123 (10) = 01111011 (2)
So 0.110 in IEEE 754 is 0 01111011 10011001100110011001100
.
It seems okay, but when we convert it back to decimal we get:
1 * (1 + 0.60000002384185791016) * 2^(-4)
= 1.60000002384185791016 * 0.0625
= 0.100000001490116119384765625
There we go. Our 0.110 is not what we expect! Even worse, errors like that are propagated when e perform operations using floating point numbers.
Special values and the bias
Scientific notation is the base for IEEE 754 but different from what we know about this notation it has a bias. It’s something we don’t see in high school.
But the bias is a good idea. It allows us to represent a range of exponents from negative to positive ones. It makes operations simpler.
If otherwise we had a signal bit as well, we would need an extra step for getting the exponent value, and we’d be losing a bit for that.
127 is a good choice because it balances the range. The minimum value is -126 and the maximum is 127.
NOTE: exponent 0 (000000002) and exponent 255 (111111112) have special meanings.
The IEEE has also something that can be defined as a downside that’s having two zeroes – even though there are some materials discussing why it might be important like in complex numbers.
The other special values are:
- positive infinity, for representing overflow;
- negative infinity, for representing underflow;
- NaN, for representing not meaningful numbers (e.g. 0/0)
- zeroes
All these special values can be reached with certain sequences of bits.
How COBOL handles it
Preparing this post was a great opportunity to validate some things. I heard a bunch of times that banks use COBOL because it has high precision, it’s good at math, and such.
Out of curiosity, I ran a simple COBOL program to perform 0.1 + 0.2 == 0.3
. Well, I’m not a COBOL developer, so I asked chatGPT to do so.
IDENTIFICATION DIVISION.
PROGRAM-ID. CHECK-SUM.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 NUM1 PIC 9V9(1) VALUE 0.1.
01 NUM2 PIC 9V9(1) VALUE 0.2.
01 RESULT PIC 9V9(1) VALUE 0.
01 ERROR-MSG PIC X(40) VALUE SPACES.
PROCEDURE DIVISION.
MAIN-PROCEDURE.
COMPUTE RESULT = NUM1 + NUM2.
IF RESULT = 0.3
DISPLAY "0.1 + 0.2 is equal to 0.3."
ELSE
MOVE "0.1 + 0.2 is not equal to 0.3." TO ERROR-MSG
DISPLAY ERROR-MSG
END-IF
STOP RUN.
You can run this code online at https://www.jdoodle.com/execute-cobol-online/. Also, blame ChatGPT if it doesn’t work.
The console shows 0.1 + 0.2 is equal to 0.3
. COBOL really knows math and can handle the 0.1 + 0.2
sum.
The point is: there are other ways of representing fractional numbers. And COBOL does it using fixed point numbers rather than floating ones. In summary, it has a defined amount of bits for representing the number and the point remains in the same position forever.
Also, I’ve found that COBOL does a kind of decimal-based operation so it’s kinda doing the operation like 10 + 20 == 30
under the hood.
I’ve also found a great post discussing a migration of a system from COBOL to Java – links in the references section – which didn’t work, at least not in the situation of that post.
The problem is not Java or COBOL. The whole point is how they handle things.
How integers solve the problem
IEEE 754 is great for fractional numbers, but that’s not suitable for integers. It can be an overkill. We can do something simpler since integers can easily be converted to binary. We must only to have a way to represent negative numbers.
Assuming we have hardware that can handle 3-bit numbers (for simplicity’s sake), how can we represent negatives?
We can get inspired by the floating point standard discussed here and have a signal bit. We would have:
Binary | Decimal |
---|---|
000 | 0 |
001 | 1 |
010 | 2 |
011 | 3 |
100 | -0 |
101 | -1 |
110 | -2 |
111 | -3 |
The MSB (most significant bit) is used just for identifying the signal. That seems reasonable, but we have some downsides:
- two zeroes – it makes operations more complex since we would need to check signals even in simple operations;
- we’re wasting a precious bit.
A note on how computers do math
Computers are just a bunch of circuits we can represent as logic gates.
Summing is a trivial task that can be easily achieved using the Full-Adder circuit because it has a carry-out and a carry-in bit. Thus, to produce a multi-bit sum we can just chain a couple of Full-Adders.
Subtracting is a little bit different though. Instead of carrying numbers, we have to borrow them from the next position to perform the operation.
In decimal is the same effect of performing 100 -1. We have to borro from next columns to perform the operation.
Operations like that are not suitable for circuits.
Let’s try something more efficient.
Two’s complement on the rescue
To begin let’s take a look at what Two’s complement is.
Two’s complement is a mathematical operation to reversibly convert a positive binary number into a negative binary number with an equivalent negative value, using the binary digit with the greatest place value as the sign to indicate whether the binary number is positive or negative. It is used in computer science as the most common method of representing signed (positive, negative, and zero) integers on computers,[1] and more generally, fixed point binary values. When the most significant bit is 1, the number is signed as negative; and when the most significant bit is 0 the number is signed as positive.
Source: wikipedia
This definition doesn’t clarify a lot, huh? Let’s break it down into smaller pieces.
The main statement: "It’s used in computer science as the most common method of representing signed integers on computers".
That makes a lot of sense since we’re trying to find a better solution for representing signed integers (see the last section).
It also says: "When the most significant bit is 1, the number is signed as negative; and when the most significant bit is 0 the number is signed as positive.".
That might be confusing. Will we still have a sign bit?
The answer is not because Two’s complement does it differently. Let’s take a look at how we can represent integers in our 3-bit machine using Two’s complement.
Binary | Decimal |
---|---|
000 | 0 |
001 | 1 |
010 | 2 |
011 | 3 |
100 | -4 |
101 | -3 |
110 | -2 |
111 | -1 |
We still use the MSB as the sign bit. But different from the last proposal, here it has a role in the calculations.
The representation of any number in two’s complement is as follows:
-2^2 * d1 + 2^1 * d2 + 2^0 *d3
Where d1
is the most significant bit. Thus when it’s 1, it makes the number to be negative.
The advantages of this approach are:
- have a single zero which makes math operations simple;
- have a signal bit with the difference it’s really used in the number calculation;
- easier to convert positive numbers to negative;
- it makes subtractions act like addition – very good for our Full-Adder circuits.
Converting from positive to negative
The algorithm for converting numbers from positive to negative and vice-versa is simple. Let’s take a look.
For a given binary number n
we can proceed as follows:
- invert all bits;
- sum 1 to the result;
Example: Convert 3 to -3
3 (10) = 011 (2)
# After inverting all bits we get:
100 (2)
# We then add 1
100 + 1 = 101 (2)
We started with 0112 and ended up with 1012 which is in fact -3 (look at the table). It works. You can try doing the same thing from negative to positive.
Worth noticing that the MSB always gets inverted. If it was 1 it turns into 0 and vice-versa. If that doesn’t happen, the operation has failed. You’re trying to get a number out of range.
In this 3-bit machine, it would happen if you try to get a +4 from -4. If we perform the operations on 1002 you’ll end up with the same value 1002. The most significant bit didn’t change. The conclusion is: we ran into an overflow.
Math operations
The fascinating thing about two’s complement is how it turns subtraction operation into addition, and addition is suitable to our chips.
Actually, it makes sense. Performing the subtraction 4 – 3 is the same as the addition 4 + (-3). Let’s try it.
1 0 0
1 0 1
_________
1 0 0 1 # Our machine is a 3-bit one so we throw the MSB from this result away
The result is 0012 which is indeed 1. So our operation went well. It works like a charm.
Why does it work? We’ll discuss this in a future article.
Summary
We’re done. This article started with the classical example of 0.1 + 0.2 == 0.3
returning false
, and we walked through some important concepts that tells us why it happens, and that’s the main idea of this article, the critical thinking.
I want you to look at stuff like that and doubt it. I want you to feel uncomfortable for not knowing what’s going on. I hope you have enjoyed the things you’ve read here and I hope you start using integers
(or any high-level decimal
structure) over float
and double
for monetary values.
Finally, if you are a COBOL developer and noticed some weird assumptions from my side, let me know so I can fix this.
That’s all for today. See ya.
References
- https://stackoverflow.com/questions/32805087/how-is-overflow-detected-in-twos-complement
- https://www.quora.com/How-does-the-CPU-know-we-are-using-ones-complement-or-twos-complement-for-representing-negative-numbers
- https://medium.com/the-technical-archaeologist/is-cobol-holding-you-hostage-with-math-5498c0eb428b
We want to work with you. Check out our "What We Do" section!