Last Updated on December 26, 2023

What you’re looking for

You saw that joke telling that JavaScript doesn’t know math because of the expression below.

0.1 + 0.2 === 0.3

This expression will result in false. In fact, such an expression will return false in many programming languages. I tested it in JavaScript, Python, Ruby, Elixir, and Erlang, and I got the same result – that’s not true for COBOL though.

That’s the reason why you shouldn’t use float or double for storing monetary values (or any other kind of values where you need precision): floating point numbers have rounding errors, so we get these known but kind of weird behaviors. It’s not a bug in your favorite language, it’s the nature of computers. In this post, we’ll understand why it happens by looking at what is under the hood.

You may not have enough time to dig into that right now. That’s okay, this section is intended to summarize it for you.

This behavior happens because not all fractional numbers can be represented in binary, hence in a machine. Let’s take 0.1 from this classical example. It’s not a real 0.1. High are the odds – due to IEEE 754 standard – that it’s more like 0.10000000149....

On the other hand, every single integer can be represented in binary. So as Martin Fowler suggests: make your application work with the most fundamental unity of a specific metric (e.g. cents for monetary values) and store values in that unity.

So, instead of storing 10.12, you should store 1012 cents. You can perform all needed operations using integers, and when it comes to display on a screen, you format it.

That’s it. Use integers and you should be fine. If you were looking for that answer, there you go.

But this post goes beyond, and if you keep reading, you’ll be embarking on a journey on how numbers are represented in computers and a little bit of how computers do math.

Stay with me, and let’s continue.

Notations used in this article

Since we’ll be switching from binary and decimal, we need to establish some notations to avoid being ambiguous. All the binary numbers in this post (except the ones in IEEE 754) will be denoted as d1d2d3..₂. Where d1, d2, d3, and such are binary digits. Example: 1100₂.

We’ll also talk about scientific notation and to denote the pow operation we will be using the notation AeB (A to the power of B). Where A and B are numbers. Example: 10e4 (10 to the power of 4).

The IEEE 754 Standard

As mentioned in the previous section, the number you think is 0.1 might not be a real 0.1 due to IEEE 754 Standard.

It’s the top standard used for representing floating point numbers in modern computers. It has two variants: single and double precision.

In this post, we’ll discuss and have examples of single precision only since it’s enough for understanding the standard. Double precision is just a matter of more bits. So every time I refer to IEEE 754 it points to its single-precision variant.

The IEEE 754 represents any single floating point number into a sequence of 32 bits. These 32 bits are divided in:

1-bit signal;·
8-bit exponent;
23-bit mantissa.

Does it sound familiar somehow?

Scientific notation

The parallel we can draw here is with scientific notation because that’s what it uses. You might recall that any number in scientific notation is written into a base 10 power and can be broken into three parts:

integer part;
fractional part;
exponent.

It also fulfills the following rules:

the integer part must be between 0 and 10 (not inclusive);
as long as you move the floating point to the right you have to decrement the exponent;
as long as you move the floating point to the left you increment the exponent.

The shape of a number in such a notation is shown below:


i.f x 10 ^ exp

i - integer part
f - the fractional part
exp - the exponent

Example: We wanna get 123.4 into scientific notation.

In fact, this number can be written as 123.4 x 10e0. Since, the integer part is greater than 9 it breaks the first rule. To accomplish this rule, we can move the floating point to the left twice making the number look like 1.234 x 10e0.

We move the floating point twice to the left, so we have to increment the exponent twice. Our final number will be 1.234 x 10e2.

If you got the idea, then you’re good. IEEE 754 is all about scientific notation*.

* in IEEE 754 we have a biased exponent

Computers, nevertheless, don’t use 10-base powers – they don’t have ten fingers in both hands. They don’t even have hands. They use 2-base powers instead.

It’s very similar though. The shape of a binary number in scientific notation is shown below.


1.f x 2 ^ exp

f - fractional
exp - exponent

So given the binary number 0.0011₂ we can represent it in scientific notation by moving the floating point to the right until we reach the pattern 1.f. In this case, we have to move it three times to the right so our number in scientific notation will be 1.1₂ x 2e-3.

Worth noticing that the rules are the same for exponent. We moved the floating point three times to the left so we decremented the exponent three times.

Representing numbers in IEEE 754

We already know the representation layout and that the standard uses scientific notation, so it’s time to see some examples of how we can represent numbers on it.

The algorithm for getting a number into IEEE 754 is as follows:

convert the number to the binary system;
apply scientific notation;
sum the calculated exponent to the bias that is 127 – we’ll discuss the bias later in this post;
convert the sum of the last step into binary;

By doing that you’ll have already:

the 23-bit mantissa that will be the fractional part of the number in scientific notation;
the 8-bit exponent which will be the sum of the calculated exponent by moving the floating point number and the bias.

You just need the 1-bit signal that you already have as well. This bit will be 0 if the number is positive and will be 1 otherwise.

Time to practice

Let’s take it easy and represent the decimal number 9.75₁₀ in the IEEE 754 standard.

First, we convert it into a binary number. As we’re working with a non-integer number, we have to follow two different approaches. One approach is applied to the integer part of the number and the other to the fractional part.

For the integer part, it’s very trivial, we apply successive divisions, get the remainders, and mount the binary number by getting the remainders and the last quotient in the reversed order they were got.


9/2 -> remainder = 1; quotient = 4;
4/2 -> remainder = 0; quotient = 2;
2/2 -> remainder = 0; quotient = 1;

9 (10) = 1001 (2)

NOTE: Inside of blocks, you’ll the base denoted inside of parenthesis. (10) means a decimal number whereas (2) means a binary number.

To convert the fractional part, we multiply by two and get the integer part of the result.


0.75 x 2 = 1.5 -> our binary digit is 1; fractional is greater than 0 so it goes to the next iteration.

0.5 x 2 = 1.0 -> 1 is our binary digit and we stop since there are only zeroes in the fractional part.

0.75 (10) = 0.11 (2)

So 9.75₁₀ is equal to 1001.11₂.

The next step is to turn this number into scientific notation:


1.00111 (2) x 2e3 // We moved the floating point three times to the left

Now we have to sum our exponent to the bias. We have 3 + 127 = 130. Our final exponent in binary is 10000010.

The signal bit is 0 since we’re representing a positive number. So, the decimal number 9.75 is represented in IEEE 754 as:


0 10000010 00111000000000000000000

Note that only fractional part goes in the mantissa. It’s in scientific notation, so we just assume that there’s a 1 in front of that ,and we have 23 bits entirely for expressing the mantissa – we’ll discuss how it makes harder to represent zero.

Converting from IEEE 754

To get back the decimal number we just converted in the last section – and any other number represented in this standard – we can apply the following formula:


(-1)^s * (1 + f) * 2^(exp-127)

s - signal value
f - mantissa
exp - exponent

NOTE: s, f, and exp must be all in decimal already.

To convert 0 10000010 00111000000000000000000 we just need to perform the conversions and apply the formula.


(-1)e0 x (1 + 0.21875) x 2e(130 - 127)
1 x (1.21875) x 2e3
1.21875 x 2e3
1.21875 x 8 = 9.75 # it worked!!

What’s the issue with 0.1?

As mentioned at the beginning of this post, not all fractional numbers can be represented in binary. Indeed, a fractional number can be represented in binary only if its denominator is a power of 2.

In the previous section, we converted 0.75₂ to binary and it worked fine. It’s because 0.75 is in fact 3/4 and 4 is a power of 2. That’s not the case for 0.1 though. It’s 1/10 and 10 is not a power of two.

The issue starts when we try to convert 0.1 to binary. We end up with a recurring number.


0.1 (10) = ~0.0001100110011001100...(2) # it goes forever repeating 1100

This recurring number is a problem since we have a limited amount of bits to represent it. But we can try our best.


0.0001100110011001100 in scientific notation is 1.10011001100110011001100 x 2e-4

We moved the floating point four times to the right so our exponent will be given by the following sum.

-4 + 127 = 123 (10) = 01111011 (2)

So 0.1₁₀ in IEEE 754 is 0 01111011 10011001100110011001100.

It seems okay, but when we convert it back to decimal we get:


1 * (1 + 0.60000002384185791016) * 2^(-4)
= 1.60000002384185791016 * 0.0625
= 0.100000001490116119384765625

There we go. Our 0.1₁₀ is not what we expect! Even worse, errors like that are propagated when e perform operations using floating point numbers.

Special values and the bias

Scientific notation is the base for IEEE 754 but different from what we know about this notation it has a bias. It’s something we don’t see in high school.

But the bias is a good idea. It allows us to represent a range of exponents from negative to positive ones. It makes operations simpler.

If otherwise we had a signal bit as well, we would need an extra step for getting the exponent value, and we’d be losing a bit for that.

127 is a good choice because it balances the range. The minimum value is -126 and the maximum is 127.

NOTE: exponent 0 (00000000₂) and exponent 255 (11111111₂) have special meanings.

The IEEE has also something that can be defined as a downside that’s having two zeroes – even though there are some materials discussing why it might be important like in complex numbers.

The other special values are:

positive infinity, for representing overflow;
negative infinity, for representing underflow;
NaN, for representing not meaningful numbers (e.g. 0/0)
zeroes

All these special values can be reached with certain sequences of bits.

How COBOL handles it

Preparing this post was a great opportunity to validate some things. I heard a bunch of times that banks use COBOL because it has high precision, it’s good at math, and such.

Out of curiosity, I ran a simple COBOL program to perform 0.1 + 0.2 == 0.3. Well, I’m not a COBOL developer, so I asked chatGPT to do so.

IDENTIFICATION DIVISION.
PROGRAM-ID. CHECK-SUM.

DATA DIVISION.
WORKING-STORAGE SECTION.
01  NUM1        PIC 9V9(1) VALUE 0.1.
01  NUM2        PIC 9V9(1) VALUE 0.2.
01  RESULT      PIC 9V9(1) VALUE 0.
01  ERROR-MSG   PIC X(40) VALUE SPACES.

PROCEDURE DIVISION.
MAIN-PROCEDURE.
    COMPUTE RESULT = NUM1 + NUM2.
    IF RESULT = 0.3
        DISPLAY "0.1 + 0.2 is equal to 0.3."
    ELSE
        MOVE "0.1 + 0.2 is not equal to 0.3." TO ERROR-MSG
        DISPLAY ERROR-MSG
    END-IF
    STOP RUN.

You can run this code online at https://www.jdoodle.com/execute-cobol-online/. Also, blame ChatGPT if it doesn’t work.

The console shows 0.1 + 0.2 is equal to 0.3. COBOL really knows math and can handle the 0.1 + 0.2 sum.

The point is: there are other ways of representing fractional numbers. And COBOL does it using fixed point numbers rather than floating ones. In summary, it has a defined amount of bits for representing the number and the point remains in the same position forever.

Also, I’ve found that COBOL does a kind of decimal-based operation so it’s kinda doing the operation like 10 + 20 == 30 under the hood.

I’ve also found a great post discussing a migration of a system from COBOL to Java – links in the references section – which didn’t work, at least not in the situation of that post.

The problem is not Java or COBOL. The whole point is how they handle things.

How integers solve the problem

IEEE 754 is great for fractional numbers, but that’s not suitable for integers. It can be an overkill. We can do something simpler since integers can easily be converted to binary. We must only to have a way to represent negative numbers.

Assuming we have hardware that can handle 3-bit numbers (for simplicity’s sake), how can we represent negatives?

We can get inspired by the floating point standard discussed here and have a signal bit. We would have:

Binary	Decimal
000	0
001	1
010	2
011	3
100	-0
101	-1
110	-2
111	-3

The MSB (most significant bit) is used just for identifying the signal. That seems reasonable, but we have some downsides:

two zeroes – it makes operations more complex since we would need to check signals even in simple operations;
we’re wasting a precious bit.

A note on how computers do math

Computers are just a bunch of circuits we can represent as logic gates.
Summing is a trivial task that can be easily achieved using the Full-Adder circuit because it has a carry-out and a carry-in bit. Thus, to produce a multi-bit sum we can just chain a couple of Full-Adders.
Subtracting is a little bit different though. Instead of carrying numbers, we have to borrow them from the next position to perform the operation.
In decimal is the same effect of performing 100 -1. We have to borro from next columns to perform the operation.
Operations like that are not suitable for circuits.

Let’s try something more efficient.

Two’s complement on the rescue

To begin let’s take a look at what Two’s complement is.

Two’s complement is a mathematical operation to reversibly convert a positive binary number into a negative binary number with an equivalent negative value, using the binary digit with the greatest place value as the sign to indicate whether the binary number is positive or negative. It is used in computer science as the most common method of representing signed (positive, negative, and zero) integers on computers,[1] and more generally, fixed point binary values. When the most significant bit is 1, the number is signed as negative; and when the most significant bit is 0 the number is signed as positive.
Source: wikipedia

This definition doesn’t clarify a lot, huh? Let’s break it down into smaller pieces.

The main statement: "It’s used in computer science as the most common method of representing signed integers on computers".

That makes a lot of sense since we’re trying to find a better solution for representing signed integers (see the last section).

It also says: "When the most significant bit is 1, the number is signed as negative; and when the most significant bit is 0 the number is signed as positive.".

That might be confusing. Will we still have a sign bit?

The answer is not because Two’s complement does it differently. Let’s take a look at how we can represent integers in our 3-bit machine using Two’s complement.

Binary	Decimal
000	0
001	1
010	2
011	3
100	-4
101	-3
110	-2
111	-1

We still use the MSB as the sign bit. But different from the last proposal, here it has a role in the calculations.

The representation of any number in two’s complement is as follows:


-2^2 * d1 + 2^1 * d2 + 2^0 *d3

Where d1 is the most significant bit. Thus when it’s 1, it makes the number to be negative.

The advantages of this approach are:

have a single zero which makes math operations simple;
have a signal bit with the difference it’s really used in the number calculation;
easier to convert positive numbers to negative;
it makes subtractions act like addition – very good for our Full-Adder circuits.

Converting from positive to negative

The algorithm for converting numbers from positive to negative and vice-versa is simple. Let’s take a look.

For a given binary number n we can proceed as follows:

invert all bits;
sum 1 to the result;

Example: Convert 3 to -3


3 (10) = 011 (2)

# After inverting all bits we get:

100 (2)

# We then add 1

100 + 1 = 101 (2)

We started with 011₂ and ended up with 101₂ which is in fact -3 (look at the table). It works. You can try doing the same thing from negative to positive.

Worth noticing that the MSB always gets inverted. If it was 1 it turns into 0 and vice-versa. If that doesn’t happen, the operation has failed. You’re trying to get a number out of range.

In this 3-bit machine, it would happen if you try to get a +4 from -4. If we perform the operations on 100₂ you’ll end up with the same value 100₂. The most significant bit didn’t change. The conclusion is: we ran into an overflow.

Math operations

The fascinating thing about two’s complement is how it turns subtraction operation into addition, and addition is suitable to our chips.

Actually, it makes sense. Performing the subtraction 4 – 3 is the same as the addition 4 + (-3). Let’s try it.


   1 0 0
   1 0 1
_________
 1 0 0 1 # Our machine is a 3-bit one so we throw the MSB from this result away

The result is 001₂ which is indeed 1. So our operation went well. It works like a charm.

Why does it work? We’ll discuss this in a future article.

Summary

We’re done. This article started with the classical example of 0.1 + 0.2 == 0.3 returning false, and we walked through some important concepts that tells us why it happens, and that’s the main idea of this article, the critical thinking.

I want you to look at stuff like that and doubt it. I want you to feel uncomfortable for not knowing what’s going on. I hope you have enjoyed the things you’ve read here and I hope you start using integers (or any high-level decimal structure) over float and double for monetary values.

Finally, if you are a COBOL developer and noticed some weird assumptions from my side, let me know so I can fix this.

That’s all for today. See ya.

References

We want to work with you. Check out our "What We Do" section!

Be cool. Don’t use float/double for storing monetary values

A tour on how hardware deals with numbers and math

What you’re looking for

Notations used in this article

The IEEE 754 Standard

Scientific notation

Representing numbers in IEEE 754

Time to practice

Converting from IEEE 754

What’s the issue with 0.1?

Special values and the bias

How COBOL handles it

How integers solve the problem

A note on how computers do math

Two’s complement on the rescue

Converting from positive to negative

Math operations

Summary

References

Related

Edigleysson Silva

What you’re looking for

Notations used in this article

The IEEE 754 Standard

Scientific notation

Representing numbers in IEEE 754

Time to practice

Converting from IEEE 754

What’s the issue with 0.1?

Special values and the bias

How COBOL handles it

How integers solve the problem

A note on how computers do math

Two’s complement on the rescue

Converting from positive to negative

Math operations

Summary

References

Related

Edigleysson Silva

You might also like

Aplicando Design Sprint no seu projeto

Dependency Injection With Cyclic Dependencies

Using Dokku for rails applications on a Linode Server.This is a step by step guide to configure Dokku on a Linode server. For more details visit Dokku docs, please.

This is a step by step guide to configure Dokku on a Linode server. For more details visit Dokku docs, please.

Using Dokku for rails applications on a Linode Server.
This is a step by step guide to configure Dokku on a Linode server. For more details visit Dokku docs, please.