Const Correctness (for non C++ programmers)

What we can learn with C++ on how to deal with mutability

image
Image from unsplash

Intro

Even though nowadays I mostly work with Javascript/Typescript, I’ve spent a long time programming with C++ and as you’ve probably heard already, C++ is an absolute beast of a language, full of interesting (and sometimes dangerous) constructs and quirks.

And while there are lots of things that I don’t miss, like dealing with pointers, memory errors, cryptic compiler messages and a whole set of intricacies, there are other functionalities that I believe would be a great addition to other programming languages.

Today I’m going to talk about one of them, const correctness.

In C++ there’s this const keyword that is used to prevent mutations in certain contexts and const correctness is about using it systematically.

Some other languages also have some kind of mechanism that is used to prevent data from being mutated, like JS which also has const, Typescript that on top of const has readonly, Java that has final, among others…

But the way const works in C++ is kinda unique, mainly for two reasons:

  1. Constness is checked during compile time, so it doesn’t have any kind of performance overhead.
  2. It can be applied not only to variables, but also to function arguments, return values and even class methods.

These characteristics enables us to not only express, but actively enforce (with the aid of the compiler) which things can or cannot be mutated and in which context.

Primer on C++

(If you already know C++ you may skip this section)

As I’m assuming that the reader has little to no exposition to C++, I’ll make a very brief introduction to C++ so that we all are on the same page.

But before we start, a short disclaimer: this post is aimed at non C++ practitioners, so I’ll intentionally avoid talking about pointers as much as I can since this is mainly a C/C++ feature and it wouldn’t really contribute much to the discussion (the same goes for move semantics).

C++ is part of the C-family, just like Java, C#, JS, Objective-C, PHP and others, so if you are familiar with any of these languages, C++ syntax will feel familiar to you.

One important thing about C++ is that it is statically typed, that meaning, every variable must be declared with a type and the correct usage of types is checked at compile time.

This is how we declare a variable in C++:

// We're saying that there is a variable named
// `value` whose type is `int`.
int value;

// There is no `let`, or `var` and
// the type always comes before the
// identifier, so the pattern is:
// Type Identifier

These are the main data types:

int integer;
char character;
float floatingPoint;
double doublePrecision;
bool boolean;

We can create records using the struct keyword:

// Here we're creating a new data type
struct Point
{
  float x;
  float y;
};

int main()
{
  // We can declare a variable of this type
  // just like we did with primitives
  Point p;

  // And we can access its members with
  // dot notation.
  p.x = 10;
  p.y = 20;
}

Functions:

int timesTwo(int value)
{
  return value * 2;
}

// The pattern is:
// ReturnType FunctionIdentifier (ParamType ParamIdentifier [,...])

Classes:

class Point
{
public:
  float getX()
  {
    // The `this` keyword refers to the
    // object instance, but to access the 
    // the class members using `this`, we need
      // to use `->` instead of `.`.
    return this->x;
  }

  float getY()
  {
    return this->y;
  }

  void setX(float value)
  {
    this->x = value;
  }

  void setY(float value)
  {
    this->y = value;
  }

private:
  float x;
  float y;
};

Last but not least, it’s really important to mention that the assignment operator (AKA =) probably works in a slightly different way than what you’d expect, especially if you come from a language with a Garbage Collector.

Usualy, in garbage collected languages, variables can store either a value or a reference depending on their data type, where primitives are stored by value and everything else is stored as a reference.

In C++, however, everything is stored as a value, regardless of its data type:

struct Point
{
  float x;
  float y;
};

int main()
{
  Point p1;
  p1.x = 10;
  p1.y = 10;

  Point p2;
  p2.x = 20;
  p2.y = 20;

  // When we assign p2 to p1, we are
  // copying each member of p2 to its
  // corresponding member in p1.
  // So this assignment DOES NOT make
  // p1 refer to the same object/memory location
  // as p2.
  // p1 still refers to the same object it did
  // previously, but now its VALUES are 
  // the same as of p2.
  p1 = p2;

  // This is how we output stuff
  // to the console, don't mind the
  // weird syntax.
  std::cout << p1.x << std::endl;
  std::cout << p1.y << std::endl;

  //Logs:
  // 20
  // 20

  // We could even check that
  // p1 and p2 refer to different addresses
  // and that those remain the same after
  // assignment, by using the "&" operator,
  // which gives us the numeric address
  // of the stored value.

  std::cout << &p1 << std::endl;
  std::cout << &p2 << std::endl;

  // This would log, for example:
  // 0x7ffd634f5dd8
  // 0x7ffd634f5de0

  // Also, as p1 and p2 refer to different
  // objects, changing one does not affect
  // the other, even after assignment.

  p2.x = 30;
  p2.y = 30;

  std::cout << p1.x << std::endl;
  std::cout << p1.y << std::endl;
  //Logs:
  // 20
  // 20
}

And this is all the information we need to start our discussion.

Use Cases

I believe that the best way of grasping what is const correctness, is to see it in action by examining use cases.

Const Primitives

Our first use case is preventing a primitive from being mutated:

int main()
{
  const int value = 10;

  // Compiler error!
  value = 20;
}

This is a very simple use case and the same thing can be achieved using JS/TS const or Java final, for example.

Const Records

Here is where things start to get interesting, because applying const to an object not only prevents us from using the assingment operator, but also propagates to all of its members, which prevents us from mutating them as well.

struct Point
{
  float x;
  float y;
};

int main()
{
  // We use this bracket notation
  // to initialize structs at 
  // the same time we're declaring them
  const Point obj1 = { 10, 10 };
  const Point obj2 = { 20, 20 };

  // Compiler error!
  obj1 = obj2;

  // Also compiler error!
  obj1.a = 20;
}

In JS/TS, where variables that store objects actually stores references to them, the const keyword only prevents us from reassigning a variable to another object, but it still allows us to modify its members.

const obj = {
  a: 10
};

// Error
obj = {
  a: 20
};

// Legal
obj.a = 20;

Const Arguments/Parameters

Before we talk about applying const to function parameters, we first need to talk about the difference between pass by value and pass by reference.

In C++, functions are pass by value by default, which means that arguments do not refer to the actual data that was passed as arguments, but rather to copies of them that are made right before executing the function.

int timesTwo(int value)
{
  // Here `value` is a copy of the 
  // data that is passed as an argument
  // to this function, so modifying it
  // has no effect on the "original" data.
  value *= 2;
  return value;
}

int main()
{
  int value = 10;
  timesTwo(value);

  std::cout &lt;&lt; value &lt;&lt; std::endl;
  //Logs
  // 10
}

But we can also use pass by reference by prepending the parameters identifiers with the &amp; (ampersand) operator.

When we pass by reference, the function arguments will now refer to the actual data that was passed to them.

void timesTwo(int &amp;value)
{
  // Here `value` refers to the 
  // actual data that was passed 
  // as an argument, so modifying it
  // DOES AFFECT the &quot;original&quot; data.
  value *= 2;
}

int value = 10;
timesTwo(value);

std::cout &lt;&lt; value &lt;&lt; std::endl;
// Logs
// 20
// And this also works for objects as well
struct Point
{
  float x;
  float y;
};

Point scale(Point point, float factor)
{
  point.x *= factor;
  point.y *= factor;
  return point;
}

int main()
{
  Point p;
  p.x = 10;
  p.y = 10;

  scale(p, 2);

  std::cout &lt;&lt; p.x &lt;&lt; std::endl;
  std::cout &lt;&lt; p.y y;
  }

  float setX(float value)
  {
    this-&gt;x = value;
  }

  float setY(float value)
  {
    this-&gt;y = value;
  }

private:
  float x;
  float y;
};

One interesting consequence of const methods is that whenever we have an instance of a class that is marked as a const in a given context, we’re only able to call its const methods, of course, otherwise we’d be able to circumvent its constness.

int main()
{
  const Point p;

  // Compiler error!
  p.setX(10);

  // Ok!
  p.getX();
}

Const Return Value

In much the same way we’re able to apply const to function parameters, we’re also able to apply it to the function’s return value.

To understand why and how this may be useful, we first need to mention that equivalently to pass by value and pass by reference, there is also return by value and return by reference.

Whenever we return by value we’re making a copy of whatever we’re returning inside the function body, which is the default behavior.

Returning by reference, on the other hand, yields the very same object returned in the function’s body.

Usually that is a bad idea, as everything that is inside a function body is allocated in the stack, so, as soon as the function returns those things are destroyed and cease to exist, therefore, if you return a reference to a local/stack-allocated variable you’ll be returning an object that is already dead.

int ×Two(int value)
{
  int doubledValue = value * 2;

  // By the time the function returns,
  // doubledValue won't exist anymore,
  // so keeping a reference to it
  // is a mistake and will be flagged by
  // the compiler as such.
  return doubledValue;
}

However, returning by reference does make sense if whatever you’re returning outlives the function scope, like if you’re returning an attribute from within an object.

class Point
{
public:
  float &getX()
  {
    return this->x;
  }

  float &getY()
  {
    return this->y;
  }

private:
  float x;
  float y;
};

Using const with return by value usually makes no sense as the data returned is a copy that was created solely to hold the return value of the called function, but it does make sense when paired with return by reference to avoid expensive copying while still preventing callers to make changes to the referenced data.

class Container
{
public:
  // Provides readonly access to `content` without
  // the overhead of making a copy.
  // Notice that we also mark the method as const,
  // as even though we're returning a reference
  // to the object's internal attribute, it 
  // is a const reference.
  const VeryHeavyObject &amp;getContent() const
  {
    return this-&gt;content;
  }

private:
  VeryHeavyObject content;
};

Obviously this approach has some caveats, like leaking the class implementation, but we won’t delve into such discussions as this is out of scope for this post.

Const Propagation

Though we’ve already seen how const propagates through objects/class members, I’d like to emphasize this point as it’s one of the key concepts in const correctness.

Whenever you have an object that is marked as const, be it in its declaration, or due to being passed as a const reference, or being returned as a const reference, or in the form of this inside a const method, there are two things that happen:

  1. All of its attributes also become const.
  2. Only methods marked as const are accessible.

Which means that const is propagated throughout the entire object chain:

struct A
{
  B b;
};

struct B
{
  C c;
};

struct C
{
  int attr;
}

void foo(const A &amp;a)
{
  // Compiler error!
  // As `a` is const, all of its 
  // attributes are also const,
  // so `a.b` is const as well,
  // which means that `a.b.c` is also const,
  // and so is `a.b.c.attr`.
  a.b.c.attr = 10;
}

Conclusion

C++ const is a very powerful keyword that allows us to express lots of different semantics regarding immutability:

  1. Declaring immutable variables:
// Can never be changed.
const Class obj;
  1. Protecting callers’ data against changes while avoiding unecessary copying:
// Signals that obj won't be mutated.
void foo(const Class &obj)
{
  // Do something
}
  1. Protecting callee’s data against changes while avoiding unecessary copying:
class Class
{
public:
  const Data &getData() const
  {
    return this->data;
  }

private:
  Data data;
}
  1. Informing clients which operations are mutable and which are immutable:
class Class
{
public:
  void modifyWithMutation(int val)
  {
    this->val = val;
  }

  void modifyImmutably(int val) const
  {
    Class newObject;
    newObject.val = val;

    return newObject;
  }

private:
  int val;
}

And all of these semantics are enforced by the compiler at compile time!

I’m a huge fan of const correctness and I wish more languages had it, but, while this is not the case I think it’s still useful to know and understand the concept as it makes us think about how we deal with mutability and in which contexts data may be mutated or not.

As I mentioned earlier I’m mostly working with Javascript/Typescript nowadays, which does not have const correctness (Typescript has readonly, but it doesn’t guarantee correctness), so what I do to try to remedy that, is to, whenever the codebases allows it, document the constness of things, especially of function parameters and class methods.

This way, even though I do not have the compiler to help me enforcing constness, I can, through discipline, achieve const correctness.

Further Reading

https://isocpp.org/wiki/faq/const-correctness
https://www.cprogramming.com/tutorial/const_correctness.html

We are hiring new talents. Do you want to work with us? become@codeminer42.com