Data Bugs on Rails

Bugs are commonplace artifacts in any kind of software system. Most applications are filled with lots of bugs, for so many reasons that can’t even be accounted for in a single blog post.

But what can possibly go wrong when code looks good, works the right way and is thoroughly tested? One of the possible answers to this question is data bugs.

Applications are usually a mixture of code and data, and the latter may very well come from external sources such as databases. By murphy’s law, things can and will go wrong when code unexpectedly deals with bad data.

A common type of data bug

Let’s examine a common type of data bug that happens in Rails applications, but it could as well be any other language or framework for the matter.

Suppose you have the following controller code which implements login functionality, nothing too fancy:

class SessionsController < ApplicationController
  def new
  end

  def create
    account = Account.find_by(email: params[:email])

    if authenticate(account, params[:password])
      flash[:notice] = t('sessions.login_success')
      redirect_to dashboard_path
    else
      flash[:error] = t('sessions.login_error')
      render :new
    end
  end
end

This is pretty standard Rails code and there’s nothing particularly outstanding about it, but it works very well and does the job.

Now, comes in a new requirement: you must store the last remote IP whenever a user logs in. “That’s easy!”, you might say. Adding one more line to your simple create session routine will do the job:

def create
  account = Account.find_by(email: params[:email])

  if authenticate(account, params[:password])
    account.update! last_ip: request.remote_ip

    flash[:notice] = t('sessions.login_success')
    redirect_to dashboard_path
  else
    flash[:error] = t('sessions.login_error')
    render :new
  end
end

You wrote tests for all of this code, and your new feature, although very simple, looks good. You had the best of intentions when you decided to use Active Record’s update! method, because it throws an exception when something goes wrong. That seemingly fits like a glove, because you want your routine to fail in exceptional situations, right?

When it all goes wrong

Later on the same day your feature gets deployed, and what you did not anticipate ends up happening. Some weird exceptions immediately begin to pop up in your error tracking application. You take a good look at the error messages, and one of them says:

ActiveRecord::RecordInvalid: Validation failed: First Name can’t be blank

You take a deep breath and ask yourself: What’s going on?

It turns out the answer to this question is very simple: one of the users that attempted login does not have a first name.

How come this ever happened? This question, in turn, does not have such a simple answer.

What about Rails validations?

Your app implements presence and length validations on the first_name field, so how come this ever happened?

class Account < ApplicationRecord
  validates :first_name, presence: true, length: { minimum: 3 }
end

Unfortunately Rails validations are not appropriate gatekeepers and do not guarantee the absence of bad data in your application—specially if it’s been running for a considerable period of time, and its database has huge volumes of data.

Data bugs can beat you in the ass, because it’s unlikely that your reasonable tests will ever catch them. Additionally, making a test for a failure case that should not exist adds pollution, and feels like doing something wrong.

For the record, there are many ways in which Rails validations can be bypassed:

  • Rails itself allows circumventing validations with methods such as update_column. Active Record’s API is quite rich, and designed to attend all sorts of necessities. I’d venture say that bypassing validations is a dubious one.

  • Rails does not prevent other apps from connecting to your database and making unintended changes.

  • A DBA or even yourself can still connect to the database and intentionally or unintentionally promote a bad change.

And while you think that none of these bullet points will ever happen to your application, that’s something very easy to escape out of your control — specially if your app has been out there for a while and has lots of users, developers and maintainers.

Rails validations can make for a great user experience, but they definitely don’t avoid these sorts of problems.

The easy but sloppy hot fix

The easy but sloppy hot fix to get your feature up and running in production environment is the following:

account.update_column :last_ip, request.remote_ip

That works, essentially because the update_column method bypasses validations and callbacks. This code does solve your feature’s problem, but does it solve your data problem as well?

As you may have already guessed, the answer is no. All sorts of weird bugs are potentially out there into the wild, as long as you keep allowing invalid data into your database.

The hard fix

By having validations at the database level, no one will ever get to mess up with your data, no matter how hard they try. That’s essentially nipping the evil in the bud.

Just a couple RDMSs provide decent validation capabilities, and one of them is PostgreSQL. Although MySQL is a common choice for information systems, it does not have good support in that regard. There’s no reason why you shouldn’t use PostgreSQL in its place, including others I will not mention here.

The hard fix to your bug begins with updating all the first names of your accounts table with valid, compliant data. For instance, you can make a script or Rake task which populates blank or NULL fields with a default placeholder value. You must also take into account that first_name requires at least 3 characters to be considered valid.

You’ll be better off with a SQL command, for it’s much faster and efficient than retrieving and updating Active Record objects in batches:

class FixAccountsFirstNameWithValidData < ActiveRecord::Migration
  def up
    execute %{
      UPDATE accounts
        SET first_name = 'Missing first name'
        WHERE first_name IS NULL OR LENGTH(first_name) < 3
    }
  end

  def down
    fail ActiveRecord::IrreversibleMigration
  end
end

In a real world application, you would probably take additional precautions when running this script, such as warning users and asking them to edit their first names.

After that step you can add a NOT NULL constraint to your first_name field, if it hasn’t one already. These kinds of constraints greatly simplify your data model and queries, and plus guarantee that you’ll always have first_name values as strings. Your database migration will probably end up looking like this:

class AddNotNullConstraintToAccountFirstName < ActiveRecord::Migration
  def change
    change_column_null :accounts, :first_name, false
  end
end

And this is the equivalent SQL code:

ALTER TABLE accounts ALTER COLUMN first_name SET NOT NULL;

This really guarantees your first_name column won’t ever be NULL, and avoids headaches down the road.

Check constraints

But wait, you’re not done yet! Now you can leverage the power of a great PostgreSQL feature called check constraints, and make sure first_name will always be present, i.e., is not an empty string, and has at least 3 characters. It’s worth remembering that NOT NULL constraints don’t avoid empty strings, they only avoid NULL values. Your final migration will probably look like this:

class AddFirstNameLengthConstraintToAccounts < ActiveRecord::Migration
  def up
    execute %{
      ALTER TABLE accounts
        ADD CONSTRAINT accounts_first_name_check
        CHECK (char_length(first_name) >= 3)
    }
  end

  def down
    execute %{
      ALTER TABLE accounts
        DROP CONSTRAINT accounts_first_name_check
    }
  end
end

And that’s all there is to it. Now watch what happens when someone tries to wreak havoc on your database, it does not matter by which means:

INSERT INTO accounts(first_name) VALUES('');
-- ERROR:  23514: new row for relation "accounts" violates check constraint "accounts_first_name_check"

That’s actually a very good thing. Now you can finally sleep at night, knowing your first_name field won’t ever be the root cause of a weird data bug, and that your data will be highly validated and consistent.

There’s a limitation with check constraints, though: you’ll need to use triggers should you want your constraint to reference data from other tables. Triggers allow you to use powerful built-in languages such as PL/pgSQL.

A good practice

If you are starting out a new application with PostgreSQL it’s very easy to use check constraints. They even support complex features, such as POSIX regular expressions! Check out the documentation to learn more about it.

Now you may be saying:

But I will end up having duplicated validation logic in Rails and also in the database!

Not necessarily. There’s an interesting gem out there called mv-postgresql, and its parent gem, mv-core. Take its own words for itself:

Define validations directly in DB as PostgreSQL constraints and integrate them into your model transparently

This gem currently supports a limited set of database constraints within migrations using Ruby, but you should check it out regardless. The killer feature is that database constraints can optionally bubble up to the model layer as ActiveModel::Validations.

Be aware that, because of potential validation duplication, there’s a chance for inconsistency to creep into your app when using check constraints. You can circumvent this particular problem by writing tests for your constraints, although that’s a bit more involved and won’t be part of this blog post.

Wrap up

Having validation logic right where data belongs makes a lot of sense. Unfortunately Rails doesn’t support this feature out of the box, and you’ll probably still have to use SQL commands and kind of duplicate your validation logic. Nevertheless, you should still use check constraints by means of database migrations.

You can prevent most kinds of data bugs by being conscious with your data. That’s a big leap toward having a robust and reliable application.