Migrating from Paperclip to ActiveStorage

My first challenge as an intern

First of all, I’m no expert in programming. In fact, I started with Ruby on Rails about one and a half month ago with the beginning of my internship at Codeminer 42.

So that’s me, an intern trying to do things I’ve never done before based on some online documentation which I’ve just googled about, along with the effort of my colleagues to teach me.

Step 1: Read the documentation

That’s right, as an intern who hasn’t got a clue about what to do, the first step I took was to google up for documentation, and there I found the official migration docs and a really great article.

The article, as it says, is a supplement to the original Paperclip documentation. Both of them shaped the way I did the migration.

Step x.5: Do what the docs tell you and watch it fail

Actually, this can happen through all the steps, hence the x in x.5. In other words, this is the intersection step.

As you follow the guide and work on the changes, errors will pop up because that’s life how it is. And even though you’ve read the entire docs and made the changes as needed, you will probably still get some errors. But that’s OK. Errors teach us about humility and make us pay attention to the details.

If you can’t solve the error by yourself in at most 30 minutes, call for help. There’s no shame on it. Your teammates can help you out and open your mind to new possibilities.

Step 2: Install Active Storage and migrate the database

First, you need to install Active Storage, or you won’t be able to do anything. Assuming you are on Rails 5.2, here is the command you need to run in the terminal:

rails active_storage:install

It will create the Active Storage tables in your app, but they will be empty and you will have to fill them out with the data from Paperclip. For that matter, I used a rake task which runs the following code:

class MigrateToActiveStorage
  require 'open-uri'

  def perform
    get_blob_id = 'LASTVAL()'

    ActiveRecord::Base.connection.raw_connection.prepare("active_storage_blob_statement",<<-SQL)
      INSERT INTO active_storage_blobs (
        key, filename, content_type, metadata, byte_size, checksum, created_at
      ) VALUES ($1, $2, $3, '{}', $4, $5, $6)
    SQL

    ActiveRecord::Base.connection.raw_connection.prepare("active_storage_attachment_statement",<<-SQL)
      INSERT INTO active_storage_attachments (
        name, record_type, record_id, blob_id, created_at
      ) VALUES ($1, $2, $3, #{get_blob_id}, $4)
    SQL

    models = ActiveRecord::Base.descendants.reject(&:abstract_class?)

    models.each do |model|
      attachments = model.column_names.map do |c|
        if c =~ /(.+)_file_name$/
          $1
        end
      end.compact

      model.find_each.each do |instance|
        attachments.each do |attachment|
          make_active_storage_records(instance,attachment,model)
        end
      end
    end
  end

  private

  def make_active_storage_records(instance,attachment,model)
    blob_key = key(instance, attachment)
    filename = instance.send("#{attachment}_file_name")
    content_type = instance.send("#{attachment}_content_type")
    file_size = instance.send("#{attachment}_file_size")
    file_checksum = checksum(instance.send(attachment))
    created_at = instance.updated_at.iso8601

    blob_values = [blob_key, filename, content_type, file_size, file_checksum, created_at]

    ActiveRecord::Base.connection.raw_connection.exec_prepared(
      "active_storage_blob_statement",
      blob_values
    )

    blob_name = attachment
    record_type = model.name
    record_id = instance.id

    attachment_values = [blob_name, record_type, record_id, created_at]
    ActiveRecord::Base.connection.raw_connection.exec_prepared(
      "active_storage_attachment_statement",
      attachment_values
    )
  end

  def key(instance, attachment)
    # SecureRandom.uuid
    # Alternatively:
    instance.send("#{attachment}").path
  end

  def checksum(attachment)
    # local files stored on disk:
    # url = "#{Rails.root}/public/#{attachment.path}"
    # Digest::MD5.base64digest(File.read(url))

    # remote files stored on another person's computer:
    url = attachment.url
    Digest::MD5.base64digest(Net::HTTP.get(URI(url)))
  end
end

This code will fetch all records from the models with Paperclip attachments and fill the ActiveStorage tables with Paperclip references. In my project, Paperclip is configured to use remote storage (Amazon S3), but if you are using local storage, just uncomment the checksum method appropriately.

That being done, you can enter Rails Console and check if the ActiveStorage::Attachment and ActiveStorage::Blob records were created correctly. If so, you are ready to go to the next step.

Step 3: Create a separate branch

This is kind of tricky because you need 2 separate pull requests: one to fill the Active Storage tables (as shown above) and another to replace Paperclip with Active Storage throughout the codebase.

Spoiler alert: in the next step, we will change the models to use Active Storage instead of Paperclip, but if you look closely at the code for the above rake task, you will notice that it uses Paperclip methods:

filename = instance.send("#{attachment}_file_name")

Trying to run the rake task after the model modifications will cause an error. That’s why you need to execute the rake task through a separate PR before changing the models, otherwise things won’t work.

Step 4: Change the models and views

It’s very simple. Just do as the official docs say and you shouldn’t have any problems. Check it out there and let’s be DRY.

Note that Active Storage works by saving the original picture and resizing it on-the-fly, as opposed to eagerly. That’s why in the model you simply write that it has_one_attached and leave the crop config out to the views.

Step 5: Migrate the attachments

Yeah, you’ve changed your models and your views. You filled out the Active Storage tables. Now you can rest, right? No way. If you look closely at the old attachments, they are still within the Paperclip path, so you aren’t completely Paperclip-free. You must therefore move the attachments onto the ActiveStorage path. This requires running the following code through a rake task:

class MigrateData
  def perform
    models = ActiveRecord::Base.descendants.reject(&:abstract_class?)

    models.each do |model|
      attachments = model.column_names.map do |c|
        if c =~ /(.+)_file_name$/
          $1
        end
      end.compact

      attachments.each do |attachment|
        migrate_data(attachment,model)
      end
    end
  end

  private

  def migrate_data(attachment,model)
    model.where.not("#{attachment}_file_name": nil).find_each do |instance|
      bucket = ENV['AWS_BUCKET']
      name = instance.send("#{attachment}_file_name")
      content_type = instance.send("#{attachment}_content_type")
      id = instance.id

      url = "https://s3.amazonaws.com/#{bucket}/uploads/#{attachment.pluralize}/#{id}/original/#{name}"

      instance.send(attachment.to_sym).attach(
        io: open(url),
        filename: name,
        content_type: content_type
        )
    end
  end
end

This code will copy the Paperclip files to the Active Storage path. The duplication is important because it makes you feel safe about your data. In other words, changing the references without the risk of losing data. If you are using a storage service other than Amazon S3, just change the URL in line 27. If the files are in your local disk, change the line to use local file paths.

After that, check if the files are hooked up to the ActiveStorage path and if the migration was executed without errors. You should then be ready for the next step.

Step 6: Remove Paperclip

If you got the previous steps right, this won’t be a problem. Just remove the Paperclip gem from your Gemfile, run bundle install, and check if everything is still working. If you have tests, don’t hesitate to run them!

Step 7: Deploy to staging!

Remember:

  • Merge the first branch and execute the migrations and the rake task for MigrateToActiveStorage, and,

  • Merge the second branch and execute the rake task for MigrateData.

Now delete the old S3 attachment folder and voila! You’ve successfully migrated to Active Storage.

Final considerations

After I got everything working in staging, I noticed the model validations were missing. Unfortunately, Active Storage doesn’t provide built-in validations, which calls for a workaround. But there is a problem: Active Storage saves the attachment blob before running the model validations, when it should do it in the opposite order. I found 3 solutions to this:

  1. Implement a model callback to delete the attachment after validation fails.

  2. Validate the attachment at the controller level. This feels out of place and is thus not ideal.

  3. Active Storage will support model validations on Rails 6.0. Until then, stick with Paperclip and problem solved.

It will come as a surprise that I’ve chosen the third option. It was the simplest solution to my project. And even though my work got in standby mode, it was a nice experience. It taught me a lot about staging, deployment, rake tasks, file storage, and so on.

When Rails 6.0 comes out, I will be ready for that.

References

https://blog.carbonfive.com/2018/06/25/safely-migrating-from-paperclip-to-active-storage/
https://gorails.com/episodes/migrate-from-paperclip-to-rails-active-storage
https://github.com/thoughtbot/paperclip/blob/master/MIGRATING.md
https://github.com/rails/rails/commit/e8682c5bf051517b0b265e446aa1a7eccfd47bf7

Thanks to Luan Gonçalves Barbosa, Thiago Araújo Silva, Halan Pinheiro, and Maychell Oliveira.

We want to work with you. Check out our "What We Do" section!