Towards Minimal, Idiomatic, and Performant Ruby Code

Debunking some popular Ruby idioms

I try to embrace a particular way of working with code: it should be minimal, idiomatic, and performant by default. Sometimes it is necessary to trade performance for readability, or readability for performance, or even break the deal if the trade is not worth it. It is possible to attain an optimal balance among all of these factors, as long as the goal remains ingrained in small decisions that we make while coding.

We will go over some specific examples that may benefit from a straight “no” answer by default, as they do not pose a worth trade most of the times. If you are not already into this mindset, consider getting into it.

Minimal: Defining unnecessary methods

Let’s stick to “attribute readers”, which are also known as “getters” in other programming languages. Take this code for example:

class Person
  attr_reader :first_name, :last_name

  def initialize(first_name, last_name)
    @first_name = first_name
    @last_name = last_name
  end

  def name
    "#{first_name} #{last_name}"
  end
end

Assuming that all clients of this code just use the name method, the first_name and last_name readers should not really exist. They are nothing but slop unnecessarily polluting the public interface of our object.

As an application interface designer, you should definitely care about which methods are meant to be publicly exposed or not — and if you have control over the call sites, it is best to publish only methods that are indeed used by consumers. This kind of minimalism yields benefits and tends to make a codebase well-rounded and easier to evolve over time.

I have frequently stumbled across such sloppiness while performing refactorings and searching for occurrences of these methods around the code base, only to find out they are not used anywhere else but internally. As you can see, this is also about communication.

Let us go farther:

All code should exist with a good reason

And how about making the readers private? It is a popular technique, but is it any good? Let’s see:

class Person
  def initialize(first_name, last_name)
    @first_name = first_name
    @last_name = last_name
  end

  def name
    "#{first_name} #{last_name}"
  end

  private

  attr_reader :first_name, :last_name
end

Well, this change introduces a subtle problem: more code. That may be a bad idea if it does not help with readability. Let’s run our file in the Ruby interpreter with the -W flag:

$ ruby -W person.rb
person.rb:10: warning: private attribute?
person.rb:10: warning: private attribute?

That’s right, we asked for Ruby’s opinion and it warned us against our redundancy. We are adding extra lines of code to no substantial benefit, as we could have just referenced the instance variables directly:

class Person
  def initialize(first_name, last_name)
    @first_name = first_name
    @last_name = last_name
  end

  def name
    "#{@first_name} #{@last_name}"
  end
end

As you may already know, attr_reader dynamically defines a method which returns an instance variable of the same name:

def first_name
  @first_name
end

What if there is a typo in the ivar?

Let’s make that assumption with some code:

# Instead of "Thiago Araújo" this would return " Araújo"
def name
  "#{@firs_name} #{@last_name}"
end

This version is semantically incorrect due to a typo on @firs_name, but to the surprise of many it does not fail with a NameError. Why? Because undefined ivars return nil in Ruby, and nil.to_s results in an empty string.

To be honest, to this date I have only seen this carelessness occur in sloppy code with no tests, not to say it only happens when interpolating ivars within strings.

The following example, in turn, provides a more useful error message:

# person.rb:8:in name': undefined method capitalize' for nil:NilClass (NoMethodError)
def name
  "#{@firs_name.capitalize} #{@last_name.capitalize}"
end

As opposed to a NameError in the case of a private reader, which is even more precise:

# person.rb:8:in name': undefined local variable or method firs_name' for #<Person:0x007fd0f30719c8> (NameError)
def name
  “#{firs_name.capitalize} #{last_name.capitalize}”
end

Fortunately, both errors indicate the most relevant information to help us solve the problem: the line number — so, having a slightly improved error message in case a typo ever sneaks in does not sound like a good trade to me.

The error may explode somewhere else within the class if we pass a nil ivar to another method, but still it gets easy to fix.

But I want to provide an extension point

A private getter may provide an extension point in case we require custom logic for the attribute in the future. Consider the following hypothetical change:

def first_name
  "#{salutation} #{first_name}"
end

Heads up if you spot “in case…” or “in the future…” amidst a justification for code to exist. Why create such an extension point for a private attribute? Remember: the internals are under your control, hence I advise you to postpone this kind of “preventive” behavior until really necessary.

Truth to be said, using a private reader won’t make your code DRY if you are concerned about referencing the ivar more than once. Changing it to a method is just one search and replace away.

DRY is more about the big picture

There is a beautiful and minimal simplicity in using ivars directly, not to say they are easier to discern than barewords. Just try to keep ivar assignments on the initializer, and your class will become easier to maintain.

Idiomatic: Converting blocks to procs

The following each method leverages a very common Ruby idiom:

class CustomCollection
  include Enumerable

  def initialize(collection)
    @collection = collection
  end

  def each(&block)
    @collection.each(&block)
  end

  # Picture more methods...
end

CustomCollection.new([1, 2, 3, 4, 5]).each do |i|
  puts i
end

It seems innocuous, right? Well, not so much. There is a performance hit there which should not be ignored. It turns out the each method converts a block to a proc object behind the scenes:

collection = CustomCollection.new([1, 2, 3, 4, 5])

# This block is transformed into a proc
collection.each { |item| do_something(item) }

And what is the problem? Well, a proc is a full-blown object, whereas a block is one of the few Ruby constructs that is not an object — actually, it is tuned to be performant and it does not have a callable interface nor does it respond to any methods.

There is a cost involved in this conversion, but we can easily avoid it by using an old’n’trusty block:

def each
  @collection.each { |item| yield item }
end

Let’s run a benchmark to compare both alternatives:

require 'benchmark/ips'

class CustomCollection
  def initialize(collection)
    @collection = collection
  end

  def each_block
    @collection.each { |item| yield item }
  end

  def each_block_to_proc(&block)
    @collection.each(&block)
  end
end

Benchmark.ips do |x|
  collection = CustomCollection.new([1, 2, 3, 4, 5])

  x.report 'block' do
    collection.each_block { |item| }
  end

  x.report 'block to proc' do
    collection.each_block_to_proc { |item| }
  end

  x.compare!
end

These results show that converting a block to a proc is about 1.44 times slower:

Warming up --------------------------------------
               block    79.756k i/100ms
       block to proc    59.273k i/100ms
Calculating -------------------------------------
               block      1.268M (± 5.4%) i/s
       block to proc    878.862k (± 6.9%) i/s

Comparison:
               block:  1268320.3 i/s
       block to proc:   878862.3 i/s - 1.44x  slower

The numbers may as well be interesting, but one thing you have to keep in mind regardless is: the code works harder. If there was a substantial benefit to this I would just say “OK, no big deal”. But is there? Is the other way around problematic?

To be fair, there is still a syntactic advantage to using a &block argument: the delegation is transparent. For instance, the following example breaks if we do not pass a block:

def each
  @collection.each { |item| yield item }
end
custom_collection.rb:9:in `block in each’: no block given (yield) (LocalJumpError)

But it should have returned an enumerator, right? If not given a block, any idiomatic Ruby iterator is meant to return an enumerator.

So, why does the block-to-proc alternative return an enumerator? Because it delegates away the input block, and since the delegatee happens to be Array#each we get this benefit for free. It works like this:

class CustomCollection
  def initialize(collection)
    @collection = collection
  end

  # We pass no block. There is nothing to convert,
  # so it comes in nil.
  def each(&block)
    # Here &nil has the effect of discarding the block.
    @collection.each(&block)
  end
end

#<Enumerator:0x007fd1f1d61738>
puts CustomCollection.new([1, 2, 3]).each

Fortunately, improving the second example consists in just one more line of code:

def each
  return to_enum(__callee__) unless block_given?

  @collection.each { |item| yield item }
end

Now our method has the same behavior as a proc-to-block delegation and it returns an enumerator if we pass no block to it.

And here is an equivalent alternative:

def each
  if block_given?
    @collection.each { |item| yield item }
  else
    @collection.each
  end
end

It has a few more lines, but it is still readable, elegant, maintainable, and faster than our &block option. Now we can chain iterators to perform complex transformations:

collection.each.with_object([]).with_index do |(item, memo), index|
  # Do something useful...
end

When possible, make your code behave like a native citizen

And when is it good to use a &block argument? When the code within the block needs to be stored for later use, due to a block falling out of scope after a method exits:

def store_callback(&proc)
  @i_will_be_used_later = proc
end

Performant: Method objects emulating first class functions

Ruby does not have first class functions, so how do we compensate that deficiency? That’s right, by extracting method objects. Follows a silly example specifically tailored to illustrate this point:

class NumberListDoubler
  def initialize(numbers)
    @numbers = numbers
  end

  def call
    @numbers.map(&method(:multiply_by_two))
  end

  private

  def multiply_by_two(n)
    n * 2
  end
end

This object takes an array of numeric values and returns a new one with each number multiplied by two. Notice how call converts the multiply_by_two method into an object that can be passed over to map.

When I see code like this written in Ruby, I automatically assume it is trying to be concise and reduce verboseness. However, does it read well? Does it meet that goal? I don’t think so. The &method call is noisy and it does not look like idiomatic Ruby — it goes against the nature of the language and how it wants to be used in such cases.

Let’s face it: Ruby is not JavaScript, so you should not pass functions around indiscriminately. We can improve this code with “blocks”, an idiomatic Ruby feature that is ideal for this situation:

class NumberListDoubler
  def initialize(numbers)
    @numbers = numbers
  end

  def call
    @numbers.map { |n| multiply_by_two(n) }
  end

  def multiply_by_two(n)
    n * 2
  end
end

This is considerably more expressive in my opinion. It is not point-free like the &method alternative, but you can use a splat if you need more resilience:

def call
  @numbers.map { |*args| multiply_by_two(*args) }
end

But we are not at the bottom line yet. There is something more important than a stylistic issue: extracting a method comes at a cost, and as a wary programmer you should be aware of that.

Turns out this code is extracting off a method object and converting it to a block afterwards (hence the & character). Let’s run a benchmark comparing “block” versus “method object”:

require 'benchmark/ips'

class NumberDoubler
  def initialize(numbers)
    @numbers = numbers
  end

  def call_with_block
    @numbers.map { |n| multiply_by_two(n) }
  end

  def call_with_method
    @numbers.map(&method(:multiply_by_two))
  end

  def multiply_by_two(n)
    n * 2
  end
end

Benchmark.ips do |x|
  list = NumberDoubler.new([1, 2, 3, 4, 5])

  x.report 'with block' do 
    list.call_with_block
  end

  x.report 'with method' do
    list.call_with_method
  end
end

Now onto the results:

Calculating -------------------------------------
          with block    46.214k i/100ms
         with method    21.740k i/100ms
-------------------------------------------------
          with block    806.459k (± 7.0%) i/s -      4.021M
         with method    285.775k (± 7.4%) i/s -      1.435M

Comparison:
          with block:   806459.0 i/s
         with method:   285775.4 i/s - 2.82x slower

As you can see, method object is 2.82 times slower than block. This may not seem like a big deal if the code runs just a few times, but it may add up over the course of a real-world program. If we can avoid it, why not?

Ruby is slow, why should I care?

It is not that slow and I would dare to say it has acceptable performance for a dynamic language. This mindset is dangerous and may potentially turn a fast program into a slow one over time. As wary programmers, we should find a sweet spot between performance and readability, and avoid doing unnecessary work whenever we can.

Avoid doing more work if the trade is not worth it

There are still nice use cases for Object#method, and most of them refer to reflection and metaprogramming. We can ask a method for its source location:

# ["number_doubler.rb", 22]
p list.method(:call).source_location

Or for its arity:

# 0
puts list.method(:call).arity

Conclusion

I could have used an abstract and general prose here, but instead I chose to present content that is somehow relevant to the theme. That said, there is a lot more ground to cover!

I hope this post challenges you to think deeper about how your code looks and works under the hood, and to stop thinking “Ruby is slow, I don’t care”. If I have the opportunity, I hope to go over more examples in the near future.

UPDATE 1, 2017–03–10: As Paul Annesley and Benjamin Fleischer pointed out on Twitter, the private attribute warning has been removed as of Ruby 2.3. I did not really want to convey the meaning that they are wrong or invalid, but instead to provide a compelling argument on why I think they are superfluous.

Ruby is about freedom, so I think the team’s decision to remove the warning makes sense. Matz mentioned they did not think of self as a receiver for attribute readers, which implies the warning was meant to prevent them from being accidentally hidden beneath the private interface. Having self as a receiver is a semantically valid case, though in my opinion it is usually not worth it.

UPDATE 2, 2017–03–10: It’s important to make it clear that benchmarks of this post are MRI-specific, so they may or may not apply to other Ruby implementations as Michael Kohl pointed out in the comments.

We want to work with you. Check out our "What We Do" section!