A Day In The Life of a Ruby Enumerator

Let's see what is under the hoods

I spent some time trying to elaborate a good introduction for this post. I wanted to start with a great quote or something like that but I couldn’t. So let’s cut to the chase.

Understanding the internals of Ruby’s enumerator is a great way to learn many concepts like fibers and blocks using an already familiar concept.

This post is the second in a series about the internals of Ruby’s Enumerator. The first post is about Custom Enumerators and you can read it here.

Basic components of an Enumerator

First, let’s recall the two main components that make Enumerator possible: the Enumerator and Enumerator::Yielder classes.

enum = Enumerator.new do |y|
  y << 1
  y << 2
  y << 3
end

enum.each do |x|
  puts x
end

In the example above, enum is an instance of Enumerator, and y is an instance of Enumerator::Yielder.

Control Interleaving (Producer and Consumer)

The main concept behind Enumerators is control interleaving. This interleaving happens between a producer and a consumer. Think of it like a stream where the data flows from one place to another.

In the last section’s example, we have both the producer and consumer working. Let’s identify them. For that, let’s imagine the Enumerator as that tennis machine that throws balls. It produces balls. The tennis player is the consumer.

Tenis ball machine

You don’t need to know much about tennis — I don’t either, their scores, for instance, don’t make sense to me. Let’s rewrite the last example to reproduce this.

tenis_machine = Enumerator.new do |thrower|
  thrower << 'ball 1'
  thrower << 'ball 2'
  thrower << 'ball 3'
end

tenis_player = -> ball {   puts "Tennis player receiving the ball: #{ball}" }

tenis_machine.each(&tenis_player)

The Enumerator itself is the tennis machine, it throws (yields) balls. To ‘receive’ (consume) the ball we use each – think of each method as the button that turns the machine on.

To wrap up:

  • Enumerator::Yielder produces values but does it lazily; only when the consumer asks for them using a method like each it starts producing the values.
  • To actually consume the values a block must be provided.

Hence, the block passed in the Enumerator.new is the producer, which produces values using the Yielder. The block passed to each – and similar methods – is the consumer.

Getting Technical

Let’s look at the Yielder implementation to make things clearer. I know this class is harder to infer since we don’t need to care much about it when working with custom enumerators but it’s simple.

class Traversor::Yielder
    def initialize(&yielder_block)
        @yielder_block = yielder_block
    end

    def yield(value)
        @yielder_block.call(value)
    end

    alias << yield
end

That’s it. It takes a block and every time the method << (yield) is called it calls the block with the value passed as an argument. See the example below:

yielder = Traversor::Yielder.new do |x|
    puts x
end

yielder << 1
yielder << 2
yielder << 3

It looks good! Every time we call << the block is called — and in this case, it prints the value. We can give a name to this block and make this code closer to the Enumerator implementation:

each_block = Proc.new do |x|
    puts x
end

yielder = Traversor::Yielder.new(&each_block)

yielder << 1
yielder << 2
yielder << 3

I hope it gives a clue about which block will be executed in which part. But if not yet, no worries. Let’s move on. We are about to connect the pieces.

Let’s recall the Enumerator.new interface. It takes a block and such a block is called with an instance of Yielder. This tells us that the Enumerator wraps the producer block in a Yielder instance.

class Traversor
    def initialize(&block)
        @block = block
    end

    def each(&each_block)
        yielder = Traversor::Yielder.new(&each_block)

        @block.call(yielder)
    end
end

Believe me, this works. You just made a simple Enumerator. You can use it like this:

enum = Traversor.new do |y|
    y << 1
    y << 2
    y << 3
end

enum.each do |x|
    puts x
end

More useful methods

To prove this is a good enough Enumerator, let’s implement some more methods. Very common methods.

class Traversor
    # omitted code

    def map
        return self unless block_given?

        result = []

        each do |x|
            result << yield(x)
        end
    end

    def select
        return self unless block_given?

        result = []

        each do |x|
            result << x if yield(x)
        end
    end
end

NOTE: This code claims for an isolation of parts that change and parts that don’t. Maybe some metaprogramming can be useful here to avoid repeating code.

NOTE: Have you noticed how methods like map and select are implemented? They are implemented using the each method. That’s why it’s so simple to implement enumerators using Enumerable module. An each method is all you need*.

Traversor Limitations

Our Enumerator is good enough. It covers the most cases. But there are some limitations. We can’t have next or rewind methods on
the consumer side. In summary, we can’t have external iterators.

Our Traversor uses blocks to control the interleaving of the producer and the consumer. Thanks to lazy evaluation it’s working as expected. For external iterators though, we need to have a way to pause and resume the execution.

To achieve the full Enumerator behavior we need to use some construct that allows us to pause and resume the execution.
We need some concurrency mechanism.

On concurrency mechanisms, we have four options: processes, threads, fibers, and reactors. Let’s see which one fits best
for our case.

  • Processes: too heavy for this task. Also, IPC is not that trivial.
  • Threads: too complex. We need to deal with synchronization. Also, we may face issues since we don’t have control over scheduling.
  • Reactors: too new. Also, it’s an actor-model. It’s supposed to solve problems that we don’t have.
  • Fibers: just right. They are lightweight and we have full control over scheduling.

The official Enumerator doc mentions Fibers:

Note that enumeration sequence by next, next_values, peek and peek_values do not affect other non-external enumeration methods, unless the underlying iteration method itself has side-effect, e.g. IO#each_line.

Moreover, implementation typically uses fibers so performance could be slower and exception stacktraces different than expected.

We’re on the right path. Let’s dig into fibers.

Fibers

Fibers allow us to do cooperative concurrency in a way that’s simple and efficient.

Think of a simple way of doing concurrency. Well, Fibers are simpler than that. To get this straight, let’s see an example:

require 'fiber'

puts "Root fiber: #{Fiber.current}\n\n"

f = Fiber.new do
  puts "Fiber says hello"
  puts "I am fiber #{Fiber.current}"

  Fiber.yield 10 # gets the control back to the caller

  puts "Fiber says goodbye"
end

pp f
value = f.resume # fiber execution starts till the first yield
puts "Value received from Fiber is: #{value}"
pp f

puts "\n------- \n\n"

value = f.resume
pp f
puts "Value received from Fiber is: #{value.inspect}"

In this simple script we can see a few things:

  • There’s a root fiber. It’s the main fiber that runs the script. You can see it on irb.
  • We create a new fiber using Fiber.new. The block passed to Fiber.new is the fiber’s body.
  • When resume is called on a fiber, it starts running the fiber’s body until it reaches a Fiber.yield. The control is then passed back to the caller.
  • A Fiber has states: created, resumed, suspended, and terminated. When a fiber is resumed it’s running. When it’s suspended it’s waiting for a resume call. When it’s terminated it’s done.
  • If resume is called and there’s no Fiber.yield in the fiber’s body, the fiber is terminated, and nil is returned to the caller.

The output of this script is:

Root fiber: #<Fiber:0x000000014381a0a8 (resumed)>

#<Fiber:0x0000000143819d38 /tmp/fibers.rb:5 (created)>
Fiber says hello
I am fiber #<Fiber:0x0000000143819d38 /tmp/fibers.rb:5 (resumed)>
Value received from Fiber is: 10
#<Fiber:0x0000000143819d38 /tmp/fibers.rb:5 (suspended)>

-------

Fiber says goodbye
#<Fiber:0x0000000143819d38 /tmp/fibers.rb:5 (terminated)>
Value received from Fiber is: nil

Very useful for us. We can use fibers to control the interleaving of the producer and the consumer when using external enumerators. The Yielder will execute inside of a fiber so it produces a value and then switches the execution to the consumer fiber.

The Complete Enumerator

We will make the following changes to our Enumerator:

  • There will be a fiber dedicated to external iteration. This fiber will be (lazily) created when next method is called.
    • In this case, the producer block will be executed inside of a fiber.
  • The Yielder will use Fiber to yield values when no block is provided
  • There will be a rewind method that will clean up the consumer fiber so the external iteration can start over.
require 'fiber'

class Traversor
    def initialize(&block)
        @block = block
    end

    def each(&each_block)
        return self unless block_given?

        yielder = Traversor::Yielder.new(&each_block)

        @block.call(yielder)
    end

    def rewind
        @yielder_fib = nil
    end

    def next
        value = yielder_fib.resume

        raise StopIteration unless value && yielder_fib.alive?

        value
    end

    private

    def yielder_fib
        @yielder_fib ||= Fiber.new do
            @block.call(Traversor::Yielder.new)
        end
    end
end

class Traversor::Yielder
    def initialize(&yielder_block)
        @yielder_block = yielder_block
    end

    def yield(value)
        return @yielder_block.call(value) if @yielder_block

        Fiber.yield(value)
    end

    alias << yield
end

Our class in action

We can use our class as we do using Enumerator.

finite_enum = Traversor.new do |y|
    y << 1
    y << 2
    y << 3
    y << 4
end

infinite_enum = Traversor.new do |y|
    i = 0
    loop do
        y << i
        i += 1
    end
end

finite_enum.each do |x|
  puts x
end

puts infinite_enum.next # 0
puts infinite_enum.next # 1
puts infinite_enum.next # 2

infinite_enum.rewind

puts infinite_enum.next # 0
puts infinite_enum.next # 1
puts infinite_enum.next # 2

Conclusion

Isn’t it exciting? We just created a class that mimics Enumerator behavior and we learned:

  • Producer/consumer pattern
  • Blocks and lazy evaluation
  • Cooperative concurrency

Such concepts are relevant not only in Ruby but in many other technologies and subjects of software development.

References

We want to work with you. Check out our "What We Do" section!

Edigleysson Silva

I own a computer

View all posts by Edigleysson Silva →