I spent some time trying to elaborate a good introduction for this post. I wanted to start with a great quote or something like that but I couldn’t. So let’s cut to the chase.
Understanding the internals of Ruby’s enumerator is a great way to learn many concepts like fibers and blocks using an already familiar concept.
This post is the second in a series about the internals of Ruby’s Enumerator. The first post is about Custom Enumerators and you can read it here.
Basic components of an Enumerator
First, let’s recall the two main components that make Enumerator possible: the Enumerator
and Enumerator::Yielder
classes.
enum = Enumerator.new do |y|
y << 1
y << 2
y << 3
end
enum.each do |x|
puts x
end
In the example above, enum
is an instance of Enumerator
, and y
is an instance of Enumerator::Yielder
.
Control Interleaving (Producer and Consumer)
The main concept behind Enumerators is control interleaving. This interleaving happens between a producer and a consumer. Think of it like a stream where the data flows from one place to another.
In the last section’s example, we have both the producer and consumer working. Let’s identify them. For that, let’s imagine the Enumerator as that tennis machine that throws balls. It produces balls. The tennis player is the consumer.
You don’t need to know much about tennis — I don’t either, their scores, for instance, don’t make sense to me. Let’s rewrite the last example to reproduce this.
tenis_machine = Enumerator.new do |thrower|
thrower << 'ball 1'
thrower << 'ball 2'
thrower << 'ball 3'
end
tenis_player = -> ball { puts "Tennis player receiving the ball: #{ball}" }
tenis_machine.each(&tenis_player)
The Enumerator itself is the tennis machine, it throws (yields) balls. To ‘receive’ (consume) the ball we use each
– think of each
method as the button that turns the machine on.
To wrap up:
Enumerator::Yielder
produces values but does it lazily; only when the consumer asks for them using a method likeeach
it starts producing the values.- To actually consume the values a block must be provided.
Hence, the block passed in the Enumerator.new
is the producer, which produces values using the Yielder. The block passed to each
– and similar methods – is the consumer.
Getting Technical
Let’s look at the Yielder
implementation to make things clearer. I know this class is harder to infer since we don’t need to care much about it when working with custom enumerators but it’s simple.
class Traversor::Yielder
def initialize(&yielder_block)
@yielder_block = yielder_block
end
def yield(value)
@yielder_block.call(value)
end
alias << yield
end
That’s it. It takes a block and every time the method <<
(yield) is called it calls the block with the value passed as an argument. See the example below:
yielder = Traversor::Yielder.new do |x|
puts x
end
yielder << 1
yielder << 2
yielder << 3
It looks good! Every time we call << the block is called — and in this case, it prints the value. We can give a name to this block and make this code closer to the Enumerator implementation:
each_block = Proc.new do |x|
puts x
end
yielder = Traversor::Yielder.new(&each_block)
yielder << 1
yielder << 2
yielder << 3
I hope it gives a clue about which block will be executed in which part. But if not yet, no worries. Let’s move on. We are about to connect the pieces.
Let’s recall the Enumerator.new interface. It takes a block and such a block is called with an instance of Yielder. This tells us that the Enumerator wraps the producer block in a Yielder instance.
class Traversor
def initialize(&block)
@block = block
end
def each(&each_block)
yielder = Traversor::Yielder.new(&each_block)
@block.call(yielder)
end
end
Believe me, this works. You just made a simple Enumerator. You can use it like this:
enum = Traversor.new do |y|
y << 1
y << 2
y << 3
end
enum.each do |x|
puts x
end
More useful methods
To prove this is a good enough Enumerator, let’s implement some more methods. Very common methods.
class Traversor
# omitted code
def map
return self unless block_given?
result = []
each do |x|
result << yield(x)
end
end
def select
return self unless block_given?
result = []
each do |x|
result << x if yield(x)
end
end
end
NOTE: This code claims for an isolation of parts that change and parts that don’t. Maybe some metaprogramming can be useful here to avoid repeating code.
NOTE: Have you noticed how methods like map
and select
are implemented? They are implemented using the each
method. That’s why it’s so simple to implement enumerators using Enumerable module. An each
method is all you need*.
Traversor Limitations
Our Enumerator is good enough. It covers the most cases. But there are some limitations. We can’t have next
or rewind
methods on
the consumer side. In summary, we can’t have external iterators.
Our Traversor uses blocks to control the interleaving of the producer and the consumer. Thanks to lazy evaluation it’s working as expected. For external iterators though, we need to have a way to pause and resume the execution.
To achieve the full Enumerator behavior we need to use some construct that allows us to pause and resume the execution.
We need some concurrency mechanism.
On concurrency mechanisms, we have four options: processes, threads, fibers, and reactors. Let’s see which one fits best
for our case.
- Processes: too heavy for this task. Also, IPC is not that trivial.
- Threads: too complex. We need to deal with synchronization. Also, we may face issues since we don’t have control over scheduling.
- Reactors: too new. Also, it’s an actor-model. It’s supposed to solve problems that we don’t have.
- Fibers: just right. They are lightweight and we have full control over scheduling.
The official Enumerator doc mentions Fibers:
Note that enumeration sequence by next, next_values, peek and peek_values do not affect other non-external enumeration methods, unless the underlying iteration method itself has side-effect, e.g. IO#each_line.
Moreover, implementation typically uses fibers so performance could be slower and exception stacktraces different than expected.
We’re on the right path. Let’s dig into fibers.
Fibers
Fibers allow us to do cooperative concurrency in a way that’s simple and efficient.
Think of a simple way of doing concurrency. Well, Fibers are simpler than that. To get this straight, let’s see an example:
require 'fiber'
puts "Root fiber: #{Fiber.current}\n\n"
f = Fiber.new do
puts "Fiber says hello"
puts "I am fiber #{Fiber.current}"
Fiber.yield 10 # gets the control back to the caller
puts "Fiber says goodbye"
end
pp f
value = f.resume # fiber execution starts till the first yield
puts "Value received from Fiber is: #{value}"
pp f
puts "\n------- \n\n"
value = f.resume
pp f
puts "Value received from Fiber is: #{value.inspect}"
In this simple script we can see a few things:
- There’s a root fiber. It’s the main fiber that runs the script. You can see it on
irb
. - We create a new fiber using
Fiber.new
. The block passed toFiber.new
is the fiber’s body. - When
resume
is called on a fiber, it starts running the fiber’s body until it reaches aFiber.yield
. The control is then passed back to the caller. - A Fiber has states:
created
,resumed
,suspended
, andterminated
. When a fiber isresumed
it’s running. When it’ssuspended
it’s waiting for aresume
call. When it’sterminated
it’s done. - If
resume
is called and there’s noFiber.yield
in the fiber’s body, the fiber isterminated
, andnil
is returned to the caller.
The output of this script is:
Root fiber: #<Fiber:0x000000014381a0a8 (resumed)>
#<Fiber:0x0000000143819d38 /tmp/fibers.rb:5 (created)>
Fiber says hello
I am fiber #<Fiber:0x0000000143819d38 /tmp/fibers.rb:5 (resumed)>
Value received from Fiber is: 10
#<Fiber:0x0000000143819d38 /tmp/fibers.rb:5 (suspended)>
-------
Fiber says goodbye
#<Fiber:0x0000000143819d38 /tmp/fibers.rb:5 (terminated)>
Value received from Fiber is: nil
Very useful for us. We can use fibers to control the interleaving of the producer and the consumer when using external enumerators. The Yielder will execute inside of a fiber so it produces a value and then switches the execution to the consumer fiber.
The Complete Enumerator
We will make the following changes to our Enumerator:
- There will be a fiber dedicated to external iteration. This fiber will be (lazily) created when
next
method is called.- In this case, the producer block will be executed inside of a fiber.
- The Yielder will use Fiber to yield values when no block is provided
- There will be a
rewind
method that will clean up the consumer fiber so the external iteration can start over.
require 'fiber'
class Traversor
def initialize(&block)
@block = block
end
def each(&each_block)
return self unless block_given?
yielder = Traversor::Yielder.new(&each_block)
@block.call(yielder)
end
def rewind
@yielder_fib = nil
end
def next
value = yielder_fib.resume
raise StopIteration unless value && yielder_fib.alive?
value
end
private
def yielder_fib
@yielder_fib ||= Fiber.new do
@block.call(Traversor::Yielder.new)
end
end
end
class Traversor::Yielder
def initialize(&yielder_block)
@yielder_block = yielder_block
end
def yield(value)
return @yielder_block.call(value) if @yielder_block
Fiber.yield(value)
end
alias << yield
end
Our class in action
We can use our class as we do using Enumerator.
finite_enum = Traversor.new do |y|
y << 1
y << 2
y << 3
y << 4
end
infinite_enum = Traversor.new do |y|
i = 0
loop do
y << i
i += 1
end
end
finite_enum.each do |x|
puts x
end
puts infinite_enum.next # 0
puts infinite_enum.next # 1
puts infinite_enum.next # 2
infinite_enum.rewind
puts infinite_enum.next # 0
puts infinite_enum.next # 1
puts infinite_enum.next # 2
Conclusion
Isn’t it exciting? We just created a class that mimics Enumerator behavior and we learned:
- Producer/consumer pattern
- Blocks and lazy evaluation
- Cooperative concurrency
Such concepts are relevant not only in Ruby but in many other technologies and subjects of software development.
References
- https://ruby-doc.org/core-3.0.2/Enumerator.html
- https://ruby-doc.org/core-3.0.2/Fiber.html
- https://www.visuality.pl/posts/concurrency-and-parallelism-in-ruby-processes-threads-fibers-and-ractors
We want to work with you. Check out our "What We Do" section!