Ruby Enumerators: A Point of View You’ve Never Had

When and how to use custom Enumerators

The 80-20 rule is empirically observed in many human phenomena and activities and it is no different in programming.

By understanding a portion of the features of ReactJS, for example, you can produce web applications that work well. By understanding 20% of the git commands, you can work daily without facing major problems.

With code, in general, these 20% are abstractions that we use. They allow us to solve most problems because they are the most trivial/common – that is why there is already an abstraction that deals with this.

Things become complex when you need to implement something like iframes with dynamic height, which can not be solved with pure ReactJS.

It is important to tap into the 80% of knowledge that you haven’t explored yet. This knowledge can help you solve specific problems that will likely significantly impact your application and business. Many tools, including programming languages and their features, follow the 80-20 rule.

This post starts a series on Enumerators in Ruby. You’ve probably already used Enumerators through common methods like each, map, and select – don’t worry, I know the last two are technically from the Enumerable mixin. Since we’re not dealing with the trivial here, I hope you already know a little about these methods I mentioned. This post will show you how and when to use Custom Enumerators.

Table of contents

  1. The Iterator Pattern
  2. Generating Sequences
  3. Use cases for custom Enumerators
  4. Full example
  5. Conclusion
  6. References

The Iterator Pattern

The documentation for Ruby’s Enumerator class says the following:

A class which allows both internal and external iteration.

This refers to a well-known pattern that exists and is implemented in languages ​​such as Java, Python and C#. The pattern consists of an Iterator and an Iterable. The Iterable is the collection that will be iterated. The Iterator is the object that knows how to iterate through that collection.

This is a useful pattern for encapsulation and for maintaining uniformity in how collections are iterated. In some languages, such as JavaScript, adhering to the iterator protocol allows, for example, to use for...of on any object.

Here is an example of an iterator that iterates over a string.

class StringIterator
  def initialize(text)
    @iterable = text
    @index = 0
  end

  def next
    raise StopIteration if @index >= @iterable.length

    value = @iterable[@index]
    @index += 1

    value
  end
end

# using
str_it = StringIterator.new("Hello")
puts str_it.next # H
puts str_it.next # e
puts str_it.next # l
puts str_it.next # l
puts str_it.next # o
puts str_it.next # raises StopIteration

This is a simple iterator that can be easily made using the Enumerator class. The code above is a simple example to give you an idea of how an external iterator works.

Rewriting the above code with Enumerator. We have:

str_it = "Hello".each_char # returns an Enumerator that can be iterated using the next method as shown above

Generating sequences

One thought you might have at first is that Enumerators are about collections. In the sense that they only deal with finite lists like arrays.

The Enumerator, however, does more than that. That is why I’d like to establish the Enumerator here as a Sequence Generator. For some reason this denotation of iterate/iterator gives me the idea of something finite. But an Enumerator can generate infinite sequences.

I want you to think of enumerators as sequence generators for things like: a sequence of natural numbers. This is quite easy to do. We can use the Range type for this

natural_seq = (1..Float::INFINITY).to_enum

puts natural_seq.next # => 1
puts natural_seq.next # => 2
puts natural_seq.next # => 3

We have an infinite sequence!

You may be wondering: for Range, it’s very simple. There’s no infinite collection saved because there’s no need for it. To represent a range, I only need to save the boundaries.

This couldn’t be more correct. That’s why I mentioned that we have an infinite sequence. I want to give you this idea of sequence generation. On demand, really.

That was very simple. Let’s complicate things a little more by generating a less predictable infinite sequence, one that I can’t
simply use a Range.

A Fibonacci Sequence

You know, the Fibonacci sequence is given by the sum of natural numbers. The third number onwards is defined as
the sum of the previous two. The sequence becomes: 0 1 1 2 3 5 8…

This is not a sequence that can be defined with a simple Range. To represent this, we can use the Enumerator class with a block.

Note that in this case the << method is just an alias for the yield method of the Enumerator::Yielder class. Not to be confused with the << alias of the push method of arrays.

fib = Enumerator.new do |yielder|
  a = 0
  b = 1

  loop do
    yielder << a

    a, b = b, a + b
  end
end

We have our sequence in which each step is defined by this sum and we transform it into an enumerator. Yes, that is what happens to the object when you call to_enum.

A made up range enumerator would be something like this:

range_enum = Enumerator.new do |yielder|
  step = 1
  range_end = Float::INFINITY
  current = 1

  loop do
    yielder << current

    current += step

    break if current > range_end
  end
end

# this is equivalent to

# range_enum = (1..Float::INFINITY).to_enum

As we’ve seen, this way of defining enumerators, passing a block, is much more flexible since we can define how the elements will be generated.

Let’s say, for example, that we want to generate Fibonacci numbers as long as they are odd, we would just need to modify the part that yields the item.

odd_fib = Enumerator.new do |yielder|
    a = 0
    b = 1

    loop do
        yielder << a if a.odd?

        a, b = b, a + b
    end
end

Use cases for Custom Enumerators

One case is the generation of infinite sequences. In fact, it is one of the few examples of custom enumerators in the documentation.

Generating a Fibonacci sequence, however, is not very practical. So let’s think about a general use case, such as a log parser.

In Kubernetes, the Container Runtime Interface (CRI) is a component that acts as a middle ground between the kubelet and the container runtime. CRI is the protocol that defines how the kubelet should interact with the container runtime.

What does this have to do with enumerators? At first, nothing. Except for the fact that CRI logs follow a specific format. They are prefixed with a timestamp, an indicator of which log stream it is, and an identifier of whether the log is full (F) or partial (P). For example:

2023-10-06T00:17:09.669794202Z stdout F A log message
2023-10-06T00:17:09.669794202Z stdout P Winx when we hold hands
2023-10-06T00:17:09.669794202Z stdout P We become powerful.
2023-10-06T00:17:09.669794202Z stdout F Because together we are invincible

NOTE: For simplicity’s sake we will only deal with stdout logs.

How can we create a log parser that provides us with an Enumerator allowing iteration over the aggregated logs? Partial logs should be grouped until a complete log is found.

The following is an example of a simple implementation of a CRI log parser that returns an Enumerator that allows us to iterate over the logs in an aggregated way.

logs = [
  '2023-10-06T00:17:09.669794202Z stdout F A log message',
  '2023-10-06T00:17:09.669794202Z stdout P Winx when we hold hands ',
  '2023-10-06T00:17:09.669794202Z stdout P We have become powerful. ', 
  '2023-10-06T00:17:09.669794202Z stdout F Because together we are invincible.'
]

class CRIParserEnumerator
    def initialize(logs)
        @logs = logs
    end

    def to_enum
        Enumerator.new do |yielder|
            current_log = ''

            for log in @logs
                parsed = log.split(/stdout (F|P) /).last
                current_log += parsed

                if log.match?(/stdout F/)
                    yielder << current_log
                    current_log = ''
                end
            end
        end
    end
end

parser_enum = CRIParserEnumerator.new(logs).to_enum
parser_enum.each_with_index do |log, index|
    puts "======= Log #{index + 1} =======\n\n#{log}\n"
end

The output of this code would be something like:

======= Log 1 =======

A log message
======= Log 2 =======

Winx when we join hands We become powerful. Because together we are invincible.

Interesting, isn’t it?

At the beginning of the post I mentioned I expect you to have a bit of knowledge of the Enumerator class and the Enumerable mixin.

That’s the moment this knowledge comes into play. You may have noticed that there’s no need for the Enumerator.new in the last code snippet. The same result can be achieved using the each method to iterate through the logs array inside of our to_enum method.

I promised you this post would get you into the 80% of unexplored knowledge. I will. But before we must get into the most common way of using Enumerators in Ruby. The internal iterators.

Internal Iterators

In the previous example, we stopped using the next method. After creating an Enumerator with to_enum, we iterate over the logs using the each_with_index method.

# a bunch of stuff omitted...
parser_enum = CRIParserEnumerator.new(logs).to_enum
parser_enum.each_with_index do |log, index|
    puts "======= Log #{index + 1} =======\n\n#{log}\n"
end

eeach_with_index is a way of iterating an Enumerator but letting the enumerator to generate the items we receive in the provided block. It’s internally done, that’s where the name, internal iterator, comes from.

We inverted the control. Instead of actively asking for a new item by calling next method we just define a block and become some sort of a listener.

I won’t call you. You call me.

Let’s bring this concept to the CRIParserEnumerator class. Since @logs is an array we can use an internal iterator to loop through it. No for-loop needed. Let’s just define our own each and include Enumerable. The implementation then becomes something like shown below.

class CRIParserEnumerator
    include Enumerable

    def initialize(logs)
        @logs = logs
    end

    def each
        current_log = ''

        @logs.each do |log|
            parsed = log.split(/stdout (F|P) /).last
            current_log += parsed

            if log.match?(/stdout F/)
                yield current_log
                current_log = ''
            end
        end
    end
end

parser_enum = CRIParserEnumerator.new(logs)
parser_enum.each_with_index do |log, index|
    puts "======= Log #{index + 1} =======\n\n#{log}\n"
end

This makes the code more declarative, and simple to understand and is what is most commonly seen in Ruby implementations.

Notice how:

  • we no longer need to call an Enumerator.new inside our class;
  • The to_enum is no longer necessary – although it exists due to the object method (Kernel::to_enum).

I know this is awesome and we got back to a common place where can just call each methods on arrays and be happy. Life is not always that simple.

Let’s say our program needs to get these logs through polling. We can’t define a simple each function because we don’t have a previous list. By defining a block in the Enumerator, we can add extra logic to enable this kind of buffering.

We’re almost there. Let’s see the final implementation of CRIParserEnumerator which has a polling to fetch logs.

Full Example

For this example, we will use objects from the LogBucket class. It is a class I created to simulate a service that fetches logs on demand. A call to the fetch method fetches logs simulating an HTTP call. Each call returns zero or more logs.

The implementation of this service can be found in this gist.

Calls to this service will return results like the one shown below.

2024-08-14T02:38:46.282585000ZZ stdout P I know you'll want to be
2024-08-14T02:38:46.282681000ZZ stdout P one of us
2024-08-14T02:38:46.282698000ZZ stdout P Winx when we hold hands
2024-08-14T02:38:46.282710000ZZ stdout P We become powerful
2024-08-14T02:38:46.282723000ZZ stdout F Because together we are invincible

Remember that our parser needs to aggregate the logs. In the sample above, you would have to accumulate the three calls to create a message that can be displayed.

This polling is something that cannot be known in advance; it needs to be done on demand. This is where the Enumerator with the block is necessary.

class CRIParserEnumerator
    include Enumerable

    def each
        e = Enumerator.new do |yielder|
            current_log = ''

            loop do
                logs = bucket_service.fetch

                logs.each do |log|
                    parsed = log.split(/stdout (F|P) /).last
                    current_log += parsed

                    if log.match?(/stdout F/)
                        yielder << current_log
                        current_log = ''
                    end
                end
            end
        end

        return e unless block_given?

        e.each { |log| yield log }
    end

    private

    def bucket_service
        @bucket_service ||= LogBucket.new
    end
end

parser = CRIParserEnumerator.new
parser.take(10).each_with_index do |log, index|
    puts "\n" if index > 0
    puts "======= Log #{index + 1} =======\n\n#{log}\n"

    sleep 1
end

The block was necessary because we have an infinite loop that polls logs. After fetching, the logs are aggregated and yielded.

It may not seem like it, but here we even have a bit of concurrency, since the Enumerator block is executed in a Fiber. That’s why it’s possible to have an infinite loop without freezing the program.

The use of internal iterators is interesting because the clients that use the method don’t need to be changed. In this use case they used a custom Enumerator to improve the application’s performance and did so without affecting any client because they did it under an Enumerator umbrella.

The image below shows the above code running. Some logs have been added, showing the result between what was returned by the service (in blue) and what was aggregated by the Enumerator (in white).

Demo log parser

Note how messages in white only appear when they are aggregated.

Conclusion

In this post, we have covered a particular use case for Enumerators, which is when more complex logic is required for sequence generation.

During the post, we mentioned the use of Fibers by the Enumerator. In the next post in this series, we will explore the inner workings of the Enumerator by implementing our Enumerator from scratch.

References

We want to work with you. Check out our "What We Do" section!