What Puma, Falcon, and Pitchfork teach you about Ruby concurrency

by Edy Silva on May 27, 2026 13 min

Table of Contents

I went on a tour of Ruby concurrency while preparing a RubyConf talk on Ractors. Going in, I knew Puma, Falcon, and Unicorn. Coming out, I’d added Pitchfork to that list – and noticed something I hadn’t seen before: every Ruby concurrency primitive has at least one production web server built on top of it.

Puma is the case for threads. Pitchfork is the case for processes. Falcon is the case for fibers. All three answer the same question: how do I serve the next request before this one finishes? – and they answer it in three completely different ways.

Except Ractors. The fourth Ruby concurrency primitive – the only one that gives you actually parallel Ruby code on multiple cores inside one process – has no production web server built on it. Not one. By the end of this post, you’ll understand exactly why, and what Ruby 4.0 starts to change.

What a web server actually does

A web server is the thing that sits between the internet and your Ruby app. Requests come in, the server hands them to your code, your code returns a response and the server sends it back. Puma, Falcon, and Pitchfork all do exactly that job.

A web server sits between client devices and a Ruby application, taking HTTP requests in and sending responses back

The interesting question – the one this post is really about – is what happens when more than one client wants service at the same time. The simplest possible server can only handle one request at a time: until your code returns, nothing else gets through. Everyone else waits in line.

What "parallel" actually means in Ruby

Before we look at the three servers, there’s one fact that shapes every decision they make.

Ruby – the CRuby implementation you’re almost certainly running – has a Global VM Lock, the GVL. The rule is simple and brutal: only one thread executes Ruby bytecode at a time, per process. Spawn 100 threads. The OS will happily schedule all of them. But inside the Ruby VM, exactly one is running Ruby code at any given instant. The rest wait their turn.

So, given the GVL, how do you actually run more than one thing at a time in Ruby? Three answers.

First, the GVL releases on I/O. The moment a thread blocks on a socket read, a database query, a file read – anything that calls into the kernel and waits – it drops the GVL and another thread picks it up. Threads buy you concurrency on I/O, even though they don’t buy you parallelism on CPU.

Second, the GVL is per process. Two processes have two independent GVLs. They actually run in parallel, on two cores, no contention. That’s why fork is a serious concurrency strategy in Ruby and not just an old Unix trick – if you want two CPU-bound things running at the same time, two processes is how you get there.

Third, fibers don’t even need the OS to schedule them. A fiber is a coroutine – a tiny unit of execution that pauses and resumes in pure Ruby, with no OS thread of its own. Thousands of fibers can live inside one thread, sharing one GVL, trading control cooperatively. Since Ruby 3.0, a fiber scheduler can automatically suspend a fiber on I/O and resume another – which is how you handle ten thousand idle connections without spawning ten thousand threads.

Side-by-side comparison of how Ruby achieves concurrency with threads, processes, and fibers - showing the GVL in each model

Keep this picture in your head:

Parallelism on CPU -> multiple processes (or Ractors, more on those later)
Concurrency on I/O -> threads are enough
Concurrency on many idle connections -> fibers do it cheaper than threads

Puma, Falcon, and Pitchfork each took one of these patterns and built a web server around it. None of them is "more advanced" than the others – they’re three different shapes. Let’s walk through them in the same order, starting with the supposedly heaviest concurrency unit: the OS process. Pitchfork’s whole reason for existing is to make that label less true than it sounds.

Pitchfork – fork the world

The oldest answer in Unix: when you need to do another thing, copy the process.

The Unicorn family – Unicorn, then Pitchfork – boots one master process, opens the listening socket, and then forks N child workers. Each child inherits the listen socket. Each child loops on accept. The kernel decides which child wakes up for each new connection.

The whole "concurrency" is the OS scheduler running N copies of Ruby on N cores. No threads. No fibers. No shared mutable state. Each request gets a fresh, predictable, isolated Ruby VM. The GVL doesn’t matter because each worker is its own GVL.

WORKERS = 4
listener = TCPServer.new(4000)

WORKERS.times do
  fork do
    loop do
      conn = listener.accept
      handle_request(conn, app)
      conn.close
    end
  end
end

Process.waitall

The strength is total: workers can’t corrupt each other’s state because they don’t share state. A worker leaking memory? Kill it, fork another. A worker hung on a slow query? Kill it, fork another. This is why Shopify, GitHub, and basically every large Rails monolith ran on Unicorn for a decade.

The catch is memory. fork is supposed to be cheap thanks to copy-on-write – parent and child share physical memory pages until one of them writes, so N workers should cost a lot less than N independent processes. In practice, the shared memory bleeds away under real traffic. Pages get touched, and each worker drifts toward holding its own full copy of the heap.

This is exactly the problem Pitchfork was built to solve, and the solution is unusually clever. The trick is called refork.

Instead of forking every new worker from the cold master, Pitchfork lets a worker warm up – serve real traffic, fill its caches, settle into the shape it’s going to keep – and then promotes that warm worker into a "mold." The next worker is forked from the mold, not from the master. It’s born pre-warmed, and most of its memory is still genuinely shared with the mold it came from.

Two-panel diagram: on the left, traditional fork creates cold workers with mostly dirty/unique memory; on the right, Pitchfork warms up a worker, promotes it into a "mold", and reforks new workers from the mold so they inherit shared memory

At Shopify scale, this is engineering gold. Every dirty memory page across thousands of Rails servers eventually becomes a number on an invoice. Reforking shrinks the fleet enough that the trick was worth building from scratch – a Rails monolith that used to need a small army of boxes now fits on a fraction of the same fleet. (Shopify wrote up the impact on their monolith.)

You pay for the cleverness in complexity. The master coordinates a mold lifecycle. The refork dance has its own failure modes. There are decisions to make about when a worker should become the mold and how often to rotate them. But the underlying shape – one master, N workers, each its own isolated Ruby – is still the simple, predictable process model that’s been working since the 1970s.

Puma – thread the world

Pitchfork’s only concurrency unit is the process – one worker, one request at a time. The second answer keeps the same forked-worker shape (one process per core, Nate Berkopec’s standard recommendation) but stacks threads inside each worker: now one Ruby process can handle many in-flight requests at once. This is the model most Rails apps actually run.

Inside each worker, Puma is the canonical thread-pool server:

One acceptor thread loops on server.accept and pushes each connection onto a queue.
A pool of worker threads each loop on queue.pop, handle the Rack request, push the response back.
The queue is a SizedQueue so the pool can apply backpressure when overwhelmed.

QUEUE = SizedQueue.new(64)
POOL_SIZE = 16

# Acceptor
Thread.new do
  loop { QUEUE << server.accept }
end

# Workers
POOL_SIZE.times do
  Thread.new do
    loop do
      conn = QUEUE.pop
      handle_request(conn, app)
      conn.close
    end
  end
end

Puma's production architecture: a master process forks one worker per CPU core, and each worker runs an internal pool of threads

Lightweight. Cheap to spawn. Inside one worker, all threads share the loaded app – no per-thread Rails memory footprint. 16 threads cost roughly what 1 thread costs, in RAM.

What makes threads work for web requests is simple: most of a request is I/O. Rails spends most of its time waiting on PostgreSQL, Redis, or some external HTTP call – and the GVL releases for every one of those waits. While a Pitchfork worker sits blocked on one slow query, a Puma worker with 16 threads can have 16 different requests parked on I/O at the same time. The thread pool is doing exactly the workload it was designed for.

The thread-pool model stops paying off when each connection needs a dedicated thread for a long time. WebSockets. Long-polling. Server-sent events. Each one of those holds a thread for minutes or hours, doing almost nothing – the connection lives long, even if there’s no real work happening on it. Each worker’s thread pool fills up with idle connections, and once every worker × every thread is taken, the next client waits.

You can crank up the thread count – but threads aren’t free. Each one has its own stack (~1 MB by default), and each one competes on the GVL even when it’s just waking up to check a socket. There’s a ceiling, and it’s lower than you’d want for a chat server.

Falcon – fiber the world

Threads work great until your connections start lasting minutes or hours instead of milliseconds. Every WebSocket client past the thread-pool ceiling is a client waiting in line – that’s where Puma’s shape runs out. Falcon’s answer is to throw threads out entirely.

Falcon, like Puma, boots a fleet of forked workers by default – one per CPU core. The interesting part is what happens inside one worker: one OS thread, one event loop, and every accepted connection becomes a fiber.

Samuel Williams’ async gem implements the scheduler that makes this work. Falcon is built on top of it. The shape inside one worker:

require 'async'

Async do |task|
  server = TCPServer.new('0.0.0.0', 4000)

  loop do
    conn = server.accept
    task.async do |subtask|
      handle_request(conn, app)
      conn.close
    end
  end
end

Falcon's production architecture: a master process forks one worker per CPU core, and each worker runs an event loop that spawns a fiber per connection

That task.async do spawns a fiber per connection. When a fiber hits I/O, the scheduler suspends it and runs the next ready one. The kernel’s epoll/kqueue does the waiting; Ruby just walks the ready set.

That’s the answer to the workload Puma struggles with. Ten thousand WebSocket clients each sending a heartbeat every 30 seconds? Falcon yawns. Long-polling, SSE, anything mostly-idle – idle fibers cost almost nothing.

Where Falcon stops paying off is CPU-bound work. Forking gives parallelism across cores, but inside a worker, only one fiber executes Ruby at a time – a single heavy request can’t fan out. Puma’s thread pool can round-robin CPU-mixed requests; Falcon’s reactor can’t.

The other thing Falcon asks of you is awareness. A blocking gem whose C extension doesn’t hook into the fiber scheduler freezes the whole worker – every other fiber stops with it. mysql2 used to bite people here before it grew scheduler-aware patches. The ecosystem is mostly fixed now, but the failure mode is real.

Three shapes, side by side

That’s the tour. Three servers, three primitives, three completely different shapes:

Server	Concurrency unit	How it serves more than one request	What it solves
Pitchfork	OS process	The kernel runs N copies of Ruby on N cores. Each request lives in its own isolated VM with its own GVL.	CPU parallelism with hard isolation between requests
Puma	OS thread	A pool of threads inside one process. The GVL serializes Ruby execution but releases on I/O, so threads can wait on databases and sockets in parallel.	I/O-bound web requests – threads stack on DB, Redis, and HTTP waits while the GVL is released
Falcon	Fiber + event loop	Thousands of fibers cooperatively scheduled by a single event loop on a single OS thread. The kernel does the waiting via `epoll`/`kqueue`; Ruby just walks the ready set.	Thousands of mostly-idle long-lived connections – real-time, chat, streaming

The GVL plays a completely different role in each. Pitchfork sidesteps it – every worker is its own GVL. Puma works around it by leaning on the fact that I/O releases the lock. Falcon makes it almost irrelevant by only ever having one runnable Ruby execution at a time anyway.

Same Ruby. Same Rack. Three completely different ways to serve more than one request.

Why Ractors aren’t on this list

The fourth Ruby concurrency primitive is Ractors – the only way to run truly parallel Ruby in one process. Multiple Ractors, multiple cores, no shared GVL. They’ve been in the language since 3.0, and Ruby 4.0 finally makes them efficient and ergonomic enough to consider seriously.

On paper, Ractors should simplify a lot. Pitchfork’s whole reason for existing is the engineering needed to keep forked workers sharing memory – the warm-mold promotion, the refork dance, all of it. Ractors would give you the same in-process parallelism natively, no refork acrobatics required. So why hasn’t anyone built a Ractor-based web server?

Two reasons.

What Ractors require: strict isolation. Nothing mutable can be shared between Ractors – everything passed across has to be immutable or copied. That’s the safety guarantee that lets the GVL go away.

A Rails app is full of mutable shared state: connection pools, class-level config, caches, gem-internal singletons. Making any one of those Ractor-safe is hard. Making the whole stack safe is a rewrite of the entire app. (byroot wrote the best deep dive on this; rails/rails#51543 tracks the practical side.)

What Ractors would actually buy you: in-process parallelism. N cores running Ruby in parallel, without forking N copies of the app – CPU parallelism and memory savings, in the same package. Both wins are real. Neither pays off for a web server.

Memory savings fail on cloud pricing. Memory and CPU come bundled in fixed ratios, typically 2 to 4 GB per core. A typical Rails worker fleet uses far less memory than the cloud provider sells you with the CPU; most of the RAM in your bill is just sitting there. Memory savings only matter if they let you buy a smaller machine, and Rails apps are nowhere near that limit.

On the CPU side, Pitchfork already gives you N parallel cores via fork. Ractors would let you skip the refork acrobatics, sure – but they’d replace them with a full-app rewrite for Ractor safety. You’d be trading one engineering complexity for a much worse one.

You’d be rewriting half your app to claim savings you can’t spend, on a problem (in-process parallelism) Pitchfork already solves.

That doesn’t mean Ractors are dead. Background jobs, isolated batch work, parts of an app you can carve off and own – they fit well there. That’s what I’ll be exploring at RubyConf in November: what shifts when Ruby 4.0 makes Ractors cheap enough to use seriously, and where they start to make sense even if no one builds a Ractor-based server on top of them.

In the meantime, if you’ve been thinking of Ruby concurrency as "threads vs processes vs fibers," try thinking of it as Pitchfork vs Puma vs Falcon. The primitives are the language. The servers are the choices people actually made. That’s where the lessons are.

We want to work with you. Check out our Services page!

What Puma, Falcon, and Pitchfork teach you about Ruby concurrency

What a web server actually does

What "parallel" actually means in Ruby

Pitchfork – fork the world

Puma – thread the world

Falcon – fiber the world

Three shapes, side by side

Why Ractors aren’t on this list

Related

Share

Categories

Tags

Let's build a scalable frontend that grows with your business.

What Puma, Falcon, and Pitchfork teach you about Ruby concurrency

What a web server actually does

What "parallel" actually means in Ruby

Pitchfork – fork the world

Puma – thread the world

Falcon – fiber the world

Three shapes, side by side

Why Ractors aren’t on this list

Related

Share

Categories

Tags

Let's build a scalable frontend that grows with your business.

My First Two Weeks as an Agentic Engineer

How Far Can AI Self-Validate Rails Code?

Related Posts

Como “Seus CFPs Não São Bons” se tornou cinco aprovações na RubyConf

How “Your CFPs Aren’t Good” Became Five Approvals at RubyConf