Optimizing Rails API JSON generation

Lower your JSON rendering time with the help of PostgreSQL

JSON on Rails

JSON has become a ubiquitous data transfer format for web APIs. Whether you’re working on a third-party integration, or endpoints to serve data for rich clients (single page application, mobile application, or whatnot), chances are this data is flowing through the wire in a JSON payload.

Ruby’s dynamic nature makes hash maps and arrays the perfect structures to represent JSON data within the virtual machine. Their literal representations also make it very straightforward to write nested data common in JSON objects. So much so that Ruby’s standard library includes powerful tools to parse JSON documents into their in-memory data structures and the other way around.

require 'json'

user = {
  full_name: "Dexter Morgan",
  aliases: [ "The Bay Harbor Butcher" ],
  profession: "Blood Splatter Analyst",
  relatives: [
    { name: "Harry Morgan", relationship: "Adoptive father" },
    { name: "Debra Morgan", relationship: "Adoptive sister" },
    { name: "Rita Morgan", relationship: "Spouse" }
  ]
}
user_json = user.to_json 
# => converts the user Hash into a String containing the JSON document
user_from_json = JSON.parse(user_json, symbolize_names: true) 
# => converts the JSON document string into a Ruby object made out of Arrays, Hashes, and basic types.
user_from_json == user
# => true

Rails builds on top of this idea by implementing the #to_json method to ActiveRecord classes, which more often than not is where the data you’re sending through the wire comes from.

# models/person.rb
class Person
  has_many :pets
end

# models/pet.rb
class Pet
  belongs_to :owner, class_name: 'Person'
end

Person.find_by(name: 'Jon Snow').to_json
# => converts the model attributes to a JSON string, e.g.
# "{ \"id\": 1, \"full_name\": \"Json Snow\", \"created_at\": \"...\", \"updated_at\": \"...\" }"
Person.find_by(name: 'Jon Snow').as_json
# => returns the model attributes as a Hash object, e.g.
# { "id" => 1, "full_name" => "Json Snow", "created_at" => "...", updated_at => "..." }"

Things start to get more complicated when we need to compose data originating from multiple associated models. For that reason, Rails also provides tools to create involved JSON documents using view templates, similar to what we do with .html.erb files.

# controllers/people_controller.rb
class PeopleController < ApplicationController
  def index
    @people = Person.all
  end
end
# views/people/index.json.jbuilder
json.array! @people do |person|
  json.id person.id
  json.name person.full_name
  json.pets person.pets do |pet| # I'm aware of the N+1 problem here. Bear with me.
    json.id pet.id
    json.name pet.name
  end
end

Pretty simple, right?

The Issue

So you implemented an endpoint that sends a list of all people from the database exactly like the previous example in a couple of minutes, deployed it to production, and it works just fine. After some time, though, you notice the server is taking a lot more time to reply to that request and is timing out. You dig a little bit and find out that your database now has hundreds of thousands of people and pet records – which is hopefully a sign that your service is successfully attracting users.

Most programmers’ intuition will tell them that the database is taking too long to answer the queries, and they’d be partially right. In the view template above, I already mentioned an example of the N+1 problem, easily solvable by changing the controller code to @people = Person.includes(:pets).all, and that’d be correct.

For demonstration purposes, I created the application above, populated the database with 100.000 people, each associated with one to four pets at random. Without the includes call in the controller, the server took 4+ minutes to reply. After adding it, the time dropped to ~13 seconds. Way better, but 13 seconds is still too much, especially considering how simple the JSON structure is. Imagine if I had to incorporate details of previous appointments the pet had with the doctor or the history of orders the customer made in the pet shop. That would mean even more queries would hit the database, growing further the response time.

Optimizing the queries is something you should do anyway, but it won’t solve the real issue in this case. Let’s take a look at the rails log for the request that took 13 seconds:

Rendered people/index.json.jbuilder (Duration: 13169.9ms | Allocations: 12435170)
Completed 200 OK in 13175ms (Views: 13171.2ms | ActiveRecord: 1012.5ms | Allocations: 12436714)

Since we’re using the includes method, ActiveRecord makes only two queries, and the database took about 1s (ActiveRecord: 1012.5ms) to reply. The log message provides other useful information: most of the time was consumed rendering the index.json.builder template, and a bazillion allocations happened. But why does Rails report the view rendering duration as almost the entire request time? If rendering took 13s and fetching the data took 1s, shouldn’t the total time be close to 14s?

How do ActiveRecord and rendering work?

When you write @people = Person.all in your controller, ActiveRecord won’t touch the database just yet. This method call produces an instance of the ActiveRecord::Relation class, which describes the query you want to perform, and it’ll be realized only when the code tries to access data in the @people object. In this example, that’ll occur when iterating over @people in the index.json.jbuilder template. Rails starts rendering the template and needs to wait until the data arrives from the database, so ActiveRecord’s time is included in the view time.

To better understand where time is being spent, we’ll update the controller to @people = Person.includes(:pets).all.to_a. Now we’re realizing the query early, still inside the controller, and passing an array of models to the renderer. After this change, the times logged are:

Rendered people/index.json.jbuilder (Duration: 7896.5ms | Allocations: 9417904)
Completed 200 OK in 13279ms (Views: 7897.6ms | ActiveRecord: 1105.6ms | Allocations: 12436005)

The request duration is still in the same ballpark as before, as is the ActiveRecord time, which makes sense, and the Views rendering time shrunk considerably. It’s still very high, but at least it’s more realistic. But wait! This doesn’t add up. If Rails is taking ~8s to render the view and ~1s to load the data, what’s adding the extra ~4s?

ActiveRecord does a lot more than just retrieving data from the database. It creates the SQL query based on the ActiveRecord::Relation for the appropriate database driver, orchestrates the acquisition of a connection from the pool, worries about SQL injection, and several other operations intertwined with glue code. Running the following commands in a Rails console, we can roughly mimic and benchmark the ActiveRecord behavior for this controller:

require 'benchmark'

Benchmark.measure do
  conn = ActiveRecord::Base.connection       # acquires a connection from the pool
  people_sql = Person.all.to_sql             # creates the SQL query ...
  people = conn.execute(people_sql)          # ... and executes it
  ids = people.map {|p| p["id"]}             # iterates over the results, collecting ids
  pets_sql = Pet.where(owner_id: ids).to_sql # creates the second SQL query ...
  conn.execute(pets_sql)                     # ... and executes it
end.real

This snippet results in something between 1s and 1.2s, matching what Rails reported as ActiveRecord time before. There’s an essential step missing in it, though: instantiating model objects.

The instruction conn.execute(people_sql) will return what is called a result set – an array of hashes in the format [{"id": 1, "full_name": "John Doe", "created_at": ..., "updated_at": ...}, ...] – which will then be used to instantiate the model objects. The same happens for the conn.execute(pets_sql) call and lastly ActiveRecord will iterate over all Person instances created to associate the proper Pet instances.

This overly long section’s goal was to show you that reading the database’s records is not the most expensive task this controller performs; transforming it is. The driver converts the database reply into a result set, ActiveRecord converts them into model instances, which are converted into a JSON document by the view template. The last two conversions make up the bulk of the request time. We could employ several tactics to make those operations faster. Replacing Jbuilder with some other template rendering library, adding some cache, avoiding model instantiation by using the pluck method, to name a few — all of which have their merits — but I’m going to present a different solution that aims to remove as many data transformations from the application code as possible.

PostgreSQL to the rescue

Our controller job is to get some data from the database and deliver it as a JSON document. Wouldn’t it be nice if the database could reply with the data we need already formatted as JSON? Of course, it would and, of course, it can. I’ll use PostgreSQL for the demonstrations from this point on, but it’s possible to transpose the same ideas to several modern relational databases.

Since version 9.2, the PostgreSQL database supports the JSON data type. The most apparent use for it is to store JSON documents in a column of that type:

create table "requests"(
    "id" serial primary key,
    "method" varchar,
    "path" varchar,
    "payload" json
);
insert into "requests" ("method", "path", "payload")
values ('POST', '/people', '{"full_name": "Jane Doe"}'::json);

select payload from "requests" where id = 1;
/*           payload           */
/* --------------------------- */
/*  {"full_name": "Jane Doe"}  */

 select payload->'full_name' as full_name from "requests" where id = 1;
/*   full_name  */
/* ------------ */
/*  "Jane Doe"  */

Rails integrates fully with Postgre’s JSON data type. With the default model definition for the table above, we’re ready to manipulate the data.

class Request < ApplicationRecord; end

req = Request.last

req.payload # => { "full_name" => "Jane Doe" }
req.payload["address"] = "0 Fools Street"
req.save # => saves the Hash back as JSON in the payload column

However, you don’t need a JSON column in order to create JSON objects in PostgreSQL queries:

/* you could create the JSON object by concatenating string, but that would be lame and error prone */
select '{"name": "Dexter Morgan"}'::json; 
/* PostgreSQL provides several functions to create JSON data. Some examples are */
select row_to_json((1,"John"));
select json_build_object("full_name", "Dexter Morgan");
select json_build_array(1, 2, 3, 4)

With that in mind, let’s create a SQL query that returns people records as JSON objects:

select row_to_json(p.*) as doc
from people as p;
                                                             doc
----------------------------------------------------------------------------------------------------------------------------
 {"id":1,"full_name":"Jon Snow","created_at":"2021-02-18T01:21:28.139544","updated_at":"2021-02-18T01:21:28.139544"}
 {"id":2,"full_name":"April O'Neil","created_at":"2021-02-18T01:21:28.139544","updated_at":"2021-02-18T01:21:28.139544"}
 {"id":3,"full_name":"Dooley","created_at":"2021-02-18T01:21:28.139544","updated_at":"2021-02-18T01:21:28.139544"}
 {"id":4,"full_name":"Will Rodman","created_at":"2021-02-18T01:21:28.139544","updated_at":"2021-02-18T01:21:28.139544"}
 {"id":5,"full_name":"Jennifer Grogan","created_at":"2021-02-18T01:21:28.139544","updated_at":"2021-02-18T01:21:28.139544"}
 ...

That works smoothly, but looking at the original Jbuilder template, we notice we only need the id and full_name columns, and the latter should be called name instead. I could write a subselect with a projection filtering and renaming the column to accomplish this (I’ll leave that as an exercise to you). Instead, I’ll use the json_build_object function.

 select
     json_build_object(
        'id', p.id,
        'name', p.full_name
     ) as doc
from people as p;
                   doc
----------------------------------------
 {"id" : 1, "name" : "Jon Snow"}
 {"id" : 2, "name" : "April O'Neil"}
 {"id" : 3, "name" : "Dooley"}
 {"id" : 4, "name" : "Will Rodman"}
 {"id" : 5, "name" : "Jennifer Grogan"}
 ...

We are almost there! Now we need to attach, somehow, a list of the associated pets to each person. That’s where we introduce some joins and aggregations.

select
    json_build_object(
        'id' , people.id, 
        'name', people.full_name, 
        'pets', array_to_json(
            array_agg(
                json_build_object(    
                    'id', pets.id, 
                    'name', pets.name
                )
            )
        )
    ) as doc 
from people 
left join pets on pets.owner_id = people.id
group by people.id;

Finally, we have the response in the right format:

                                                                                          doc
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 {"id" : 1, "name" : "Jon Snow", "pets" : [{"id" : 1, "name" : "Ghost"}]}
 {"id" : 2, "name" : "April O'Neil", "pets" : [{"id" : 2, "name" : "Leonardo"},{"id" : 3, "name" : "Michelangelo"},{"id" : 5, "name" : "Raphael"},{"id" : 4, "name" : "Donatello"}]}
 {"id" : 3, "name" : "Dooley", "pets" : [{"id" : 6, "name" : "Jerry Lee"}]}
 {"id" : 4, "name" : "Will Rodman", "pets" : [{"id" : 7, "name" : "Caesar"}]}
 {"id" : 5, "name" : "Jennifer Grogan", "pets" : [{"id" : 8, "name" : "Marley"}]}
 ...

And we are done! Well, not quite.

Connecting the query to Rails

We have a SQL query that delivers the rows organized as we require, but we still need to connect this to our Rails application. One could suggest directly slapping this entire SQL as a String into some Ruby file (or maybe a .sql file read by such class) and perform the raw query using ActiveRecord. I think there’s a better way.

One beauty of ActiveRecord is that it examines the database schema to define the fields accessible by the model classes (some would argue that this is one of its downsides, but that’s a whole different topic). All we need is a table with the previous query result and a new Rails model pointing to it. It doesn’t even need to be an actual table; anything that looks like one – a relation in relational database parlance – will do. Let’s turn this query into a view then, so we don’t need to worry about refreshing the results whenever we add or update something in the people and pets tables.

create view json_people as
select
    people.id as person_id, /* this is going to be useful later */
    json_build_object(
        'id' , people.id,
        'name', people.full_name,
        'pets', array_to_json(
            array_agg(
                json_build_object(
                    'id', pets.id,
                    'name', pets.name
                )
            )
        )
    ) as doc
from people
left join pets on pets.owner_id = people.id
group by people.id;

Rails convention over configuration philosophy allows us to quickly connect a model to this view by naming the class appropriately.

class JsonPerson < ApplicationRecord; end

JsonPerson.first.doc
# => {"id"=>1, "name"=>"Jon Snow", "pets"=>[{"id"=>1, "name"=>"Ghost"}]}

Neat! With just that, we can access the doc column from the view as a JSON representation built of hashes and arrays. Rewriting the controller action to make use of that, we get:

def index
  people = JsonPerson.all.to_a
  render json: people
end

Please note that it now renders the JSON calling the render method directly from the action instead of using a Jbuilder template. It’s safe to delete the template altogether. Did this change bring us any performance gain?

Evaluating the performance and further improving it

Completed 200 OK in 6223ms (Views: 6221.1ms | ActiveRecord: 501.9ms | Allocations: 11204662)

We’re down to ~6s! Less than 50% of the original request time (I’m not even counting the N+1 version taking more than 4 minutes; that was just silly). There are still a lot of allocations happening, though. That’s the hint from where we’ll shave off a lot more time.

You see, the database has the notion of a JSON data type. It knows that the doc column is of that type. However, at the transport level, JSON is always serialized as text. The ActiveRecord database driver will receive the data as a String, but since it reads the database schema, it also knows the column is supposed to be a JSON and converts it to its Ruby representation. Finally, the controller transforms this representation back into a String to send in the HTTP response.

Hence the high number of allocations. The driver allocates a result set with at least as many objects as the number of records returned by the query. Then, ActiveRecord instantiates an even more significant number of hashes and arrays.

Adding a cast to the view query, changing the column type to text or varchar, is a possible solution, yet it’s useful to have the data as a structured object in Rails. Besides, we still want to clump all objects together into a list of JSON objects, not a list of Strings representing JSON objects. The idea is to tell the database to merge all rows into a single JSON array and tell ActiveRecord to treat the entire thing as text. I’ll do that by declaring a new method that makes the mentioned cast in the JsonPerson model.

# models/json_person.rb
def self.to_json_array
  projection = Arel.sql('array_to_json(array_agg(json_people.doc))::text as arr_text')
  pluck(projection).first
end

Using strings to build SQL queries is something I usually oppose. One could use Arel to construct this portion instead, but I don’t want to overcomplicate the example.

Be aware that this method returns a single result containing a string with the whole JSON we need to send, not Ruby Hashes or Arrays. Because of that, we should update the controller action to use render :plain and ensure we pass in "application/json" as the content-type option.

# controllers/people_controller.rb
def index
  prople = JsonPerson.to_json_array
  render plain: people, content_type: 'application/json'
end

The final result

Time to see how it fares against the previous iterations of this controller.

Rendered text template (Duration: 0.1ms | Allocations: 26)
Completed 200 OK in 566ms (Views: 0.4ms | ActiveRecord: 562.1ms | Allocations: 1008

Yes, I swear I’m not lying. In about half a second, this request gives us the same 15MB of JSON as the original request that took 13s, without any caching, recreating it from live data in the database on the fly. Ain’t that cool?

Being the observant reader you are, you might have already noticed that this solution is not very flexible. Right now, it always includes all rows present in the json_people table in the resulting JSON array. What if we need to fetch only the customers named “Jon”? That’s the reason I included the person_id column in the json_people view. From ActiveRecord’s perspective, this “table” works like any other table, and naming the column person_id enables us to create an association to the people table with minimal configuration.

# models/json_person.rb
class JsonPerson < ApplicationRecord
  belongs_to :person

  scope :by_name, lambda { |name|
    joins(:person)
      .where('people.full_name ilike ?', "%#{name}%")
  }

  def self.to_json_array
    projection = Arel.sql('array_to_json(array_agg(json_people.doc))::text as arr_text')
    pluck(projection).first 
  end
end

As seen in the snippet above, adding the association comes with all the perks Rails syntax grants. Here’s the new controller action that uses the created scope:

# controllers/people_controller.rb
def index
  scope = JsonPerson.all
  scope = scope.by_name(params[:n]) if params[:n].present?
  people = scope.to_json_array

  render plain: people, content_type: 'application/json'
end

Final Thoughts

Ruby and Rails are fantastic tools for web development, but not without their shortcomings. Processing large amounts of data can quickly grow memory-intensive and very time-consuming, which for a web application can become unbearable. We usually work around it by deferring the workload to a background job or chunking the data into smaller manageable pieces (pagination), however, sometimes those options aren’t available.

The examples presented shouldn’t be interpreted as a strict guide on using the database to produce JSON data. You can enhance several things, e.g., moving pets data from json_people view into another one adding modularity and reusability, and configuring your Rails application to keep track of schema changes in a SQL file rather than a Ruby file, allowing it to ensure the views exist.

Hopefully, this article will serve as food for thought on better using the tools at your disposal.

We are hiring new talents. Do you want to work with us? become@codeminer42.com