JSON has become a ubiquitous data transfer format for web APIs. Whether you’re working on a third-party integration, or endpoints to serve data for rich clients (single page application, mobile application, or whatnot), chances are this data is flowing through the wire in a JSON payload.
Ruby’s dynamic nature makes hash maps and arrays the perfect structures to represent JSON data within the virtual machine. Their literal representations also make it very straightforward to write nested data common in JSON objects. So much so that Ruby’s standard library includes powerful tools to parse JSON documents into their in-memory data structures and the other way around.
require 'json'
user = {
full_name: "Dexter Morgan",
aliases: [ "The Bay Harbor Butcher" ],
profession: "Blood Splatter Analyst",
relatives: [
{ name: "Harry Morgan", relationship: "Adoptive father" },
{ name: "Debra Morgan", relationship: "Adoptive sister" },
{ name: "Rita Morgan", relationship: "Spouse" }
]
}
user_json = user.to_json
# => converts the user Hash into a String containing the JSON document
user_from_json = JSON.parse(user_json, symbolize_names: true)
# => converts the JSON document string into a Ruby object made out of Arrays, Hashes, and basic types.
user_from_json == user
# => true
Rails builds on top of this idea by implementing the #to_json
method to ActiveRecord classes, which more often than not is where the data you’re sending through the wire comes from.
# models/person.rb
class Person
has_many :pets
end
# models/pet.rb
class Pet
belongs_to :owner, class_name: 'Person'
end
Person.find_by(name: 'Jon Snow').to_json
# => converts the model attributes to a JSON string, e.g.
# "{ \"id\": 1, \"full_name\": \"Json Snow\", \"created_at\": \"...\", \"updated_at\": \"...\" }"
Person.find_by(name: 'Jon Snow').as_json
# => returns the model attributes as a Hash object, e.g.
# { "id" => 1, "full_name" => "Json Snow", "created_at" => "...", updated_at => "..." }"
Things start to get more complicated when we need to compose data originating from multiple associated models. For that reason, Rails also provides tools to create involved JSON documents using view templates, similar to what we do with .html.erb
files.
# controllers/people_controller.rb
class PeopleController < ApplicationController
def index
@people = Person.all
end
end
# views/people/index.json.jbuilder
json.array! @people do |person|
json.id person.id
json.name person.full_name
json.pets person.pets do |pet| # I'm aware of the N+1 problem here. Bear with me.
json.id pet.id
json.name pet.name
end
end
Pretty simple, right?
The Issue
So you implemented an endpoint that sends a list of all people from the database exactly like the previous example in a couple of minutes, deployed it to production, and it works just fine. After some time, though, you notice the server is taking a lot more time to reply to that request and is timing out. You dig a little bit and find out that your database now has hundreds of thousands of people and pet records – which is hopefully a sign that your service is successfully attracting users.
Most programmers’ intuition will tell them that the database is taking too long to answer the queries, and they’d be partially right. In the view template above, I already mentioned an example of the N+1 problem, easily solvable by changing the controller code to @people = Person.includes(:pets).all
, and that’d be correct.
For demonstration purposes, I created the application above, populated the database with 100.000 people, each associated with one to four pets at random. Without the includes
call in the controller, the server took 4+ minutes to reply. After adding it, the time dropped to ~13 seconds. Way better, but 13 seconds is still too much, especially considering how simple the JSON structure is. Imagine if I had to incorporate details of previous appointments the pet had with the doctor or the history of orders the customer made in the pet shop. That would mean even more queries would hit the database, growing further the response time.
Optimizing the queries is something you should do anyway, but it won’t solve the real issue in this case. Let’s take a look at the rails log for the request that took 13 seconds:
Rendered people/index.json.jbuilder (Duration: 13169.9ms | Allocations: 12435170)
Completed 200 OK in 13175ms (Views: 13171.2ms | ActiveRecord: 1012.5ms | Allocations: 12436714)
Since we’re using the includes
method, ActiveRecord
makes only two queries, and the database took about 1s (ActiveRecord: 1012.5ms) to reply. The log message provides other useful information: most of the time was consumed rendering the index.json.builder
template, and a bazillion allocations happened. But why does Rails report the view rendering duration as almost the entire request time? If rendering took 13s and fetching the data took 1s, shouldn’t the total time be close to 14s?
How do ActiveRecord and rendering work?
When you write @people = Person.all
in your controller, ActiveRecord won’t touch the database just yet. This method call produces an instance of the ActiveRecord::Relation
class, which describes the query you want to perform, and it’ll be realized only when the code tries to access data in the @people
object. In this example, that’ll occur when iterating over @people
in the index.json.jbuilder
template. Rails starts rendering the template and needs to wait until the data arrives from the database, so ActiveRecord’s time is included in the view time.
To better understand where time is being spent, we’ll update the controller to @people = Person.includes(:pets).all.to_a
. Now we’re realizing the query early, still inside the controller, and passing an array of models to the renderer. After this change, the times logged are:
Rendered people/index.json.jbuilder (Duration: 7896.5ms | Allocations: 9417904)
Completed 200 OK in 13279ms (Views: 7897.6ms | ActiveRecord: 1105.6ms | Allocations: 12436005)
The request duration is still in the same ballpark as before, as is the ActiveRecord time, which makes sense, and the Views rendering time shrunk considerably. It’s still very high, but at least it’s more realistic. But wait! This doesn’t add up. If Rails is taking ~8s to render the view and ~1s to load the data, what’s adding the extra ~4s?
ActiveRecord does a lot more than just retrieving data from the database. It creates the SQL query based on the ActiveRecord::Relation
for the appropriate database driver, orchestrates the acquisition of a connection from the pool, worries about SQL injection, and several other operations intertwined with glue code. Running the following commands in a Rails console, we can roughly mimic and benchmark the ActiveRecord behavior for this controller:
require 'benchmark'
Benchmark.measure do
conn = ActiveRecord::Base.connection # acquires a connection from the pool
people_sql = Person.all.to_sql # creates the SQL query ...
people = conn.execute(people_sql) # ... and executes it
ids = people.map {|p| p["id"]} # iterates over the results, collecting ids
pets_sql = Pet.where(owner_id: ids).to_sql # creates the second SQL query ...
conn.execute(pets_sql) # ... and executes it
end.real
This snippet results in something between 1s and 1.2s, matching what Rails reported as ActiveRecord time before. There’s an essential step missing in it, though: instantiating model objects.
The instruction conn.execute(people_sql)
will return what is called a result set – an array of hashes in the format [{"id": 1, "full_name": "John Doe", "created_at": ..., "updated_at": ...}, ...]
– which will then be used to instantiate the model objects. The same happens for the conn.execute(pets_sql)
call and lastly ActiveRecord will iterate over all Person
instances created to associate the proper Pet
instances.
This overly long section’s goal was to show you that reading the database’s records is not the most expensive task this controller performs; transforming it is. The driver converts the database reply into a result set, ActiveRecord converts them into model instances, which are converted into a JSON document by the view template. The last two conversions make up the bulk of the request time. We could employ several tactics to make those operations faster. Replacing Jbuilder with some other template rendering library, adding some cache, avoiding model instantiation by using the pluck
method, to name a few — all of which have their merits — but I’m going to present a different solution that aims to remove as many data transformations from the application code as possible.
PostgreSQL to the rescue
Our controller job is to get some data from the database and deliver it as a JSON document. Wouldn’t it be nice if the database could reply with the data we need already formatted as JSON? Of course, it would and, of course, it can. I’ll use PostgreSQL
for the demonstrations from this point on, but it’s possible to transpose the same ideas to several modern relational databases.
Since version 9.2, the PostgreSQL
database supports the JSON
data type. The most apparent use for it is to store JSON documents in a column of that type:
create table "requests"(
"id" serial primary key,
"method" varchar,
"path" varchar,
"payload" json
);
insert into "requests" ("method", "path", "payload")
values ('POST', '/people', '{"full_name": "Jane Doe"}'::json);
select payload from "requests" where id = 1;
/* payload */
/* --------------------------- */
/* {"full_name": "Jane Doe"} */
select payload->'full_name' as full_name from "requests" where id = 1;
/* full_name */
/* ------------ */
/* "Jane Doe" */
Rails integrates fully with Postgre’s JSON data type. With the default model definition for the table above, we’re ready to manipulate the data.
class Request < ApplicationRecord; end
req = Request.last
req.payload # => { "full_name" => "Jane Doe" }
req.payload["address"] = "0 Fools Street"
req.save # => saves the Hash back as JSON in the payload column
However, you don’t need a JSON column in order to create JSON objects in PostgreSQL queries:
/* you could create the JSON object by concatenating string, but that would be lame and error prone */
select '{"name": "Dexter Morgan"}'::json;
/* PostgreSQL provides several functions to create JSON data. Some examples are */
select row_to_json((1,"John"));
select json_build_object("full_name", "Dexter Morgan");
select json_build_array(1, 2, 3, 4)
With that in mind, let’s create a SQL query that returns people records as JSON objects:
select row_to_json(p.*) as doc
from people as p;
doc
----------------------------------------------------------------------------------------------------------------------------
{"id":1,"full_name":"Jon Snow","created_at":"2021-02-18T01:21:28.139544","updated_at":"2021-02-18T01:21:28.139544"}
{"id":2,"full_name":"April O'Neil","created_at":"2021-02-18T01:21:28.139544","updated_at":"2021-02-18T01:21:28.139544"}
{"id":3,"full_name":"Dooley","created_at":"2021-02-18T01:21:28.139544","updated_at":"2021-02-18T01:21:28.139544"}
{"id":4,"full_name":"Will Rodman","created_at":"2021-02-18T01:21:28.139544","updated_at":"2021-02-18T01:21:28.139544"}
{"id":5,"full_name":"Jennifer Grogan","created_at":"2021-02-18T01:21:28.139544","updated_at":"2021-02-18T01:21:28.139544"}
...
That works smoothly, but looking at the original Jbuilder template, we notice we only need the id
and full_name
columns, and the latter should be called name
instead. I could write a subselect with a projection filtering and renaming the column to accomplish this (I’ll leave that as an exercise to you). Instead, I’ll use the json_build_object
function.
select
json_build_object(
'id', p.id,
'name', p.full_name
) as doc
from people as p;
doc
----------------------------------------
{"id" : 1, "name" : "Jon Snow"}
{"id" : 2, "name" : "April O'Neil"}
{"id" : 3, "name" : "Dooley"}
{"id" : 4, "name" : "Will Rodman"}
{"id" : 5, "name" : "Jennifer Grogan"}
...
We are almost there! Now we need to attach, somehow, a list of the associated pets to each person. That’s where we introduce some joins and aggregations.
select
json_build_object(
'id' , people.id,
'name', people.full_name,
'pets', array_to_json(
array_agg(
json_build_object(
'id', pets.id,
'name', pets.name
)
)
)
) as doc
from people
left join pets on pets.owner_id = people.id
group by people.id;
Finally, we have the response in the right format:
doc
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{"id" : 1, "name" : "Jon Snow", "pets" : [{"id" : 1, "name" : "Ghost"}]}
{"id" : 2, "name" : "April O'Neil", "pets" : [{"id" : 2, "name" : "Leonardo"},{"id" : 3, "name" : "Michelangelo"},{"id" : 5, "name" : "Raphael"},{"id" : 4, "name" : "Donatello"}]}
{"id" : 3, "name" : "Dooley", "pets" : [{"id" : 6, "name" : "Jerry Lee"}]}
{"id" : 4, "name" : "Will Rodman", "pets" : [{"id" : 7, "name" : "Caesar"}]}
{"id" : 5, "name" : "Jennifer Grogan", "pets" : [{"id" : 8, "name" : "Marley"}]}
...
And we are done! Well, not quite.
Connecting the query to Rails
We have a SQL query that delivers the rows organized as we require, but we still need to connect this to our Rails application. One could suggest directly slapping this entire SQL as a String into some Ruby file (or maybe a .sql file read by such class) and perform the raw query using ActiveRecord. I think there’s a better way.
One beauty of ActiveRecord is that it examines the database schema to define the fields accessible by the model classes (some would argue that this is one of its downsides, but that’s a whole different topic). All we need is a table with the previous query result and a new Rails model pointing to it. It doesn’t even need to be an actual table; anything that looks like one – a relation in relational database parlance – will do. Let’s turn this query into a view then, so we don’t need to worry about refreshing the results whenever we add or update something in the people
and pets
tables.
create view json_people as
select
people.id as person_id, /* this is going to be useful later */
json_build_object(
'id' , people.id,
'name', people.full_name,
'pets', array_to_json(
array_agg(
json_build_object(
'id', pets.id,
'name', pets.name
)
)
)
) as doc
from people
left join pets on pets.owner_id = people.id
group by people.id;
Rails convention over configuration philosophy allows us to quickly connect a model to this view by naming the class appropriately.
class JsonPerson < ApplicationRecord; end
JsonPerson.first.doc
# => {"id"=>1, "name"=>"Jon Snow", "pets"=>[{"id"=>1, "name"=>"Ghost"}]}
Neat! With just that, we can access the doc
column from the view as a JSON representation built of hashes and arrays. Rewriting the controller action to make use of that, we get:
def index
people = JsonPerson.all.to_a
render json: people
end
Please note that it now renders the JSON calling the render
method directly from the action instead of using a Jbuilder template. It’s safe to delete the template altogether. Did this change bring us any performance gain?
Evaluating the performance and further improving it
Completed 200 OK in 6223ms (Views: 6221.1ms | ActiveRecord: 501.9ms | Allocations: 11204662)
We’re down to ~6s! Less than 50% of the original request time (I’m not even counting the N+1 version taking more than 4 minutes; that was just silly). There are still a lot of allocations happening, though. That’s the hint from where we’ll shave off a lot more time.
You see, the database has the notion of a JSON data type. It knows that the doc
column is of that type. However, at the transport level, JSON is always serialized as text. The ActiveRecord database driver will receive the data as a String, but since it reads the database schema, it also knows the column is supposed to be a JSON and converts it to its Ruby representation. Finally, the controller transforms this representation back into a String to send in the HTTP response.
Hence the high number of allocations. The driver allocates a result set with at least as many objects as the number of records returned by the query. Then, ActiveRecord instantiates an even more significant number of hashes and arrays.
Adding a cast to the view query, changing the column type to text or varchar, is a possible solution, yet it’s useful to have the data as a structured object in Rails. Besides, we still want to clump all objects together into a list of JSON objects, not a list of Strings representing JSON objects. The idea is to tell the database to merge all rows into a single JSON array and tell ActiveRecord to treat the entire thing as text. I’ll do that by declaring a new method that makes the mentioned cast in the JsonPerson
model.
# models/json_person.rb
def self.to_json_array
projection = Arel.sql('array_to_json(array_agg(json_people.doc))::text as arr_text')
pluck(projection).first
end
Using strings to build SQL queries is something I usually oppose. One could use Arel
to construct this portion instead, but I don’t want to overcomplicate the example.
Be aware that this method returns a single result containing a string with the whole JSON we need to send, not Ruby Hashes or Arrays. Because of that, we should update the controller action to use render :plain
and ensure we pass in "application/json"
as the content-type option.
# controllers/people_controller.rb
def index
prople = JsonPerson.to_json_array
render plain: people, content_type: 'application/json'
end
The final result
Time to see how it fares against the previous iterations of this controller.
Rendered text template (Duration: 0.1ms | Allocations: 26)
Completed 200 OK in 566ms (Views: 0.4ms | ActiveRecord: 562.1ms | Allocations: 1008
Yes, I swear I’m not lying. In about half a second, this request gives us the same 15MB of JSON as the original request that took 13s, without any caching, recreating it from live data in the database on the fly. Ain’t that cool?
Being the observant reader you are, you might have already noticed that this solution is not very flexible. Right now, it always includes all rows present in the MARKDOWN_HASH43ec2da63e346ba73837856ff5ee6e12MARKDOWNHASH
table in the resulting JSON array. What if we need to fetch only the customers named "Jon"_? That’s the reason I included the person_id
column in the json_people
view. From ActiveRecord’s perspective, this "table" works like any other table, and naming the column person_id
enables us to create an association to the people
table with minimal configuration.
# models/json_person.rb
class JsonPerson < ApplicationRecord
belongs_to :person
scope :by_name, lambda { |name|
joins(:person)
.where('people.full_name ilike ?', "%#{name}%")
}
def self.to_json_array
projection = Arel.sql('array_to_json(array_agg(json_people.doc))::text as arr_text')
pluck(projection).first
end
end
As seen in the snippet above, adding the association comes with all the perks Rails syntax grants. Here’s the new controller action that uses the created scope:
# controllers/people_controller.rb
def index
scope = JsonPerson.all
scope = scope.by_name(params[:n]) if params[:n].present?
people = scope.to_json_array
render plain: people, content_type: 'application/json'
end
Final Thoughts
Ruby and Rails are fantastic tools for web development, but not without their shortcomings. Processing large amounts of data can quickly grow memory-intensive and very time-consuming, which for a web application can become unbearable. We usually work around it by deferring the workload to a background job or chunking the data into smaller manageable pieces (pagination), however, sometimes those options aren’t available.
The examples presented shouldn’t be interpreted as a strict guide on using the database to produce JSON data. You can enhance several things, e.g., moving pets data from json_people
view into another one adding modularity and reusability, and configuring your Rails application to keep track of schema changes in a SQL file rather than a Ruby file, allowing it to ensure the views exist.
Hopefully, this article will serve as food for thought on better using the tools at your disposal.
We want to work with you. Check out our "What We Do" section!