Modeling data with DynamoDB

In this article, we will perform the modeling and queries of a simple e-commerce domain using the NoSQL database, DynamoDB, and address its main concepts. And before we start modeling data in our domain we will see what is behind the concept and process of data modeling, to give us more theoretical background in what we will do.

Data Modeling, a short walk back in time

According to the book "Database Modeling and Design," we can define the idea of data modeling as a logical component of database design. This is so important that it has been dealt with since the 1960s when Charles Bachman – American winner of the 1973 Turing Award – performed the formalization of Schema diagrams, where he made use of rectangular geometric shapes to denote record types and arrows to represent the relationship between these records.

Besides Bachman, other software engineers: Peter Chen and the duo Grady Booch and James Rumbaugh were very influential in the data modeling process. The first, in 1976, presented the entity-relationship (ER) approach, where, similarly to Bachman, he makes use of rectangles analogous to records, and the diamond shape for the representation of the various types of relationships he makes use of, which are differentiated by numbers or letters placed on the lines that connect the diamonds to the rectangles; The duo, Booch and Rumbaugh, in 1997 introduced the Unified Modeling Language (UML) that would become the standard graphical language for specifying and documenting large-scale software systems.

So, in summary, we can understand that data modeling is primarily aimed at simplicity and readability, and the use of approaches such as ER and UML is useful to capture the requirements of our domain in an understandable way, both for us, devs and to those we need to generate value, our end customer.

What are we going to model?

As mentioned before, we will model a simple e-commerce domain, consisting of 3 entities User, Order, and Order Item. But first of all, we need our domain to make questions that will help us model our entities, relationships, and their attributes. For our article, we’ll work with these three questions:

  1. What are the orders of a particular user?
  2. Which items belong to which order?
  3. Which addresses belong to a given user?

Now, with these questions, we can model more clearly our domain and elaborate, as in the Figure below, an Entity and Relationship diagram.

Figure 1
Figure 1

Modelling our data using DynamoDB

What is DynamoDB

Amazon DynamoDB is a non-relational database service (NoSQL) that provides fast, predictable performance with built-in, fully managed scalability, so you don’t have to worry about provisioning, hardware installation and configuration, replication, software patching, or cluster scaling. In addition, DynamoDB offers "encryption at rest", eliminating the operational complexities involved in protecting sensitive data.

Key concepts

Similarly to the relational model, DynamoDB is based on one or more tables. However, they do not have a fixed schema, which allows new documents with different attributes to be added without the rigidity of data normalization. Documents in a table can contain attributes of already known types, such as string, number, boolean, list/array/collection, or hashes.

Data persistence requires the specification of two important fields: Partition Key (PK) and Sort Key/Range Key (SK), this last is optional. It is worth mentioning that the PK, once defined, can no longer be changed.

Setting the PK attribute makes it possible to partition your hashed data into one or more storage nodes that Dynamo provides (Figure 2), making it easy to scale the database to growing demands by simply adding a new partition and moving the data around.

Figure 2
Figure 2

The PK can optionally operate in conjunction with the SK, forming a composite key. Using both together allows data with the same PK but different SK to be persisted in the same partition. We will see in the next section how both can work together.

Global(GSI) vs Local Secondary Indexes(LSI)

DynamoDB, being NoSQL, does not provide support for queries such as SELECT with WHERE as the following query:

SELECT * FROM Users WHERE email='username@email.com';

This SQL query is made possible by a query optimizer, which evaluates the available indexes to see if any index can fulfill the query. Is it possible to do this in DynamoDB? Yes, using the Scan operator, but it is necessary to access each item in the table to find which record has the field of interest, in this case, email. Besides that, since all the items in the table are being accessed, AWS will charge as if the request is for the entire table and not just the items returned, thus making the service more expensive to use.

To work around this, DynamoDB supports two types of indexes: the Global Secondary Index (GSI) and the Local Secondary Index (LSI), which can be defined on the creation table moment and after that can not be modified.

The GSI is an index type that has a PK and an optional sort key that are different from the primary key of the main table. It is called "global" because queries to this index can access data in separate partitions of the main table. You can think of this index as a different table (which increases the cost of storage) that contains attributes based on the main table, as you can see in Figure 3 below.

Figure 3
Figure 3

The LSI, illustrated in Figure 4 below, on the other hand, is a type of index that must have the same PK but a different SK as the main table. It’s called "local" because each partition of an LSI is bounded by the same PK value as the base table. This allows you to query the data with a different sort order for the specified SK attribute.

Figure 4
Figure 4

An LSI allows the query operation to retrieve multiple items with the same PK value but different sort key values and one item with a specific PK value and SK value.

Taking our Diagram off the paper

To implement our diagram, we’ll use TypeScript alongside the Dynamoose library (it was inspired by the JS/TS lib, mongoose aimed at the MongoDB competitor). This Dynamoose lib is a wrapper for the official JS/TS SDK @aws-sdk/client-dynamodb that makes it easy to build schemas and query our database.

Let’s first define our User scheme as follows:

export const User = dynamoose.model(
  "User",
  new dynamoose.Schema({
    id: { type: String, hashKey: true, default: () => crypto.randomUUID() },
    SK: {
      type: String,
      rangeKey: true,
      required: true,
      default: () => new Date().toISOString(),
    },
    addresses: {
      type: Array,
      schema: [
        {
          type: Object,
          schema: {
            street: String,
            postalCode: String,
            state: String,
          },
        },
      ],
    },
    fullName: String,
    email: String,
    birthDate: Date,
  })
);

If you are familiar with Mongoose, you may notice a similar syntax in our schema definition. In it, we defined the basic attributes for a User: fullName, email, birthDate, and addresses which is a collection/array of objects containing the fields street, postalCode, and state. In addition to these attributes, two essential ones were also defined: id and SK.

The id represents the PK which, at the implementation level, we call the hashKey. It has the role of defining on which partition of the Dynamo the record of a new User will be.

The SK will define where, in the partition, the new user document will be stored. In the specific case of the User schema, the value of SK assumes an automatic value of the current date.

Now let’s set the schema to Order:

export const Order = dynamoose.model("Order", {
  id: { type: String, hashKey: true, default: () => crypto.randomUUID() },
    // USER#<id>
    SK: {
    type: String,
    rangeKey: true,
    index: { rangeKey: "id", type: "global", name: "order_index" },
  },
  status: String,
});

We have three attributes: id, order status (which can be PENDING and COMPLETED), and SK of type global, which will serve to store the reference to the user who owns the order. We will use this format: USER#.

Finally, to finish our schema modeling, we will define the last OrderItem as follows:

export const OrderItem = dynamoose.model(
  "OrderItem",
  {
    id: { type: String, hashKey: true, default: () => crypto.randomUUID() },
    SK: {
      // ORDER#<order_id>
      type: String,
      rangeKey: true,
      index: { rangeKey: "id", type: "global", name: "item_order_index" },
    },
    productName: String,
    price: Number,
    quantity: Number,
  },
  { update: true }
);

In this schema, we have simple attributes related to a certain order item, such as productName, price, and quantity. Besides the SK, of type global, which will store the order reference that belongs to the item. And for that, we will use this format: ORDER#.

And now, to close with a golden key, thanks to the easy-to-use dynamoose API, we will define our single table that will store data following the schemas we modeled earlier.

new dynamoose.Table("SingleTable", [User, Order, OrderItem]);

At first glance, it may seem strange to have all records in a single table, even more so if you came from the SQL world. But this approach brings some benefits since all the records are in the same place, the query cost is lower compared to traditional relational databases (Postgres, MySQL, etc.) since there is no use of operators like JOIN, but of course, this advantage is not a silver bullet and you should always consider the best approach for your project and team. This blog post covers some of the downsides of using this approach: The What, Why, and When of Single-Table Design with DynamoDB

Answering our questions

Assuming you already have the local version of Dynamo installed on your machine and a Node project with the dynamoose lib installed. If you want to create new records on your machine, just access this Gist and run it.

Once our database is populated, as shown in the figure below, we can then perform queries based on the questions that were presented at the beginning of the reading.

To perform our queries we will also use the Dynamoose API to make our work easier.

  1. What are the orders of a particular user?
await Order.query("SK").eq("USER#c72b41f5-32c0-429b-8b82-942709f53cc9")
    .using("order_index")
    .exec();

To get the orders for a user, we have to search for the SK that was defined as GSI, USER#..., using the index we defined: order_index.

  1. Which items belong to which order?
await OrderItem.query("SK").eq("ORDER#g62b41f5-32c0-429b-8f82-342709f53cc9")
    .using("item_order_index")
    .exec();

Like in the first question, to get the items of a certain order, we have to search for the SK defined as GSI, ORDER#..., using the index we defined: item_order_index.

  1. Which addresses belong to a given user?
await User.query("id").eq("c72b41f5-32c0-429b-8b82-942709f53cc9").exec();

The answer to this question is the simplest, we just perform a query for the PK, in this case, the id of our desired user.

Closing remarks

After reading this, we can have a good initial understanding of how DynamoDB can be useful for projects that do not want to worry about database configurations and use a ready-made Amazon infrastructure that, with a few clicks, already provides the entire environment for saving our information out-of-the-box. Furthermore, Dynamo can be recommended for domains that have a scope of entities with few relationships, even more so if we follow the single table approach. However, if our project domain has many relationships and several subdomains, even though Dynamo allows us to have more than one table, using a traditional SQL database is the best option since there is better data organization and better query performance compared to DynamoDB.

References

We want to work with you. Check out our "What We Do" section!