Lean4: Crafting in an Uncharted Territory

Overcoming Challenges and Innovating

Armed with an unknown weapon called Lean4, I ventured into an arena known as ‘Rinha de Backends‘, a challenge to determine (for fun) the fastest API for a simple specification. At first, it was just a joke — "What if I build a backend using a language nobody uses?" — soon it turned into an odd quest. In this blog post, we will dive deeply into this quest, how I solved it, technical details, and results.

The beginning of the journey

This journey began unexpectedly, just two days before the submission deadline for the event, and it all started as a joke between me and my girlfriend about crafting a backend in some strange language like Agda or Coq. I initially dismissed it, thinking I wouldn’t have time to complete it. However, upon reconsideration, I reevaluated the idea, and then I started the journey to make an API in an odd programming language called Lean4.

What is Lean4?

In a world full of programming languages, Lean4 caught my attention with unique and powerful concepts that, although incomplete and unstable, demonstrate its capacity to handle real-world programming tasks through incredible features.

Lean4 is a cutting-edge, purely functional programming language primarily used by mathematicians to rigorously prove properties about all kinds of things that can be modeled as mathematical objects. It relates to programming because of a correspondence between mathematics and programming that is called Curry-Howard Isomorphism which asserts that constructive mathematics can be expressed in terms of programs and their types. This feature enables something that is not present in mainstream programming languages, which is the ability to manipulate types as first-class citizens; in other words, you can get types as arguments, compute types using functions, and return types as normal values, like integers, i.e.

-- This function receives a boolean and returns a Type. If the Boolean is true, it returns `String`,
-- otherwise, it returns `Int`
def IntOrString : Bool -> Type
  | true  => String
  | false => Int

-- This function returns an `IntOrString` of `x`. In it, `x` is the value of the boolean
-- we are passing as an argument. If the `x` is `true`, `IntOrString true` evaluates
-- to `String` and we need to return the String "return a string", otherwise the `IntOrString false`
-- evaluates to `Int`.
def choose : (x: Bool) -> (IntOrString x)
  | true  => "return a string"
  | false => 1

In addition to this feature, the powerful concept of "Functional But In Place" was an important rule when I chose the language. This concept enables the programmer to use mutation in a purely functional programming language through a new technique for garbage collection called "Perceus Garbage Collection". It’s almost the same idea as described in the talk "Outperforming Imperative with Pure Functional Languages" by Richard Feldman. This technique also decreases the amount of allocation the language does by using the drop-reuse optimization (if the reference count of the object is equal to 1), i.e.:

-- This is an example of a simple function `map` of a List of `a` to a List of `b` using a function that transforms from `a` to `b` that is called `f`
def map (f: a -> b) : List a -> List b
  | Cons x xs => Cons (f x) (map f xs)
  | Nil       => Nil

This function will be optimized to mutability due to the reuse of the constructor Cons. So, a function that usually creates a new list, allocating many new resources and dropping old resources, can now be automatically optimized to use mutation instead.

The Nonexistent Ecosystem of Lean4

While Lean4 is an innovative and powerful programming language with a unique set of features, it faces a notable drawback: a lack of an established ecosystem. As I was starting the project, I tried to ask a question in the official Chat of the language about frameworks for the web. To my surprise, the response directed me towards a socket bindings library, while not aligning with my initial expectations, at least provided me the foundation I needed to build: An HTTP/1.1 server.

Following that, I started to read RFC 2616, a document detailing the inner workings of the HTTP/1.1 protocol. Soon I realized it was too much to do in only two days and I moved to another strategy: C++ bindings. Lean is a language that compiles to C++, so it’s really easy to do C++ bindings in Lean4. My quest for the fastest C HTTP parser led me to the discovery of PicoHttpParser. Since the HTTP/1.1 protocol specifies text-based communication, I required a high-performance parser that minimized string data manipulation (given its computational cost). PicoHTTPparser was exactly what I was trying to find.

With the C++ bindings, I forged a library called ‘Lina’. This library is a fast HTTP/1.1 parser crafted for my needs in the "Rinha de Backend" (so it should not, by all means, be used in production). "Lina" was the first building block that I fused with sockets to make an actual HTTP server. This endeavor gave rise to a library called "Melp", which almost does not use any of the fancy types that I explained earlier in this article but provides a simple low-level abstraction for HTTP servers.

"Melp", however, was too low level. To develop fancier code, I started to develop something like the Node.js’ "Express" library to Lean4, which I called "Ash". This library implements JSON support using a parser combinator library that I developed (called "Soda"), which is optimized to use mutability in a pure way and an Ash.App Monad that helps with a simple DSL for writing routes just like express:

def app: Ash.App Unit := do
  get "/pessoas" $ λ conn => do
    match conn.query.find? "t" with
    | none       => conn.badRequest "Bad Request"
    | some query => conn.ok (← findLike query db)

  get "/contagem-pessoas" $ λ conn => do
    let count ← countPeople db
    conn.ok s!"{count}"

This serves as example showcasing how the "Ash" library works with the main application that is the one crafted for "Rinha de Backend". The library provides access to the Ash.App monad, the main component of the app function. Within the do block of this function, we use the get function to add a new route into the Router component of the Ash.App monad.

def app: IO Unit := do
    let server ← Server.new

    let server := server.onConnection $ λ conn => do
        if conn.method == "GET" then
            if conn.data.path == "/pessoas"
                then do
                    match conn.query.find? "t" with
                    | none       => conn.badRequest "Bad Request"
                    | some query => conn.ok (← findLike query db)
            else if conn.data.path == "/contagem-pessoas" then do
                let count ← countPeople db
                conn.ok s!"{count}"

    server.start "127.0.0.1" "8080"

This second approach is a lot harder to understand and more nested. It’s not as automatic and simple as the other one, it needs some manual validation using ifs that is done automatically with the "Ash" library.

In the inner workings of almost all of these libraries, I used arrays, which are somewhat unconventional in functional languages. In Functional programming, we want to preserve immutability, and this does not work very well with an array; we would have to copy the entire array all the time to push something into the structure. In Lean4, due to the concept of "Functional But in Place", it’s different: we can purely mutate arrays if the count of references is equal to one (because if it’s not shared, we will not have visible side effects, thus it’s ‘purely functional’), i.e.

-- It looks like it is returning a new copy of the array but it's returning the same array with the
-- mutation.
let arr := #[]
let arr := arr.push 1
let arr := arr.push 2
arr

The last building block of my quest was a PostgreSQL driver, a vital component of my project. To my surprise, It took me more time than I was expecting because of problems compiling Lean4 with the default C++ compiler. In the end, I conducted a trial and error process, testing all available C++ compilers. Ultimately, it was clang++ that successfully compiled Lean4 with PostgreSQL, bringing my project to reality. The driver is ridiculously small, it is only 100 lines of Lean4 and 150 of C++.

With all the essential libraries in place, I just needed to implement the application and that really was the easy part. This phase, in contrast to the earlier challenges, did not consume a significant amount of time or effort.

Troubles during the implementation

The initial attempt at running the Ash library with the Pgsql driver led to random segfaults. This issue was happening approximately five hours before the submission deadline, leaving me feeling quite discouraged. At that point, I contemplated giving up. However, after long research, I connected the fact that Ash was using a Task construction that automatically parallelizes the code inside of it with a single PostgreSQL connection, and that would lead to data racing issues.

I would never have believed that I would have to use a Mutex in Lean4, but it turned out to be the solution to the problem.

def execCursor (s: Connection) (query: String) (a: Array String) : IO Result := do
  s.mutex.lock
  let res ← execCursorRaw s.conn str a
  s.mutex.unlock
  pure res

This code is used to safely execute a query represented by the parameter query. To make sure that everything will run correctly, we lock and unlock the Mutex around the raw query function to execute a cursor.

The main code

The application consists of only three files on a structure that I don’t recommend for any serious project:

Main.lean: That describes the executable part of the application and initiating the application. It includes a router and the main function, which is responsible for establishing both the connection with PostgreSQL and starting the application. The main function is used as the application’s entry point, setting up the components and starting the application with the specified configuration.

def main : IO Unit := do
  -- Read the environment from environment variables,
  -- but if they are not set, use the default values.
  let env <- readEnvironment

  -- Connects to the database using the environment variables.
  let conn ← Pgsql.connect $ env.postgres.toConnectionString
  let app := app conn
  IO.println s!"INFO: Database connection set up"

  -- Run the application with the environment variables host and port.
  app.run env.host env.port do
    IO.println s!"INFO: Server running on {env.host}:{env.port}"

rinha/Rinha/Environment.lean: This module plays a crucial role in specifying the application’s settings. It defines the Environment structure, which encapsulates essential information such as the host, port, and PostgreSQL connection. Additionally, it provides some functions that are useful for retrieving information from the environment variables into these structures, i.e.,

/--
Structure that represents all the environment variables needed
for running the application. 
-/
structure Environment where
  host : String
  port : String
  postgres : PostgreSQLEnvironment

...

/--
Reads the environment variables and returns the environment
data type for easier access.
-/
def readEnvironment : IO Environment := do
  let port := Option.getD (← IO.getEnv "PORT") "8000"
  let host := Option.getD (← IO.getEnv "HOST") "0.0.0.0"
  let postgres ← readPostgreSQLEnvironment
  return {host, port, postgres}

rinha/Rinha/Entities/Person.lean: This module serves as the main point of the application, defining the primary model for both the logic and the retrieval of data from the database. It introduces a collection of data types useful for delegating the responsibility of ensuring the application’s correctness from runtime to compile time by the usage of the type system. This approach is made possible by the usage of the "new-type" pattern, which makes it harder to forget or forge invariants about some data by creating a new type for a simple piece of data like a username or an email. In some other languages like Rust, it’s easy to use the new-type pattern, but in Lean4, we can make it even more correct by the integration of proofs inside the type. With these proofs, we can ensure that each type will be correct-by-construction, which aligns with what a theorem prover can offer.

structure Username where
  data : String
  prop_in_bounds : data.length <= 32
  deriving Repr

In the example provided, the Username type contains two essential components:

  1. The data field of type String that encapsulates a simple string as a new type.
  2. The prop_in_bounds field is not a typical "validation" but rather a type and a proposition (in the mathematical sense) at the same time! The only value that can satisfy this type is a proof that the data.length <= 32 is indeed less than or equal to 32 characters. This approach ensures that any instance of Username is correct by design. It becomes impossible to create an invalid Username containing more than 32 characters.

The following code snippet showcases a critical aspect of the system:

def String.toUsername? (data : String) : Option Username :=
  match data.length.decLe 32 with
  | Decidable.isTrue p  => some {data, prop_in_bounds := p}
  | Decidable.isFalse _ => none

The creation of a proof validating that the data’s length is less than 32 characters is done by the usage of the Decidable data structure that carries a value of the type data.length <= 32. These structures are generated by functions like decLe, which, in essence, determines if a number is smaller than it’s argument and subsequently provides a proof of this property in the isTrue variant; otherwise, it creates an isFalse with the inverse proof.

Towards the end of the module, you will find some code that handles some database queries, similar to what is done in other languages. The example below simply executes a SQL query and then tries to transform a result into a validated structure.

def findById (id : String) (conn : Connection) : IO (Option Person) := do
  let query := "SELECT * FROM users WHERE id = $1;"
  let result ← exec conn query #[id]
  match result with
  | Except.error _ => return none
  | Except.ok rs   => return rs.get? 0 
                  >>= FromResult.fromResult

Putting everything together

The main application can be summarized from flux of the "contagem-pessoas" route. The entire application starts from the main function, which initializes both the database connection and the Ash framework to handle requests.

-- The entrypoint of the whole application
def main : IO Unit := do
  -- Connects to the database using the environment variables.
  let conn ← Pgsql.connect $ env.postgres.toConnectionString
  let app := app conn

  -- Starts the main application
  app.run env.host env.port do
    IO.println s!"INFO: Server running on {env.host}:{env.port}"

The app function, initiated from the main function, is responsible for defining each of the application’s routes. In this example, our focus will be on the "contagem-pessoas" route.

-- This function defines the router of the application using the "Ash" library.
def app (db: Pgsql.Connection) : Ash.App Unit := do
  ...

  -- Registers the "contagem-pessoas" route.
  get "/contagem-pessoas" $ λ conn => do
    let count ← countPeople db
    conn.ok s!"{count}"

When this function is called from a HTTP request on this route, it calls the countPeople function. This function directly interacts with the database by using the exec function from the Pgsql library. The result is a ResultSet which the first entry provides a count of registered users in the database.

-- Makes the query using the Pgsql library.
def countPeople (conn : Connection) : IO Nat := do
  -- Executes a query
  let result ← exec conn "SELECT COUNT(id) FROM users;" #[]
  -- Checks if it returned an error
  match result with
  | Except.error _ => return 0
  | Except.ok rs =>
    match rs.get? 0 with
    | some res => return (res.get "count").get!
    | none     => return 0

The Contest

When the contest began, I had modest expectations, hoping to secure at least the 30th position. When I heard my name, in the 4th place, a rush of excitement filled me. I couldn’t help but wonder how it got faster than Golang and other Rust implementations. My initial thought was that my girlfriend configured Nginx so well that it made it fast. This was a temporary triumph for purely functional programming language.

Results from "rinha-de-backend"

This triumph was short-lived, however, because another user found a critical bug with our implementation. This flaw allowed invalid users to be registered as valid, all because I forgot to add date validation. Besides that, It was a great and fun experience. It allowed me to exercise some abilities that I have never tested because I had no reason to do so. I’m not sure if I’m going to maintain these libraries to make something production-ready, but I’m certainly going to use these libraries, when more mature, in some projects just for fun.

References

We want to work with you. Check out our "What We Do" section!