Integrating Ruby with Rust with FFI

Transferring numbers, strings and JSON data between Ruby and Rust

Ruby is a great programming language for productivity, but sometimes you need to develop something that is performant or has a low memory footprint. Rust is a exceptional language for that, due to its type safety, memory safety, and its modern language features. In this article, we’ll talk about how we can integrate Ruby with Rust by using FFI (Foreign Function Interface). We’ll look into:

  • How to use FFI:
    • Simple implementation only passes numbers around
    • How to send and receive strings
    • How to send and receive JSON
  • How to handle panics when using FFI
  • How to deploy the application with Docker

You can also check out the final result in this repository: https://github.com/rhian-cs/cm42-blogpost-example-rust-in-ruby

Let’s get started!

Why Rust?

Rust is a performant, safe, and modern programming language. It is suitable both for low-level applications that need to interact directly with hardware as well as high-level applications that need to be highly performant.

It stands out among low-level languages because it implements modern features and tooling from high-level languages, such as closures, pattern matching, safe concurrency, and an official package manager.

It also catches the eyes of developers who use high-level languages, since it has all the benefits of a low-level language while still having a compiler that helps developers make little to no mistakes when it comes to memory management and concurrency.

About Ruby Performance

Ruby is a great language due to its easy-to-read syntax, mature ecosystem, and productivity benefits.

However, it lacks the necessary performance for some types of problems. While this can be somewhat mitigated by implementing concurrency with threads, Ruby* still has a Global Interpreter Lock (GIL). The GIL is a mechanism that ensures only one thread can execute Ruby code at a time. This has some advantages in a single-threaded context, but it prevents the interpreter from executing Ruby code in a truly parallel manner.

Mind you, I/O-bound operations such as network calls, database calls, and filesystem access will still benefit from the use of concurrency. But CPU-bound operations, such as heavy number crunching or in-memory string manipulation will in practice run sequentially and will take a toll on the system’s performance.

Luckily, we can use Rust to solve these types of problems.

*This only applies to CRuby (MRI). Other Ruby VM implementations such as JRuby and Rubinius don’t have a GIL and thus support true parallelism, but they may have some downsides depending on your use case.

How to Integrate with Rust? Here’s 3 ways

The service approach: If you have a web application that uses Ruby (such as a Rails app) you can simply add another web server to your infrastructure, written in Rust. The Actix Web framework is a great choice for this approach. Actix Web is comparable to Ruby’s Sinatra library, due to it being mostly an HTTP server and not a full-fledged application framework.

The worker approach: Alternatively, instead of relying on synchronous HTTP requests to exchange information, you could use Rust as a worker that would execute tasks asynchronously from a job queue. To do that, you could use Faktory, a background job system created by Mike Perham, the creator of Sidekiq. Thanks to Faktory being language-agnostic, you don’t need to use the same language for your app and its workers. This means you could enqueue a job from Ruby and consume it in Rust. Then, you could deliver a response by enqueueing a job from Rust and consuming it in Ruby.

The native approach: Last but not least, instead of adding a separate service/machine to your web infrastructure, you could simply call a Rust function from within Ruby by leveraging FFI. "What?", I hear you say. Yes! This is the approach we’ll be deep-diving into today. With a little boilerplate, we can easily get Ruby and Rust talking to each other directly within the same process.

What is FFI?

FFI stands for Foreign Function Interface. It’s the mechanism that allows native programs to call functions in compiled programs written in a completely different language. If you use Ruby, you might be familiar with the following message when installing some Ruby gems:

Building native extensions. This could take a while...

This message is shown whenever a gem has native dependencies that need to be compiled before installing the gem. These gems are using FFI under the hood. Some gems that use this mechanism are mysql2 and fast_excel, because they need to interact with native C code.

In the case of Ruby and Rust, they both provide first-class support for FFI with C-style functions. Now what does the C programming language have to do with all of this? You see (no pun intended), C can be perceived as the "lingua franca" of programming languages. Instead of having an adapter specifically for Rust in Ruby, or for Ruby in Rust, both languages can talk to each other via a common interface (by declaring and consuming C-style function declarations).

Integrating with Rust using FFI

Let’s do the simplest possible integration using FFI. By the end of this section it will have the following file structure:

├── adder
   ├── Cargo.lock
   ├── Cargo.toml
   ├── src
      └── lib.rs
   └── target/
└── ruby
    ├── Gemfile
    ├── Gemfile.lock
    └── main.rb

Create a new Rust library (let’s call it adder):

cargo new adder --lib

Update the adder/src/lib.rs file and modify the preexisting add function:

#[no_mangle]
pub extern "C" fn add(left: u32, right: u32) -> u32 {
    left + right
}

This will expose the "add" function for use in our FFI library. Some notes:

  • #[no_mangle] will ensure our add function is visible and actually named add in our final binary. We can’t easily locate this function in the binary without this configuration;
  • extern "C" specifies that the signature of this function should be C-compatible. It also shows us some warnings if we use unsupported types (such as directly using String);
  • I’ve also changed the argument types and return type to u32 for simplicity. u32 means "unsigned 32-bit integer".

Inside adder/Cargo.toml, specify the crate-type to be a "C Dynamic Library". Reference: https://doc.rust-lang.org/cargo/reference/cargo-targets.html#library

 # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[lib]                   # <--- add this
crate-type = ["cdylib"] # <--- add this

[dependencies]

This will ensure that when we build the library it will generate a ".so" (shared object) file, that we can use inside other programs.

Now build the app. Let’s build it in release mode. Inside adder/, run:

cargo build --release

This will create a libadder.so file inside adder/target/release/. That’s our library.

Now it’s time to call the add function from Ruby. Create a ruby/ directory.

Create a Gemfile inside the ruby/ directory:

source 'https://rubygems.org'

gem 'ffi', '~> 1.16'

Create the ruby/main.rb file:

require 'ffi'

module Adder
  extend FFI::Library
  ffi_lib '../adder/target/release/libadder.so'
  attach_function :add, [:uint32, :uint32], :uint32
end

puts Adder.add(5, 3)

Then, inside the ruby/ directory, run the file:

ruby main.rb

You can now see the result: 8!

So, what’s happening behind the scenes?

Representation of the process running the program

Our application runs in a single process. We’ve written Ruby code that will execute in the Ruby VM. The moment the Ruby VM creates the Adder module, it looks for that shared object we created earlier, loads it, and checks if the functions we’ve declared exist.

After that, when we call add, the Ruby VM will temporarily yield execution to that native code, and then we get our result back.

Isn’t that amazing?

Also, if you’re curious, uint32_t add(uint32_t, uint32_t) is just the signature of our function in C: it expects two unsigned 32-bit integers and returns one value of that same type.

Transferring Text Data using FFI

Passing numbers around in FFI is easy since they’re very small and can be cheaply copied over to the other program. However, when we pass larger or more complex structures things are not as simple.

Change adder/src/lib.rs:

use std::ffi::{CStr, CString}; // <-- add this

// ...

// add this function
#[no_mangle]
pub extern "C" fn process_request(raw_string_ptr: *const i8) -> *const i8 {
    let request = unsafe { CStr::from_ptr(raw_string_ptr) }.to_str().unwrap(); // We'll handle errors properly later in the article

    let response = format!("You've requested: {request}");

    let response = CString::new(response).unwrap();
    response.into_raw()
}

// and this function
#[no_mangle]
pub unsafe extern "C" fn deallocate_ptr(ptr: *mut i8) {
    if ptr.is_null() {
        return;
    }

    unsafe {
        let _ = CString::from_raw(ptr);
    };
}

Now there’s quite a lot going on here. Let’s break it down.

At a high-level, we’re declaring two functions: process_request and deallocate_ptr. process_request expects a raw pointer to a C style string and returns a raw pointer to a C style string. After the user uses or clones the string, they’ll need to deallocate it with deallocate_ptr.

Now let’s take a look at each line inside the process_request function:

  • it expects a raw_string_ptr parameter, which is a *const i8. This means it is a raw pointer for one or more i8 values. This is used because it is the required type for the CStr::from_ptr function.
  • unsafe { CStr::from_ptr(raw_string_ptr) } – We’re casting the raw pointer to a CStr. CStr represents a borrowed C-style string. This means Rust won’t try to deallocate it (since it’s owned by Ruby and should be deallocated by Ruby’s garbage collector). To understand why this is unsafe refer to the "Safety" section in the from_ptr documentation: https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr
  • unsafe { CStr::from_ptr(raw_string_ptr) }.to_str().unwrap() – We’re just converting that CStr to a regular &str so we can use it in our program
  • let response = format!("You've requested: {request}") – This is just the code that represents our "CPU-bound heavy lifting" but for simplicity, we’re just creating a new string.
  • let response = CString::new(response).unwrap(); – We’re now creating a CString to house our response. A CString represents an owned C-style string.
  • response.into_raw() – Finally, we’re casting our CString to a raw pointer. The into_raw function ensures Rust won’t deallocate the string data in request at the end of the function, which is important because we’re gonna use this value in Ruby.

Let’s take a look at each line inside the deallocate_ptr function:

  • if ptr.is_null() – We’re just checking if it’s a null pointer so we don’t try to deallocate it
  • let _ = CString::from_raw(ptr) – We’re now casting this raw pointer to an owned CString and putting it in a placeholder variable (_). Thanks to Rust’s ownership system, this pointer will be automatically deallocated at the end of the function.

Whew! That was a lot. Let’s call this from Ruby now and see if it really works.

Change ruby/main.rb:

require 'ffi'

module Adder
  extend FFI::Library
  ffi_lib '../adder/target/release/libadder.so'
  attach_function :add, [:uint32, :uint32], :uint32
  attach_function :process_request, [:string], :strptr # declare this function
  attach_function :deallocate_ptr, [:pointer], :void   # and this function
end

# ...

# add these calls
result_str, result_ptr = Adder.process_request("a cup of coffee ☕")
puts result_str
Adder.deallocate_ptr(result_ptr)

We’ve now attached the two functions we created earlier: process_request and deallocate_ptr. At a low-level, we’re passing and receiving pointers, but the FFI library provides some types to make it easier to call this code from Ruby.

  • The :string type makes FFI automatically convert the Ruby string into a pointer to a C-style string
  • The :strptr type is a helper to ensure the function returns both the returned data as a string as well as a raw pointer to the string, so it can be deallocated
  • The :pointer type is just a raw pointer
  • The :void type specifies that the function doesn’t return anything

We’re calling the process_request function, then we’re using the response string (with puts) and finally we’re deallocating the pointer to avoid any memory leaks.

Let’s go through how memory is shared and modified when running this code.

Part of the memory is owned by Ruby, meaning Rust shouldn’t directly interfere with it. And some other part of memory is owned by Rust, which Ruby shouldn’t interfere with.

The memory table is a simplified version of how memory works and the memory addresses are made up.

Let’s start at the beginning, in Ruby:

Memory Explanation - Part 1

Now let’s see what data is received in Rust’s end:

Memory Explanation - Part 2

This how Ruby processes the response:

Memory Explanation - Part 3

And this is how Rust deallocates the memory:

Memory Explanation - Part 4

Transferring JSON using FFI

Transferring text data can be pretty useful, but plain text data is not as simple to parse, so what if we use JSON instead?

Let’s work with a new example, say, a "greeter" action in our Rust app. We’ll have a GreetRequest, that should be formatted as JSON from Ruby, and it responds with a GreetResponse, also represented as JSON.

Rust has an awesome library called serde that handles serialization and deserialization, all while being extremely type-safe.

Add the following dependencies to adder/Cargo.toml:

[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"

Update your imports in adder/src/lib.rs:

use std::ffi::{CStr, CString};
use serde_json::json;
use std::{
    error::Error,
    ffi::{CStr, CString},
};

use serde::{Deserialize, Serialize};

Add the following structs:

#[derive(Deserialize)]
struct GreetRequest {
    name: String,
    age: u32,
}

#[derive(Serialize)]
struct GreetResponse {
    message: String,
}

The derive blocks add the necessary serialization and deserialization functions to our structures. This means we can easily parse a JSON string into a GreetRequest struct and encode a GreetResponse struct into a JSON string.

Create the greet function. We’ll use the Result of this function in our existing process_request function in a bit.

fn greet(raw_string_ptr: *const i8) -> Result<*mut i8, Box<dyn Error>> {
    let request_str = unsafe { CStr::from_ptr(raw_string_ptr) }.to_str()?;
    let request = serde_json::from_str::<GreetRequest>(request_str)?;

    let response = GreetResponse {
        message: format!("Hi, {}! You're {} years old.", request.name, request.age),
    };

    let response_json = serde_json::to_value(response)?;
    Ok(encode_json_into_ptr(response_json)?)
}

We’re extending what we were doing before. We’re reading from the raw pointer as a string, then we’re parsing JSON from that. After that, we’re creating our GreetResponse struct. Then we convert it into a generic JSON value so that we can call encode_json_into_ptr.

Note the function return type: Result<*mut i8, Box>. This means the function can either return a raw pointer or a generic error. This is useful because we can use the ? operator to simplify error handling even for more specific error types.

Create the encode_json_into_ptr function below:

fn encode_json_into_ptr(json: serde_json::Value) -> Result<*mut i8, Box<dyn Error>> {
    let string = json.to_string();
    let cstr = CString::new(string)?;
    Ok(cstr.into_raw())
}

This will convert a JSON Value to a string and then convert it to a raw pointer.

Let’s update the process_request function to call our greet function and handle potential errors.

#[no_mangle]
pub extern "C" fn process_request(raw_string_ptr: *const i8) -> *const i8 {
    match greet(raw_string_ptr) {
        Ok(response_ptr) => response_ptr,
        Err(err) => {
            let response_json = json! ({ "error": err.to_string() });
            encode_json_into_ptr(response_json).unwrap()
        }
    }
}

Let’s go back to our Ruby code.

In the ruby/main.rb file, import the json module:

require 'json'

Then, update how the Ruby code interacts with our Rust library:

request = {
  name: 'John Doe',
  age: 18
}

result_str, result_ptr = Adder.process_request(request.to_json)
response = JSON.parse(result_str)
Adder.deallocate_ptr(result_ptr)

puts response

This code creates a hash, serializes it into JSON, reads the response, parses that as JSON, and gives us a hash as a result. We’re then deallocating the pointer and outputting the result:

{"message"=>"Hi, John Doe! You're 18 years old."}

Let’s test out our error handling. What if we give Rust invalid values?

Instead of just running the Ruby code, open an IRB console with:

irb -r ./main

Then, play around with the function calls:

irb(main)> JSON.parse Adder.process_request('')[0]
=> {"error"=>"EOF while parsing a value at line 1 column 0"}

irb(main)> JSON.parse Adder.process_request({}.to_json)[0]
=> {"error"=>"missing field `name` at line 1 column 2"}

irb(main)> JSON.parse Adder.process_request({ name: 'John Doe' }.to_json)[0]
=> {"error"=>"missing field `age` at line 1 column 19"}

irb(main)> JSON.parse Adder.process_request({ name: 'John Doe', age: '18' }.to_json)[0]
=> {"error"=>"invalid type: string \"18\", expected u32 at line 1 column 29"}

irb(main)> JSON.parse Adder.process_request({ name: 'John Doe', age: 18 }.to_json)[0]
=> {"message"=>"Hi, John Doe! You're 18 years old."}

And there you have it! Ruby code calls Rust code and the users on both sides only have to worry about the JSON interface. You may just want to encapsulate it in a class so that you only worry about passing and receiving hashes on Ruby’s side.

Dealing With Potential Rust Panics

In Rust, a Panic represents something went terribly wrong in the program, and that it cannot continue executing. You can panic! explicitly, but there are other ways that a program could panic, such as (but not limited to):

  • Dividing an integer by zero
  • Using unwrap() on a None or Err(_) value
  • Using expect() on a None or Err(_) value

Well-designed Rust programs rarely panic. However, if a beginner Rust developer is creating an app, they might use unwrap or expect without proper care, which could cause a panic.

The major concern with panics in our case is: what happens to our Ruby program after Rust panics?

Whenever a Rust panic occurs, the default behavior is for it to halt the process with a SIGABRT signal. This means our Ruby program has no time to clean up any resources and will stop immediately.

If we’re calling Rust code, for example, from Sidekiq, this means Sidekiq will abort immediately and won’t push back currently running jobs to the queue. This will make us lose jobs in Sidekiq.

Let’s simulate what would happen in a panic. In adder/src/lib.rs, temporarily remove the ? operator and replace it with a call to unwrap():

 fn greet(raw_string_ptr: *const i8) -> Result<*mut i8, Box<dyn Error>> {
     let request_str = unsafe { CStr::from_ptr(raw_string_ptr) }.to_str()?;
-    let request = serde_json::from_str::<GreetRequest>(request_str)?;
+    let request = serde_json::from_str::<GreetRequest>(request_str).unwrap();

Try calling Rust with an empty string ('') instead of a valid request. You’ll get an error like this:

thread '<unnamed>' panicked at src/lib.rs:38:69:
called `Result::unwrap()` on an `Err` value: Error("EOF while parsing a value", line: 1, column: 0)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
Aborted (core dumped)

This aborts the process and begin/rescue cannot stop this.

Rust panics are not meant to be prevented. However, we need some error handling when dealing with FFI. For that purpose, Rust provides the catch_unwind function.

Add the following import:

 use std::{
     error::Error,
     ffi::{CStr, CString},
+    panic::catch_unwind,
 };

Then, update the code to wrap our greet call with catch_unwind:

#[no_mangle]
pub extern "C" fn process_request(raw_string_ptr: *const i8) -> *const i8 {
    let result = catch_unwind(|| greet(raw_string_ptr));

    match result {
        Ok(Ok(response_ptr)) => response_ptr,
        Ok(Err(err)) => {
            let response_json = json! ({ "error": err.to_string() });
            encode_json_into_ptr(response_json).unwrap()
        }
        Err(_) => {
            let response_json = json! ({ "error": "Rust Panic!" });
            encode_json_into_ptr(response_json).unwrap()
        }
    }
}

This will return:

  • The expected JSON response if everything goes well
  • An appropriate error message if an expected error occurs
  • A generic error message if an unexpected error occurs

One last thing about panics, the documentation on catch_unwind says:

Note that this function might not catch all panics in Rust. A panic in Rust is not always implemented via unwinding, but can be implemented by aborting the process as well. This function only catches unwinding panics, not those that abort the process.

This is probably not a big issue, but if you really really want to be safe from panics, you could spawn a new process just to run Rust code, but that might have a performance impact on your application. You could spawn a separate process in Ruby like this:

Process.wait(fork { @result = Adder.process_request(...) })
status = $? # Process.wait populates the global variable `$?` after the process is finished. This contains information about the process

raise StandardError.new(status.inspect) unless status.success?

puts @result

But that might be too much, it depends on your use case. If you want to learn more about Ruby Processes, check out this post by Maciej Mensfield.

How can we deploy this?

The simplest way to deploy this is to compile the code when deploying the app. If you’re using Docker that’s even easier to do, thanks to multi-stage builds.

Before dockerizing our app, let’s just update ruby/main.rb so that we can set the path to our library via an environment variable:

 module Adder
   extend FFI::Library
-  ffi_lib '../adder/target/release/libadder.so'
+  ffi_lib ENV.fetch('LIBADDER_SO_PATH') { '../adder/target/release/libadder.so' }
   attach_function :add, [:uint32, :uint32], :uint32
   attach_function :process_request, [:string], :strptr
   attach_function :deallocate_ptr, [:pointer], :void

Create the Dockerfile:

#### Stage 1: Rust ####
FROM rust:1-bookworm as rust

WORKDIR /adder
COPY adder .

RUN cargo build -r

#### Stage 2: Ruby ####
FROM ruby:3-bookworm

# Copy the built artifact from the other container
COPY --from=rust /adder/target/release /opt/adder

WORKDIR /app
COPY ruby .

# Install dependencies using Bundler
RUN bundle install

ENV LIBADDER_SO_PATH=/opt/adder/libadder.so

# Start your Ruby/Rails application
CMD ["irb", "-r", "/app/main"]

A multi-stage build is a way to use multiple images in a single Dockerfile, where each image is responsible for a separate thing. In our case, we first compile our Rust code and then generate an "artifact", the .so file. Then, in the Ruby stage, we copy that .so file to a relevant location so we can use it. After the build is complete, the Rust container will be deleted since it’s no longer needed, which reduces the build size.

Notice how we’re using bookworm as the base Debian image for both images. This is because we’re dealing with .so files, which are native system dependencies. Using the same OS version reduces the likelihood of compatibility issues.

Finally, build and run the app like this:

docker build -t adder .
docker run --rm -it adder

Wrapping Up

Rust is a great programming language but it can be hard to know how to tie it together with our existing Ruby apps. Hopefully, this guide will help you get started with your own implementation!

If you have any questions, check out the repository (and feel free to open an issue!) containing an example implementation here: https://github.com/rhian-cs/cm42-blogpost-example-rust-in-ruby

Thanks, and see you next time!

References

We want to work with you. Check out our "What We Do" section!