Copycatting netcat: from Node.JS to C

Let’s get to the heart of streams, how they work and some classical problems.

Mastering web development is a cruel and endless journey. Part of this endeavor is understanding the dystopian world where all the magic sits on — at the end of the day, all comes down to bytes.

Let’s get down to some low-level concepts. At the end of this article, I will propose you to write your own web server from scratch, using pure C.

Just 3 lines of code in Node.JS

Using netcat to open a stream is simple. To listen for connections, just type nc -l <port>}, and to connect to it, open a second terminal (which can be on a second computer), and type nc <host> <port>.

Enjoy your stream.

No big deal. An echo server can be built in Node.JS in a second. This snippet will give us exactly the same behavior:

var server = net.createServer(function(socket) {
    socket.pipe(socket);
});

Wait, there is a .pipe() in there. Is it the same concept of pipe we use in linux, when we need to concatenate processes? Yes. In Unix, everything is a file descriptor, so streaming from/to standard I/O, network sockets or files on the disk is just a matter of redirecting.

Dissecting Node.JS pipe

So, let’s see what behavior the pipe method implements under the hood. Take a look in this code:

var server = net.createServer(function(socket) {
    socket.on('readable',function(){
        var chunk = null;
            while (null !== (chunk = socket.read())) {
                socket.write(chunk);
            }
    });
});

Whenever a client sends some data to our socket, a readable event will be fired, and socket.read will be called at least once until the internal buffer is completely drained.

This code exposes two interesting points about piping in Node.JS. First, it explicitly takes advantage of the unblocking nature of events. Second, it assembles data chunk by chunk, let’s see how this can be useful.

Uploading large files

Letting your client upload large objects may be not-so-great to your server.

Most programming languages will only deal with objects entirely loaded in memory. Let’s modify our code a little bit to prevent the resources of our server to be drained when using Node.JS.

const CHUNK_SIZE = 1024;

var server = net.createServer(function(socket) {
    socket.on('readable',function(){
        var chunk = null;
            while (null !== (chunk = socket.read(CHUNK_SIZE))) {
                socket.write(chunk);
            }
    });
});

This will guarantee Node.JS will not deal with more than CHUNK_SIZE bytes at any given time.

The building blocks in C

Node.JS features an awesome abstraction of low-level I/O. At this point, we’re just one inch shy of another well-known abstraction, which will give us more understanding of how network communication works: Berkeley Sockets.

This site is a great resource for code examples, where you’ll see these functions with a more detailed context. I’ll use this (free!) guide as a reference.

First, let’s see the function we need for declaring a socket.

#include <sys/types.h>
#include <sys/socket.h>

int socket(int domain, int type, int protocol);

We’ll use some macros to shape our socket. You can choose either PF_INET or PF_INET6 for your domain, to choose between IPv4 or IPv6. For your type and protocol we will be going with SOCK_STREAM and IPPROTO_TCP. That’s simply because TCP is a protocol designed to provide socket-like connections.

Now let’s use the file descriptor you have just gotten from the SO. As you’re building a server, let’s bind that socket to a port:

int bind(int sockfd, struct sockaddr *my_addr, int addrlen);

The first argument is the return value of the socket function — in C, files descriptor are represented by integers. The second is a struct containing an address family, an address and a port, and the third, the size of this struct.

We now have the equivalent of a net.createServer call on Node.JS. Okay, so the clients are waiting patiently on line to be accepted. Let’s accept() them!

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

sockfd is the listening socket. On the other hand, addr is a struct that will help us identify the client we’re accepting. Now, the interesting part: accept() will return a brand new file descriptor, so we may talk to this client in private. The rule is simple: for every new client, a new socket.

Awesome. Waiting for a client to arrive and be accepted is a blocking operation by default. Past that, it’s time to talk to this client using those two functions:

int send(int sockfd, const void *msg, int len, int flags);
int recv(int sockfd, void *buf, int len, int flags);

send()ing and recv()ing look very much alike. You just need to use the file descriptor that was given when you accepted the client, and provide a pointer to the first byte you want to send (or to be written), along with how many bytes you want to send (or to be received).

Just to draw a parallel between this and what we’ve made with Node.JS — see that we have control over the width of our stream, the same way we did with CHUNK_SIZE!

In summary, that’s how our echo server in C will look like:

socket();
bind();
listen();
for(;;)
{
  accept();
  while(recv()) send();
}

The Heartbleed bug

What if I say that now you are able to understand the arguably worst vulnerability found since commercial traffic began to flow on the Internet?

Heartbleed logo.Heartbleed logo.

Heartbeat was an extension protocol in OpenSSL. It acts pretty much like an echo server. When a receiver receives a HeartbeatRequest message, the receiver should send back an exact copy of the received message in the HeartbeatResponse message. If the content is the same, the secure connection will be kept.

I’ll give a clue of what the bug is. Read the following xkcd cartoon and take a look at the send() and recv() interfaces on the previous section.

Exactly. It is perfectly possible that, by mistake, or by induced mistake, you send more bytes than your actual buffer has. Let’s illustrate this:

char* payload = "are you alive";
int payload_size = 16381;

send(client_sock, payload, payload_size, 0);

This is known as buffer over-read, which is commonly associated with languages like C and C++. Take a deep breath and thank Node.JS for making this kind of exploit much less likely to happen to your apps.

But, to be fair with C, there are tools like Valgrind to prevent an application to have this kind of memory errors, we just need to test them correctly! By the way, if you want to read more about this bug, check its own site and this neat pdf by IBM.

Writing your own webserver

New powers ahead.New powers ahead.

Great job! Now that you know the anatomy behind socket connections, implementing an application protocol like HTTP can be fairly straightforward.

You may implement a small subset of the HTTP 1.0 protocol, for example. Parse a GET request header, along with any path parameters it may contain, and return a proper header following what was asked —that can be either a plain text or a binary file.

Some questions will naturally arise.

How would you fork (or thread) your C application to accept and handle multiple clients? What’s the practical difference between simply forking/threading and having an event loop like in Node.JS? What would be the overhead of streaming with HTTP(S)? How do we improve TTFB (Time to First Byte)? What does the architecture behind CDNs look like?

The search for those answers will certainly make you a better web developer. 😉

Thanks to Bruno Konrad.