Build your own: libffi

By: Andrew Halle Repo: byo-libffi

Part of build-your-own

How does code written in one language (say, Python) call written in a different language (say, C). It's clear that this is necessary (e.g. for perforance concerns) but, if you're anything like me, it seems like black magic that this should be possible. How can this possibly work.

A Foreign Function Interface is the means why which code written in one language calls code written in another language.

libffi is a low-level library that implements the calling convention for a particular platform. In this post I'll re-implement a subset of the functionality provided by libffi and prove that it works by replacing the real libffi and showing that the Python interpreter still works.

The Basics

How does a function actually get called? On x86, we have the call instruction, which has the following definition:

Saves procedure linking information on the stack and branches to the called procedure specified using the target operand.

We can see this in action via a simple example (Compiler Explorer is a great tool for looking at compiler output).

C Assembly

This simple example reveals a lot about what the compiler does for us. Let's look closer at this snippet in particular.

Why is the compiler putting the arguments to our add function into registers? Why these registers, in this order? This is because of the calling convention as previously mentioned. The calling convention is the contract between compilers that allows separate compilation to happen. The add function can assume that its first argument will be in edi and its second argument will be in esi (more on this later). Moreover, the calling convention allows the calling code (in this case, the main function) allows the calling code to assume that, after a function is called, its return value will be in eax.

When a dynamic language like Python needs to call into some already compiled C code, how does it marshall arguments into the appropriate registers? This is the job of libffi, and what we'll implement in the following sections.

Calling Code Loaded at Runtime

In order to call native code, the Python interpreter must load in a previously compiled shared library (a .so file on Linux). This is accomplished via the functions dlopen and dlsym. dlopen loads a shared object and returns a void* handle to that object. The dlsym function takes a handle to a shared object and a string symbol to search for in that shared object. As an example, to call an add function in a shared library libadd.so you can write the following C code:

This code opens the shared object (loading it immediately because of the flag RTLD_NOW), grabs a pointer to the add function, casts that void pointer to a pointer to a function taking two ints and returning an int (notice the typedef at the top of the file).

The typedef at the top of the file gives the compiler the information to generate code to call this function. Our first goal will be to call this function without the typedef (so we have to write the code to call this function, the compiler can't help us).

Calling One Function

Let's start by defining a struct to hold some information for us, namely the address of the function to call, and the arguments we're going to pass into that function.

If this were a real library, storing the arguments inside the callable would be a bad idea, because we might want to call this function with different arguments. However, it will suffice for now.

We'd like to be able to write code like this:

What about this mysterious function runtime_call? runtime_call needs to do a few things:

Since we need direct register access (and the ability to issue a raw call) we'll need to write this function in assembly (I use the NASM assembler).

Compiler Explorer can do most of the work for us here. This example:

C Assembly

clearly parallels what we're trying to do. And indeed, if we copy the assembly output into a file runtime-call.s, build it with nasm -f elf64 runtime-call.s and link the resulting object file with our main.c we'll successfully call this function! (this state of the code is given by commit ca946232a8545bdb7389be7159abf504d3f5a168 of the repo for this post).

Calling Any Function

Okay, the last section was kind of cheating. We only allowed for one possible signature of the function we might call, so we could copy the compiler output to put the arguments in the right place. We haven't done anything yet! In this section, we'll make our runtime_call function generic enough to handle functions that take any number of ints as arguments, and return a single int.

(the leap from here to supporting functions that take and return variables of different type is not trivial, but I think also not very instructive. In order to actually finish this post, I decided to stop here and restrict my libffi to only work with functions of this type. For more information on supporting functions that take arbitrary types of arguments, see one of the links in the Resources)

For this section, we'll actually need to figure out what the calling convention on our platform is (I'm using Linux, and include a Vagrantfile for a Linux VM in the repo). We'll be implementing the calling convention for 64bit linux, which is known officially as "System V AMD64 ABI" (see this link for more information).

We're interested in the following aspects of this ABI (the following points are quotes from the above link)

On x86-64 Linux, the first six function arguments are passed in registers %rdi, %rsi, %rdx, %rcx, %r8, and %r9, respectively. The seventh and subsequent arguments are passed on the stack, about which more below. The return value is passed in register %rax.

That link uses AT&T syntax for registers, while Compiler Explorer and NASM default to using Intel syntax.

If we imagine we have the following high-level functions (Python syntax for clarity):

then we could implement this ABI in the following way

There's one complication to writing this function in assembly. The overwrite_register function takes a string description of the register and loads a value into it. We don't want to actually resort to string comparisons to determine the register we're loading into. We'll instead include the instructions for loading all 6 arguments into registers in the runtime_call function, and then jump over the ones we don't want to execute. Since assembly executes instructions sequentially, this means we have to list out these instructions in reverse order (since if we have less than 6 arguments, we'll have to jump over the instructions which load the later arguments that we don't actually have).

The interesting part of the assembly is listed below (look in the repo for the full function):

Some notes about this assembly:

Our C code will also have to change to support an arbitrary number of arguments. We'll implement this by changing the structure to include a pointer to the args, and a couple of functions to initialize this structure with a malloc'd array and add args to it.

We can now write C code like the following:

where the add function has been separately compiled to have the appropriate number of arguments.

And that's it! We can now call dynamically loaded functions at runtime, without specifying their signatures at compile time. All that's left to do is make this compatible with the real libffi, and we should be able to use our library with Python.

Writing a Compatibility Layer

First, let me prove that the Python interpreter does in fact use libffi to accomplish this task. Let us consider the test program

where libadd.so is our previously compiled add function, in this case, with 10 arguments.

Running this in our VM (vagrant up; vagrant ssh; python test.py) produces the correct answer of 55.

Now, let's look for libffi.so (we know from the libffi docs that we link with libffi using -lffi so the file is probably called libffi.so per convention). If we run

we'll see the output

So if we compile our libffi.so and replace these files, we can inject our version of libffi into the system.

NOTE: don't actually do this on your system. libffi is incredibily important for a lot of things (even LLVM links with libffi) so if you remove it on your actual computer, you won't be able to do a lot of important things. I've included a Vagrantfile for testing, you can do this on any virtual machine

I've restructured our implementation of libffi into ffi.h and ffi.c, which we can now compile into libffi.so with

If we replace the system libffi (again, not on your actual system) with our compiled artifact, and try to run the test python program, we'll get the following error

Okay, now we're getting somewhere. This error is happening because Python is loading in our libffi.so and looking for symbols that don't currently exist, but should exist according to the API offered by libffi. Let's look at the man pages for libffi

Clearly we'll need a struct ffi_cif (which is the equivalent of our callable) and functions ffi_prep_cif (the equivalent of our init function) and ffi_call (the equivalent of our runtime_call). Note, I didn't learn all this just from the output above, I had to read the other man pages for libffi (listed in the SEE ALSO).

One interesting thing to note, I don't see ffi_closure_alloc anywhere in this API. As far as I can tell, these are optional symbols included by libffi for offering closure type functionality. I think I got incredibly lucky in that they don't actually have to do anything for the simple case we've already implemented, they just have to exist. So, we can define empty functions with these names (going back and forth with python to see what it complains about missing) to move forward.

The resulting code is:

When we now run our test script, we get

Okay, at this point, we have to actually look at the header file ffi.h provided by libffi to figure out what value we're supposed to return from ffi_prep_cif to indicate success, and what ffi_prep_cif and ffi_call are supposed to do.

ffi_prep_cif is supposed to set up the ffi_cif object with the types the function is supposed to take. Since we've restricted ourselves to functions that only take integers and return an integer, we're only interested in n_args.

ffi_call passes a pointer to an array of void* which are the arguments we're passing to the function, and a void* indicating where to put the reuslt. At this point, you may notice our callable struct is not exactly mappable to ffi_cif because the callable struct stores its arguments, whereas ffi_cif does not. We can get around this in a hacky way (if we were writing an actual library, I would actually fix this). We'll do the following: when ffi_prep_cif is called, store the number of arguments we're expecting. When ffi_call is called, grab n_args from the callable, then re-init it, and add the arguments using our previously defined function. Then, call runtime_call to actually invoke the function.

The final code is below:

When we compile this, move it to the path normally occupied by libffi, and run our test script, it works! We've successfully written an implementation of libffi that's API-compatibile with the real libffi and supports a subset of it's functionality.

Wrapping Up

At this point, we're done. We've accomplished our goal of calling functions at runtime without specifying their type signatures at compile-time, we've made our clone API-compatible with the real libffi (through a pattern I quite like, we defined our own API that makes sense to us, then written a compatibility layer for backward compatibility), and we've (if you're at all like me) become extremely grateful that high quality implementations of this functionality already exist. Every new language that wants to offer a foreign function interface doesn't have to write their own implementation of this, and libffi is written with more platforms in mind than just x86, so this will work across many platforms.

Resources / Bibliography

I made use of the following links/resources while writing this post: