Always Processing

Rust API Bindings: Core Foundation Memory Management and Mutability

Two dogs sitting in a room of computer boxes. Is the content of each box unique, or do some boxes have shared contents?

The design patterns used by Core Foundation for memory management and mutability fit surprisingly well in idiomatic Rust. This post shares an overview of how I reached this conclusion the hard way.

As I’m designing Rust API bindings for Core Foundation, I want the user-facing API to match The Rust Standard Library as closely as possible, and memory management is a crucial area whose design significantly impacts the API surface. There are (at least) two critical differences between Core Foundation and The Rust Standard Library in their approach to memory management:

  1. All Core Foundation objects are allocated on the heap and are reference counted. Generally, Rust types can be stack-allocated, heap-allocated and uniquely owned, or heap-allocated with shared ownership.

  2. Core Foundation uses different types for immutable and mutable objects, while Rust expresses mutability through the type system.

The following summarizes my exploration in this space, my design goals for memory management, and how I achieved them.

Ad Hoc Approach

For many C APIs, wrapping a pointer in a tuple struct and implementing Drop is sufficient to provide an idiomatic Rust API for a foreign interface. Consider the following example of this approach for CFString:

struct String(*const __CFString);

impl String {
  fn len(&self) -> CFIndex {
    unsafe { CFStringGetLength(self.0) }
  }
}

impl Drop for String {
  fn drop(&mut self) {
      unsafe { CFRelease(self.0.cast()) }
  }
}

This approach is straightforward, but it is not a zero-cost abstraction. Each time the Core Foundation object pointer is required, for example, in the len method, the compiler must emit a dereference of the tuple struct &self to load the Core Foundation pointer value. This indirection is unavoidable because we must define a type to implement Drop, though it is negligible in practice.

Although Core Foundation is a C-based API, many types have logical subclasses. If we were to add Rust API bindings for CFMutableString with this approach, it would require defining a new, independent type. Adding an implementation of Deref would enable the logical subclass to gain all the methods of its logical superclass through deref coercion, and the resulting Rust API would still be reasonably idiomatic.

While this is a well-trodden path[1], I wanted to find a design that:

  • Is a true zero-cost abstraction.

  • Shows Core Foundation objects are heap-allocated through the type system (e.g., Box).

  • Combines Rust’s mutable references with Core Foundation’s mutable types.

Box for Core Foundation

I started exploring the design space by building equivalents of the standard library’s Box and Arc types, knowing unique ownership (à la Box) would be part of the mutability story and that shared pointers (like Arc) are a fundamental part of programming on Apple platforms.

Through this exercise, I immediately achieved my first two design goals. Consider the following sample code illustrating the approach for CFString:

struct Box<T>(NonNull<T>);

impl<T> Deref for Box<T> {
  type Target = T;

  #[inline]
  fn deref(&self) -> &Self::Target {
    unsafe { self.0.as_ref() }
  }
}

struct String;

impl String {
  fn new() -> Box<Self> { /* ... */ }

  fn len(&self) -> CFIndex {
    let cf: *const _ = self;
    unsafe { CFStringGetLength(cf.cast()) }
  }
}

Like the approach in the previous section, a tuple struct wraps the raw Core Foundation object instance pointer. But this wrapper has three essential differences.

First, the type name, Box<T>, signals to the reader that T is heap-allocated and that the instance T is unique.

Second, it implements Deref to T, the Rust type implementing the API bindings, which is crucial in making the abstraction zero-cost. When the box is dereferenced by the compiler, for example, to call the len method, the box returns the Core Foundation pointer value as a reference to T. The reference value (i.e., &self) is bitwise identical to the Core Foundation pointer value and can be passed directly through to the C API.

The compiler dereferences the tuple struct in each approach, so why is one approach a zero-cost abstraction while the other is not?

In the previous section, &self is a reference to the tuple struct. The compiler must load the Core Foundation pointer value from the tuple struct’s field through the reference. An illustration in C may help clarify:

struct String { const struct __CFString *s; };

CFIndex String_len(const struct String *self) {
  return CFStringGetLength(self->s);
  // dereference+member access ^^ is an abstraction cost
}

struct String s = /* ... */;
CFIndex length = String_len(&s);

In this section, &self is the pointer value. The dereferencing operation is a logical process, not a physical one, that is effectively a member access. To illustrate in C:

struct Box__CFString { const struct __CFString *v; };

CFIndex String_len(const struct String *self) {
  return CFStringGetLength((const struct __CFString *)self);
}

struct Box__CFString s = /* ... */;
CFIndex length = String_len(s.v);
//        member access here ^ is zero cost

Finally, the separation of the type bindings (e.g., String) from the memory management facility (e.g., Box<T>) enables idiomatic, zero-cost use of references to the Core Foundation type bindings. Consider potential bindings for CFArrayGetValueAtIndex. With the approach in this section, the function binding can simply cast the pointer into a reference with the array’s lifetime.

With the approach in the previous section, the bindings for this function could:

  • Return a new binding instance for the value, retaining and releasing the object. The new binding instance does not have a lifetime associated with the array, so the retain is necessary to guarantee that the object lives at least as long as the binding instance. In many cases, however, the retain/release is unnecessary overhead.

  • Use an intermediate type to associate a lifetime with the binding instance and sidestep its retain/release, which is effectively the same as the function binding for the approach in this section but requires more ceremony to eliminate the retain/release correctly..

Arc for Core Foundation

It took more exploration and trial and error to identify an approach to achieve my third design goal of combining Rust’s mutable references with Core Foundation’s mutable types.

At some point, I asked, "Why do immutable objects need exclusive ownership?" I was eventually able to convince myself that "They don’t!" Looking back, I don’t know why this wasn’t more obvious. Rust’s documentation for its Arc type clearly states:

You cannot generally obtain a mutable reference to something inside an Arc.

With that insight, developing the guidance to identify the appropriate smart pointer type was reasonably straightforward: Is the Core Foundation object instance a mutable type uniquely owned by the raw pointer (i.e., a Create or Copy function return the pointer)? If yes, use Box<T>; otherwise use Arc<T>.

My implementations of Box<T> and Arc<T> for Core Foundation are virtually identical, with the primary difference being Box<T> also implements DerefMut, AsMut, and BorrowMut.

The combination of reference counting and mutability in the smart pointer types[2] fulfilled my design goals and resulted in surprisingly idiomatic Rust code.

impl String {
  fn append(&mut self, s: &String) {
    let cf: *mut _ = self;
    let s: *const _ = s;
    unsafe { CFStringAppend(cf.cast(), s.cast()) };
  }
}

fn main() {
  let mut s: Box<String> = String::new();
  s.append(cfstr!("Hello, World!"));
  println!("{s}");
}

1. The "tuple struct wrapper with Drop" is the approach used by the Servo Core Foundation bindings.
2. Although I approached this as a clean-room design, similar prior art exists. In 2014, long before I heard of Rust, GitHub user SSheldon differentiated between owned and shared references in an Objective-C smart pointer.