Objective-C Internals: Release
Although release is "just" the logical inverse of retain, its implementation is much more complex, primarily due to the ARM synchronization model. This post explores the unique aspects of the release implementation (relative to retain), focusing on the memory ordering requirements on ARM.
Objective-C manages memory using a reference counting approach. This post will look at the release operation, which removes a reference count from an object instance. The previous post covered the retain implementation, which adds a reference count to an object instance. The retain and release implementations are similar because they are inverse operations. I’ll refer back to the retain post, where the discussion is similar, so this post can focus on the unique aspects of retain.
Entry Points
There are two interfaces for reference counting operations: the long-standing NSObject
API and a compiler-private API used by ARC, both of which call into a core implementation. The following two subsections will examine each interface’s release implementation, and the next section will discuss the core implementation.
NSObject
Like -[NSObject retain]
, -[NSObject release]
is trivial—it simply calls _objc_rootRelease()
to release self
.
runtime/NSObject.mm
lines 2544-2546- (void)release {
_objc_rootRelease(self);
}
The term root, as discussed in the retain post, indicates the operation is occurring via a -release
message received by the root class in the object’s class hierarchy.
Next, the _objc_rootRelease
function, which is also trivial, calls objc_object::rootRelease()
.
runtime/NSObject.mm
lines 1883-1889void _objc_rootRelease(id obj) {
ASSERT(obj);
obj->rootRelease();
}
Finally, objc_object::rootRelease()
calls an overload of rootRelease
.
runtime/objc-object.h
lines 729-733bool objc_object::rootRelease() {
return rootRelease(true, RRVariant::Fast);
}
The overload called here is the core implementation, which has two parameters:
-
performDealloc
specifies whether the release operation should deallocate the object instance if the retain count reaches zero. The runtime always passestrue
for this parameter unless the_objc_rootReleaseWasZero()
SPI[1] (system programming interface for first-party use, as opposed to application programming interface for third-party use) is performing the release. -
variant
provides context about the call path, enabling the core implementation to elide unnecessary work. Releases performed throughNSObject
useRRVariant::Fast
to skip the check for whether the class has a custom reference counting implementation because the operation occurring through the root class is, by definition, not custom.
Automatic Reference Counting
When ARC is enabled, the compiler performs reference counting operations through a compiler-private API added for ARC as a performance optimization (also discussed in the retain post).
runtime/NSObject.mm
lines 1780-1786void objc_release(id obj) {
if (_objc_isTaggedPointerOrNil(obj)) return;
return obj->release();
}
If the object pointer value references an object on the heap, derived through the same ceremony as retain, the function calls objc_object::release()
to perform the release operation.
runtime/objc-object.h
lines 709-716inline void objc_object::release() {
ASSERT(!isTaggedPointer());
rootRelease(true, RRVariant::FastOrMsgSend);
}
This function calls the core implementation (though root in rootRelease
is a misnomer at this point) with:
-
true
forperformDealloc
, for the same reason discussed above in the NSObject entry point. -
RRVariant::FastOrMsgSend
forvariant
. No introspection, whether direct (see rootRelease below) or indirect (via a message send, see NSObject above), has occurred, so it’s not yet known whether the object’s class overrides any of the reference counting methods (hence the function’s name does not contain the term root).The
MsgSend
part of thevariant
instructs the core implementation to do the introspection necessary to determine whether the object’s class overrides the reference counting methods. If it does, the core implementation performs the release operation by sending the object a-release
message (which may re-enter the runtime via-[NSObject release]
).
rootRelease
The objc_object::rootRelease(bool, RRVariant)
function is on the larger side, so we’ll analyze it piece by piece.
runtime/objc-object.h
line 744if (slowpath(isTaggedPointer())) return (id)this;
Although the ARC entry point checks for a tagged pointer, the NSObject
entry point does not. It’s not immediately apparent to me why the NSObject
implementation doesn’t perform this check, but it has to happen somewhere, and in this version of the runtime, it’s here.
Next, the runtime loads the object’s isa
value[2].
runtime/objc-object.h
lines 746-750bool sideTableLocked = false;
isa_t newisa, oldisa;
oldisa = LoadExclusive(&isa().bits);
If the compiler-private API was the entry point for the release operation, the runtime must check whether the class overrides any reference counting methods[3].
runtime/objc-object.h
lines 752-764if (variant == RRVariant::FastOrMsgSend) {
// These checks are only meaningful for objc_release()
// They are here so that we avoid a re-load of the isa.
if (slowpath(oldisa.getDecodedClass(false)->hasCustomRR())) {
ClearExclusive(&isa().bits);
if (oldisa.getDecodedClass(false)->canCallSwiftRR()) {
swiftRelease.load(memory_order_relaxed)((id)this);
return true;
}
((void(*)(objc_object *, SEL))objc_msgSend)(this, @selector(release));
return true;
}
}
If a class has a custom reference counting implementation, the runtime sends the object a -release
message to fulfill the ARC-initiated release operation. Note the object may then call -[NSObject release]
, but this code block will not execute again as the variant
will be RRVariant::Fast
.
Continuing to the next block.
runtime/objc-object.h
lines 766-773if (slowpath(!oldisa.nonpointer)) {
// a Class is a Class forever, so we can perform this check once
// outside of the CAS loop
if (oldisa.getDecodedClass(false)->isMetaClass()) {
ClearExclusive(&isa().bits);
return false;
}
}
Class
objects are never deallocated and do not require reference counting. So, if the object is a class object, the function returns it without performing any further work.
Compare and Swap Loop
The compare-and-swap loop is the heart of the release implementation. It starts, perhaps unexpectedly, with a goto
label. The Full Variant subsection below discusses this function’s use of goto
.
runtime/objc-object.h
lines 775-777retry:
do {
newisa = oldisa;
The loop first sets newisa
to the current isa
value (i.e., oldisa
), which the following steps will update to reflect the decremented retain count.
Then, the loop checks if the object instance has a non-pointer isa
. If it does not, the retain count is recorded in a side table[4]. This check is performed in the loop because if this thread loses a compare-and-swap, it could be due to another thread mutating the object in a way that removed its use of a non-pointer isa
.
runtime/objc-object.h
lines 778-781 if (slowpath(!newisa.nonpointer)) {
ClearExclusive(&isa().bits);
if (tryRetain) return sidetable_tryRetain() ? (id)this : nil;
else return sidetable_retain(sideTableLocked);
}
Next, the loop checks to see if it lost another race.
runtime/objc-object.h
lines 782-788 if (slowpath(newisa.isDeallocating())) {
ClearExclusive(&isa().bits);
if (sideTableLocked) {
ASSERT(variant == RRVariant::Full);
sidetable_unlock();
}
return false;
}
An object may be deallocating while a thread is attempting to release it in (at least) two scenarios:
-
Another thread released the object, causing it to deallocate, usually due to a race condition that occurs when a process concurrently reads from and writes to a
strong
,nonatomic
property. Everything about this scenario is undefined behavior. -
Logic in
-dealloc
causes a release to be performed (e.g., the-dealloc
implementation passesself
to a clean-up routine where the ARC compiler emits a retain/release pair). This scenario is not a race condition like the case above because the release occurred on the same thread executing the deallocation.
If the _objc_rootReleaseWasZero()
SPI performed the release, the return value of false
indicates the caller should not initiate deallocation, as the object is already deallocating. The return value is otherwise unused by the NSObject
and ARC entry points.
Finally, we get to the actual decrement.
runtime/objc-object.h
lines 791-797 // don't check newisa.fast_rr; we already called any RR overrides
uintptr_t carry;
newisa.bits = subc(newisa.bits, RC_ONE, 0, &carry); // extra_rc--
if (slowpath(carry)) {
// don't ClearExclusive()
goto underflow;
}
Recall the non-pointer isa
is a bit field with three variants. The value of RC_ONE
is the bit that represents a retain count of one when viewing the bit field as an integer. The retain count is stored in the most significant bits of the isa
, so an underflow, or carry, will occur if all of the retain count bits are zero (discussed in the following subsection). Otherwise, newisa
contains the decremented retain count if no underflow occurs and is ready to be written back to the object instance.
runtime/objc-object.h
line 798} while (slowpath(!StoreReleaseExclusive(&isa().bits, &oldisa.bits, newisa.bits)));
If the value at &isa()
matches the value at &oldisa
, the compare-and-swap operation succeeds and writes the value of newisa
to &isa()
, and the loop ends.
Otherwise, the value of &isa()
has changed since this thread loaded it into oldisa
. The compare-and-swap operation fails and writes the new value at &isa()
to &oldisa
. The loop continues until the thread wins a compare-and-swap operation or another thread changes the object state to activate one of the above return paths.
After the loop ends, the runtime checks if the retain count is zero. If it is, it deallocates the object instance.
runtime/objc-object.h
lines 800-801 if (slowpath(newisa.isDeallocating()))
goto deallocate;
Otherwise, the object has a positive retain count. If necessary, the runtime will release the side table lock. The function ends by returning false
, indicating to the _objc_rootReleaseWasZero()
SPI that the object should not deallocate.
runtime/objc-object.h
lines 803-808 if (variant == RRVariant::Full) {
if (slowpath(sideTableLocked)) sidetable_unlock();
} else {
ASSERT(!sideTableLocked);
}
return false;
The Full Variant
If the retain count underflows the bits in the non-pointer isa
, the runtime reverts the changes to newisa
. Then, it checks whether any retain counts previously overflowed to the side table.
runtime/objc-object.h
lines 810-816underflow:
// newisa.extra_rc-- underflowed: borrow from side table or deallocate
newisa = oldisa; // abandon newisa to undo the decrement
if (slowpath(newisa.has_sidetable_rc)) {
If no retain counts overflowed to the side table, no retain counts remain, so the release deallocates the object instance, though I don’t think this can happen in practice as it would imply an over-release. Either there are retain counts in the side table, or the retain count reached zero and the above code path deallocated the object instance. In my opinion, it would be cleaner if the runtime trapped in this case, as the process will likely crash when -dealloc
gets called for a second time.
If retain counts did previously overflow to the side table, the runtime checks whether this function invocation has the Fast
or FastOrMsgSend
variant. If so, it stops its attempt at the release operation and passes the buck to rootRelease_underflow()
.
runtime/objc-object.h
lines 817-820 if (variant != RRVariant::Full) {
ClearExclusive(&isa().bits);
return rootRelease_underflow(performDealloc);
}
The function immediately calls back into objc_object::rootRelease(bool, RRVariant)
with the Full
variant.
runtime/NSObject.mm
lines 1379-1383NEVER_INLINE uintptr_t objc_object::rootRelease_underflow(bool performDealloc) {
return rootRelease(performDealloc, RRVariant::Full);
}
I speculated in the retain post the purpose of this function is to provide a frame in stack traces to help Apple engineers troubleshoot release crashes in the runtime, as the interplay of side table locking (which uses a non-reentrant spin lock) can be challenging to reason about.
If the release count decrement underflows with the Full
variant, the runtime obtains a side table lock.
runtime/objc-object.h
lines 822-832 // Transfer retain count from side table to inline storage.
if (!sideTableLocked) {
ClearExclusive(&isa().bits);
sidetable_lock();
sideTableLocked = true;
// Need to start over to avoid a race against the nonpointer -> raw pointer transition.
oldisa = LoadExclusive(&isa().bits);
goto retry;
}
Acquiring the side table lock may cause the thread to suspend, so the runtime first removes its exclusive monitor on the isa
address, which is required to use the exclusive monitor correctly. From the ARM Architecture Reference Manual (emphasis mine):
The exclusives support a single outstanding exclusive access for each processor thread that is executed. … If the target address of an
STREX
(store exclusive) is different from the precedingLDREX
(load exclusive) in the same thread of execution, behavior can be unpredictable. As a result, anLDREX
/STREX
pair can only be relied upon to eventually succeed if they are executed with the same address. Where a context switch… might change the thread of execution, aCLREX
instruction… must be executed to avoid unwanted effects…
After obtaining the side table lock, the runtime reloads the isa
value and starts the compare-and-swap loop again to perform the decrement. A reload of the isa
is necessary because another thread may have changed the isa
while this thread was waiting to acquire the side table lock.
Finally, if the decrement again results in an underflow, it’s safe for the runtime to load any additional retain counts from the side table.
runtime/objc-object.h
lines 834-835 // Try to remove some retain counts from the side table.
auto borrow = sidetable_subExtraRC_nolock(RC_HALF);
sidetable_subExtraRC_nolock()
returns a SidetableBorrow
struct (borrow in the sense of taking the value of higher digits in a subtraction operation, not as in leasing the values from the side table), which has two fields:
-
borrowed
: The number of retain counts taken from the side table. -
remaining
: The number of retain counts remaining in the side table.
The runtime first checks whether all the retain counts have been removed from the side table to perform additional bookkeeping later. Then, it checks whether the side table returned any retain counts. If the side table is empty, no retain counts remain, so the release will deallocate the object instance.
runtime/objc-object.h
lines 837-839 bool emptySideTable = borrow.remaining == 0; // we'll clear the side table if no refcounts remain there
if (borrow.borrowed > 0) {
If the side table returned retain counts for the object instance, the runtime attempts to update the non-pointer isa
with the retain counts taken from the side table.
runtime/objc-object.h
lines 840-846 // Side table retain count decreased.
// Try to add them to the inline count.
bool didTransitionToDeallocating = false;
newisa.extra_rc = borrow.borrowed - 1; // redo the original decrement too
newisa.has_sidetable_rc = !emptySideTable;
bool stored = StoreReleaseExclusive(&isa().bits, &oldisa.bits, newisa.bits);
The borrow.borrowed
field contains the retain counts taken from the side table. The runtime subtracts one from the count (recall this is the code path for an underflow, so the release accounting has not yet occurred) and stores the value in the non-pointer isa
's extra_rc
field. It then updates the has_sidetable_rc
bit to reflect whether the side table still has overflowed retain counts for the object instance.
It then attempts to store the new isa
value. The store may fail, which is handled by the next code block.
runtime/objc-object.h
lines 848-863 if (!stored && oldisa.nonpointer) {
// Inline update failed.
// Try it again right now. This prevents livelock on LL/SC architectures
// where the side table access itself may have dropped the reservation.
uintptr_t overflow;
newisa.bits = addc(oldisa.bits, RC_ONE * (borrow.borrowed-1), 0, &overflow);
newisa.has_sidetable_rc = !emptySideTable;
if (!overflow) {
stored = StoreReleaseExclusive(&isa().bits, &oldisa.bits, newisa.bits);
if (stored) {
didTransitionToDeallocating = newisa.isDeallocating();
}
}
}
}
If placing the retain counts taken from the side table into the non-pointer isa
fails, the runtime immediately tries again. The runtime’s StoreReleaseExclusive()
function performs the load exclusive operation if the store exclusive fails, so oldisa
is the most recent value. After subtracting one for this release operation, it adds the retain counts taken from the side table, updates the bit tracking if the object instance has retain counts in the side table, and then attempts to store the updated isa
again. This retry is likely less than 32 instructions (meeting ARM’s recommendation; see aside below) and is more likely to succeed than the general path.
If the store is successful, the runtime sets didTransitionToDeallocating
to true
if the retain count has reached zero. But this can never happen in practice, as adding RC_HALF - 1
retain counts to the non-pointer isa
just succeeded.
If the retry did not succeed (e.g., adding RC_HALF - 1
retain counts overflowed), the runtime aborts this transaction by clearing the exclusive monitor, putting the retain counts back into the side table, and reloading the non-pointer isa
before jumping back to the start of the compare-and-swap loop. It does, however, still hold the side table lock.
runtime/objc-object.h
lines 865-872 if (!stored) {
// Inline update failed. Put the retains back in the side table.
ClearExclusive(&isa().bits);
sidetable_addExtraRC_nolock(borrow.borrowed);
oldisa = LoadExclusive(&isa().bits);
goto retry;
}
If either of the store attempts succeeds, and the side table does not have any additional retain counts for the object instance, the runtime removes the entry for the object instance from the side table.
runtime/objc-object.h
lines 874-876 // Decrement successful after borrowing from side table.
if (emptySideTable)
sidetable_clearExtraRC_nolock();
Finally, if necessary, the runtime releases its side table lock and returns false to indicate the retain count did not reach zero (which is only used by the _objc_rootReleaseWasZero()
SPI). The retain count cannot reach zero on this path (see above), so the release operation ends here after a successful side table update.
runtime/objc-object.h
lines 878-882 if (!didTransitionToDeallocating) {
if (slowpath(sideTableLocked)) sidetable_unlock();
return false;
}
}
Otherwise, execution continues to the deallocation logic.
Deallocate
If the retain count reaches zero (or underflows and the object instance is not storing retain counts in a side table), the runtime deallocates[5] the object.
runtime/objc-object.h
lines 888-901deallocate:
// Really deallocate.
ASSERT(newisa.isDeallocating());
ASSERT(isa().isDeallocating());
if (slowpath(sideTableLocked)) sidetable_unlock();
__c11_atomic_thread_fence(__ATOMIC_ACQUIRE);
if (performDealloc) {
((void(*)(objc_object *, SEL))objc_msgSend)(this, @selector(dealloc));
}
return true;
First, the runtime releases the side table lock, if necessary.
Next, it has an acquire
fence. I presume this is a atomic-fence synchronization, but it’s not clear to me what release
operation this fence synchronizes with. The only potentially contentious read after the fence is of the isa
in the message send a few lines down, but this thread just set the isa
.
I would expect writes to the isa
from another thread to be undefined behavior because the retain count is zero. Such a write, though, could change the class, which would be visible to this thread because of the fence. So, as far as I can tell, the fence’s only potential effect is on the message send in quite rare and bizarre circumstances.
The fence is probably an artifact from a previous implementation that is no longer relevant, a last minute change to "fix" a memory ordering problem just before a release, an unnecessary addition, or am I misunderstanding the behavior.
Then, finally, if the release didn’t occur through the _objc_rootReleaseWasZero()
SPI, a -dealloc
message is sent to the object.
The release operation ends by returning true
, indicating the retain count has reached zero, which is ignored by every caller except the _objc_rootReleaseWasZero()
SPI.
Epilogue
When I decided to cover retain and release separately, I thought the release post would be significantly shorter than retain, but it’s 20% longer!
When the retain count overflows, the runtime keeps half of the count in the non-pointer isa
and then adds half to the side table. It can quickly perform the critical write to the non-pointer isa
and follow up with the expensive write to the side table. In contrast, when the retain count underflows, the runtime must take retain counts from the side table to gain the information necessary to perform the critical write to the non-pointer isa
. The expensive read between the exclusive load and store increases the probability that the store may fail, which creates a unique corner case the release operation must handle. Writing is a great way to learn.
With that lesson learned, I have my fingers crossed that the next post on autorelease will be more straightforward!
true
if the release results in a retain count of zero, enabling system frameworks implementing root classes to perform clean up work before deallocating the root class instance. Safari 10.1 (circa March 2017) added a use of the SPI, though the change was later reverted in Safari 11.1.
LoadExclusive
function.