Dec 12 2014

Swift Method Dispatching — a summary of my talk at Swift Warsaw

Two weeks ago, I attended Swift Warsaw as a speaker, holding a presentation on “Swift Runtime — Swift Method Dispatching”. I promised to summarise the talk in an article, so hereby I keep my promise.

Swift Method Dispatching #

When announcing Swift, Apple described it as being much faster than Objective-C. On the web, there is a number of comparsions juxtaposing speed of both languages. In my opinion, the way Swift dispatches method invocations has the biggest impact on its performance.

Leaving the assessment of method dispatch performance aside, let’s focus on understanding how it works in Swift.

Before looking into the Swift method dispatching, it is worth making a short recollection of the Objective-C method dispatching.

Objective-C Method Dispatching #

During the compilation process, Clang translates each method invocation into a call to the objc_msgSend function (or one of its variations), passing an object, method selector and parameters as arguments.

For example, the [object message: param] invocation gets translated into the following call: objc_msgSend(object, @selector(message:), param, nil).

Basically, the objc_msgSend functions family handles method dispatching in Objective-C. These functions are implemented in the assembly language and opensourced. Let’s not go deeper into Apple sources, but instead try to understand how the objc_msgSend function works by looking at the following pseudo-code:

id objc_msgSend ( id obj, SEL op, ... )
{
    Class c = object_getClass(obj);
    IMP imp = CacheLookup(c, op);
    if (!imp) {
        imp = class_getMethodImplementation(c, op);
    }
    jump imp(obj, op, ...);
}

First, it obtains a class of the passed obj object (this runs in constant time).

    Class c = object_getClass(obj);

Then, it tries to look for a method implementation in a class cache.

    IMP imp = CacheLookup(c, op);

Finding a selector in the cache is a relatively fast operation — as fast as a look-up in the hashmap. However, in the case of cache miss, the program has to take a slower path and call class_getMethodImplementation, which looks for implementation by scanning all the selectors defined in the class and its ancestors, down to the root of the inheritance tree.

    if (!imp) {
        imp = class_getMethodImplementation(c, op);
    }

This method is relatively slow.

Finally, objc_msgSend cleans a stack frame and jumps directly to the method implementation, passing the object, method selector and parameters (that is why objc_msgSend is often called a trampoline rather than a function).

    jump imp(obj, op, ...);

Before looking at the Swift method dispatching, let’s get familiar with two important notions.

Name Mangling #

Swift allows a programmer to define many entities with the same name, e.g. the same class names in different modules or the same method names in different classes. Moreover, Swift allows method overloading. Linker resolves external references by symbols names, when combining object files. That is why, the compiler needs to produce unique symbol names and does it by encoding every entity name. This process of name encoding is called name mangling.

Let’s take a look at an example of Swift method signatures and corresponding mangled symbol names:

Matrix.AIProgram.speak (Matrix.AIProgram)(Swift.String) -> ()
Matrix.AIProgram.speak (Matrix.AIProgram)(Swift.Int) -> ()

They are defined in the same class and have the same name, so we deal with a simple example of method overloading. It is worth noting, that Swift method takes an object instance as a first argument to use it as a self pointer (you can read more about this on Ole Begemann’s blog).

Now, take a look at symbol names being a result of mangling the above signatures:

_TFC6Matrix9AIProgram5speakfS0_FSST_
_TFC6Matrix9AIProgram5speakfS0_FSiT_

The first method takes a string type parameter marked as S in the first symbol. The second method takes an int type parameter marked as i in the second symbol.

Obviously, the above should not give any reason for concern in the everyday development work. But when there is a need, a symbol can be easily demangled using this handy command passing the symbol as a parameter:

xcrun swift-demangle SYMBOL_NAME

If you would like to read more about method mangling, there is no better article than the one on the NSBlog.

Before diving into the Swift method dispatching, let’s take a look at one more thing. Namely, let’s see how Swift compiler translates a code into a binary executable.

Compilation #

Swift compilation consists of several phases.

At first, a Swift Front End translates a Swift code into a high-level, platform agnostic SIL (Swift Intermediate Language). Everyone can examine SIL using the swiftc --emit-sil command. Secondly, a SIL Optimizer takes SIL, optimises it on a high-level of abstraction and provides an output in the IR (Intermediate Representation) format, which is low-level, yet platform agnostic. IR can be examined using the swift --emit-ir command. Then, IR is optimized. In the last phase, a Code Generator uses the optimised IR to generate a machine code. Anyone can view the output of this phase in the assembly language by using the swift -S command.

Again, all you need to be aware of is that there are a couple of phases and that you can use an early phase output to analyse a program code. If you wish to read more about Swift compilation, please refer to John Siracusa’s article.

Now, let’s get to the the point…

Virtual Method Table #

Virtual Method Table is a mechanism used in Swift and many other languages to support run time method binding. We are going to investigate what it looks like in Swift.

Let’s take the following two classes:

// module Test

class Agent {
  func punch()
  func kick()
  func jump()
  final func block()
}

class Smith : Agent {
  override func jump()
  func laugh()
}

They present an Agent class able to perform some basic Kung Fu moves: punch, kick, jump and block. For the sake of this example, let’s assume that the Agent’s defence mechanism cannot be overridden (block method declared as final). Then, we have an agent Smith subclass, which overrides the jump method — let’s assume that Smith jumps a bit differently than other agents. Furthermore Smith can laugh (evilly but still).

Now, let’s take a look at the following snippet from the generated SIL code:

sil_vtable Agent {
  #Agent.punch!1: _TFC4Test5Agent5punchfS0_FT_T_  // Test.Agent.punch (Test.Agent)() -> ()
  #Agent.kick!1: _TFC4Test5Agent4kickfS0_FT_T_  // Test.Agent.kick (Test.Agent)() -> ()
  #Agent.jump!1: _TFC4Test5Agent4jumpfS0_FT_T_  // Test.Agent.jump (Test.Agent)() -> ()
  #Agent.init!initializer.1: _TFC4Test5AgentcfMS0_FT_S0_  // Test.Agent.init (Test.Agent.Type)() -> Test.Agent
}

sil_vtable Smith {
  #Agent.punch!1: _TFC4Test5Agent5punchfS0_FT_T_  // Test.Agent.punch (Test.Agent)() -> ()
  #Agent.kick!1: _TFC4Test5Agent4kickfS0_FT_T_  // Test.Agent.kick (Test.Agent)() -> ()
  #Agent.jump!1: _TFC4Test5Smith4jumpfS0_FT_T_  // Test.Smith.jump (Test.Smith)() -> ()
  #Agent.init!initializer.1: _TFC4Test5SmithcfMS0_FT_S0_  // Test.Smith.init (Test.Smith.Type)() -> Test.Smith
  #Smith.laugh!1: _TFC4Test5Smith5laughfS0_FT_T_  // Test.Smith.laugh (Test.Smith)() -> ()
}

A bit more transparent view may be of help:

sil_vtable Agent {
  #Agent.punch!1: Test.Agent.punch
  #Agent.kick!1: Test.Agent.kick
  #Agent.jump!1: Test.Agent.jump
  #Agent.init!initializer.1: Test.Agent.init
}

sil_vtable Smith {
  #Agent.punch!1: Test.Agent.punch
  #Agent.kick!1: Test.Agent.kick
  #Agent.jump!1: Test.Smith.jump
  #Agent.init!initializer.1: Test.Smith.init
  #Smith.laugh!1: Test.Smith.laugh
}

The above shows that the SIL virtual method table is basically a dictionary that maps method names to their implementations (function pointers). In the Agent class vtable, every method maps to its original implementation from the same class. The vtable of the Smith subclass starts with all the methods defined in the Agent class, and finishes with methods declared in the Smith subclass. The jump method was overridden in the Smith subclass and so it is clearly visible in the second vtable that the Agent.jump method name maps to the Test.Smith.jump implementation.

The block method mapping cannot be seen — in fact, no mapping is necessary, because the method is declared as final and it has only one well-known implementation.

Let’s get into details by skipping IR and looking directly into the Assembly. Here is a code snippet from the output of the swiftc -S command:

.globl  __TMdC4Test5Agent
.quad   __TFC4Test5AgentD
.quad   __TWVBo
.quad   __TMmC4Test5Agent
/* ... */
.quad   __TFC4Test5Agent5punchfS0_FT_T_
.quad   __TFC4Test5Agent4kickfS0_FT_T_
.quad   __TFC4Test5Agent4jumpfS0_FT_T_
.quad   __TFC4Test5AgentcfMS0_FT_S0_

.globl  __TMdC4Test5Smith
.quad   __TFC4Test5SmithD
.quad   __TWVBo
.quad   __TMmC4Test5Smith
/* ... */
.quad   __TFC4Test5Agent5punchfS0_FT_T_
.quad   __TFC4Test5Agent4kickfS0_FT_T_
.quad   __TFC4Test5Smith4jumpfS0_FT_T_
.quad   __TFC4Test5SmithcfMS0_FT_S0_
.quad   __TFC4Test5Smith5laughfS0_FT_T_

The above snippet shows some similarity to SIL vtables. The first line presents .globl __TMdC4Test5Agent — a declaration of a global symbol for the direct type metadata of the Agent class, followed only by a set of pointers. Then, there is .globl __TMdC4Test5Smith — a declaration of a global symbol for the direct type metadata of the Smith subclass, followed by another set of pointers. Basically, direct type metadata has been defined as an array of pointers, so… what happened to the dictionary-like structure of the vtable? Let’s look at the test method code to see what is going on:

func agentKungFuTest(agent : Agent)
{
  agent.jump()
  agent.punch()
  agent.kick()
  agent.block()
}

The above method is used by the Matrix to make a simple test of the Agent’s Kung Fu abilities. It takes an Agent instance and invokes its jump, punch, kick and block methods. Now, let’s skip SIL and IR and go directly to the assembly of this method:

.globl  __TF4Test15agentKungFuTestFCS_5AgentT_
movq  (%rdi), %rcx
callq *0x58(%rcx)
callq *0x48(%rcx)
callq *0x50(%rcx)
callq __TFC4Test5Agent5blockfS0_FT_T_
retq

I must warn you that I have cleaned the above listing a little bit. I have removed code lines which run in constant time and are not significant for this investigation (that is ARC code and local variables code). But this is still impressive — Swift code maps almost directly to processor instructions! So what is there left? The first line contains just a global function symbol declaration. Look at the second line. rdi is a register in the Intel x86-64 architecture that usually holds a value of a function’s first argument. In our case, a value of the first argument is a pointer to the Agent compatible instance. A Swift instance is a structure and its very first field is a pointer to its metadata, so the assembly code can obtain a pointer to the class metadata basically by dereferencing the instance pointer.

In the test function assembly, there are also three function calls to some computed addresses and one function call to a well-known address. Remember that the block method has been marked final, so there is no need to use vtable — a direct call to its implementation is sufficient.

To clarify the code above, let’s look at it in the form of a pseudo-code:

void agentKungFuTest(void *agent) {
  void *metadata = *agent
  void (*jump)(void*) = (metadata + 0x58)
  jump(agent)
  void (*punch)(void*) = (metadata + 0x48)
  punch(agent)
  void (*kick)(void*) = (metadata + 0x50)
  kick(agent)
  Test.Agent.block(agent)
}

It does not compile, but it makes the situation more clear. In the second line, an argument pointer is being dereferenced and casted to a metadata pointer (metadata will serve as vtable). Then, it adds an offset of 0x58 to the metadata pointer and dereferences it. Wait a minute… let’s look back at the direct type metadata in the assembly! (I have just added some exemplary file offsets in the left column):

.globl  __TMdC4Test5Agent
0x000021F0  .quad   __TFC4Test5AgentD
0x000021F8  .quad   __TWVBo
0x00002200  .quad   __TMmC4Test5Agent                  // metadata ptr
/* ... */
0x00002248  .quad   __TFC4Test5Agent5punchfS0_FT_T_    // +0x48
0x00002250  .quad   __TFC4Test5Agent4kickfS0_FT_T_     // +0x50
0x00002258  .quad   __TFC4Test5Agent4jumpfS0_FT_T_     // +0x58
0x00002260  .quad   __TFC4Test5AgentcfMS0_FT_S0_       // +0x60

.globl  __TMdC4Test5Smith
0x000023F0  .quad   __TFC4Test5SmithD
0x000023F8  .quad   __TWVBo
0x00002400  .quad   __TMmC4Test5Smith                  // metadata ptr
/* ... */
0x00002448  .quad   __TFC4Test5Agent5punchfS0_FT_T_    // +0x48
0x00002450  .quad   __TFC4Test5Agent4kickfS0_FT_T_     // +0x50
0x00002458  .quad   __TFC4Test5Smith4jumpfS0_FT_T_     // +0x58
0x00002460  .quad   __TFC4Test5SmithcfMS0_FT_S0_       // +0x60
0x00002468  .quad   __TFC4Test5Smith5laughfS0_FT_T_    // +0x68

It may appear strange and probably be an implementation detail, but an object’s metaclass pointer points to the third element of its direct type metadata. Don’t be concerned about it — this is irrelevant in this investigation. Important is that the agentKungFuTest() function code accesses pointer at offset of 0x58 bytes from that place, that is:

__TFC4Test5Smith4jumpfS0_FT_T_ - Test.Smith.jump in case of a Smith instance
__TFC4Test5Agent4jumpfS0_FT_T_ - Test.Agent.jump in case of an Agent instance

Look at the agentKungFuTest pseudo-code:

  void (*jump)(void*) = (metadata + 0x58)
  jump(agent)

The jump variable points to the Test.Smith.jump implementation and this implementation is simply called.

Let’s take a look at the metadata again. It becomes obvious that the vtable dictionary-like structure has not been lost. It has just morphed into a form in which a mapping key is defined as an offset in the metadata. Something like that:

__TMdC4Test5Smith
0x48 => _TFC4Test5Agent5punchfS0_FT_T_
0x50 => _TFC4Test5Agent4kickfS0_FT_T_
0x58 => _TFC4Test5Agent4jumpfS0_FT_T_
0x60 => _TFC4Test5AgentcfMS0_FT_S0_

__TMdC4Test5Smith
0x48 => _TFC4Test5Agent5punchfS0_FT_T_
0x50 => _TFC4Test5Agent4kickfS0_FT_T_
0x58 => _TFC4Test5Smith4jumpfS0_FT_T_
0x60 => _TFC4Test5SmithcfMS0_FT_S0_
0x68 => _TFC4Test5Smith5laughfS0_FT_T_
}

Optimizations #

All the SIL and assembly code listings in this article were produced without the optimization -O compiler flag. This is because the purpose was to find out how Swift dispatches methods in the worst case scenario. But you should be aware that the use of the -O flag can drastically change the final machine code:

non-final methods can be invoked by direct function calls, when the compiler knows the exact class of a target instance,
compiler can even skip a function call and inline a method implementation in place of the method invocation,

so the final code can be even faster :)

Summary #

Let’s wrap up! That was a long journey through the depths of Swift. You saw that Swift uses vtables for method dispatching. Because of that, method dispatching in Swift is much simpler and faster — so more battery saving. Unfortunately, in the case of a regular app, the speed gain will probably be insignificant, unless the app does some complex computations.

By its very definition, vtable dispatch has one big disadvantage — it lacks dynamism so commonly used by Objective-C programmers and in Cocoa frameworks. If you decide to code in Swift, you will probably end up mixing in some Objective-C.

mobile swift runtime vtable virtual method table dispatching

Swift Method Dispatching — a summary of my talk at Swift Warsaw