When I hear hello-world
I imagine a trivial one-line program. However, things that are actually
happening under the hood are far from being trivial: memory allocation, register and stack
manipulation, and interop with an OS kernel among others. So I figured that going a couple layers
down the stack to the bare assembly in order to build a minimal working program (and a couple more
complex ones) might be a fun excercise.
This is the first post about assembly on MacOS X, that covers “getting started”, calling conventions and system calls. I plan to write another post with more practical examples later on.
Preliminaries
First things first. I’ll be using nasm
and hence Intel’s assembly dialect. The code will be
written for 64-bit1 MacOS X. Porting the code to Linux should be quite straightforward as the
only thing that’s different is system call numbers.
nasm
can be installed easily with a package manager like brew
:
The second part of the toolchain is the linker – ld
– which becomes available after installing
XCode command line tools:
To make sure everything’s set, the following commands should produce a legible output:
NASM version 2.14 compiled on Nov 8 2018
@(#)PROGRAM:ld PROJECT:ld64-409.12
BUILD 17:47:51 Sep 25 2018
An awkward no-op in assembly, or how to trigger a bus error
Let’s start with a program that literally does nothing. As a first cut, we could try to only define a code segment with a global symbol that will be an entry point to our program and a single “do-nothing” instruction:
We can use the following script to compile and link this program:
nasm -f macho64 -o build64/do-nothing-incomplete.o do-nothing-incomplete.asm && \
ld -static -e _main -o bin/do-nothing-incomplete -macosx_version_min 10.13.0 \
build64/do-nothing-incomplete.o
nasm -f macho64
tells NASM to produce a 64bit Mach-O object file, which is then transformed into
a statically linked executable using ld -static
stanza; -e _main
tells a linker the name of
the symbol which will serve as an entry point to the program.
During execution, however, the program fails with an error:
[1] 70339 bus error bin/do-nothing-incomplete
This may seem odd as we don’t access memory. So where this bus error came from?
A short session in debugger reveals the following:
(lldb) r
Process 70526 launched: 'bin/do-nothing-incomplete' (x86_64)
Process 70526 stopped
* thread #1, stop reason = EXC_BAD_ACCESS (code=2, address=0x2000)
frame #0: 0x0000000000002000 do-nothing-incomplete
-> 0x2000: add al, byte ptr [rax]
0x2002: add byte ptr [rax], al
0x2004: add eax, dword ptr [rcx]
0x2006: adc byte ptr [rax], al
Target 0: (do-nothing-incomplete) stopped.
(lldb) dis -s 0x1fff
do-nothing-incomplete`main:
0x1fff <+0>: nop
0x2000: add al, byte ptr [rax]
0x2002: add byte ptr [rax], al
0x2004: add eax, dword ptr [rcx]
0x2006: adc byte ptr [rax], al
0x2008: add byte ptr [rax], dl
0x200a: add byte ptr [rax], al
0x200c: add byte ptr [rax], al
0x200e: add byte ptr [rax], al
(lldb) p/x $rax
(unsigned long) $0 = 0x0000000000000000
dis -s 0x1fff
shows our nop
command followed by some other commands. Since we didn’t
explicitly signal to the OS that the program is ready to terminate with an exit
system call, a
processor started executing whatever was next in memory, and because rax
was zero the
instruction add al, byte ptr [rax]
tried to access zero byte in memory and failed with a bus
error.
The cause of the problem is clear. But where did those other instructions come from? Let’s peek
into the structure of our executable with otool
. Command otool -l
prints the load commands,
which is pretty much a Table of Contents of our binary:
Load command 2
cmd LC_SEGMENT_64
cmdsize 72
segname __LINKEDIT
vmaddr 0x0000000000002000
vmsize 0x0000000000001000
fileoff 4096
filesize 64
maxprot 0x00000007
We can see that at address 0x2000
starts __LINKEDIT
segment, which is something that should
not be interpreted and executed as code. Thus a well-behaved program needs to signal its exit to
the OS, and we, in turn, need to talk a bit about system calls and calling conventions.
System calls and SysV ABI
As long as a program is doing its own things, it can use whatever registers it wants and do whatever it wants with the stack. But collaboration with OS and other libraries requires a common set of rules that all parties need to adhere to. Such a set of rules is called “Application Binary Interface” or ABI for short.
Modern 64-bit Linux and MacOS X systems follow System V ABI2. The set of rules is very extensive, but for this post we’ll only need the following:
- For regular calls integer arguments are passed in registers
rdi, rsi, rdx, rcx, r8, r9
in the specified order. - For system calls register
r10
is used instead ofrcx
, while values in registersrcx
andr11
are clobbered by the kernel. The number of the system call to invoke is passed inrax
. - An integer result of the call is returned in registers
rax
orrax + rdx
depending on the size of the returned value. - Registers
rbx, rbp, rsp, r12-r15
are preserved across function calls. In other words, when you want to use these registers in your subroutine, you need to save them on the stack and restore their values before returning to the caller. - The stack should be 16 bytes aligned before the call.
The last rule is not strictly necessary for all calls. You can write and call your own subroutine and most likely things will be fine without stack alignment. However, the situation is different when calling external subroutines, which may in turn call operations that require memory alignment of their operands; we’ll get back to it later.
We know how to pass arguments and invoke a system call with a particular number3. What is missing is a list of system calls for OS X kernel.
I couldn’t find such a list in official documentation. There is a list for 32-bit4, but
unfortunately it doesn’t work out of the box. Luckily someone on the internet wrote that a syscall
number for 64-bit assembly is number-in-the-list
+ 0x0200_0000
. Thus an exit
syscall which
is number 1 in the list, will have number 0x0200_0001
in assembly.
A proper noop in assembly
Now we know how to do system calls, so we can properly trigger an exit, and return a meaningful
exit code of 42
.
SECTION .text
GLOBAL start ; start is the default name of the entry point
start:
mov rdi, 42 ; first parameter "exit code" = 42
mov rax, 0x0200_0001 ; exit syscall number in rax = 0x02000000 + 1
syscall
This time around we don’t need to specify the name of the entry point to the linker, as we went with the default. The code compiles and links without issues:
nasm -f macho64 -o build64/do-nothing-complete.o do-nothing-complete.asm && \
ld -static -o bin/do-nothing-complete build64/do-nothing-complete.o
and the resulting binary produces the expected result:
42
Actually, we could omit explicitly providing an exit code and just invoke a system call with
whatever is in rdi
at the moment. This way it’s possible to make a “do-nothing” program even
smaller.
nasm -f macho64 -o build64/do-nothing-minimal.o do-nothing-minimal.asm && \
ld -static -o bin/do-nothing-minimal build64/do-nothing-minimal.o
And although SysV ABI states in Section 3.4.1 that the state of rdi
is undefined, the kernel
zeroes it out, so we get a proper 0 exit code:
0
What’s next
So far, I’ve covered only statically linked self-contained executables. Although interesting, it’s not very practical (to the extent one can consider programming in assembly practical). So far, I plan to write one more post and cover the following topics:
- dynamic linking with C runtime,
- using external symbols from linked libraries,
- 16-byte stack alignment, SSE instructions and segfaults,
- accessing
argc
andargv
, - jumps, loops, etc.
But that’s for 2019.
Footnotes
That fact that it’s 64-bit is important as the calling conventions for 32-bit code on MacOS are quite different.↩︎
Latest SysV ABI PDFs can be found on this page on Github.↩︎
In 64-bit assembly a syscall is triggered with a
syscall
instruction.↩︎Relevant lines from this source file.↩︎