Friday, September 28, 2012

Assembly Language: Linux vs OS X

I have had a Mac Mini computer for roughly a month and I have learned enough to do fairly well with 64 bit assembly language programming.  Both Linux and OS X use the System V AMD64 Application Binary Interface which makes 64 bit assembly programming very similar between the two systems.  By contrast Windows uses the Microsoft x64 ABI.  Despite using the same ABI there are a couple of differences which make it difficult to write portable assembly language programs for Linux and OS X.


Naming conventions



The first difference between Linux and OS X is a fairly superficial difference.  Global names in assembly language under OS X have an underscore prefix (compared to C) while the names are the same for C and assembly under Linux.  This means that you can use main as the name of your main routine in C and assembly under Linux while you must use _main under OS X.  This is a trivial difference which could be accommodated with macros allowing portable coding.

Interestingly both operating systems allow more fundamental starting functions named _start under Linux and start under OS X.   For a C program the C library provides a start (or _start) function which calls _main (or main).  The interesting part of this is that the underscore is used for the Linux function here and for all other functions under OS X.


RIP relative addressing


The x86-64 architecture introduced a new addressing mode called "RIP relative", which means that the address field (32 bits) of instructions can be interpreted as an offset from the current instruction pointer. By having the addresses for data and code at similar addresses it is possible to accommodate large virtual addresses with moderately small offsets.

For Linux RIP relative addressing is not important.  The addresses for the data and text segments are all 32 bit addresses, allowing the 32 bit field to store the actual address of a data item or an instruction.  These small addresses make the coding simpler.

By contrast OS X uses addresses above 0x200000000 for the code and text segments.  Thus the virtual addresses of data items and instructions do not fit into 32 bits and RIP relative addressing can overcome this problem.  There are multiple ways to indicate RIP relative addressing.  One is to add "wrt rip" to an address field of an instruction.  The method I prefer is to use "default rel" which indicates that the default addressing mode is RIP relative.

The addition of "default rel" almost takes care of the difference.  For simple data access it is adequate.  For example

     mov    rax,[a]

will move the quadword at address a to register rax in both Linux and OS X.

There are problems beyond that issue.  Under Linux it is possible to use

     mov     rax, a

to move the address of a to register rax.  This does not work under OS X because the address of a does not fit in the 32 bit address field.

The solution to this issue is portable.  You can use "load effective address" rather than move:

    lea     rax, [a]

This instruction does the job properly on both systems.

A more difficult issue is that RIP relative addressing can't be used with indexing.  So if you attempt

    mov     rax, [data+rdi*8]

to index an array of quadwords, it works with Linux but not OS X.  This requires a little more effort:

    lea     rbx, [data]
    mov     rax, [rbx+rdi*8]

Overall it is still possible to adopt the OS X style and achieve nearly portable assembly programming.  A careful assembly coder could develop a set of macros which hide the effect of underscores in names and the existence of RIP relative addressing and, by always coding for RIP relative addressing, achieve portable assembly coding.

The differences between System V AMD64 ABI and the Microsoft x64 ABI involve differences to register usage in function call semantics which make for more differences.  Perhaps a clever person could still write portable assembly code, but I doubt that the benefit would be worth the effort.


1 comment:

Anonymous said...

Nasm has a handy --prefix option, so you can write your assembly in the linux fashion and for osx targets, provide --prefix _ to take care of the naming convention differences.

Thanks for this blog though, I'm finding it interesting learning the subtle (sometimes not so!) differences between the different operating systems from an assembly perspective.

Thanks
Aaron.