Naming conventions
The first difference between Linux and OS X is a fairly superficial difference. Global names in assembly language under OS X have an underscore prefix (compared to C) while the names are the same for C and assembly under Linux. This means that you can use main as the name of your main routine in C and assembly under Linux while you must use _main under OS X. This is a trivial difference which could be accommodated with macros allowing portable coding.
Interestingly both operating systems allow more fundamental starting functions named _start under Linux and start under OS X. For a C program the C library provides a start (or _start) function which calls _main (or main). The interesting part of this is that the underscore is used for the Linux function here and for all other functions under OS X.
RIP relative addressing
The x86-64 architecture introduced a new addressing mode called "RIP relative", which means that the address field (32 bits) of instructions can be interpreted as an offset from the current instruction pointer. By having the addresses for data and code at similar addresses it is possible to accommodate large virtual addresses with moderately small offsets.
For Linux RIP relative addressing is not important. The addresses for the data and text segments are all 32 bit addresses, allowing the 32 bit field to store the actual address of a data item or an instruction. These small addresses make the coding simpler.
By contrast OS X uses addresses above 0x200000000 for the code and text segments. Thus the virtual addresses of data items and instructions do not fit into 32 bits and RIP relative addressing can overcome this problem. There are multiple ways to indicate RIP relative addressing. One is to add "wrt rip" to an address field of an instruction. The method I prefer is to use "default rel" which indicates that the default addressing mode is RIP relative.
The addition of "default rel" almost takes care of the difference. For simple data access it is adequate. For example
mov rax,[a]
will move the quadword at address a to register rax in both Linux and OS X.
There are problems beyond that issue. Under Linux it is possible to use
mov rax, a
to move the address of a to register rax. This does not work under OS X because the address of a does not fit in the 32 bit address field.
The solution to this issue is portable. You can use "load effective address" rather than move:
lea rax, [a]
This instruction does the job properly on both systems.
A more difficult issue is that RIP relative addressing can't be used with indexing. So if you attempt
mov rax, [data+rdi*8]
to index an array of quadwords, it works with Linux but not OS X. This requires a little more effort:
lea rbx, [data]
mov rax, [rbx+rdi*8]
Overall it is still possible to adopt the OS X style and achieve nearly portable assembly programming. A careful assembly coder could develop a set of macros which hide the effect of underscores in names and the existence of RIP relative addressing and, by always coding for RIP relative addressing, achieve portable assembly coding.
The differences between System V AMD64 ABI and the Microsoft x64 ABI involve differences to register usage in function call semantics which make for more differences. Perhaps a clever person could still write portable assembly code, but I doubt that the benefit would be worth the effort.
1 comment:
Nasm has a handy --prefix option, so you can write your assembly in the linux fashion and for osx targets, provide --prefix _ to take care of the naming convention differences.
Thanks for this blog though, I'm finding it interesting learning the subtle (sometimes not so!) differences between the different operating systems from an assembly perspective.
Thanks
Aaron.
Post a Comment