Tuesday, October 21, 2014

Rational names for registers

Having published a book on 64 bit assembly language for Linux and OS X which are fairly similar and a different one for Windows which is quite different.  I have been trying to synthesize from my experience a better way to write assembly language.

One of my problems is remembering which registers are used for parameters.  Linux/OS X uses 6 and Windows uses 4 and they are not nearly the same:  rdi, rsi, rdx, rcx, r8 and r9 vs rcx, rdx, r8 and r9.  The registers are generally general purpose, but there are a few instances where certain registers are implied by an instruction like movsb which moves a byte pointed at by rsi to the location in rdi,

Another of my problems has been remembering which registers must be saved and restored if you need to use them and which are scratch registers.  The save collection is rbx and r12-r15 for all system and you add in rdi and rsi for Windows.  Why should it be such a hassle?

I have decided and implemented a plan where I have renamed the registers based on their normal usage.  The rax register I have named the accumulator register or acc.  I have left rsp and rbp as is, though I played around with renaming rbp as rfp to be the "frame pointer register" rather than the "base register".  The rest are either parameter registers, scratch registers or save registers.

So I have used %assign in an include file named "hal.inc" to define new names for the general purpose registers.  All of them follow some simple patterns unlike the use of eax to mean the lower half of rax versus r8d for the lower half of r8.  All of them use a letter after the name to establish a part of the register.  You can use accq to mean the quadword use of acc or you can omit the q.  accd means the double-word part of acc, accw means the word part of acc,  accb means the byte portion of the register and, for a few cases, adding an h gives the high byte of the register.  I don't think I've had a use for the high byte of a register yet.

The parameter registers are par1, par2, par3, par4 for all systems and you also have par5 and par7 for Linux/OS X.  For those who want to write portable code, you can test using %ifdef WINDOWS to see which protocol is in effect.  So I still need to remember that Linux and OS X have 6 registers used for parameters, though I can forget the sequence of crazy names.  I can zero out one of the parameter registers using "xor par3d, par3d" and this works for all the operating systems.  In fact if I am writing test programs I generally have no more than 4 parameters for my functions and the code is almost identical.  There is a hangup with functions receiving a variable number of parameters like printf and scanf.

The scratch registers are scr1 and scr2.  It turned out that there were 2 of these and they are r10 and r11 for Windows and Linux/OS X.  I really didn't know that until I categorized them all and started naming them.  It is really nice to be able to use scr1 or scr2 in a function as long as you don't call a function and expect to retain the values.  The name "scratch" seems about right to me.

The save registers are registers which need to be moved to memory locations before you use them in a function and later you have to restore their values from memory or risk confusing the calling functions.  Your program might do some strange things if you leave a bogus value in a save register.  The names are sav1, sav2, ..., sav5 and for Windows there are also sav6 and sav7.  Having 2 fewer registers for parameters ended with 2 more save registers.

Using my frame macro (discussed in the previous blog entry) if I want to write a function using a lot of registers I would start with

myfunction:
.sav1    equ    local1
.sav2    equ    local2
.sav3    equ    local3
.sav4    equ    local4
         push   rbp
         mov    rbp, rsp
         frame  3, 4, 3        ; prepare a frame with 4 locals
         mov    [rbp+.sav1], sav1
         mov    [rbp+.sav2], sav2
         mov    [rbp+.sav3], sav3
         mov    [rbp+.sav4], sav4

From this point forward I am safe using registers sav1-sav4 and coding is a bit easier.  I do need to restore them before returning using

         mov    sav1, [rbp+.sav1]
         mov    sav2, [rbp+.sav2]
         mov    sav3, [rbp+.sav3]
         mov    sav4, [rbp+.sav4]
         leave
         ret

I have chosen "hal" as the extension for my programs with renamed registers which stands for "Human Assembly Language".  Ebe now supports HAL by presenting the HAL registers in a special register window with the registers grouped by their normal use.  There is also a window which shows the translation from Intel to HAL for the infrequent occasions when you really must use rsi and rdi or whatever registers for an instruction with implied registers.

I have some functions which use quite a few registers and I have found it convenient to give them names which reflect the data they represent.  I created an alias macro to assist with this.  This was somewhat necessary since registers have different names to reflect the portion of the register being used in an instruction.  In this macro it became necessary to not provide a name without the "q" so that I could use my unalias macro to remove the alias.

Here are some sample aliases

        alias  Sum, sav1
        alias  N, sav2
        alias  I, sav3
        alias  Data, sav4
 
Now to use these aliases you must prefix them with q, d, w, b or h to indicate the desired portion of the register.  I have decided to use a prefix and names with an initial capital to make them easier to read.  I realize that I'm inconsistent but it looks better to me.  Using these aliases I might write a loop like

.top    mov    acc, [qData+qI*8]   ; get Data[i]
        add    qSum, acc
        inc    dI
        cmp    dI, dN
        jl     .top
   
It would be possible to simplify the code a little, but it does show that the assembly code can look more realistic with better names for the registers.  On my to-do list is to add code to ebe to show the aliased register names in the HAL register window.  Without that the code looks good, but the debugging involves a translation step which I intend to eliminate real soon now.

I have also implemented fpalias and fpunalias to do the same sort of thing for the floating point registers.  I used them all in the code which computed correlation using AVX instructions and their purpose changed from the data summing phase to the correlation computation phase, so I aliased the registers once at the start and than again later.  The code is far easier to read, though it still requires some planning to write code using all 16 registers.  The new code uses ySum_x to represent the ymmX register which holds the sum of x values.  I used 2 sets of registers in the summing phase so I also had ySum_x2.  That gave me 8 partial sums of x values and a handful of others.  This adds one more thing to my to-do list.

I am pretty pleased with the HAL register renaming along with the frame macro.  I am hoping that some other people agree with me that it is OK to learn Intel/AMD assembly language without the strange names.


No comments: