Having published a book on 64 bit assembly language for Linux and OS X which are fairly similar and a different one for Windows which is quite different. I have been trying to synthesize from my experience a better way to write assembly language.
One of my problems is remembering which registers are used for parameters. Linux/OS X uses 6 and Windows uses 4 and they are not nearly the same: rdi, rsi, rdx, rcx, r8 and r9 vs rcx, rdx, r8 and r9. The registers are generally general purpose, but there are a few instances where certain registers are implied by an instruction like movsb which moves a byte pointed at by rsi to the location in rdi,
Another of my problems has been remembering which registers must be saved and restored if you need to use them and which are scratch registers. The save collection is rbx and r12-r15 for all system and you add in rdi and rsi for Windows. Why should it be such a hassle?
I have decided and implemented a plan where I have renamed the registers based on their normal usage. The rax register I have named the accumulator register or acc. I have left rsp and rbp as is, though I played around with renaming rbp as rfp to be the "frame pointer register" rather than the "base register". The rest are either parameter registers, scratch registers or save registers.
So I have used %assign in an include file named "hal.inc" to define new names for the general purpose registers. All of them follow some simple patterns unlike the use of eax to mean the lower half of rax versus r8d for the lower half of r8. All of them use a letter after the name to establish a part of the register. You can use accq to mean the quadword use of acc or you can omit the q. accd means the double-word part of acc, accw means the word part of acc, accb means the byte portion of the register and, for a few cases, adding an h gives the high byte of the register. I don't think I've had a use for the high byte of a register yet.
The parameter registers are par1, par2, par3, par4 for all systems and you also have par5 and par7 for Linux/OS X. For those who want to write portable code, you can test using %ifdef WINDOWS to see which protocol is in effect. So I still need to remember that Linux and OS X have 6 registers used for parameters, though I can forget the sequence of crazy names. I can zero out one of the parameter registers using "xor par3d, par3d" and this works for all the operating systems. In fact if I am writing test programs I generally have no more than 4 parameters for my functions and the code is almost identical. There is a hangup with functions receiving a variable number of parameters like printf and scanf.
The scratch registers are scr1 and scr2. It turned out that there were 2 of these and they are r10 and r11 for Windows and Linux/OS X. I really didn't know that until I categorized them all and started naming them. It is really nice to be able to use scr1 or scr2 in a function as long as you don't call a function and expect to retain the values. The name "scratch" seems about right to me.
The save registers are registers which need to be moved to memory locations before you use them in a function and later you have to restore their values from memory or risk confusing the calling functions. Your program might do some strange things if you leave a bogus value in a save register. The names are sav1, sav2, ..., sav5 and for Windows there are also sav6 and sav7. Having 2 fewer registers for parameters ended with 2 more save registers.
Using my frame macro (discussed in the previous blog entry) if I want to write a function using a lot of registers I would start with
myfunction:
.sav1 equ local1
.sav2 equ local2
.sav3 equ local3
.sav4 equ local4
push rbp
mov rbp, rsp
frame 3, 4, 3 ; prepare a frame with 4 locals
mov [rbp+.sav1], sav1
mov [rbp+.sav2], sav2
mov [rbp+.sav3], sav3
mov [rbp+.sav4], sav4
From this point forward I am safe using registers sav1-sav4 and coding is a bit easier. I do need to restore them before returning using
mov sav1, [rbp+.sav1]
mov sav2, [rbp+.sav2]
mov sav3, [rbp+.sav3]
mov sav4, [rbp+.sav4]
leave
ret
I have chosen "hal" as the extension for my programs with renamed registers which stands for "Human Assembly Language". Ebe now supports HAL by presenting the HAL registers in a special register window with the registers grouped by their normal use. There is also a window which shows the translation from Intel to HAL for the infrequent occasions when you really must use rsi and rdi or whatever registers for an instruction with implied registers.
I have some functions which use quite a few registers and I have found it convenient to give them names which reflect the data they represent. I created an alias macro to assist with this. This was somewhat necessary since registers have different names to reflect the portion of the register being used in an instruction. In this macro it became necessary to not provide a name without the "q" so that I could use my unalias macro to remove the alias.
Here are some sample aliases
alias Sum, sav1
alias N, sav2
alias I, sav3
alias Data, sav4
Now to use these aliases you must prefix them with q, d, w, b or h to indicate the desired portion of the register. I have decided to use a prefix and names with an initial capital to make them easier to read. I realize that I'm inconsistent but it looks better to me. Using these aliases I might write a loop like
.top mov acc, [qData+qI*8] ; get Data[i]
add qSum, acc
inc dI
cmp dI, dN
jl .top
It would be possible to simplify the code a little, but it does show that the assembly code can look more realistic with better names for the registers. On my to-do list is to add code to ebe to show the aliased register names in the HAL register window. Without that the code looks good, but the debugging involves a translation step which I intend to eliminate real soon now.
I have also implemented fpalias and fpunalias to do the same sort of thing for the floating point registers. I used them all in the code which computed correlation using AVX instructions and their purpose changed from the data summing phase to the correlation computation phase, so I aliased the registers once at the start and than again later. The code is far easier to read, though it still requires some planning to write code using all 16 registers. The new code uses ySum_x to represent the ymmX register which holds the sum of x values. I used 2 sets of registers in the summing phase so I also had ySum_x2. That gave me 8 partial sums of x values and a handful of others. This adds one more thing to my to-do list.
I am pretty pleased with the HAL register renaming along with the frame macro. I am hoping that some other people agree with me that it is OK to learn Intel/AMD assembly language without the strange names.
Tuesday, October 21, 2014
Displaying stack frames in ebe
It is now possible to display the current function's relevant stack data in a stack frame window in ebe. To do this use the new frame macro in the start of a function. For example let's suppose you want to use 3 local variables in your main function and call some function with 4 parameters. Then you should start main like this
main:
push rbp
mov rbp, rsp
frame 2, 3, 4 ; main has 2 parameters, will use 3 locals, and
; will call functions with a maximum of 4 parameters
sub rsp, frame_size
The frame macro does not generate any instructions. However it does determine frame_size, which could be 0 in functions with no local variables. Ebe keeps track of the lines after the frame macro and builds a table in the stack frame window from that point forward in the code. It will be accurate only after the subtraction of frame_size.
With Linux and OS X the frames are generally pretty simple. Also the use of 6 register parameters means that a lot of functions will require no space on the stack for parameters to the current function or to call functions.
Here is the stack from using 2, 3, and 4 for Linux:
mov [rbp+local1], rdi
mov [rbp+local2], rsi
You may have noticed the last row in the table which could have been label4. This space is there so that the functions which you call can keep the stack on 16 byte boundaries. The value of frame_size will always be a multiple of 16.
Now suppose you have a function which receives 9 parameters, has 2 locals and the maximum number of parameters for called functions is 5. Then the current function will have 3 parameters on the stack. The frame call would be
frame 9, 2, 3
The stack frame for this function would look like this under Linux or OS X.
Here we see an empty row at the top of the table which the calling function had to prepare for. Before making the call it placed parameters 7, 8 and 9 on the stack as shown as currPar7-9. You can also see the addressing for parameter 7 is [rbp+16] or more simply stated as [rbp+currPar7].
Similarly you can prepare to call a function with 9 parameters from a function with 2 locals and 2 parameters using
frame 2, 2, 9
Again you see an empty row in the table above newPar9 so that rsp will remain a multiple of 16. You can place values into these stack-based parameters using [rsp+newPar7]. Note that the recommendation is to use rsp for the new parameters since they are basically at positive offsets from rsp.
The main goal for ebe is to make assembly programming easier without sacrificing the main benefit from learning assembly which is to learn how the computer works. This explains why I chose to not implement a function macro which could have generated the standard stack frame code. If you really want it to be easier, maybe C is the answer.
I have also been working on some register re-naming tricks which can make it easier to write more complex functions, but that discussion will be for another day.
Subscribe to:
Posts (Atom)