Thursday, December 06, 2018

Switching to lldb for OS X

I have had extensive problems getting gdb to run properly on OS X, but I have generally been successful.  Recently I installed Mojave in a VM and ran into an issue that I didn't resolve.  Running gdb now requires modifying SIP (System Integrity Protection) which can only be done by booting into Recovery Mode.  I have yet to boot this VM into Recovery Mode, so gdb doesn't work.

It is ridiculous to tell people trying to learn assembly language the number of steps required to change SIP, create a certificate and codesign gdb in order to run ebe, so I have decided to bite the bullet and configure ebe to use either gdb or lldb.

Step 1 is to create a Debugger class which is a pure virtual class which allows me to have both GDB and LLDB subclasses and from the other parts of ebe use a single pointer to Debugger to refer to whichever debugger I am using.  After a little relearning it turned out to be pretty easy to create a superclass for GDB and start working on the LLDB class.

I think I have had the same start-up trials with gdb and lldb: they won't simply accept a stream of text as input.  You have to have a little time between commands.  This is my best assessment.  The truth might be a bit different.  Anyway to conquer getting all the output from a command I always follow an lldb command with "script print '<LLDB>\n'" which will print "LLDB" on a line by itself.  For some strange reasons I have to not simply read the output from a command; I have to use a Qt function to wait for outout to be ready.  The docs claim that it will wait up to the specified number of milliseconds, but I think it waits that amount of time.  Maybe I'm wrong on that.  Anyway after a certain amount of time of waiting, I go ahead and send the script command to lldb.  Unfortunately occasionally the command seems to disappear.  My solution is to wait up to about 4 seconds and then issue the script command a second time.

I don't like solutions like this, but sometimes it seems to be the best I can do.  I expect that perhaps a slower computer (or maybe faster) will exhibit different patterns of failing to execute my script command, so I have attempted to put in a little cushion in my times for waiting.

There is another issue with lldb.  If you set breakpoints by address and then run the program, the breakpoints are unresolved and don't work.  They do work if you do that process again.  So, in ebe, you would have to click on the alien icon once to "prime the pump" and click again to start debugging.  I have tried this and it is quick.   There is another way to start a program using "process launch --stop-at-entry" which will start it and stop at an entry point before main.  This takes about 3 seconds, but after that breakpoints can be set and a "continue" command will get the debugging started.  In ebe, you just click on the alien icon once and after a small delay it just works.  I prefer the "just works" with a small delay.  I don't wish to explain to people that you have to click twice.  If only I knew why lldb requires 3 seconds longer to find an entry point and set a breakpoint there and run the program...  A computer could do a lot in 3 seconds.  It makes no sense to me.

Anyway I have debugging working with the next line highlighted, registers, floating point registers, back trace, assembly data, and stack displayed.  Left to do is a collection of data for C/C++ in the data window.  Assembly debugging seems pretty solid.  I would prefer rock solid, but I don't know how to fix my issues with lldb.  I think the same issues have always existed with gdb, but my gdb kludge has been pretty successful if not rock solid.

Monday, December 29, 2014

Designing the EZTable class

    I had a request for displaying changes in the data highlighted while debugging in ebe.  The idea is that in the data window, the assembly data window, the register window, the floating point register window and the stack frame window each changed item would be displayed in a highlighted color (I chose red).  The first attempt to solve this problem with to derive EbeTable from QTableWidget and derive EbeTableItem from QTableWidgetItem.  This was mostly successful, but had some issues.

Sunday, December 28, 2014

Ebe changes

   I have recently added changes to the display of the data in the assembly data window, the register window, the floating point register window and the the stack frame window.  The first change is that whenever a value changes it is displayed in red making it a little easier to see the changes as you step through your code.
   The second change is that it is now possible to use the "Edit settings" dialog to change the width of the display in the register window and the assembly data window.  You can choose to display the registers in 2 columns rather than 4 which makes it more convenient if the register window is docked on the left.  For the assembly data window the choice is from 8, 16 or 32 columns.  The number of columns is the number of chars displayed on one row.  Other types will be adjusted to match.  If you drag to assembly data window to the bottom of the window, 32 columns makes pretty good sense.
   I am currently working to replace the data window for high level languages with a similar capability.  When it is done the data will be displayed in a regular pattern and the updates will be in red.  Ultimately I would like to provide a scrolling feature for displaying arrays.  At the start the arrays need to be fairly small to be convenient.  I really don't like the method of selecting indexes for the data window display.

Tuesday, October 21, 2014

Rational names for registers

Having published a book on 64 bit assembly language for Linux and OS X which are fairly similar and a different one for Windows which is quite different.  I have been trying to synthesize from my experience a better way to write assembly language.

One of my problems is remembering which registers are used for parameters.  Linux/OS X uses 6 and Windows uses 4 and they are not nearly the same:  rdi, rsi, rdx, rcx, r8 and r9 vs rcx, rdx, r8 and r9.  The registers are generally general purpose, but there are a few instances where certain registers are implied by an instruction like movsb which moves a byte pointed at by rsi to the location in rdi,

Another of my problems has been remembering which registers must be saved and restored if you need to use them and which are scratch registers.  The save collection is rbx and r12-r15 for all system and you add in rdi and rsi for Windows.  Why should it be such a hassle?

I have decided and implemented a plan where I have renamed the registers based on their normal usage.  The rax register I have named the accumulator register or acc.  I have left rsp and rbp as is, though I played around with renaming rbp as rfp to be the "frame pointer register" rather than the "base register".  The rest are either parameter registers, scratch registers or save registers.

So I have used %assign in an include file named "hal.inc" to define new names for the general purpose registers.  All of them follow some simple patterns unlike the use of eax to mean the lower half of rax versus r8d for the lower half of r8.  All of them use a letter after the name to establish a part of the register.  You can use accq to mean the quadword use of acc or you can omit the q.  accd means the double-word part of acc, accw means the word part of acc,  accb means the byte portion of the register and, for a few cases, adding an h gives the high byte of the register.  I don't think I've had a use for the high byte of a register yet.

The parameter registers are par1, par2, par3, par4 for all systems and you also have par5 and par7 for Linux/OS X.  For those who want to write portable code, you can test using %ifdef WINDOWS to see which protocol is in effect.  So I still need to remember that Linux and OS X have 6 registers used for parameters, though I can forget the sequence of crazy names.  I can zero out one of the parameter registers using "xor par3d, par3d" and this works for all the operating systems.  In fact if I am writing test programs I generally have no more than 4 parameters for my functions and the code is almost identical.  There is a hangup with functions receiving a variable number of parameters like printf and scanf.

The scratch registers are scr1 and scr2.  It turned out that there were 2 of these and they are r10 and r11 for Windows and Linux/OS X.  I really didn't know that until I categorized them all and started naming them.  It is really nice to be able to use scr1 or scr2 in a function as long as you don't call a function and expect to retain the values.  The name "scratch" seems about right to me.

The save registers are registers which need to be moved to memory locations before you use them in a function and later you have to restore their values from memory or risk confusing the calling functions.  Your program might do some strange things if you leave a bogus value in a save register.  The names are sav1, sav2, ..., sav5 and for Windows there are also sav6 and sav7.  Having 2 fewer registers for parameters ended with 2 more save registers.

Using my frame macro (discussed in the previous blog entry) if I want to write a function using a lot of registers I would start with

myfunction:
.sav1    equ    local1
.sav2    equ    local2
.sav3    equ    local3
.sav4    equ    local4
         push   rbp
         mov    rbp, rsp
         frame  3, 4, 3        ; prepare a frame with 4 locals
         mov    [rbp+.sav1], sav1
         mov    [rbp+.sav2], sav2
         mov    [rbp+.sav3], sav3
         mov    [rbp+.sav4], sav4

From this point forward I am safe using registers sav1-sav4 and coding is a bit easier.  I do need to restore them before returning using

         mov    sav1, [rbp+.sav1]
         mov    sav2, [rbp+.sav2]
         mov    sav3, [rbp+.sav3]
         mov    sav4, [rbp+.sav4]
         leave
         ret

I have chosen "hal" as the extension for my programs with renamed registers which stands for "Human Assembly Language".  Ebe now supports HAL by presenting the HAL registers in a special register window with the registers grouped by their normal use.  There is also a window which shows the translation from Intel to HAL for the infrequent occasions when you really must use rsi and rdi or whatever registers for an instruction with implied registers.

I have some functions which use quite a few registers and I have found it convenient to give them names which reflect the data they represent.  I created an alias macro to assist with this.  This was somewhat necessary since registers have different names to reflect the portion of the register being used in an instruction.  In this macro it became necessary to not provide a name without the "q" so that I could use my unalias macro to remove the alias.

Here are some sample aliases

        alias  Sum, sav1
        alias  N, sav2
        alias  I, sav3
        alias  Data, sav4
 
Now to use these aliases you must prefix them with q, d, w, b or h to indicate the desired portion of the register.  I have decided to use a prefix and names with an initial capital to make them easier to read.  I realize that I'm inconsistent but it looks better to me.  Using these aliases I might write a loop like

.top    mov    acc, [qData+qI*8]   ; get Data[i]
        add    qSum, acc
        inc    dI
        cmp    dI, dN
        jl     .top
   
It would be possible to simplify the code a little, but it does show that the assembly code can look more realistic with better names for the registers.  On my to-do list is to add code to ebe to show the aliased register names in the HAL register window.  Without that the code looks good, but the debugging involves a translation step which I intend to eliminate real soon now.

I have also implemented fpalias and fpunalias to do the same sort of thing for the floating point registers.  I used them all in the code which computed correlation using AVX instructions and their purpose changed from the data summing phase to the correlation computation phase, so I aliased the registers once at the start and than again later.  The code is far easier to read, though it still requires some planning to write code using all 16 registers.  The new code uses ySum_x to represent the ymmX register which holds the sum of x values.  I used 2 sets of registers in the summing phase so I also had ySum_x2.  That gave me 8 partial sums of x values and a handful of others.  This adds one more thing to my to-do list.

I am pretty pleased with the HAL register renaming along with the frame macro.  I am hoping that some other people agree with me that it is OK to learn Intel/AMD assembly language without the strange names.


Displaying stack frames in ebe





It is now possible to display the current function's relevant stack data in a stack frame window in ebe.  To do this use the new frame macro in the start of a function.  For example let's suppose you want to use 3 local variables in your main function and call some function with 4 parameters.  Then you should start main like this

main:
      push   rbp
      mov    rbp, rsp
      frame  2, 3, 4   ; main has 2 parameters, will use 3 locals, and
                       ; will call functions with a maximum of 4 parameters
      sub    rsp, frame_size
 
The frame macro does not generate any instructions.  However it does determine frame_size, which could be 0 in functions with no local variables.  Ebe keeps track of the lines after the frame macro and builds a table in the stack frame window from that point forward in the code.  It will be accurate only after the subtraction of frame_size.

With Linux and OS X the frames are generally pretty simple.  Also the use of 6 register parameters means that a lot of functions will require no space on the stack for parameters to the current function or to call functions.

Here is the stack from using 2, 3, and 4 for Linux:

You can see that local1 is on the third line of the table and it addressed using [rbp-8].  Alternatively you can use simple [rbp+local1].  Let's assume you want to save the 2 parameters to main in locals 1 and 2:

      mov    [rbp+local1], rdi
      mov    [rbp+local2], rsi
 
You may have noticed the last row in the table which could have been label4.  This space is there so that the functions which you call can keep the stack on 16 byte boundaries.  The value of frame_size will always be a multiple of 16.

Now suppose you have a function which receives 9 parameters, has 2 locals and the maximum number of parameters for called functions is 5.  Then the current function will have 3 parameters on the stack.  The frame call would be

      frame  9, 2, 3

The stack frame for this function would look like this under Linux or OS X.


Here we see an empty row at the top of the table which the calling function had to prepare for.  Before making the call it placed parameters 7, 8 and 9 on the stack as shown as currPar7-9.  You can also see the addressing for parameter 7 is [rbp+16] or more simply stated as [rbp+currPar7].

Similarly you can prepare to call a function with 9 parameters from a function with 2 locals and 2 parameters using

      frame  2, 2, 9

This will generate a stack frame like this one for Linux and OS X


Again you see an empty row in the table above newPar9 so that rsp will remain a multiple of 16.  You can place values into these stack-based parameters using [rsp+newPar7].  Note that the recommendation is to use rsp for the new parameters since they are basically at positive offsets from rsp.

The main goal for ebe is to make assembly programming easier without sacrificing the main benefit from learning assembly which is to learn how the computer works.  This explains why I chose to not implement a function macro which could have generated the standard stack frame code.  If you really want it to be easier, maybe C is the answer.

I have also been working on some register re-naming tricks which can make it easier to write more complex functions, but that discussion will be for another day.