Tuesday, September 10, 2013

Linux Binary Compatibility

I have had some issues with lack of binary compatibility of Linux.  I can build ebe using a lot of shared libraries and if people have those libraries installed it is highly likely that the binary will work properly.  Their computer would need matching versions of a moderately large collection of .so files.

To avoid requiring that ebe users install a bunch of libraries I have developed a script named "build_package" which uses ldd to determine the shared objects used by ebe and it builds a directory named "ebe.d" with ebe copied in as "ebe.exe" and all the shared objects.  The idea is that on the target system ebe would be started using a shell script named "ebe" which sets LD_LIBRARY_PATH to ebe.d allowing all the shared libraries from my system to be used.

This works with the same version of Ubuntu (13.04) for me, but fails on 12.04.  Interestingly on my virtualbox 12.04 system ebe.exe works if I allow all the libraries to be loaded normally.  So I tried to match the strategy of firefox which is a fairly successful program.  It is loaded by a script though it does not use LD_LIBRARY_PATH.  Firefox does use some special shared libraries but these apparently are loaded using dlopen rather than automatically.  They don't appear when you use ldd.

So I started removing shared libraries.  After removing a few of them I started getting incompatible messages about Qt versions.  Apparently Qt knows that the Qt shared libraries I copied to 12.04 are a different version  from the libraries I installed previously.  I did not find an environment variable which pointed to the older libraries, so I assume that Qt does some form of internal consistency checking.  This may be something I can get around, but I don't have a clue about how to dodge this test now.

I considered briefly forcing ebe users to install from source, but the requirements are fairly large.  I was trying to make ebe accessible to novices.  Novices shouldn't have to face a long list of instructions just to get their IDE installed.  I had the experience of installing a copy of ebe modified to use Qt5 recently.  It took probably an hour of piddling around to figure out the packages required to get ebe to build.  Maybe I missed the obvious qt5-dev package, but I tried to find the obvious.  I wouldn't call myself a novice though I know I can make some really stupid mistakes.

So I am left with providing a separate install package for each of several versions of Linux.  What a waste of effort!  I am sure there is a better way, but I have spent more time looking for a better way than  it would take me to provide 3 versions of my install package for a year.

To reduce the effort I will start using rsync to update my downloadable files on sourceforge.  I had been using their web interface on each computer.  I will still have to do a fair amount of work on each revision, so I need to make less frequent updates to the binary installers.  I can still push source code changes regularly since that is so easy, but the installers I will try to update perhaps once a month.

I also tried to use an authorized_keys file on sourceforge to make it quicker to do the rsync, but that didn't work.  It didn't really matter since it takes a few seconds to type in my password and an hour or two to prepare the binary packages on about 5 computers.

Thursday, August 29, 2013

Using Qt Linguist to Translate EBE's Messages to Multiple Languages

Over the past week I have been using Qt's language translation facility to translate all the words and phrases used in ebe into multiple languages.  Qt is designed for easy use of multiple languages.  If you make a good habit of using their tr function with every string, then retrofitting languages is fairly easy.

Consider the use of QMessageBox::warning to present warning messages.  Here is one of my calls:

    QMessageBox::warning(this,tr("Error"),
       tr("The first index can't be\n greater than the last."),
       QMessageBox::Ok, QMessageBox::Ok);

You see that there are 2 calls to tr in the call to the warning function.  If you have not implemented any other languages, tr("Error") will return "Error".  If you have set up your program with other languages, then tr will use the active language to rapidly find a translation for "Error".  In the case of French, it might return "Erreur".  So for each language you must prepare a translation for each string embedded in a call to tr.

The file which contains translations is an XML with the extension "ts".  You need to a TRANSLATIONS variable to your Qt project file (ebe.pro for ebe).  Here is my current value

    TRANSLATIONS = ebe_fr.ts ebe_sp.ts ebe_sv.ts \
                   ebe_de.ts ebe_pt.ts ebe_hi.ts \
                   ebe_zh.ts ebe_ru.ts ebe_ar.ts \
                   ebe_bn.ts ebe_in.ts ebe_ja.ts

The project file is processed by the lupdate program to determine all the translation files and also the location an content of each tr call.  Each of the files named has a 2 character language code like "fr" for "French" which is used by the Qt lupdate program to determine the language.  If a particular .ts file does not exist it will be created with an empty translation for each tr string.  If a string is repeated within a source file, it exists as one entry in the .ts file with the line numbers for each occurrence listed with the string.  If a file already exists it is updated with changed information while retaining existing translations.

The translation process consists of replacing the empty translations with reasonable strings from the appropriate language.  This can be done with an editor (provided the editor allows entering the proper characters and accents for the language).  A better choice is to use the Qt Linguist program.  This allows you to move through the .ts file without the bother of keeping the XML format straight.  With Linguist you can type in translations or copy them from other sources.  In my case I have been using Google Translate, though many editors will allow editing in multiple languages.

So far I have translations for French, German, Hindi, Arabic, Swedish, Russian, Chinese, Spanish and Portuguese.  You can see from the TRANSLATIONS variable that I anticipate translations for Bengali, Indonesian and Japanese.  At that point I will have covered the top 10 most used languages of the world with a start toward covering more of Europe.

My main problem is the inconsistent quality of Google Translate's translations.  I really need knowledgeable people to repair my broken translations.  I have had a volunteer from Canada repair my French file and I have volunteers working on Spanish, Portuguese, Hindi, Arabic and Chinese.  I need help with Russian, Indonesian, Bengali, Japanese, German and Swedish.

In general this points out to me the need to recruit volunteers to the ebe project.  I desperately need help with documentation.  I need to complete HTML files detailing how to use ebe.  Ultimately it would be wonderful to have all the HTML files translated into the dozen or so languages selected for ebe.  There are also needs to write sample code and HTML files in the ebe library.  Finally ebe is not nearly perfect and I need a few good programmers to add features and work on the ebe C++ code.  This is an open source project so it needs volunteers.

Long term changes include adding a function/class database to provide call information within the editor and adding lessons to the system.  The goal is to make ebe into a great tool for teaching and lessons seem important.  I would also welcome any bright ideas which could further the cause.

Wednesday, July 24, 2013

Syntax highlighting for Fortran and Assembly

I finally got around to writing classes for Fortran and Assembly source code highlighting.  It was fairly easy to convert the existing Highlighter class into a base Highlighter class and a derived CppHighlighter class.  Then I copied the code to produce FortranHighlighter and AsmHighlighter.

I found a collection of Fortran 20xx keywords online which I used to replace the C++ keywords in the constructor for FortranHighlighter.  Fortran essentially throws away white space so keywords like "end do" become "enddo".  For my convenience I generated the keywords "end", "do" and "enddo" to make it work properly.  Then there was a minor bit of coding to handle Fortran comments and strings properly.

I had previously stored all the x86-64 instructions in src/assembly/instructions and this file is read into a set of strings named instructions.  So I only had to change the test for keywords to be testing for a string in the set to manage keywords properly for AsmHighlighter.  I had to simplify the comment code a little since Assembly comments all begin with ';' and go to the end of the line.  I left the string handling as it was.

As I did previously I implemented state machines for lexical analysis using gotos.  It is so easy and pretty clear when done nicely.   Now I have 70 gotos in highlighter.cpp.  It's a cheap thrill.

Sunday, February 17, 2013

Coping well with the lack of line numbers

One of my goals in the Qt version of ebe is to transparently support OS X assembly language as well as Linux assembly language.  There are several basic problems with using yasm under OS X.

  • OS X uses rip-relative addressing.
  • Global functions use an underscore prefix.
  • There is no debug information provided by yasm for use in gdb.
The first two problems are fairly easy to cope with.  First the rip-relative addressing only matters when you attempt to using indexing of an array or accessing a structure component in the data segment.  For those cases if you use load-effective-address to get the address of the array or struct into a register, this works on both Linux and OS X.

The second problem can be solved using macros.  I have prepared a set of macros which I automatically prefix each assembly file (using yasm's -P option).  The macros add "default rel" under OS X to establish rip-relative addressing and translate each of about 350 function names to have prefixes including main, scanf, printf, ...   So the source code can use main without worrying about the need for an underscore.  There is also a cname macro which can turn any name into a macro which will have an underscore prefix under OS X and not under Linux.  So the set of macros takes care of the first 2 issues.

The lack of debugging support is not as total as it could be.  You can still find globals and addresses using nm and within gdb, but there is no way to set a breakpoint by using a line number and when gdb stops after a next instruction command it won't tell you the next line of the function to execute.

My original solution to this was to inspect the listing file and determine relative addresses for each line and then query by either gdb or nm to determine an actual address such as &main.  This is a fair amount of code and more code means a higher probability of error.

My next solution to the lack of line numbers is to make my own.  I am now generating a debug asm file from the original with each original line preceded by a generated label with a line number in the label.  To properly handle local labels, my generated labels need to be local labels, so I need to create a global label to stick at the start of the file.

So a file would start with

ebe_debug;           ; generated global label
.filename_line_1:    ; generated local label with a line number
...                  ; whatever was originally on line 1
.filename_line_2:
...                  ; whatever was originally on line 2

So with a file with 100 lines there would be 201 lines in the debug asm file and the generated instructions are the same.  Now it is possible to extract all the address information using nm for the executable and issue breakpoint commands like:

break main.filename_line_8

By choosing a nice solution I managed to get a bonus: gdb reports the location when it finishes a command with something ending in "in main.filename_9", so I can readily determine the filename and the line number for the next line to execute and highlight it in ebe.

This solution was fairly good.  It was simple, but it interfered with using macros among other issues.  For repeat macros there would be multiple occurrences of the same label.  After using this solution for a while, I reverted back to the solution requiring analysis of the listing file to determine relative addresses which are later translated to program addresses using addresses of globals in the program.  This worked smoothly.

I think the total solution might require identifying labels well.  Unfortunately yasm allows labels with or without colons.  With a colon a label is obviously a label.  Without a colon an instuction looks about the same.  I rounded up a relatively complete collection of x86_64 instructions and yasm pseudo-ops to store in a QSet<QString> to identify when the first word on a command is an instruction or a label.  There are roughly 1600 names in the set.  It is a truly arcane instruction set, but fortunately you can learn a fairly small subset and do a fairly good job.

Left to deal with are handling data items with global and local labels.  I need to review this again to see if I really must identify the labels in the source.  So far I have identified the range of each global label which would determine which global label is appropriate for a variable identified by a local label.  I think this is necessary, but I hope to simplify this too.  

Wednesday, January 30, 2013

Retirement == Freedom

I have been retired for 7 or 8 months now and I think I am starting to get used to the new life.  Recently I have started feeling more energetic about programming.  It is different now.  In the past I was either working on a project with a specific goal in mind or generally too busy to focus on new programming projects.

Now don't get me wrong I have done some interesting, fun things as parts of several projects during the last 5 years at USM.  The difference is one of degree.  Over the years I would explore interesting things for fun, but the unpaid projects were usually somewhat small.  Now whatever I decide to do is my choice and I can generally decide when to work on it.

"Toy Box" for ebe

Since about November 19, 2012 I have been working steadily rewriting the ebe integrated development environment in C++ using the Qt toolkit for the GUI controls.  The new ebe is better in many ways than the older one which was written in Python.

One feature I have been considering was to allow beginners to enter simple expressions and evaluate them the same way that the compiler would do.  I searched for C and C++ interpreters thinking that an interpreter might make a good choice for speedy computations.  During my search I came across a Python program which does live calls to g++ to "interpret" the C++ as it is entered.  I had to wonder about the speed of the compilations.

I started by compiling hello.c and hello.cpp to see how long it took.  On my Core i7 desktop it took about 0.1 seconds for compiling hello.cpp which hello.c took about 0.06 seconds.   I copied hello.c to hello.cpp and it still required 0.06 seconds compiled with g++.  That meant that using printf would be superior for my needs over cout.  Compilations could be done in 0.6 seconds and executing the small program took about 0.002 seconds.  So perhaps I could get 16 compilations per second.

I knew enough about the C++ typeid to figure out the type of an expression, but in order to capture the data precisely I wanted to dump the data for the expression as hexadecimal.  That would allow me to capture floats and doubles exactly as they were computed in the expression.  It seemed a shame to write one program to determine the type of an expression and a second program to dump the value of the expression in hex.  So I kept searching.

Eventually I ran across the g++ typeof operator.  This would allow me to declare a variable of the exact type as an expression.   Consider the declaration below
     typeof((a+b+c)/3.0) x;
This declaration determines that the type of the expression is double and declares x as a double.  With this interesting feature it was easy to write code to declare a variable of the right type to assign the value of an expression and then, using sizeof, I could dump each byte of the variable.



Here you can see the "Toy Box" after defining 3 int variablees and evaluating some expressions.  Line 1 of the lower table shows the original type, format and result for the expression in column 1.  In the others I have used the Format combobox to select an alternative format.  This seems like a useful tool to use when learning a language.  I feel sure that a lot of teachers would enjoy illustrating how C/C++ evaluates expressions using the toybox.  This is far superior to using an interpreter which would probably approximate the syntax and the behavior,  This tool uses the compiler so what you see is what you get.

Tuesday, January 08, 2013

A simple solution to unbuffering for a debugger

A problem which I have run into recently is the fact that pipes in Windows are not recognized as interactive and thus the C standard library routines does full buffering.  I am using two pipes to implement the stdin, stdout and stderr for C/C++ programs being debugged.  My luck was great with Linux and OSX - not so good with Windows.  On Windows I had to do fflush(stdout) to see the test printed by printf.  (and also cout using \n rather than endl)  The problem exists in the library code which inspects the attached file descriptors (or handles) and decides that this is a batch type operation and buffering perhaps 4096 bytes is perfect.

The solution I came up with is to mix C and C++.  It works just fine with gcc compiling a C program and g++ compiling a C class file and finally linking the 2 object files with g++.  The trick is to take advantage of the fact that C++ constructors for static objects are called before main is called.  Fortunately they are called after the library has prepared all the I/O routines for normal use.

Here is my C++ code:


#include <cstdio>

class __UnBuffer
{
public:
    __UnBuffer();
    int x;
};

__UnBuffer::__UnBuffer()
{
    printf("Here we go\n");
    setbuf(stdout,NULL);
}

__UnBuffer _unBuffer;

The constructor is called for the one __UnBuffer object which sets the buffering to unbuffered for stdout and also prints a message.  This message was printed first in  my test and the data made it through the pipe immediately.  Surprisingly I got some 1 byte messages which I will have to accommodate in the Qt ebe, but at least I have a simple solution which works well.

I ran into other folks who were writing GUI debuggers and they seemed to have difficulties.  I am not sure how they coped, but I didn't read a solution from any of those people.  Hopefully they will run into my solution and get past this stupid issue.

Friday, January 04, 2013

Ebe with toolbars

To make ebe more colorful I have added 3 toolbars.  After adding them I think they may actually be useful, though I was only trying to make it look better.


Thursday, January 03, 2013

New and improved ebe

Recently I decided to bite the bullet and rewrite the ebe integrated development environment using Qt.  I was getting quite tired of coping with the large number of windows used in the original ebe.  I also needed to update ebe so that class and structs could be properly displayed while debugging.  It was a fairly interesting adventure.


You can see how it looks using Qt.  The various windows are now implemented using a main widget for the editing and a collection of dock widgets.  The dock widgets can be moved around and resized.  They can even be moved out of the main window if desired.  Perhaps more useful is the ability to drop dock widgets on top of each other forming a collection of tabs.  In the image above the terminal, project and back trace dock widgets are stacked making it simple to select one of the 3 to display.

The data window is now a dock widget containing a "tree" widget which has little "arrows" to the left of items which can be expanded.  You can expand class objects and arrays to drill down into the data of a program.  The tree widget turned out to be exactly what was needed to display the data.  This simplified the effort involved in revising the data display.

The terminal was previously an xterm window under Linux and a cmd.exe window under Windows.  These separate windows complicated the management of fonts.  Now you can increase or decrease the fonts in all the widgets using Control+ or Control-.

Qt had some extra capabilities which I had not expected.  There is a QCompleter class designed to aid in implementing word completion for a variety of widgets.  I used this to make editing easier.  There is a QSyntaxHighlighter class designed for implementing syntax.  I also used Qt's QTabWidget to manage editing a collection of files as tabs in the source frame.  It allows you to open all the files in a project when you start ebe.

Overall I found it easier to implement this version of ebe compared to the previous version based on Python.  It requires more effort to get started coding with Qt, but the rather large class collection ends up saving time.  It was also nice for me to use C++ which I have more experience with.

I think the program would look better with a collection of toolbars with colorful icons.  I can't think of appropriate icons for use in debugging, but I will probably add a toolbar of file/edit operations so it doesn't look so boring.