Friday, November 04, 2011

Portable Book Format

Now that I have published Kindle and Nook versions of my assembly book I have some experience with book formatting.  LaTeX produced an excellent printed book, but it required a huge amount of changes to convert the book into HTML for conversion to Kindle (mobi) format.  Fortunately the Nook format (epub) is similar enough to mobi that the Calibre program produced an acceptable epub book automatically from the mobi book.

However I now have 2 versions of the book to maintain which is not desirable.  In the long run it will be better to find or create a single format for a book which can be converted automatically to multiple format
Docbook
I started looking for an existing solution to try to save time.  There have been numerous posts on the Internet relating the problems experience by others trying to produce technical ebooks and printed books from a common source.  The problem seems to be basically unsolved.

The first possible solution I discovered was the docbook xml format.  The docbook specification is a verbose method of specifying an XML file which can be translated into PDF using dblatex and into epub using dbtoepub.  The results are quite good except for mathematical formulas.

Mathematics rendered with docbook are fairly crude compared to the excellent rendering from LaTeX.  The epub and mobi formats are limited in equation rendering features.  Basically in-line equations done using subscripts and superscripts are possible, but much more.  On the other hand, my needs are generally not too demanding.  I had about 5 stand-alone equations in my book which I chose to include as images obtained from LaTeX.  The rest were done somewhat crudely using HTML sup and sub tags.

My efforts with docbook indicated that complex mathematics was not a real option with docbook.  I would still need to capture some equations as images from LaTeX.  This is not very difficult but it is a tedious process which I would prefer to automate.

Asciidoc
About the same time that I discovered docbook I also discovered asciidoc, which is an alternative which uses a wiki-like syntax which is translated into docbook format and subsequently into epub and pdf.  Asciidoc is quite good at what it does.  However it is not great and equation rendering.  It can use mathml though trying to figure out how to use it with mathml is quite a challenge.

I tried several hours to work with asciidoc and produced nice looking PDF and epub books (very short books).  It was fairly promising but eventually I decided that I needed more control and it seemed that docbook would be more successful.

Editing Docbook Files
I studied several options for editing docbook files.  There are some nice commercial editors and I had one option immediately available with the conglomerate XML editor.  It had a nice option for starting a new book and looked promising - for a few minutes.  It crashed when I made a typing error.  I studied it a little longer and decided that it was not mature enough for my needs.

I then turned to vim.  I found a few vim files which helped with editing docbook files and started tweaking my .vimrc file to make it easier to produce mathematics in docbook format.  I kept reading more and more about docbook.  It has some capability to produce mathml, though I still was handicapped by a lack of documentation.

The XML format was quite verbose:  <mathphrase>a<subscript>i</subscript><superscript>2</superscript></mathphrase> to produce i2.  It was not going to be pretty.

In addition to equation rendering issues I still had a few more unresolved problems.  Chief among these was the issue of rendering left and right pages with different size margins to work well in a printed book.


The Plan
After much effort I have decided to write my own translator to produce LaTeX and HTML (or docbook)  from a common format.  My plan is to use either LaTeX or superscripts and subscripts for in-line math and use either LaTeX or images produced from LaTeX for stand-alone equations.  I can produce excellent results with this plan and I don't think I could do as well with either asciidoc or docbook.

No comments: