MOBI Format

Mobipocket file format

The Mobipocket format (.mobi) files is a common format for eBook readers. In particular, it is the primary native format for Amazon's Kindle. Amazon uses a variant of the format with DRM (Digital Rights Management) features added. Amazon provides a tool called kindlegen which converts a human readable set of files into a single .mobi file. The inputs consist of:

Thus the process of weaving to the MOBI format looks like this:

weaving implementation +≡
weaving toc
weaving ncx
weaving cover
weaving opf
weaving chapter xhtml
: weave ( -- )
    weave-opf
    weave-ncx
    weave-cover
    weave-toc
    weave-chapters
;

OPF files

The OPF file provided to kindlegen is the primary input file. In fact, it is the file listed as an argument when running kindlegen from the command line.

We will assume a single OPF file which will be generated into the "meaning" of a reserved atom.

weaving opf +≡
atom" ~~~OPF" constant atom-opf

We will append .opf to the document base name to select the output file.

weaving opf +≡
: opf-filename ( -- A )
    doc-base @ atom" .opf" atom+ ;

Weaving the opf file involves changing the focus chunk to the opf file.

weaving opf +≡
weaving opf manifest chapters
weaving opf chapter itemref
: weave-opf
    atom-opf documentation-chunk ! doc!

Emitting the opf header.

weaving opf +≡
.d| <?xml version="1.0" encoding="utf-8"?>
<package xmlns="http://www.idpf.org/2007/opf" version="2.0"
unique-identifier="BookId">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:opf="http://www.idpf.org/2007/opf">
|.d

Add in metadata fields about the document in general like: title, isbn, author, subject, date, and description.

weaving opf +≡
    .d{ <dc:title>} title @ doc+=$ .d{ </dc:title>} .dcr
    .d{ <dc:language>en-us</dc:language>} .dcr
    .d{ <meta name="cover" content="My_Cover"/> } .dcr
    .d{ <dc:identifier id="BookId" opf:scheme="ISBN">}
    isbn @ doc+=$ .d{ </dc:identifier>} .dcr
    .d{ <dc:creator>} author @ doc+=$ .d{ </dc:creator>} .dcr
    .d{ <dc:publisher>} author @ doc+=$ .d{ </dc:publisher>} .dcr
    .d{ <dc:subject>} subject @ doc+=$ .d{ </dc:subject>} .dcr
    .d{ <dc:date>} doc-date @ doc+=$ .d{ </dc:date>} .dcr
    .d{ <dc:description>} description @ doc+=$ .d{ </dc:description>} .dcr
.d|
</metadata>

Then add in a table of contents listing all the files in the book, including table of contents and chapters.

weaving opf +≡
<manifest>
   <item id="My_Table_of_Contents" media-type="application/x-dtbncx+xml"
   href="|.d ncx-filename doc+=$ .d| "/>
  <item id="toc" media-type="application/xhtml+xml" href="|.d
    toc-filename doc+=$ .d{ "></item>}
    chapters @ begin dup while
        dup chapter-filename opf-chapter ->next
    repeat drop
    .d{ <item id="My_Cover" media-type="image/gif"} .dcr
    .d{  href="} cover-filename doc+=$ .d{ "/>} .dcr
    .d{ </manifest>}

One entry per chapter.

weaving opf manifest chapters +≡
: opf-chapter ( A -- )
    .d{ <item id="}
    dup doc+=$
    .d{ " media-type="application/xhtml+xml" href="}
    doc+=$
    .d{ "></item>} .dcr
;

Then list each chapter and TOC again for the spine.

weaving opf +≡
    .d{ <spine toc="My_Table_of_Contents"><itemref idref="toc"/>}
    chapters @ begin dup while
        dup chapter-filename opf-chapter' ->next
    repeat drop
   .d{ </spine>}

Each itemref in the spine looks like this.

weaving opf chapter itemref +≡
: opf-chapter' ( A -- )
    .d{ <itemref idref="} doc+=$ .d{ "/>} .dcr ;

Finally the guide can just consist of the table of contents.

weaving opf +≡
.d|
<guide>
  <reference type="toc" title="Table of Contents"
   href="|.d toc-filename doc+=$ .d| "></reference>
</guide>
</package>
|.d

Then write out the file.

weaving opf +≡
   documentation means opf-filename file!
;

NCX files

The NCX file relists each chapter to select the navigation points in the document.

As with the OPF, accumulate into the "meaning" of a reserved atom.

weaving ncx +≡
atom" ~~~NCX" constant atom-ncx

Output to the document base with .ncx appended.

weaving ncx +≡
: ncx-filename ( -- A )
    doc-base @ atom" .ncx" atom+ ;

We then can write to the reserved atom.

weaving ncx +≡
weaving ncx chapter
: weave-ncx
    atom-ncx documentation-chunk ! doc!

Writing out the ncx header.

weaving ncx +≡
.d| <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN"
"http://www.daisy.org/z3986/2005/ncx-2005-1.dtd">
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/"
 version="2005-1" xml:lang="en-US">
<head>
<meta name="dtb:uid" content="BookId"/>
<meta name="dtb:depth" content="2"/>
<meta name="dtb:totalPageCount" content="0"/>
<meta name="dtb:maxPageNumber" content="0"/>
</head>

Including the a few fields like title and author.

weaving ncx +≡
|.d
.d{ <docTitle><text>} title @ doc+=$
.d| </text></docTitle>
<docAuthor><text>me</text></docAuthor>

Then the main navmap.

weaving ncx +≡
  <navMap>
    <navPoint class="toc" id="toc" playOrder="1">
      <navLabel>
        <text>Table of Contents</text>
      </navLabel>

Add in the table of contents.

weaving ncx +≡
     <content src="|.d toc-filename doc+=$ .d| "/>
     </navPoint>
|.d

And each chapter.

weaving ncx +≡
    chapters @ begin dup while
    dup weave-ncx-chapter ->next repeat drop

A chapter looks like this.

weaving ncx chapter +≡
: weave-ncx-chapter ( chapter -- )
   .d{ <navPoint class="chapter" id="}
    dup chapter-filename doc+=$
    .d{ " playOrder="}
    dup chapter-filename doc+=$
    .d{ "><navLabel><text>}
    dup chapter-name doc+=$
    .d{ </text></navLabel><content src="}
    chapter-filename doc+=$
    .d{ "/></navPoint>}
;

Then close out the file and write it.

weaving ncx +≡
    .d{ </navMap></ncx>}
    documentation means ncx-filename file!
;

table of contents

The table of contents is an XHTML file like the chapters. XHTML is like HTML but strictly XML like in format. We use a subset that is constrained by MOBI's limitations.

We will accumulate the table of contents to a reserved atom.

weaving toc +≡
atom" ~~~TOC" constant atom-toc

And write this to a filename based on the document base with the .html extension added.

weaving toc +≡
: toc-filename doc-base @ atom" .html" atom+ ;

We change the focus chunk to the TOC.

weaving toc +≡
weaving toc chapter
: weave-toc
    atom-toc documentation-chunk ! doc!

Then write out the header for the TOC.

weaving toc +≡
.d| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>Table of Contents</title></head>
<body>
<div>
  <h1><b>TABLE OF CONTENTS</b></h1>
|.d

Then write out each chapter.

weaving toc +≡
    chapters @ begin dup while
    dup weave-toc-chapter ->next repeat drop

Where a chapter looks like this.

weaving toc chapter +≡
: weave-toc-chapter ( chapter -- )
    .d{ <h4><b><a href="}
    dup chapter-filename doc+=$
    .d{ ">}
    chapter-name doc+=$
    .d{ </a></b></h4>} .dcr
 ;

Then close out the TOC and write it out.

weaving toc +≡
    .d{ </div></body></html>} .dcr

    documentation means toc-filename file!
;

Chapter HTML

Each chapter is accumulated into a link list of chapters.

chapter implementation +≡
variable slide-chapter
variable chapter-count
linked-list chapters

Accessors for chapters are provided.

chapter implementation +≡
: chapter-name ( chp -- A )
    cell+ @ ;
: chapter-text ( chp -- A )
    cell+ @ means ;
: chapter-number ( chp -- n )
    2 cells + @ ;

Chapters are output to the base document name, followed by an underscore, then a zero extended number, and .html at the end.

chapter implementation +≡
atom" .html" constant .html
: chapter-filename ( chp -- A )
     chapter-number s>d <# # # # #s #> atom
     doc-base @ atom" _" atom+ swap .html atom+ atom+ ;

A raw chapter can be either normal or for slides. It is added to the list of chapters.

chapter implementation +≡
chapter implementation finish
: raw-chapter ( -- )
     chapter-finish
     parse-cr
     chapter-count @   1 chapter-count +!
     over 2 chapters chain
     dup documentation-chunk ! doc!

Then a the xhtml header is written.

chapter implementation +≡
.d| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
 "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html>
<head>
|.d

Then potentially slide show Javascript.

chapter implementation +≡
slide-chapter @ if
slide show logic
then

Then some CSS.

chapter implementation +≡
.d|
<style type="text/css">
  div.chunk {
    margin: 0em 0.5em;
  }
  pre {
    margin: 0em 0em;
  }
|.d

Potentially with a page break for slide shows.

chapter implementation +≡
slide-chapter @ if
.d|
  div.section {
    page-break-before: always;
  }
|.d
then

Finally chapter headings.

chapter implementation +≡
.d|
</style>
<title>|.d
    dup doc+=$
    .d{ </title></head>}
    slide-chapter @ if .d{ <body onload="Load()">} else .d{ <body>} then
    .d{ <div class="section"><h1>}
    doc+=$
    .d{ </h1><p>}

    feed
;

Each chapter also has a short footer.

chapter implementation finish +≡
: chapter-finish   .d{ </p></div></body></html>} ;

This allows us to construct each chapter.

weaving chapter xhtml +≡
: weave-chapter ( chapter -- )
    dup chapter-text swap chapter-filename file! ;
: weave-chapters
    chapters @ begin dup while
    dup weave-chapter ->next repeat drop ;