Courses/EPUB/day1

From Publication Station

day 1

what is an EPUB?

EPUB is:

A format for representing electronic text and images.

Other formats that allow for the representation of electronic text and images – ebooks for short – are PDF, Mobi, AZW, Comic Book Archive, HTML, LaTex, etc.

The EPUB format is developed by the International Digital Publishing Forum

EPUB characteristics

  • a zip archive, renamed .epub instead of .zip
  • based in web technologies: HTML, XML, Javascript, TTF and WOFF fonts css
  • currently has 2 versions: EPUB 2 and EPUB 3
    • EPUB 3 introduced HTML5 (video, audio, canvas tags), elaborate CSS, scripting (Javascript),
    • For most part, EPUB 3 is backward compatible with EPUB 2
    • Most e-readers are only prepared for EPUB 2, however, with some limitations, they can EPUB 3.
    • Readium reader is meant as "a reference system for rendering EPUB 3"

reflowable format

EPUBs represent a paradigm shift away from the page-centered representations of text.

The page is no longer the unit or canvas. The concept of the page is simply present in EPUBs.

Instead the unit is the screen, whose size changes from device to device.

As EPUBs are meant to be read on screens, they need to adapt and reflow themselves to any screen resolution or orientation.

http://www.freesoftwaremagazine.com/files/nodes/3396/fig_zoo_of_screen_sizes.jpg Source:http://www.freesoftwaremagazine.com/

e-reading devices

  • E-ink (a type of Electronic paper): Kindle, Nook, Kobo.
  • LCD: Smart-phones, tables, Desktop and laptop computers

Source: From Print to Ebooks: A Hybrid Publishing Toolkit for the Arts

Note: Kindle devices don't support EPUB, they read only Amazon's proprietary ebook format AZW or Mobi.

e-reading software

  • PC:
  • iOs: iBooks
  • Android: Aldiko, Moon Reade

sources of EPUBs

changing EPUBs

EPUBs have one advantage over other ebook formats, and other digital media such as audio, video, or images.

EPUBS are fully changeable.


The files contained in an EPUB are, with the exception of images, plain text files in either xhtml or css formats.

Theses files can be opened, edited and saved by any text editor (different from a word-processor).

Changes can be made to the:

  • text content of the book
  • the images used in the book
  • the metadata
  • the presentation layout: CSS (including fonts)
  • Order of the book: the EPUB's spine

Once all changes are done, the source-files can be be bundled once more under a zip archive under the extension .epub

changing ePubs in Calibre

Calibre nowadays comes with a Edit book function that:

  • exposes the ePub's source-files
  • it allows the user to introduce changes to the ePub
  • can also be used to create ePubs from scratch

Such easiness to manipulate ebooks, is something that will probably fuel many future discussions, on the nature of the book.

  • is a version of a book still the same book?
  • will there be any possibility to distinguish the original from its versions?
  • can this feature be used to enrich an expand what the book is?

EPUB anatomy

a zip archive

An EPUB file is ZIP archive, with the filename extension '.epub' instead of '.zip'.

"[EPUB] is a publication format, and as such it specifies and documents a host of things that publications need to include—content documents, style sheets, images, media, scripts, fonts, and more" (Garrish, 2013)

It is a web-site in a box.

Zip EPUB.png

  • Bulleted list item

inside the .epub archive

Inside EPUB.svg


META-INF folder

Contains only the container.xml file

This file points the reading devices to the content.opf file

container.xml source:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<container xmlns="urn:oasis:names:tc:opendocument:xmlns:container" version="1.0">
  <rootfiles>
    <rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>


mimetype

A one-line file that indicates the type of application the .epub is:

application/epub+zip

It informs the reading device that the file it is reading is application in epub+zip format.


OEBPS folder

Container for all of the EPUB's contents.

This folder is not required, but it is commonly used in EPUBs.

Sub-folders can be created inside it: e.g. for images, fonts or content documents.


content.opf

Often called package document content.opf is the epicenter of every EPUB.

If an EPUB is a website in a box content.opf is the file that organizes all the stuff in the box.

Skeleton of a content.opf file:

<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" unique-identifier="epub-id-1" version="3.0">
  <metadata>
   
      ...
   </metadata>
   <manifest>
      ...
   </manifest>
   <spine>
      ...
   </spine>
   <guide>
    ...
   </guide>
</package>

The content.opf file is divided in different sections:


metadata

Contains all the metadata describing the publication in question, such as title, author, publisher, description, date, identifier (isbn number), language.

A rich set of metadata helps reading system sort and display the EPUB.

In larger context, such as ebook repositories, rich and accurate metadata is essential for the books to be discovered, read, and reused.

 
 <metadata>
<dc:identifier id="epub-id-1">urn:isbn:978-90-822345-4-1</dc:identifier>
<meta refines="#epub-id-1" property="identifier-type" scheme="onix:codelist5">01</meta>
<dc:title id="epub-title-1">Work Title</dc:title>
<meta refines="#epub-title-1" property="title-type">main</meta>
<dc:publisher>Great Publisher</dc:publisher>    
<dc:date id="epub-date">2014-12</dc:date>
<dc:language>en-US</dc:language>

<dc:creator id="epub-creator-0">Luther Blisset</dc:creator>
<meta refines="#epub-creator-0" property="role" scheme="marc:relators">aut</meta>

<dc:contributor id="epub-contributor-1">Monty Cantsin</dc:contributor>
<meta refines="#epub-contributor-1" property="role" scheme="marc:relators">edt</meta>

<dc:contributor id="epub-contributor-5">Alan Smithee</dc:contributor>
<meta refines="#epub-contributor-5" property="role" scheme="marc:relators">dsr</meta>

<dc:rights>Creative Commons Attribution, NonCommercial, ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)</dc:rights>

<dc:subject>electronic publishing, e-publishing, ebooks</dc:subject>

<dc:description>Electronic publishing has become an essential medium for the field of contemporary arts and design.</dc:description>
</metadata>


Dublin Core terms (advanced)

The metadata uses Dublin Core terms

Dublin Core terms constitutes a metadata scheme, which intends to provide a simple and limited vocabulary to describe web resources, hence its adoption in EPUB.

<dc:creator id="epub-creator-0">Luther Blisset</dc:creator>
<meta refines="#epub-creator-0" property="role" scheme="marc:relators">aut</meta>

Inside the ePub Dublin Core structures the metadata by using a DC term and giving it a value;

To produce more accurate metadata, the dc terms are refined with a <meta>

In the example case above:

  • DC term is "creator"; and has the value Luther Blisse
  • and the <meta scheme="marc:relators"> specifies that this creator is an author: "aut"
  • "creator" is one the Dublin Core terms
  • "aut" is one of the MARC Relator terms made available by the Library of Congress. In conjunction with Dublin Core terms, they provide a more detailed description.

The combination of the more generic DC terms, with the more specific MARC Relator terms can provide a very rich, detailed, and yet generic set of title level metadata.

Common DC terms used are: creator, title, identifier, language, description, subject, publisher


Note: That in order two use the dublin core terms, its namespace must be declared, either in the package tag:

<package xmlns="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" unique-identifier="epub-id-1" version="3.0">

Or the metadata tag:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">


manifest

All the resources (files) used in the EPUB are declared in the manifest.

If a resources is not in the manifest it won't be present in the EPUB.

<manifest>
    <item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml" />
    <item href="stylesheet.css" id="style" media-type="text/css" />
    <item href="nav.xhtml" id="nav" media-type="application/xhtml+xml" properties="nav" />
    <item href="cover.xhtml" id="cover_xhtml" media-type="application/xhtml+xml" />
    <item href="ch001.xhtml" id="ch001_xhtml" media-type="application/xhtml+xml" />
    <item href="ch002.xhtml" id="ch002_xhtml" media-type="application/xhtml+xml" />
    <item href="media/file0.png" id="file0_png" media-type="image/png" />
    <item href="UbuntuMono-B.ttf" id="UbuntuMono-B_ttf" media-type="application/x-font-truetype" />
</manifest>


spine

Stipulates the order of the different EPUB content documents (xhtml or SVG files).

  <spine toc="ncx">
    <itemref idref="cover_xhtml" linear="yes" />
    <itemref idref="title_page_xhtml" linear="yes" />
    <itemref idref="nav" linear="no" />
    <itemref idref="ch001_xhtml" />
    <itemref idref="ch002_xhtml" />

Note: 'linear="no"' hides that content document from the ePub's contents.


guide

The guide is deprecated in EPUB 3, however you might still come across it.

It often points to nav file.

Navigation - table of contents

The Navigation document is a hyper-linked table-of-contents, that allow the reader to quickly reach the different sections of a ePub.

In EPUB 2 the navigation document was toc.ncx, written in XML.

EPUB 3 uses file, usually named nav.xhtml, which is slightly simpler document, based of HTML with common tags such as

<ol>, <li>, <nav>

In content.opf's manifest the navigation file is referred as:

<item href="nav.xhtml" id="nav" media-type="application/xhtml+xml" properties="nav" />


For backward compatibility reasons (most ereaders are only prepared to read EPUB2) the two files might be present in the same ePub publication.

A nav.xhtml:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
  <head>
    <title>From Print to Ebooks: a Hybrid Publishing Toolkit for the Arts</title>
    <link rel="stylesheet" type="text/css" href="stylesheet.css" />
  </head>
  <body>
    <nav epub:type="toc">
      <h1 id="toc-title">From Print to Ebooks: a Hybrid Publishing Toolkit for the Arts</h1>
      <ol class="toc">
        <li id="toc-li-1">
          <a href="ch001.xhtml">Colophon</a>
        </li>
        <li id="toc-li-2">
          <a href="ch002.xhtml">1 Introduction</a>
          <ol class="toc">
            <li id="toc-li-3">
              <a href="ch003.xhtml">Industry promises vs. reality</a>
            </li>
            <li id="toc-li-4">
              <a href="ch004.xhtml">What this Toolkit provides</a>
            </li>
         </ol>
        </li>
        <li id="toc-li-8">
          <a href="ch008.xhtml">2 The basics</a>
          <ol class="toc">
            <li id="toc-li-9">
              <a href="ch009.xhtml">Layout and structure of a text</a>
            </li>
          </ol>
        </li>
    </nav>
  </body>
</html>


content document .xhtml

Content documents are the place-holders for text and images in an ePub.

In EPUB 3, content documents are XHTML5(HTML5 + XML), under .xhtml file extension.

What you find in a ePub content document is essentially what you find behind webpage: HTML code.

EPUB3 allows the use of HTML5, including elements such as <section>, <article>, <nav>, <header>, <footer>, <aside>. <video>, <audio> and Mathematical Markup Language (MathML). However keep in mind that some of these tags wont work with much of the current ereaders, which are only prepared to render EPUB 2 ePubs.


As a rule of thumb is good to have chunk content into several documents, either one for each section or subsection of a book. The reason for this, is that ereaders are often slow and take their time to render the content. To facilitate their job is a good practice to have the content divided in smaller pieces.

XML requirements

As content document are HTML with added XML they obey the stricter XML rules:

  • every element has an opening and closing tag.
  • Empty tags such as
    must be self-closing:
  • Elements and attribute names are case-sensitive and always lowercase.
  • Namespaces must be declared (for embedded MathML, SVG, EPUB elements and attributes, etc.).
  • Ampersands (&) must be escaped as &.

Example of a content document:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
  <head>
    <meta charset="utf-8" />
    <meta content="pandoc" name="generator" />
    <title>Acknowledgments</title>
    <link href="stylesheet.css" rel="stylesheet" type="text/css" />
  </head>
  <body>
    <section class="level1" epub:type="acknowledgments frontmatter" id="acknowledgments">
      <h1>Acknowledgments</h1>
      <p>In fall 2012, as part of my graduate studies, I began studying Instagram user self-portraits. The users whose images I discuss in this notebook may or may not have knowledge that I critiqued their selfies, as I did not interview users about their images. My study is assumptive, and my work is influenced by the research of Marshall McLuhan and Vilém Flusser. Thank you to the users who kindly gave me permission to republish their images.</p>
    </section>
  </body>
</html>


semantic inflection

It is possible to enrich the content documents with semantic inflection. In other words, to attach meaning about the purpose or nature an element to itself.

Semantic inflections use the epub:type attribute to describe the content contained within that element. The terms that can be used in these inflections can be found at http://www.idpf.org/epub/vocab/structure/

In the previous example epub:type appears in <section class="level1" epub:type="acknowledgments frontmatter" id="acknowledgments">. It informs us, and the reading system that the current section belongs to the book's frontmatter and constitutes the acknowledgments.


images

Images can either be a raster (jpg or png) or an SVG, however it is safer to opt for a raster as it is more likely that all eredears will render it.

They need to be included in package document(.opf)'s manifest:

<manifest>
    <item href="media/file0.png" id="file0_png" media-type="image/png" />
    <item id="file1_jpg" href="media/file1.jpg" media-type="image/jpeg" />
    ...
</manifest>


cover

A cover is essentially an image, either a raster image (jpg or png) or an SVG, however it is safer to opt for a raster.

Within the package document(.opf), the cover is declared both within the manifest, as item with properties="cover-image", so that the reading systems identify the image in question as the cover of the ePub.

<item id="cover-img" href="cover.jpg" media-type="image/jpeg" properties="cover-image"/>

If you want the cover to also appear as the first content the reader sees when she start the book, make sure to include a content document that includes the image:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <link rel="stylesheet" type="text/css" href="stylesheet.css" />
</head>
<body>
<div id="cover-image">
<img src="media/cover.png" alt="cover image" />
</div>
</body>
</html>
</xml>


style

CSS style-sheets need to be declared in the package document(.opf)'s manifest.

And

Linked to in every content document's <head>

<link rel="stylesheet" type="text/css" href="stylesheet.css" />


fonts

WOFF and OTF fonts can also be used, although few will be the ereader that will interpret them correctly.

To include custom fonts they need to be declared in the package document(.opf)'s manifest, as a resource used in the publication

<manifest>
    <item href="OpenSans-Regular.ttf" id="OpenSans-Regular_ttf" media-type="application/x-font-truetype" />
    <item href="OpenSans-LightItalic.ttf" id="OpenSans-LightItalic_ttf" media-type="application/x-font-truetype" />
    ...
</manifest>

zip all the files into to an EPUB

If you are currently in the folder where all the content of the EPUB is,

in command line you can run:

zip book.epub mimetype * -r;
  • zip is the command that zips the file
  • book.epub is the destination file of EPUB
  • mimetype - is the first file to be zipped - so that it informs the operating system of the reader how the ebook is formatted
  • * is a wildcard that stands for every file and folder
  • -r recursive arguments, makes sure all the files within folder are included in the zip

Validate

Check the health of your EPUB in http://validator.idpf.org/

experiment

Find interesting written or visual content, which you consider suitable for an ebook.

The content can be in any digital form - word document, HTML webpage, plain-text, etc.

Examples of interesting context re contextualization:

  • Kenneth Goldsmith Uncreative Writing: Transcribing Project Runway chat-room log transcribing the action of Project Runway;
  • Report from the Desert published Grey Scale Press: a book made from an abandoned blog
  • Silvio Lorusso and Sebastian Schmieg: 56 Broken Kindle Screens


Bibliography

D. P. T. Collective. From Print to Ebooks: A Hybrid Publishing Toolkit for the Arts. Vol. 1. Institute of Network Cultures, 2014. http://networkcultures.org/blog/publication/from-print-to-ebooks-a-hybrid-publishing-toolkit-for-the-arts/.
Garrish, Matt, and Markus Gylling. EPUB 3 Best Practices. Vol. 1. O’Reilly Media, Inc., 2013.
McGuire, Hugh, and Brian O’Leary. Book: A Futurist’s Manifesto. Vol. 1. O’Reilly Media, 2012. http://book.pressbooks.com/.



ePubs and designers

Is there anything for designers in the EPUB format?

Instead of thinking of EPUBs' design process, as similar to designing a paper book, the process resembles more the development of a Webpage.

  • content must be reflowable - must adapt itself to the dimensions of the reading device.
  • design is no longer specific to the physical characteristics of the container – ereader –, but generic, in the sense that it has to adapt to several containers.
  • when the renderings change from container to container, the structuring of the text can not rely on visual structuring. Structure has to be semantic: the document has to be semantic tagged so that the text's different elements – headings, footnotes, body text – are explicit. If that is the case, even if e-readers changes the rendering of your (visual) design, the semantic tagging (or design) will persist. The ereader it will render a heading as a heading, not simply as a piece of body text.