Difference between revisions of "Courses/EPUB"

From Publication Station
Line 199: Line 199:
</source>
</source>


<div style="background:grey">
<div style="background:#DCDAD5">
==== Dublin Core terms ====
==== Dublin Core terms ====
The metadata uses [http://dublincore.org/documents/dcmi-terms/ Dublin Core terms]
The metadata uses [http://dublincore.org/documents/dcmi-terms/ Dublin Core terms]

Revision as of 11:21, 14 January 2015

What is ePub?

In this workshop you will come to grips with the simple but rich, and increasingly popular ePub, ebook format.

You will learn about the ideas that underline this publishing format, and will get your hands dirty by opening, tweaking, and putting back together existing ePubs.

As the cherry on top of the cake you will experiments with small programs to turn the ePubs into gif animations, word maps, or remixed versions of themselves.

day 1

what is an EPUB?

EPUB is:

A format for representing electronic text and images.

Other formats that allow for the representation of electronic text and images – ebooks for short – are PDF, Mobi, AZW, Comic Book Archive, HTML, LaTex, etc.

The EPUB format is developed by the International Digital Publishing Forum

EPUB characteristics

  • a zip archive, renamed .epub instead of .zip
  • based in web technologies: HTML, XML, Javascript, TTF and WOFF fonts css
  • currently has 2 versions: EPUB 2 and EPUB 3
    • EPUB 3 introduced HTML5 (video, audio, canvas tags), elaborate CSS, scripting (Javascript),
    • For most part, EPUB 3 is backward compatible with EPUB 2
    • Most e-readers are only prepared for EPUB 2, however, with some limitations, they can EPUB 3.
    • Readium reader is meant as "a reference system for rendering EPUB 3"

reflowable format

EPUBs represent a paradigm shift away from the page-centered representations of text.

The page is no longer the unit or canvas. The concept of the page is simply present in EPUBs.

Instead the unit is the screen, whose size changes from device to device.

As EPUBs are meant to be read on screens, they need to adapt and reflow themselves to any screen resolution or orientation.

http://www.freesoftwaremagazine.com/files/nodes/3396/fig_zoo_of_screen_sizes.jpg Source:http://www.freesoftwaremagazine.com/

e-reading devices

  • E-ink (a type of Electronic paper): Kindle, Nook, Kobo.
  • LCD: Smart-phones, tables, Desktop and laptop computers

Source: From Print to Ebooks: A Hybrid Publishing Toolkit for the Arts

Note: Kindle devices don't support EPUB, they read only Amazon's proprietary ebook format AZW or Mobi.

e-reading software

  • PC:
  • iOs: iBooks
  • Android: Aldiko, Moon Reader

sources of EPUBs

changing EPUBs

EPUBs have one advantage over other ebook formats, and other digital media such as audio, video, or images.

EPUBS are fully changeable.


The files contained in an EPUB are, with the exception of images, plain text files in either xhtml or css formats.

Theses files can be opened, edited and saved by any text editor (different from a word-processor).

Changes can be made to the:

  • text content of the book
  • the images used in the book
  • the metadata
  • the presentation layout: CSS (including fonts)
  • Order of the book: the EPUB's spine

Once all changes are done, the source-files can be be bundled once more under a zip archive under the extension .epub

changing ePubs in Calibre

Calibre nowadays comes with a Edit book function that:

  • exposes the ePub's source-files
  • it allows the user to introduce changes to the ePub
  • can also be used to create ePubs from scratch

Such easiness to manipulate ebooks, is something that will probably fuel many future discussions, on the nature of the book.

  • is a version of a book still the same book?
  • will there be any possibility to distinguish the original from its versions?
  • can this feature be used to enrich an expand what the book is?

EPUB anatomy

a zip archive

An EPUB file is ZIP archive, with the filename extension '.epub' instead of '.zip'.

"[EPUB] is a publication format, and as such it specifies and documents a host of things that publications need to include—content documents, style sheets, images, media, scripts, fonts, and more" (Garrish, 2013)

It is a web-site in a box.

Zip EPUB.png

  • Bulleted list item

inside the .epub archive

Inside EPUB.svg


META-INF folder

Contains only the container.xml file

This file points the reading devices to the content.opf file

container.xml source:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<container xmlns="urn:oasis:names:tc:opendocument:xmlns:container" version="1.0">
  <rootfiles>
    <rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>


mimetype

A one line file that indicate the type of application the .epub archive is:

application/epub+zip

The mimetype file informs the reading device on the file that it is reading: an application in epub+zip format.

OEBPS folder

Container for all of the EPUB's contents.

This folder is not required, but it is commonly used in EPUB archives.

Sub folders can be created inside it: e.g. for the storage of images, or fonts.

content.opf

Often called package document content.opf is the epicenter of every EPUB.

If an EPUB is a website in a box content.opf is the file that organizes all the stuff in the box.

Structure followed by content.opf:

<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" unique-identifier="epub-id-1" version="3.0">
  <metadata>
   <metadata>
      ...
   </metadata>
   <manifest>
      ...
   </manifest>
   <spine>
      ...
   </spine>
   <guide>
    ...
   </guide>
</package>

The content.opf file is divided in different sections:

metadata

Contains all the metadata describing the publication in question, such as title, author, publisher, description, date, identifier (isbn number), language.

A rich set of metadata helps reading system sorting and displaying the EPUB.

In larger context, such extensive ebook repositories, rich and accurate metadata is essential for the books to be discovered, read, and reused.

 
 <metadata>
<dc:identifier id="epub-id-1">urn:isbn:978-90-822345-4-1</dc:identifier>
<meta refines="#epub-id-1" property="identifier-type" scheme="onix:codelist5">01</meta>
<dc:title id="epub-title-1">Work Title</dc:title>
<meta refines="#epub-title-1" property="title-type">main</meta>
<dc:publisher>Great Publisher</dc:publisher>    
<dc:date id="epub-date">2014-12</dc:date>
<dc:language>en-US</dc:language>

<dc:creator id="epub-creator-0">Luther Blisset</dc:creator>
<meta refines="#epub-creator-0" property="role" scheme="marc:relators">aut</meta>

<dc:contributor id="epub-contributor-1">Monty Cantsin</dc:contributor>
<meta refines="#epub-contributor-1" property="role" scheme="marc:relators">edt</meta>

<dc:contributor id="epub-contributor-5">Alan Smithee</dc:contributor>
<meta refines="#epub-contributor-5" property="role" scheme="marc:relators">dsr</meta>

<dc:rights>Creative Commons Attribution, NonCommercial, ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)</dc:rights>

<dc:subject>electronic publishing, e-publishing, ebooks</dc:subject>

<dc:description>Electronic publishing has become an essential medium for the field of contemporary arts and design.</dc:description>
</metadata>

Dublin Core terms

The metadata uses Dublin Core terms

Dublin Core terms constitutes a metadata scheme, which intends to provide a simple and limited vocabulary to describe web resources, hence its adoption in EPUB.

<dc:creator id="epub-creator-0">Luther Blisset</dc:creator>
<meta refines="#epub-creator-0" property="role" scheme="marc:relators">aut</meta>

Inside the ePub Dublin Core structures the metadata by using a DC term and giving it a value;

To produce more accurate metadata, the dc terms are refined with a <meta>

In the example case above:

  • DC term is "creator"; and has the value Luther Blisse
  • and the <meta scheme="marc:relators"> specifies that this creator is an author: "aut"
  • "creator" is one the Dublin Core terms
  • "aut" is one of the MARC Relator terms made available by the Library of Congress. In conjunction with Dublin Core terms, they provide a more detailed description.

The combination of the more generic DC terms, with the more specific MARC Relator terms can provide a very rich, detailed, and yet generic set of title level metadata.

Common DC terms used are: creator, title, identifier, language, description, subject, publisher


Note: That in order two use the dublin core terms, its namespace must be declared, either in the package tag:

<package xmlns="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" unique-identifier="epub-id-1" version="3.0">

Or the metadata tag:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">

manifest

All the resources (files) used in the EPUB are declared in the manifest.

If a resources is not in the manifest it won't be present in the EPUB.

<manifest>
    <item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml" />
    <item href="stylesheet.css" id="style" media-type="text/css" />
    <item href="nav.xhtml" id="nav" media-type="application/xhtml+xml" properties="nav" />
    <item href="cover.xhtml" id="cover_xhtml" media-type="application/xhtml+xml" />
    <item href="ch001.xhtml" id="ch001_xhtml" media-type="application/xhtml+xml" />
    <item href="ch002.xhtml" id="ch002_xhtml" media-type="application/xhtml+xml" />
    <item href="media/file0.png" id="file0_png" media-type="image/png" />
    <item href="UbuntuMono-B.ttf" id="UbuntuMono-B_ttf" media-type="application/x-font-truetype" />
</manifest>

spine

Stipulates the order of the different EPUB content documents (xhtml or SVG files).

  <spine toc="ncx">
    <itemref idref="cover_xhtml" linear="yes" />
    <itemref idref="title_page_xhtml" linear="yes" />
    <itemref idref="nav" linear="no" />
    <itemref idref="ch001_xhtml" />
    <itemref idref="ch002_xhtml" />

Note: 'linear="no"' hides that content document from the ePub's contents.

guide

The guide is deprecated in EPUB 3, however you might still come across it.

It often points to nav file.

Navigation - table of contents

The Navigation document is a hyper-linked table-of-contents, that allow the reader to quickly reach the different sections of a ePub.

In EPUB 2 the navigation document was toc.ncx, written in XML.

EPUB 3 uses file, usually named nav.xhtml, which is slightly simpler document, based of HTML with common tags such as

<ol>, <li>, <nav>

In content.opf's manifest the navigation file is referred as:

<item href="nav.xhtml" id="nav" media-type="application/xhtml+xml" properties="nav" />


For backward compatibility reasons (most ereaders are only prepared to read EPUB2) the two files might be present in the same ePub publication.

A nav.xhtml:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
  <head>
    <title>From Print to Ebooks: a Hybrid Publishing Toolkit for the Arts</title>
    <link rel="stylesheet" type="text/css" href="stylesheet.css" />
  </head>
  <body>
    <nav epub:type="toc">
      <h1 id="toc-title">From Print to Ebooks: a Hybrid Publishing Toolkit for the Arts</h1>
      <ol class="toc">
        <li id="toc-li-1">
          <a href="ch001.xhtml">Colophon</a>
        </li>
        <li id="toc-li-2">
          <a href="ch002.xhtml">1 Introduction</a>
          <ol class="toc">
            <li id="toc-li-3">
              <a href="ch003.xhtml">Industry promises vs. reality</a>
            </li>
            <li id="toc-li-4">
              <a href="ch004.xhtml">What this Toolkit provides</a>
            </li>
         </ol>
        </li>
        <li id="toc-li-8">
          <a href="ch008.xhtml">2 The basics</a>
          <ol class="toc">
            <li id="toc-li-9">
              <a href="ch009.xhtml">Layout and structure of a text</a>
            </li>
          </ol>
        </li>
    </nav>
  </body>
</html>

content document .xhtml

Content documents are the place-holders for text and images in an ePub.

In EPUB 3 content documents are XHTML5(HTML5 + XML), under .xhtml file extension.

What you find in a ePub content document is essentially what you find behind webpage: HTML code.

EPUB3 allows the use of HTML5, including elements such as <section>, <article>, <nav>, <header>, <footer>, <aside>. <video>, <audio> and Mathematical Markup Language (MathML). However keep in mind that some of these tags wont work with much of the current ereaders, which are only prepared to render EPUB 2 ePubs.


As a rule of thumb is good to have chunk content into several documents, either one for each section or subsection of a book. The reason for this, is that ereaders are often slow and take their time to render the content. To facilitate their job is a good practice to have the content divided in smaller pieces.


XML requirements

XML requires you to make sure:

  • every element has an opening and closing tag.
  • Empty tags such as
    in HTML must be self-closing:
  • Element and attribute names are case-sensitive and always lowercase.
  • Namespaces must be declared (e.g., for embedded MathML, SVG, EPUB elements and attributes, etc.).
  • Ampersands (&) must be escaped as &.

Example content document:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
  <head>
    <meta charset="utf-8" />
    <meta content="pandoc" name="generator" />
    <title>Acknowledgments</title>
    <link href="stylesheet.css" rel="stylesheet" type="text/css" />
  </head>
  <body>
    <section class="level1" epub:type="acknowledgments frontmatter" id="acknowledgments">
      <h1>Acknowledgments</h1>
      <p>In fall 2012, as part of my graduate studies, I began studying Instagram user self-portraits. The users whose images I discuss in this notebook may or may not have knowledge that I critiqued their selfies, as I did not interview users about their images. My study is assumptive, and my work is influenced by the research of Marshall McLuhan and Vilém Flusser. Thank you to the users who kindly gave me permission to republish their images.</p>
      <p>I would like to recognize my professors, Anne-Marie Oliver, Barry Sanders, Joan Handwerg, Marie-Pierre Hasne, and Elie Charpentier, for their guidance and wisdom.</p>
      <p>Langdon Herrick, thank you for your feedback and numerous hours of assistance.</p>
      <p>Evangelina Owens, thank you for your never-ending words of encouragement and advice, as well as your countless hours of support.</p>
      <p>I am very grateful to Miriam Rasch and the Institute of Network Culture’s staff for publishing my research. I greatly appreciate their hard work and efforts.</p>
      <p>This notebook is dedicated to my family, who provide only love and support.</p>
    </section>
  </body>
</html>

An xhtml is essentially a simple webpage. However there are a few additions, such as the xml declaration, and the namespace in the html element.

semantic inflection

These additions make possible the inclusion of semantic inflection, or in other words, to attach meaning about the purpose or nature an element to itself. Semantic inflections use the epub:type attribute to describe the content contained within that element. The terms that can be used in these inflections can be found at http://www.idpf.org/epub/vocab/structure/

In the previous example epub:type appears in <section class="level1" epub:type="acknowledgments frontmatter" id="acknowledgments">. It informs us, and the reading system that the current section belongs to the book's frontmatter and constitutes the acknowledgments.


images

Images can either be a raster (jpg or png) or an SVG, however it is safer to opt for a raster.

They need to be included in package document(.opf)'s manifest:

<manifest>
    <item href="media/file0.png" id="file0_png" media-type="image/png" />
    <item id="file1_jpg" href="media/file1.jpg" media-type="image/jpeg" />
    ...
</manifest>

cover

A cover is essentially an image, either a raster image (jpg or png) or an SVG, however it is safer to opt for a raster.

Within the package document(.opf), the cover is declared both within the manifest, as item with properties="cover-image", so that the reading systems identify the image in question as the cover of the ePub in question.

<item id="cover-img" href="cover.jpg" media-type="image/jpeg" properties="cover-image"/>

If you want the cover to also appear inline, when you start the book, make sure to include a content document, just with the image:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <link rel="stylesheet" type="text/css" href="stylesheet.css" />
</head>
<body>
<div id="cover-image">
<img src="media/cover.png" alt="cover image" />
</div>
</body>
</html>
</xml>

style

If you want to use CSS style-sheet, sure each content document links to the sheet.

<link rel="stylesheet" type="text/css" href="stylesheet.css" />


fonts

WOFF and OTF fonts can also be used, although few will be the ereader that will interpret them correctly.

To include custom fonts they need to be declared in the package document(.opf)'s manifest, as a resource used in the publication

<manifest>
    <item href="OpenSans-Regular.ttf" id="OpenSans-Regular_ttf" media-type="application/x-font-truetype" />
    <item href="OpenSans-LightItalic.ttf" id="OpenSans-LightItalic_ttf" media-type="application/x-font-truetype" />
    ...
</manifest>



experiment

Find interesting written or visual content, which you consider suitable for an ebook.

The content can be in any digital form - word document, HTML webpage, plain-text, etc.

Examples of interesting context re contextualization:

  • Kenneth Goldsmith Uncreative Writing: Transcribing Project Runway chat-room log transcribing the action of Project Runway;
  • Report from the Desert published Grey Scale Press: a book made from an abandoned blog
  • Silvio Lorusso and Sebastian Schmieg: 56 Broken Kindle Screens


Bibliography

D. P. T. Collective. From Print to Ebooks: A Hybrid Publishing Toolkit for the Arts. Vol. 1. Institute of Network Cultures, 2014. http://networkcultures.org/blog/publication/from-print-to-ebooks-a-hybrid-publishing-toolkit-for-the-arts/.
Garrish, Matt, and Markus Gylling. EPUB 3 Best Practices. Vol. 1. O’Reilly Media, Inc., 2013.
McGuire, Hugh, and Brian O’Leary. Book: A Futurist’s Manifesto. Vol. 1. O’Reilly Media, 2012. http://book.pressbooks.com/.



ePubs and designers

Is there any satisfaction for designers in the EPUB format?

Instead of thinking of EPUBs' design process, as similar to designing a paper book, the process resembles more the development of a Webpage.

  • content must be reflowable - must adapt itself to the dimensions of the reading device.
  • design is no longer specific to the physical characteristics of the container – ereader –, but generic, in the sense that it has to adapt to several containers.
  • when the renderings change from container to container, the structuring of the text can not rely on visual structuring. Structure has to be semantic: the document has to be semantic tagged so that the text's different elements – headings, footnotes, body text – are explicit. If that is the case, even if e-readers changes the rendering of your (visual) design, the semantic tagging (or design) will persist. The ereader it will render a heading as a heading, not simply as a piece of body text.