Difference between revisions of "Courses/EPUB"

From Publication Station
 
(31 intermediate revisions by the same user not shown)
Line 1: Line 1:
<span style="background:#CCFF66;font-size:40px;font-weight:bold;">What is ePub?</span>


''In this workshop you will come to grips with the simple but rich, and increasingly popular ePub, ebook format.''
* [[Courses/EPUB/day1]]
* [[Courses/EPUB/day2]]
* [[Courses/EPUB/day3]]


''You will learn about the ideas that underline this publishing format, and will get your hands dirty by opening, tweaking, and putting back together existing ePubs.''


''As the cherry on top of the cake you will experiments with small programs to turn the ePubs into gif animations, word maps, or remixed versions of themselves.''


=day 1=
==<span style="background:#CCFF66;">What is ePub?</span>==
== what is an EPUB? ==


== EPUB is: ==
''In this workshop you will come to grips with the simple but rich, and increasingly popular ePub, ebook format.''
A format for representing electronic text and images.  


Other formats that allow for the representation of electronic text and images &ndash; ''ebooks'' for short &ndash; are PDF, Mobi, AZW, Comic Book Archive, HTML, LaTex, etc.
''You will learn about the ideas that underline this publishing format, and will get your hands dirty by opening, tweaking, and putting back together existing ePubs.''


The EPUB format is developed by the  [http://idpf.org/ International Digital Publishing Forum]
''As the cherry on top of the cake you will experiments with small programs to turn the ePubs into gif animations, word maps, or remixed versions of themselves.''
 
== EPUB characteristics ==
* a zip archive, renamed .epub instead of .zip
* based in web technologies: HTML, XML, Javascript, TTF and WOFF fonts css
* currently has 2 versions: EPUB 2 and EPUB 3
** EPUB 3 introduced HTML5 (video, audio, canvas tags), elaborate CSS, scripting (Javascript),
** For most part, EPUB 3 is backward compatible with EPUB 2
** Most e-readers are only prepared for EPUB 2, however, with some limitations, they can EPUB 3.
** [http://readium.org/ Readium] reader is meant as "a reference system for rendering EPUB 3" 
 
* not page centered
* [[#reflowable format|reflowable]]
* fully [[#changing EPUBs | changeable]]
 
== reflowable format==
EPUBs represent a paradigm shift away from the page-centered representations of text.
 
The page is no longer the unit or canvas. The concept of the page is simply present in EPUBs.
 
Instead the unit is the screen, whose size changes from device to device.
 
As EPUBs are meant to be read on screens, they need to adapt and '''reflow themselves to any screen resolution or orientation'''.
 
http://www.freesoftwaremagazine.com/files/nodes/3396/fig_zoo_of_screen_sizes.jpg Source:http://www.freesoftwaremagazine.com/
 
== e-reading devices ==
* E-ink (a type of Electronic paper): Kindle, Nook, Kobo.
* LCD: Smart-phones, tables, Desktop and laptop computers
[[File:readingHardware.png | Source: From Print to Ebooks: A Hybrid Publishing Toolkit for the Arts]]
 
Note: Kindle devices don't support EPUB, they read only Amazon's proprietary ebook format AZW or Mobi.
 
== e-reading software ==
* PC:
** [http://calibre-ebook.com/ Calibre]
** [http://readium.org/ Readium] - plugin for the Chrome/Chromium browser
** [http://www.epubread.com/en/ EPUBReader] - plugin for the Firefox browser
* iOs: iBooks
* Android: Aldiko, Moon Reader
 
== sources of EPUBs==
* [http://www.gutenberg.org/ Project Gutenberg]
* [https://archive.org/details/texts Internet Archive]
* [https://www.scribd.com Scribd]
* Pirate: The Pirate Bay, [http://gen.lib.rus.ec/ Library Genesis], [http://aaaaarg.org/ AAAAARG], [http://monoskop.org/log/ Monoskop], [http://it-ebooks.info/ IT Books]
 
== changing EPUBs==
 
EPUBs have one advantage over other ebook formats, and other digital media such as audio, video, or images.
 
EPUBS are '''fully changeable'''.
 
 
[[#EPUB anatomy|The files contained in an EPUB]] are, with the exception of images, plain text files in either xhtml or css formats.
 
Theses files can be opened, edited and saved by any [http://en.wikipedia.org/wiki/Text_editor text editor] (different from a word-processor).
 
Changes can be made to the:
* text content of the book
* the images used in the book
* the metadata
* the presentation layout: CSS (including fonts)
* Order of the book: the EPUB's spine
 
Once all changes are done, the source-files can be be bundled once more under a zip archive under the extension <code>.epub</code>
 
==changing ePubs in Calibre==
Calibre nowadays comes with a '''Edit book''' function that:
* exposes the ePub's source-files
* it '''allows the user to introduce changes to the ePub'''
* can also be used to create ePubs from scratch
 
Such easiness to manipulate ebooks, is something that will probably fuel many future discussions, on the nature of the book.
* is a version of a book still the same book?
* will there be any possibility to distinguish the ''original'' from its versions?
* can this feature be used to enrich an expand what the book is?
 
== EPUB anatomy ==
=== a zip archive ===
An EPUB file is ZIP archive, with the filename extension '.epub' instead of '.zip'.
 
"[EPUB] is a publication format, and as such it specifies and documents a host of things that publications need to include—content documents, style sheets, images, media, scripts, fonts, and more" (Garrish, 2013)
 
It is a ''web-site in a box''.
 
[[File:zip_EPUB.png]]
 
== inside the epub archive ==
 
[[File:inside_EPUB.svg]]
 
=== META-INF folder ===
Contains only the container.xml file
 
This file points the reading devices to the content.opf file
 
container.xml source:
<source lang="xml">
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<container xmlns="urn:oasis:names:tc:opendocument:xmlns:container" version="1.0">
  <rootfiles>
    <rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>
</source>
 
=== mimetype ===
A one line file that indicate the type of application the .epub archive is: <pre>application/epub+zip</pre>
 
The mimetype file informs the reading device on the file that it is reading: an application in epub+zip format.
 
== OEBPS folder ==
Container for all of the EPUB's contents.
 
This folder is not required, but it is commonly used in EPUB archives.
 
Sub folders can be created inside it: e.g. for the storage of images, or fonts.
 
=== content.opf ===
Often called ''package document'' content.opf is the epicenter of every EPUB.
 
If an EPUB is a website in a box content.opf is the file that organizes all the stuff in the box.
 
Structure followed by content.opf:
<source lang="xml">
<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" unique-identifier="epub-id-1" version="3.0">
  <metadata>
  <metadata>
      ...
  </metadata>
  <manifest>
      ...
  </manifest>
  <spine>
      ...
  </spine>
  <guide>
    ...
  </guide>
</package>
</source>
 
The content.opf file is divided in different sections:
==== metadata ==== 
Contains all the metadata describing the publication in question, such as title, author, publisher, description, date, identifier (isbn number), language.
 
A rich set of metadata helps reading system sorting and displaying the EPUB.
 
In larger context, such extensive ebook repositories, rich and accurate metadata is essential for the books to be discovered, read, and reused.
 
<source lang="xml">
<metadata>
<dc:identifier id="epub-id-1">urn:isbn:978-90-822345-4-1</dc:identifier>
<meta refines="#epub-id-1" property="identifier-type" scheme="onix:codelist5">01</meta>
<dc:title id="epub-title-1">Work Title</dc:title>
<meta refines="#epub-title-1" property="title-type">main</meta>
<dc:publisher>Great Publisher</dc:publisher>   
<dc:date id="epub-date">2014-12</dc:date>
<dc:language>en-US</dc:language>
 
<dc:creator id="epub-creator-0">Luther Blisset</dc:creator>
<meta refines="#epub-creator-0" property="role" scheme="marc:relators">aut</meta>
 
<dc:contributor id="epub-contributor-1">Monty Cantsin</dc:contributor>
<meta refines="#epub-contributor-1" property="role" scheme="marc:relators">edt</meta>
 
<dc:contributor id="epub-contributor-5">Alan Smithee</dc:contributor>
<meta refines="#epub-contributor-5" property="role" scheme="marc:relators">dsr</meta>
 
<dc:rights>Creative Commons Attribution, NonCommercial, ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)</dc:rights>
 
<dc:subject>electronic publishing, e-publishing, ebooks</dc:subject>
 
<dc:description>
Electronic publishing has become an essential medium for the field of contemporary arts and design. While traditional models of art book publishing are becoming less and less viable for artists, writers, designers and publishers, periodicals like the <i>e-flux journal</i>, art and design blogs, and internet libraries such as <i>UbuWeb</i> are widely read and discussed.
 
Is electronic book technology really the way forward for these types of publications? Or does the answer lie in some hybrid form of publication, in which both print and electronic editions of the same basic content can be published in a parallel or complementary fashion?  And perhaps more importantly, what are the changes in workflow and design mentality that will need to be implemented in order to allow for such hybrid publications?
 
This Toolkit is meant for everyone working in art and design publishing. No specific expertise of digital technology, or indeed traditional publishing technology, is required. The Toolkit provides hands-on practical advice and tools, focusing on working solutions for low-budget, small-edition publishing.
 
Everything in the Hybrid Publishing Toolkit is based on real-world projects with art and design publishers. Editorial scenarios include art and design catalogues and periodicals, research publications, and artists'/designer's books.
</dc:description>
</metadata>
</source>
 
==== Dublin Core terms ====
The metadata uses [http://dublincore.org/documents/dcmi-terms/ Dublin Core terms]
 
'''Dublin Core terms''' constitutes a metadata scheme, which intends to provide a simple and limited vocabulary to describe web resources, hence its adoption in EPUB.
 
<pre>
<dc:creator id="epub-creator-0">Luther Blisset</dc:creator>
<meta refines="#epub-creator-0" property="role" scheme="marc:relators">aut</meta>
</pre>
 
Inside the ePub Dublin Core structures the metadata by using a DC term and giving it a value;
 
To produce more accurate metadata, the dc terms are refined  with a <code><meta></code>
 
In the example case above:
* DC term is "creator"; and has the value Luther Blisse
* and the <code><meta scheme="marc:relators"></code> specifies that this creator is an author: "aut"
 
* "creator" is one the [http://dublincore.org/documents/2012/06/14/dcmi-terms/ Dublin Core terms] 
* "aut" is one of the [http://www.loc.gov/marc/relators/relaterm.html MARC Relator terms] made available by the Library of Congress. In conjunction with Dublin Core terms, they provide a more detailed description.
 
The combination of the more generic DC terms, with the more specific MARC Relator terms can provide a very rich, detailed, and yet generic set of title level metadata.
 
Common DC terms used are: creator, title, identifier, language, description, subject, publisher
 
 
Note: That in order two use the dublin core terms, its namespace must be declared, either in the package tag:
<package xmlns="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" unique-identifier="epub-id-1" version="3.0">
Or the metadata tag:
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
 
==== manifest ==== 
All the resources (files) used in the EPUB are declared in the manifest.
 
If a resources is not in the manifest it won't be present in the EPUB.
 
<source lang="xml">
<manifest>
    <item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml" />
    <item href="stylesheet.css" id="style" media-type="text/css" />
    <item href="nav.xhtml" id="nav" media-type="application/xhtml+xml" properties="nav" />
    <item href="cover.xhtml" id="cover_xhtml" media-type="application/xhtml+xml" />
    <item href="ch001.xhtml" id="ch001_xhtml" media-type="application/xhtml+xml" />
    <item href="ch002.xhtml" id="ch002_xhtml" media-type="application/xhtml+xml" />
    <item href="media/file0.png" id="file0_png" media-type="image/png" />
    <item href="UbuntuMono-B.ttf" id="UbuntuMono-B_ttf" media-type="application/x-font-truetype" />
</manifest>
</source>
 
==== spine ====
Stipulates the order of the different EPUB content documents (xhtml or SVG files).
<source lang="xml">
  <spine toc="ncx">
    <itemref idref="cover_xhtml" linear="yes" />
    <itemref idref="title_page_xhtml" linear="yes" />
    <itemref idref="nav" linear="no" />
    <itemref idref="ch001_xhtml" />
    <itemref idref="ch002_xhtml" />
</source>
 
Note: 'linear="no"' hides that content document from the ePub's contents.
 
==== guide ====
The guide is deprecated in EPUB 3, however you might still come across it.
 
It often points to [[#TOC|nav]] file.
 
=== Navigation - table of contents ===
The Navigation document is a hyper-linked table-of-contents, that allow the reader to quickly reach the different sections of a ePub.
 
In EPUB 2 the navigation document was '''toc.ncx''', written in XML.
 
EPUB 3 uses file, usually named '''nav.xhtml''', which is slightly simpler document, based of HTML with common tags such as <pre><ol>, <li>, <nav></pre>
 
In content.opf's manifest the navigation file is referred as:
<item href="nav.xhtml" id="nav" media-type="application/xhtml+xml" properties="nav" />
 
 
For backward compatibility reasons (most ereaders are only prepared to read EPUB2) the two files might be present in the same ePub publication.
 
A nav.xhtml:
<source lang="xml">
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
  <head>
    <title>From Print to Ebooks: a Hybrid Publishing Toolkit for the Arts</title>
    <link rel="stylesheet" type="text/css" href="stylesheet.css" />
  </head>
  <body>
    <nav epub:type="toc">
      <h1 id="toc-title">From Print to Ebooks: a Hybrid Publishing Toolkit for the Arts</h1>
      <ol class="toc">
        <li id="toc-li-1">
          <a href="ch001.xhtml">Colophon</a>
        </li>
        <li id="toc-li-2">
          <a href="ch002.xhtml">1 Introduction</a>
          <ol class="toc">
            <li id="toc-li-3">
              <a href="ch003.xhtml">Industry promises vs. reality</a>
            </li>
            <li id="toc-li-4">
              <a href="ch004.xhtml">What this Toolkit provides</a>
            </li>
        </ol>
        </li>
        <li id="toc-li-8">
          <a href="ch008.xhtml">2 The basics</a>
          <ol class="toc">
            <li id="toc-li-9">
              <a href="ch009.xhtml">Layout and structure of a text</a>
            </li>
          </ol>
        </li>
    </nav>
  </body>
</html>
</source>
 
=== content document .xhtml ===
Content documents are '''the place-holders for text and images in an ePub'''.
 
In EPUB 3 content documents are XHTML5(HTML5 + XML), under '''.xhtml''' file extension.
 
What you find in a ePub content document is essentially what you find behind webpage: HTML code.
 
EPUB3 allows the use of '''HTML5''', including elements such as <code><section>, <article>, <nav>, <header>, <footer>, <aside>.
<video>, <audio></code> and Mathematical Markup Language (MathML). However keep in mind that some of these tags wont work with much of the current ereaders, which are only prepared to render EPUB 2 ePubs.
 
 
As a rule of thumb is good to have '''chunk content into several documents''', either one for each section or subsection of a book.
The reason for this, is that ereaders are often slow and take their time to render the content. To facilitate their job is a good practice to have the content divided in smaller pieces.
 
 
 
==== XML requirements ====
XML requires you to make sure: 
* every element has an opening and closing tag.
* Empty tags such as <br> in HTML must be self-closing: <br/>
* Element and attribute names are case-sensitive and always lowercase.
* Namespaces must be declared (e.g., for embedded MathML, SVG, EPUB elements and attributes, etc.).
* Ampersands (&) must be escaped as &amp;.
 
Example content document.
<source lang="xml">
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
  <head>
    <meta charset="utf-8" />
    <meta content="pandoc" name="generator" />
    <title>Acknowledgments</title>
    <link href="stylesheet.css" rel="stylesheet" type="text/css" />
  </head>
  <body>
    <section class="level1" epub:type="acknowledgments frontmatter" id="acknowledgments">
      <h1>Acknowledgments</h1>
      <p>In fall 2012, as part of my graduate studies, I began studying Instagram user self-portraits. The users whose images I discuss in this notebook may or may not have knowledge that I critiqued their selfies, as I did not interview users about their images. My study is assumptive, and my work is influenced by the research of Marshall McLuhan and Vilém Flusser. Thank you to the users who kindly gave me permission to republish their images.</p>
      <p>I would like to recognize my professors, Anne-Marie Oliver, Barry Sanders, Joan Handwerg, Marie-Pierre Hasne, and Elie Charpentier, for their guidance and wisdom.</p>
      <p>Langdon Herrick, thank you for your feedback and numerous hours of assistance.</p>
      <p>Evangelina Owens, thank you for your never-ending words of encouragement and advice, as well as your countless hours of support.</p>
      <p>I am very grateful to Miriam Rasch and the Institute of Network Culture’s staff for publishing my research. I greatly appreciate their hard work and efforts.</p>
      <p>This notebook is dedicated to my family, who provide only love and support.</p>
    </section>
  </body>
</html>
</source> 
 
As you can see an xhtml is essentially a simple webpage. However there are a few additions, such as the xml declaration, and the namespace in the html element.
 
 
==== semantic inflection ====
These additions make possible the inclusion of [http://www.idpf.org/epub/30/spec/epub30-contentdocs.html#sec-xhtml-semantic-inflection '''semantic inflection'''], or in other words, to attach meaning about the purpose or nature an element to itself. Semantic inflections use the epub:type attribute to describe the content contained within that element. The terms that can be used in these inflections can be found at http://www.idpf.org/epub/vocab/structure/
 
In the previous example epub:type appears in <code><section class="level1" epub:type="acknowledgments frontmatter" id="acknowledgments"></code>. It informs us, and the reading system that the current section belongs to the book's frontmatter and constitutes the acknowledgments.
 
=== images ===
Images can either be a raster (jpg or png) or an SVG, however it is safer to opt for a raster.
 
They need to be included in ''package document''(.opf)'s manifest:
 
<source lang="xml">
<manifest>
    <item href="media/file0.png" id="file0_png" media-type="image/png" />
    <item id="file1_jpg" href="media/file1.jpg" media-type="image/jpeg" />
    ...
</manifest>
</source>
 
=== cover ===
A cover is essentially an image, either a raster image (jpg or png) or an SVG, however it is safer to opt for a raster.
 
Within the ''package document''(.opf), the cover is declared both within the manifest, as item with properties="cover-image", so that the reading systems identify the image in question as the cover of the ePub in question.
<item id="cover-img" href="cover.jpg" media-type="image/jpeg" properties="cover-image"/>
 
If you want the cover to also appear inline, when you start the book, make sure to include a ''content document'', just with the image:
<source lang="xml">
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <link rel="stylesheet" type="text/css" href="stylesheet.css" />
</head>
<body>
<div id="cover-image">
<img src="media/cover.png" alt="cover image" />
</div>
</body>
</html>
</xml>
 
=== styles ===
If you want to use CSS style-sheet, sure each ''content document'' links to the sheet.
<link rel="stylesheet" type="text/css" href="stylesheet.css" />
 
=== fonts ===
WOFF and OTF fonts can also be used, although few will be the ereader that will interpret them correctly.
 
To include custom fonts they need to be declared in the ''package document''(.opf)'s manifest, as a resource used in the publication
 
<source lang="xml">
<manifest>
    <item href="OpenSans-Regular.ttf" id="OpenSans-Regular_ttf" media-type="application/x-font-truetype" />
    <item href="OpenSans-LightItalic.ttf" id="OpenSans-LightItalic_ttf" media-type="application/x-font-truetype" />
    ...
</manifest>
</source>
 
 
 
==experiment==
Find interesting written or visual content, which you consider suitable for an ebook.
 
The content can be in any digital form - word document, HTML webpage, plain-text, etc.
 
Examples of interesting context re contextualization:
* Kenneth Goldsmith Uncreative Writing: Transcribing Project Runway chat-room log transcribing the action of Project Runway;
* Report from the Desert published [http://greyscalepress.com/ Grey Scale Press]: a book made from an abandoned [http://reportdesert.blogspot.co.at/ blog]
* Silvio Lorusso and Sebastian Schmieg: [http://sebastianschmieg.com/56brokenkindlescreens/ 56 Broken Kindle Screens]
 
=Bibliography=
<div class="csl-entry">D. P. T. Collective. <i>From Print to Ebooks: A Hybrid Publishing Toolkit for the Arts</i>. Vol. 1. Institute of Network Cultures, 2014. http://networkcultures.org/blog/publication/from-print-to-ebooks-a-hybrid-publishing-toolkit-for-the-arts/.</div>
<div class="csl-entry">Garrish, Matt, and Markus Gylling. <i>EPUB 3 Best Practices</i>. Vol. 1. O’Reilly Media, Inc., 2013.</div>
<div class="csl-entry">McGuire, Hugh, and Brian O’Leary. <i>Book: A Futurist’s Manifesto</i>. Vol. 1. O’Reilly Media, 2012. http://book.pressbooks.com/.</div>
 
 
 
 
<div style="background:yellow">
=ePubs and designers=
''Is there any satisfaction for designers in the EPUB format?''


Instead of thinking of EPUBs' design process, as similar to designing a paper book, the process resembles more the development of a Webpage.  
==Software needed==
day 1:
* '''Calibre http://calibre-ebook.com/'''
* text editor:
** '''Gedit https://wiki.gnome.org/Apps/Gedit'''
** '''Sublime Text http://www.sublimetext.com/'''


* content must be '''reflowable''' - must adapt itself to the dimensions of the reading device.
* design is no longer specific to the physical characteristics of the container &ndash; ereader &ndash;, but generic, in the sense that it has to adapt to several containers.
* when the renderings change from container to container, the structuring of the text can not rely on visual structuring. Structure has to be semantic: the document has to be semantic tagged so that the text's different elements &ndash; headings, footnotes, body text &ndash; are explicit. If that is the case, even if e-readers changes the rendering of your (visual) design, the semantic tagging (or design) will persist. The ereader it will render a heading as a heading, not simply as a piece of body text.


</div>
[[Category:teaching]]

Latest revision as of 19:44, 5 March 2015


What is ePub?

In this workshop you will come to grips with the simple but rich, and increasingly popular ePub, ebook format.

You will learn about the ideas that underline this publishing format, and will get your hands dirty by opening, tweaking, and putting back together existing ePubs.

As the cherry on top of the cake you will experiments with small programs to turn the ePubs into gif animations, word maps, or remixed versions of themselves.

Software needed

day 1: