Difference between revisions of "Courses/EPUB"
Line 360: | Line 360: | ||
=== XML requirements === | === XML requirements === | ||
As content document are HTML with added XML they obey the stricter XML rules: | As content document are HTML with added XML they obey the stricter XML rules: | ||
* every element has an opening and closing tag. | |||
* Empty tags such as <br> must be self-closing: '''<br/>''' | * Empty tags such as <br> must be self-closing: '''<br/>''' | ||
* Elements and attribute names are case-sensitive and always lowercase. | * Elements and attribute names are case-sensitive and always lowercase. | ||
Line 382: | Line 382: | ||
<h1>Acknowledgments</h1> | <h1>Acknowledgments</h1> | ||
<p>In fall 2012, as part of my graduate studies, I began studying Instagram user self-portraits. The users whose images I discuss in this notebook may or may not have knowledge that I critiqued their selfies, as I did not interview users about their images. My study is assumptive, and my work is influenced by the research of Marshall McLuhan and Vilém Flusser. Thank you to the users who kindly gave me permission to republish their images.</p> | <p>In fall 2012, as part of my graduate studies, I began studying Instagram user self-portraits. The users whose images I discuss in this notebook may or may not have knowledge that I critiqued their selfies, as I did not interview users about their images. My study is assumptive, and my work is influenced by the research of Marshall McLuhan and Vilém Flusser. Thank you to the users who kindly gave me permission to republish their images.</p> | ||
</section> | </section> | ||
</body> | </body> | ||
Line 392: | Line 387: | ||
</source> | </source> | ||
Revision as of 15:42, 14 January 2015
<slidy theme="aa" />
URLs
wiki http://publicationstation.wdka.hro.nl/wiki
A Calibre http://145.24.140.83:8080
Software needed
day 1:
- Calibre http://calibre-ebook.com/
- text editor:
- Gedit https://wiki.gnome.org/Apps/Gedit
- Sublime Text http://www.sublimetext.com/
What is ePub?
In this workshop you will come to grips with the simple but rich, and increasingly popular ePub, ebook format.
You will learn about the ideas that underline this publishing format, and will get your hands dirty by opening, tweaking, and putting back together existing ePubs.
As the cherry on top of the cake you will experiments with small programs to turn the ePubs into gif animations, word maps, or remixed versions of themselves.
day 1
what is an EPUB?
EPUB is:
A format for representing electronic text and images.
Other formats that allow for the representation of electronic text and images – ebooks for short – are PDF, Mobi, AZW, Comic Book Archive, HTML, LaTex, etc.
The EPUB format is developed by the International Digital Publishing Forum
EPUB characteristics
- a zip archive, renamed .epub instead of .zip
- based in web technologies: HTML, XML, Javascript, TTF and WOFF fonts css
- currently has 2 versions: EPUB 2 and EPUB 3
- EPUB 3 introduced HTML5 (video, audio, canvas tags), elaborate CSS, scripting (Javascript),
- For most part, EPUB 3 is backward compatible with EPUB 2
- Most e-readers are only prepared for EPUB 2, however, with some limitations, they can EPUB 3.
- Readium reader is meant as "a reference system for rendering EPUB 3"
- not page centered
- reflowable
- fully changeable
reflowable format
EPUBs represent a paradigm shift away from the page-centered representations of text.
The page is no longer the unit or canvas. The concept of the page is simply present in EPUBs.
Instead the unit is the screen, whose size changes from device to device.
As EPUBs are meant to be read on screens, they need to adapt and reflow themselves to any screen resolution or orientation.
http://www.freesoftwaremagazine.com/files/nodes/3396/fig_zoo_of_screen_sizes.jpg Source:http://www.freesoftwaremagazine.com/
e-reading devices
- E-ink (a type of Electronic paper): Kindle, Nook, Kobo.
- LCD: Smart-phones, tables, Desktop and laptop computers
Note: Kindle devices don't support EPUB, they read only Amazon's proprietary ebook format AZW or Mobi.
e-reading software
- PC:
- Calibre
- Readium - plugin for the Chrome/Chromium browser
- EPUBReader - plugin for the Firefox browser
- iOs: iBooks
- Android: Aldiko, Moon Reade
sources of EPUBs
- Project Gutenberg
- Internet Archive
- Scribd
- Pirate: The Pirate Bay, Library Genesis, AAAAARG, Monoskop, IT Books
changing EPUBs
EPUBs have one advantage over other ebook formats, and other digital media such as audio, video, or images.
EPUBS are fully changeable.
The files contained in an EPUB are, with the exception of images, plain text files in either xhtml or css formats.
Theses files can be opened, edited and saved by any text editor (different from a word-processor).
Changes can be made to the:
- text content of the book
- the images used in the book
- the metadata
- the presentation layout: CSS (including fonts)
- Order of the book: the EPUB's spine
Once all changes are done, the source-files can be be bundled once more under a zip archive under the extension .epub
changing ePubs in Calibre
Calibre nowadays comes with a Edit book function that:
- exposes the ePub's source-files
- it allows the user to introduce changes to the ePub
- can also be used to create ePubs from scratch
Such easiness to manipulate ebooks, is something that will probably fuel many future discussions, on the nature of the book.
- is a version of a book still the same book?
- will there be any possibility to distinguish the original from its versions?
- can this feature be used to enrich an expand what the book is?
EPUB anatomy
a zip archive
An EPUB file is ZIP archive, with the filename extension '.epub' instead of '.zip'.
"[EPUB] is a publication format, and as such it specifies and documents a host of things that publications need to include—content documents, style sheets, images, media, scripts, fonts, and more" (Garrish, 2013)
It is a web-site in a box.
- Bulleted list item
inside the .epub archive
META-INF folder
Contains only the container.xml file
This file points the reading devices to the content.opf file
container.xml source:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<container xmlns="urn:oasis:names:tc:opendocument:xmlns:container" version="1.0">
<rootfiles>
<rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/>
</rootfiles>
</container>
mimetype
A one-line file that indicates the type of application the .epub is:
application/epub+zip
It informs the reading device that the file it is reading is application in epub+zip format.
OEBPS folder
Container for all of the EPUB's contents.
This folder is not required, but it is commonly used in EPUBs.
Sub-folders can be created inside it: e.g. for images, fonts or content documents.
content.opf
Often called package document content.opf is the epicenter of every EPUB.
If an EPUB is a website in a box content.opf is the file that organizes all the stuff in the box.
Skeleton of a content.opf file:
<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" unique-identifier="epub-id-1" version="3.0">
<metadata>
<metadata>
...
</metadata>
<manifest>
...
</manifest>
<spine>
...
</spine>
<guide>
...
</guide>
</package>
The content.opf file is divided in different sections:
metadata
Contains all the metadata describing the publication in question, such as title, author, publisher, description, date, identifier (isbn number), language.
A rich set of metadata helps reading system sort and display the EPUB.
In larger context, such as ebook repositories, rich and accurate metadata is essential for the books to be discovered, read, and reused.
<metadata>
<dc:identifier id="epub-id-1">urn:isbn:978-90-822345-4-1</dc:identifier>
<meta refines="#epub-id-1" property="identifier-type" scheme="onix:codelist5">01</meta>
<dc:title id="epub-title-1">Work Title</dc:title>
<meta refines="#epub-title-1" property="title-type">main</meta>
<dc:publisher>Great Publisher</dc:publisher>
<dc:date id="epub-date">2014-12</dc:date>
<dc:language>en-US</dc:language>
<dc:creator id="epub-creator-0">Luther Blisset</dc:creator>
<meta refines="#epub-creator-0" property="role" scheme="marc:relators">aut</meta>
<dc:contributor id="epub-contributor-1">Monty Cantsin</dc:contributor>
<meta refines="#epub-contributor-1" property="role" scheme="marc:relators">edt</meta>
<dc:contributor id="epub-contributor-5">Alan Smithee</dc:contributor>
<meta refines="#epub-contributor-5" property="role" scheme="marc:relators">dsr</meta>
<dc:rights>Creative Commons Attribution, NonCommercial, ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)</dc:rights>
<dc:subject>electronic publishing, e-publishing, ebooks</dc:subject>
<dc:description>Electronic publishing has become an essential medium for the field of contemporary arts and design.</dc:description>
</metadata>
Dublin Core terms (advanced)
The metadata uses Dublin Core terms
Dublin Core terms constitutes a metadata scheme, which intends to provide a simple and limited vocabulary to describe web resources, hence its adoption in EPUB.
<dc:creator id="epub-creator-0">Luther Blisset</dc:creator> <meta refines="#epub-creator-0" property="role" scheme="marc:relators">aut</meta>
Inside the ePub Dublin Core structures the metadata by using a DC term and giving it a value;
To produce more accurate metadata, the dc terms are refined with a <meta>
In the example case above:
- DC term is "creator"; and has the value Luther Blisse
- and the
<meta scheme="marc:relators">
specifies that this creator is an author: "aut"
- "creator" is one the Dublin Core terms
- "aut" is one of the MARC Relator terms made available by the Library of Congress. In conjunction with Dublin Core terms, they provide a more detailed description.
The combination of the more generic DC terms, with the more specific MARC Relator terms can provide a very rich, detailed, and yet generic set of title level metadata.
Common DC terms used are: creator, title, identifier, language, description, subject, publisher
Note: That in order two use the dublin core terms, its namespace must be declared, either in the package tag:
<package xmlns="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" unique-identifier="epub-id-1" version="3.0">
Or the metadata tag:
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
manifest
All the resources (files) used in the EPUB are declared in the manifest.
If a resources is not in the manifest it won't be present in the EPUB.
<manifest>
<item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml" />
<item href="stylesheet.css" id="style" media-type="text/css" />
<item href="nav.xhtml" id="nav" media-type="application/xhtml+xml" properties="nav" />
<item href="cover.xhtml" id="cover_xhtml" media-type="application/xhtml+xml" />
<item href="ch001.xhtml" id="ch001_xhtml" media-type="application/xhtml+xml" />
<item href="ch002.xhtml" id="ch002_xhtml" media-type="application/xhtml+xml" />
<item href="media/file0.png" id="file0_png" media-type="image/png" />
<item href="UbuntuMono-B.ttf" id="UbuntuMono-B_ttf" media-type="application/x-font-truetype" />
</manifest>
spine
Stipulates the order of the different EPUB content documents (xhtml or SVG files).
<spine toc="ncx">
<itemref idref="cover_xhtml" linear="yes" />
<itemref idref="title_page_xhtml" linear="yes" />
<itemref idref="nav" linear="no" />
<itemref idref="ch001_xhtml" />
<itemref idref="ch002_xhtml" />
Note: 'linear="no"' hides that content document from the ePub's contents.
guide
The guide is deprecated in EPUB 3, however you might still come across it.
It often points to nav file.
The Navigation document is a hyper-linked table-of-contents, that allow the reader to quickly reach the different sections of a ePub.
In EPUB 2 the navigation document was toc.ncx, written in XML.
EPUB 3 uses file, usually named nav.xhtml, which is slightly simpler document, based of HTML with common tags such as
<ol>, <li>, <nav>
In content.opf's manifest the navigation file is referred as:
<item href="nav.xhtml" id="nav" media-type="application/xhtml+xml" properties="nav" />
For backward compatibility reasons (most ereaders are only prepared to read EPUB2) the two files might be present in the same ePub publication.
A nav.xhtml:
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
<title>From Print to Ebooks: a Hybrid Publishing Toolkit for the Arts</title>
<link rel="stylesheet" type="text/css" href="stylesheet.css" />
</head>
<body>
<nav epub:type="toc">
<h1 id="toc-title">From Print to Ebooks: a Hybrid Publishing Toolkit for the Arts</h1>
<ol class="toc">
<li id="toc-li-1">
<a href="ch001.xhtml">Colophon</a>
</li>
<li id="toc-li-2">
<a href="ch002.xhtml">1 Introduction</a>
<ol class="toc">
<li id="toc-li-3">
<a href="ch003.xhtml">Industry promises vs. reality</a>
</li>
<li id="toc-li-4">
<a href="ch004.xhtml">What this Toolkit provides</a>
</li>
</ol>
</li>
<li id="toc-li-8">
<a href="ch008.xhtml">2 The basics</a>
<ol class="toc">
<li id="toc-li-9">
<a href="ch009.xhtml">Layout and structure of a text</a>
</li>
</ol>
</li>
</nav>
</body>
</html>
content document .xhtml
Content documents are the place-holders for text and images in an ePub.
In EPUB 3, content documents are XHTML5(HTML5 + XML), under .xhtml file extension.
What you find in a ePub content document is essentially what you find behind webpage: HTML code.
EPUB3 allows the use of HTML5, including elements such as <section>, <article>, <nav>, <header>, <footer>, <aside>.
<video>, <audio>
and Mathematical Markup Language (MathML). However keep in mind that some of these tags wont work with much of the current ereaders, which are only prepared to render EPUB 2 ePubs.
As a rule of thumb is good to have chunk content into several documents, either one for each section or subsection of a book.
The reason for this, is that ereaders are often slow and take their time to render the content. To facilitate their job is a good practice to have the content divided in smaller pieces.
XML requirements
As content document are HTML with added XML they obey the stricter XML rules:
- every element has an opening and closing tag.
- Empty tags such as
must be self-closing: - Elements and attribute names are case-sensitive and always lowercase.
- Namespaces must be declared (for embedded MathML, SVG, EPUB elements and attributes, etc.).
- Ampersands (&) must be escaped as &.
Example content document:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
<meta charset="utf-8" />
<meta content="pandoc" name="generator" />
<title>Acknowledgments</title>
<link href="stylesheet.css" rel="stylesheet" type="text/css" />
</head>
<body>
<section class="level1" epub:type="acknowledgments frontmatter" id="acknowledgments">
<h1>Acknowledgments</h1>
<p>In fall 2012, as part of my graduate studies, I began studying Instagram user self-portraits. The users whose images I discuss in this notebook may or may not have knowledge that I critiqued their selfies, as I did not interview users about their images. My study is assumptive, and my work is influenced by the research of Marshall McLuhan and Vilém Flusser. Thank you to the users who kindly gave me permission to republish their images.</p>
</section>
</body>
</html>
semantic inflection
These additions make possible the inclusion of semantic inflection, or in other words, to attach meaning about the purpose or nature an element to itself. Semantic inflections use the epub:type attribute to describe the content contained within that element. The terms that can be used in these inflections can be found at http://www.idpf.org/epub/vocab/structure/
In the previous example epub:type appears in <section class="level1" epub:type="acknowledgments frontmatter" id="acknowledgments">
. It informs us, and the reading system that the current section belongs to the book's frontmatter and constitutes the acknowledgments.
images
Images can either be a raster (jpg or png) or an SVG, however it is safer to opt for a raster.
They need to be included in package document(.opf)'s manifest:
<manifest>
<item href="media/file0.png" id="file0_png" media-type="image/png" />
<item id="file1_jpg" href="media/file1.jpg" media-type="image/jpeg" />
...
</manifest>
cover
A cover is essentially an image, either a raster image (jpg or png) or an SVG, however it is safer to opt for a raster.
Within the package document(.opf), the cover is declared both within the manifest, as item with properties="cover-image", so that the reading systems identify the image in question as the cover of the ePub in question.
<item id="cover-img" href="cover.jpg" media-type="image/jpeg" properties="cover-image"/>
If you want the cover to also appear inline, when you start the book, make sure to include a content document, just with the image:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<title></title>
<link rel="stylesheet" type="text/css" href="stylesheet.css" />
</head>
<body>
<div id="cover-image">
<img src="media/cover.png" alt="cover image" />
</div>
</body>
</html>
</xml>
style
If you want to use CSS style-sheet, sure each content document links to the sheet.
<link rel="stylesheet" type="text/css" href="stylesheet.css" />
fonts
WOFF and OTF fonts can also be used, although few will be the ereader that will interpret them correctly.
To include custom fonts they need to be declared in the package document(.opf)'s manifest, as a resource used in the publication
<manifest>
<item href="OpenSans-Regular.ttf" id="OpenSans-Regular_ttf" media-type="application/x-font-truetype" />
<item href="OpenSans-LightItalic.ttf" id="OpenSans-LightItalic_ttf" media-type="application/x-font-truetype" />
...
</manifest>
experiment
Find interesting written or visual content, which you consider suitable for an ebook.
The content can be in any digital form - word document, HTML webpage, plain-text, etc.
Examples of interesting context re contextualization:
- Kenneth Goldsmith Uncreative Writing: Transcribing Project Runway chat-room log transcribing the action of Project Runway;
- Report from the Desert published Grey Scale Press: a book made from an abandoned blog
- Silvio Lorusso and Sebastian Schmieg: 56 Broken Kindle Screens
Bibliography
ePubs and designers
Is there anything for designers in the EPUB format?
Instead of thinking of EPUBs' design process, as similar to designing a paper book, the process resembles more the development of a Webpage.
- content must be reflowable - must adapt itself to the dimensions of the reading device.
- design is no longer specific to the physical characteristics of the container – ereader –, but generic, in the sense that it has to adapt to several containers.
- when the renderings change from container to container, the structuring of the text can not rely on visual structuring. Structure has to be semantic: the document has to be semantic tagged so that the text's different elements – headings, footnotes, body text – are explicit. If that is the case, even if e-readers changes the rendering of your (visual) design, the semantic tagging (or design) will persist. The ereader it will render a heading as a heading, not simply as a piece of body text.