Difference between revisions of "Research/all-in-one publishing"
Arjensuijker (talk | contribs) |
Arjensuijker (talk | contribs) |
||
Line 14: | Line 14: | ||
The global process consisted of the following steps: | The global process consisted of the following steps: | ||
# | # The original Word documents with the contents of the publication were converted to html files using pandoc conversion software. | ||
# | # These html files were 'cleaned up', deleting everything design-related and only leaving structural information. | ||
# | # Additional HTML was added to improve semantic value and facilitate CSS styling. | ||
# | # CSS stylesheets were created for every output format. | ||
# | # The outputs were created using Pandoc (for epub), Prince (for pdf), Chrome (for reposponsive web) and Firefox (for direct browser print) and checked for consistency. | ||
Of course the process wasn't as lineair as it appears here. The process can be better described as being iterative, looping through the steps continuously. | Of course the process wasn't as lineair as it appears here. The process can be better described as being iterative, looping through the steps continuously. |
Revision as of 14:47, 18 June 2015
Introduction
The core of our research is developing a workflow in which HTML and CSS is used as the source for a publication. This publication is than available for print, epub and for a responsive website. This approach was chosen because it could potentially improve the publication workflow for the following reasons:
- Both the design and the content can be updated at any time, fully independently of each other.
- Every update in content or design is automatically exported to every available publication medium.
- HTML and CSS are very widespread transparent file formats that have been used for decades without much change, and will remain this way for years to come. This makes them much more suitable for digital archiving than most proprietary formats.
- For larely the same reasons, the resulting publication is very suitable for (collaborative) reuse and redesign.
About the process
For this research, we have chosen to take an existing publication and explore how it could be rebuilt from scratch using our newly developed workflow. The original publication was only designed for print. Our goals was to replicate this design as closely as possible with our new print version, and adapt the design to also fit the other media.
The global process consisted of the following steps:
- The original Word documents with the contents of the publication were converted to html files using pandoc conversion software.
- These html files were 'cleaned up', deleting everything design-related and only leaving structural information.
- Additional HTML was added to improve semantic value and facilitate CSS styling.
- CSS stylesheets were created for every output format.
- The outputs were created using Pandoc (for epub), Prince (for pdf), Chrome (for reposponsive web) and Firefox (for direct browser print) and checked for consistency.
Of course the process wasn't as lineair as it appears here. The process can be better described as being iterative, looping through the steps continuously.
Challenges
Because we are off the beaten tracks for the duration of this research, we encountered many challenges that didn't have readily available solutions. We will discuss the challenges for every step of our process:
From Word to HTML
At this stage, we encountered very little problems. Pandoc proved easy to install and worked flawlessly.
Cleaning up the HTML
In word, there are basically two methods to style a document: using styles or by manually changing the font properties for every heading and paragraph. The first method results in html files with a great structure, because the styles can directly translated to HTML tags. The second method creates very messy HTML files that require lots of cleaning because the HTML is littered with unnecessary styling information. The documents that we used as the source were somewhere in between: most styling was done correctly, but a lot of manual cleaning was still needed. Additionally, most figures had to be extracted manually.
Adding semantic structure and classes
Before we started adding this information to the HTML, we walked through the publication. Together we decided what would be the correct semantic structure and added this as annotations to the file. We could then work in a parallel fashion, individually implementing the structure in the html files. An issue we encountered while defining the structure is that most semantic HTML elements are aimed at the web, so deciding on a suitable semantic structure for both print and web was often a matter of compromise. Because we knew what design we were aiming at, it was easier for us to decide what elements should have classes and which should not. If the design is not ready yet, this may be more of a challenge and will probably change during the course of the project.
Creating the CSS
This was by far the most challenging part of this project, so we will discuss the challenges separately for every output format.
Web CSS
For the website, the whole design of the book had to be reimagined. Pages do not exist, information doesn't have to be lineair and many interactive posibilities open up. We decided to keep the website largely lineair because seemed to fit the content well. Some interactivity was added: the index was redesigned to become a navigation menu and the tooltips were made into toolips. Because the font (Metric) was not licensed for online use, we had to find a font that was similair. This proved to be very difficult, so we had to compromise. We tried to avoid using javascript when possible, because this can decrease compatibility and increase the complexity of the code. The only exception we made was for the footnotes, because this was simply not possible in CSS. Numbering of figures, footnotes, pages and tables was also done in CSS. This turned out to work out quite well, although browser support is somewhat limited.
Responsive web CSS
Instead of creating a separate mobile output, we made the regular website responsive to screen size. The challenges we encountered while doing this were not much different from the challenges that come with all responsive websites. Since these challenges are already well-documented, we will not discuss these further.
Epub CSS
Upub stylesheets proved to be very badly documented. Every device has it's own interpretation of CSS, and almost all of them are quite limited. Advanced CSS like numbering work on almost none of the available devices.
We used Pandoc to create the epub, and one limitation was that pandoc can only add one stylesheet. We devided the stylesheet into a global stylesheet, a print stylesheet and a specific epub stylesheet, all of which should be used for the epub. To solve this, we used a commandline utility to merge these stylesheets and save them as one, so that we could use the resulting css for Pandoc.
Print CSS
Because one of our team mamber had good experiences in the past with Chrome's print functionality, this was the browser we used for testing. This turned out to be a mistake, because since some time, Chrome's printing functionality has deteriorated (possibly caused by their change of rendering engine). Many hours were wasted on trying to get the design to work on Chrome, after which we decided to focus on firefox. This proved to be much easier, and most CSS worked correctly.
Prince CSS
Collaboration
We worked in parralel on the different outputs, which worked well but sometimes caused little hickups. The most common problem was caused by the fact that we used multiple css stylesheet for every output. For example, when one person was working on the print stylesheet, this could accidentally break the pdf export.
Conclusion
Overall, this workflow proved to work well. The process takes some getting used to, but it already appears to be a viable alternative to traditional workflows. The biggest obstacle is probably that every member of the development team needs a fair bit of technical skill together with an eye for design. If such a team is available, this process can streamline the entire publication process.
Future directions
Because this process is based on HTML and CSS, it can be combined with several existing technologies and platforms. For example, in the future we plan to use a wiki as the source. Wiki's allow for the collaborative creation of content, which can easily and automatically be converted to HTML. This HTML can then serve as the source file for the publication process that we just described.
Another possible future direction is a research into the possibility of developing a WYSIWYG editor that strongly enforces a semantically and structurally correct layout. This would vastly simplify the first two steps of our process, and ideally render it obsolete.
A preview of the website can be found here: [1]
A preview of the epub can be found here: File:All-in-one-publishing-epub.zip