Difference between revisions of "Research/Web-to-print"

From Publication Station
Line 85: Line 85:
* Webkit, and thus wkhtmltopdf, has not yet implemented as many advanced css printing features as weasyprint has.
* Webkit, and thus wkhtmltopdf, has not yet implemented as many advanced css printing features as weasyprint has.
===sample output===
===sample output===
<source lang="bash">"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe" --print-media-type container.html wkhtmltopdf_book.pdf  </source>
[[File:wkhtmltopdf_book.pdf]]
[[File:wkhtmltopdf_book.pdf]]
[[Category:research]]
[[Category:research]]

Revision as of 18:23, 10 June 2015

This page is dedicated to research on web-to-print approaches and workflows. Deadline for research: June 26

Starting points

  • Goal: transform reflowable (markup-based) digital publications into fixed-layout PDF.
  • Case study: Beyond Social. How can web-to-print (or wiki-to-print) be applied to Beyond Social.
  • Requisites:
    • an online workflow, that can be run on a server
    • support page numbers
    • gather all articles in on to 1 document
  • Advanced features:
    • impositions - how can impositions, instead of simple stack of pages, be integrate into this workflow?

Possible strategies - software

  • laTex - Document preparation system, focused on the creation of PDFs, uses its own markup, supported by Pandoc.
  • wkhtmltopdf - HTML to PDF converter based on Webkit.
  • Weasy Print - Python, visual rendering engine for HTML and CSS that can export to PDF
  • Mediawiki Collection Extension - the same system used by Wikipedia to create books in PDF and Epub formats.

assessment

For each strategy try to point out:

  • summary of the workflow
  • example prototype
  • advantages
  • disadvantages
  • how can it be integrated into teaching

Strategies

List you researched strategy below, with a bit of documentation that point others to the right direction if they want to try it

LaTex

LaTex is a type-setting/document preparation language, focused on producing typographicaly correct page-based documents as PDF.

positive aspects

  • LaTex is a markup language, in many ways similar to HTML or Markdown, and Pandoc offers good support for it, converting well from other markups.
  • Can produce quality PDFs: w/ support for: page numbers, hyphenation, bibliogrphy, references, hyperlinks

LaTex sample:

\section{Tools}                      
We organized the work in two spaces: a {\bf wiki} and a {\bf website}. The \href{http://beyond-social.org/wiki/index.php/Main_Page}{wiki} was established as the editorial space, while the \href{http://beyond-social.org/}{website}
  • Can be set to produce more experimental and generative outputs. (See works by Lafkon studio for an idea)

negative aspects

  • Produced PDF are by default academic looking, although this can be changed
  • Use is outmoded and mostly restricted to academia
  • Styling is defined by packages imported into the document, which is very different and incompatible with CSS. Styling a LaTex document:
\documentclass[10pt, a4paper]{book} % Document form: book, size: A4, font-size                                                                                                
\usepackage[hmargin=3.0cm, vmargin=2.0cm]{geometry} %document margins

sample output

Article.pdf

final remarks

Although LaTex can be set to produce very interesting results and can be easily integrated within the current workflow, centered around Wikis, Pandoc, HTML and CSS; It constitutes a difficult tool to work with, let alone to teach. It might bring more confusion to students and contradict our approach for setting up hybrid publishing workflows, which has been based on essential web languages: HTML and CSS and simple tools: Wikis and Pandoc. The advice is to leave LaTex alone, although it might be an interesting venue to explore, for more experimental projects.


Weasyprint

Can be used as a Python library or as a standalone program. Remarks below refer to use as standalone program, so far.

positive aspects

  • Uses HTML and CSS to layout the PDF, which means a smoother learning curve from web to print.
  • Supports features like page size, page number, custom typography, allowing the production of a PDF with high level of control in terms of design.
  • Very simple and easy to understand syntax, does not require proficiency in command line. Example:
    weasyprint http://beyond-social.org/wiki/index.php/Hybrid_Publishing beyondsocial.pdf -s style.css
    

weasyprint [source html document] [pdf result] -s [css file] (-s being the flag to include the CSS that will overwrite existing CSS rules used in the web version)

negative aspects

  • Can be difficult to install, due to the dependencies. In Debian no issue was experienced. In Mac OSX, still trying to manage the installation.

sample output

Sample Beyondsocial.pdf

wkhtmltopdf

Wkhtmltopdf is an open source project very similar to weasyprint. Because it is so similar, we will mostly discuss the differences between the two.

positive aspects

  • wkhtmltopdf is based on the webkit rendering engine, which eases bug tracking and improves support
  • wkhtmltopdf is very easy to install.
  • wkhtmltopdf can run javascript.

negative aspects

  • Webkit, and thus wkhtmltopdf, has not yet implemented as many advanced css printing features as weasyprint has.

sample output

"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe" --print-media-type container.html wkhtmltopdf_book.pdf

Wkhtmltopdf book.pdf