Difference between revisions of "Research/Web-to-print"
Line 60: | Line 60: | ||
==Weasyprint== | ==Weasyprint== | ||
''WeasyPrint is a visual rendering engine for HTML and CSS that can export to PDF. It aims to support web standards for printing. WeasyPrint is free software made available under a BSD license.'' | |||
<ref name="weasyprint1">WeasyPrint documenation http://weasyprint.org/docs/</ref> | |||
Can be used as a Python library or as a standalone program. Remarks below refer to use as standalone program, so far. | Can be used as a Python library or as a standalone program. Remarks below refer to use as standalone program, so far. | ||
===positive aspects=== | ===positive aspects=== | ||
Line 147: | Line 150: | ||
} | } | ||
</source> | </source> | ||
<references/> | |||
==wkhtmltopdf== | ==wkhtmltopdf== |
Revision as of 11:25, 16 June 2015
This page is dedicated to research on web-to-print approaches and workflows. Deadline for research: June 26
Starting points
- Goal: transform reflowable (markup-based) digital publications into fixed-layout PDF.
- Case study: Beyond Social. How can web-to-print (or wiki-to-print) be applied to Beyond Social.
- Requisites:
- an online workflow, that can be run on a server
- support page numbers
- gather all articles in on to 1 document
- Advanced features:
- impositions - how can impositions, instead of simple stack of pages, be integrate into this workflow?
Possible strategies - software
- laTex - Document preparation system, focused on the creation of PDFs, uses its own markup, supported by Pandoc.
- wkhtmltopdf - HTML to PDF converter based on Webkit.
- Weasy Print - Python, visual rendering engine for HTML and CSS that can export to PDF
- Mediawiki Collection Extension - the same system used by Wikipedia to create books in PDF and Epub formats.
assessment
For each strategy try to point out:
- summary of the workflow
- example prototype
- advantages
- disadvantages
- how can it be integrated into teaching
Strategies
List you researched strategy below, with a bit of documentation that point others to the right direction if they want to try it
LaTex
LaTex is a type-setting/document preparation language, focused on producing typographicaly correct page-based documents as PDF.
positive aspects
- LaTex is a markup language, in many ways similar to HTML or Markdown, and Pandoc offers good support for it, converting well from other markups.
- Can produce quality PDFs: w/ support for: page numbers, hyphenation, bibliogrphy, references, hyperlinks
LaTex sample:
\section{Tools}
We organized the work in two spaces: a {\bf wiki} and a {\bf website}. The \href{http://beyond-social.org/wiki/index.php/Main_Page}{wiki} was established as the editorial space, while the \href{http://beyond-social.org/}{website}
- Can be set to produce more experimental and generative outputs. (See works by Lafkon studio for an idea)
negative aspects
- Produced PDF are by default academic looking, although this can be changed
- Use is outmoded and mostly restricted to academia
- Styling is defined by packages imported into the document, which is very different and incompatible with CSS. Styling a LaTex document:
\documentclass[10pt, a4paper]{book} % Document form: book, size: A4, font-size
\usepackage[hmargin=3.0cm, vmargin=2.0cm]{geometry} %document margins
sample output
final remarks
Although LaTex can be set to produce very interesting results and can be easily integrated within the current workflow, centered around Wikis, Pandoc, HTML and CSS; It constitutes a difficult tool to work with, let alone to teach. It might bring more confusion to students and contradict our approach for setting up hybrid publishing workflows, which has been based on essential web languages: HTML and CSS and simple tools: Wikis and Pandoc. The advice is to leave LaTex alone, although it might be an interesting venue to explore, for more experimental projects.
Weasyprint
WeasyPrint is a visual rendering engine for HTML and CSS that can export to PDF. It aims to support web standards for printing. WeasyPrint is free software made available under a BSD license. [1]
Can be used as a Python library or as a standalone program. Remarks below refer to use as standalone program, so far.
positive aspects
- Uses HTML and CSS to layout the PDF, which means a smoother learning curve from web to print.
- Supports features like page size, page number, custom typography, allowing the production of a PDF with high level of control in terms of design.
- Very simple and easy to understand syntax, does not require proficiency in command line.
Example:
weasyprint http://beyond-social.org/wiki/index.php/Hybrid_Publishing beyondsocial.pdf -s style.css
Example explained:
weasyprint source-html-document pdf-output -s css-file
-s
being the flag to include the CSS that will overwrite existing CSS rules used in the web version
negative aspects
- Can be difficult to install, due to the dependencies. In Debian no issue was experienced. In Mac OSX, still trying to manage the installation.
sample output
style.css
html, body{
background-color: #e0e0e0 !important;
font-family: "AmericanTypewriter", serif !important; /* the font needs to be in your computer. this is not the final font, please choose a font of your choice */
color: #000 !important;
}
div#footer ul {
list-style-type: none !important;
}
@page{
size: 8.5in 8.5in;
background-color: white !important;
counter-increment: page;
font-family: "AmericanTypewriter", monospace !important;
color: #000 !important;
margin: 1cm;
font-size: 8pt;
}
h1{
string-set: doctitle content(); /* not tested - not sure it is working */
/* retrieves the content from h2.title - will be used later, in the page bottom*/
}
img{
width: 100%;
break-page-inside: never;
}
#catlinks{display: none;}
div#footer{
background-color: black !important;
color: #fff !important;
/*border-radius: 2cm;*/
font-family: sans-serif !important;
font-size: .75em !important;
text-align: center;
padding-bottom: .2cm;
position: absolute;
bottom: 0;
width: 100%;
}
@page :left {
@bottom-right{
margin: 0;
/* font-family: inherit; */ /* does not work */
content: string(doctitle);
}
@bottom-left{
margin: 0;
content: counter(page);
}
}
@page :right {
@bottom-right{
margin: 0;
/* font-family: inherit; */ /* does not work */
content: counter(page);
}
@bottom-left{
margin: 0;
content: string(doctitle);
}
}
- ↑ WeasyPrint documenation http://weasyprint.org/docs/
wkhtmltopdf
Wkhtmltopdf is an open source project very similar to weasyprint, with an identical workflow. Because it is so similar, we will mostly discuss the differences between the two.
positive aspects
- wkhtmltopdf is based on the webkit rendering engine, which eases bug tracking and improves support
- wkhtmltopdf is very easy to install.
- wkhtmltopdf can run javascript.
negative aspects
- Webkit, and thus wkhtmltopdf, has not yet implemented as many advanced css printing features as weasyprint has.
sample output
"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe" --print-media-type container.html wkhtmltopdf_book.pdf
Usage for teaching
Because it is free and open source, this tool appears very suitable for use in the hybrid publishing workflow that is currently being thought in several courses. Because it adheres to most HTML and CSS rules and can be used simply from the command line, students are not forced to learn yet another language.
built-in browser pdf prints
Many browsers nowadays have built in pdf rendering engines. When an HTML page is created on the server, users and/or printers can simply press print in their browser and choose to export a pdf.
positive aspects
- Less work is needed on the server
- Users can easily customize the look of their pdf
negative aspects
- Publishers have no guarantee that users see the correct lay-out (Chrome does this very poorly for example)
- It requires more technical know-how from the user
sample output
Below is an example print from Chrome:
Usage for teaching
This works quite poorly, so we see no place for this in education. Recent versions of Chrome (post-webkit) actually perform worse than before, so there is little hope for improvements in the future.