Pandoc

A universal document converter and its ecosystem

Yasin Raies

09.01.1984

Suppose you want to connect your old gameconsole to your TV …
Suppose you want to connect your old gameconsole to your TV …

What is Pandoc


Pandoc is a free and open-source software document converter created by John MacFarlane.

Supported Formats:

In

commonmark, creole, docbook, docx, epub, fb2, gfm (GitHub-Flavored Markdown), haddock, html, jats, json, latex, markdown (Pandoc’s Markdown), markdown_mmd, markdown_phpextra, markdown_strict, mediawiki, man, muse, native, odt, opml, org, rst, t2t, textile, tikiwiki, twiki, vimwiki

Out

asciidoc, beamer, commonmark, context, docbook or docbook4, docbook5, docx, dokuwiki, epub or epub3, epub2, fb2, gfm (GitHub-Flavored Markdown), haddock, html or html5 (HTML, i.e. HTML5/XHTML polyglot markup), html4, icml, jats, json, latex, man, markdown (Pandoc’s Markdown), markdown_mmd, markdown_phpextra, markdown_strict, mediawiki, ms, muse, native, odt, opml, opendocument, org, plain, pptx, rst, rtf, texinfo, textile, slideous, slidy, dzslides, revealjs, s5, tei, zimwiki

Pandoc-flavoured Markdown

A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. - John Gruber

PFM (pandoc favoured Markdown) expands on this by having multiple outputformats in mind.


Metadata

---
title: Pandoc
subtitle: A universal document converter and its ecosystem
author: Yasin Raies
theme: white
center: true
width: 1280
height: 720
date: 09.01.2019
---

Inline

In Out
text text
*emphasis* emphasis
**strong** strong
~~strike~~ strike
S~ub~ S^uper^ Sub Super
$$e^{\pi i} + 1 = 0$$ eπi + 1 = 0

In Out
"Quote" “Quote”
`*verb/code*` *verb/code*
[FMI](fmi-wuerzburg.de) FMI
![Logo]([...].png) Logo
word^[Some Note] word1
[This is a *span*]{.smallcaps} This is a span

Blocks


LineBlock

Ingredients:
    0.5 Lime
    5 tbsp Sugar
    350 ml Ginger Ale
    Crusehd Ice
| **Ingredients**:
|     0.5 Lime
|     5 *tbsp* Sugar
|     350 *ml* Ginger Ale
|     Crusehd Ice

Verbatim/Code

```java
while(true){
    doTalk();
}
```

Quotes

This is a blockquote

> This is a blockquote


Ordered/Bullet List

  1. one
    1. two
    2. three
  2. five
I) one
   3. two 
   7. three
I) five

- some
  + things
  + are
- weird

Definition List

Term 1
Definition 1
Term 2
Definition 2a
Definition 2b
Term 1
  ~ Definition 1

Term 2
  ~ Definition 2a
  ~ Definition 2b

Example List

  1. This is a numbered example.
  2. This example can be referenced.

See (2).

(@) This is a numbered example.
(@ex) This example can be referenced.

See (@ex).

Tables

Right Left Center Default
12 12 12 12
Fruit Price Advantages
Bananas $1.34
  • built-in wrapper
  • bright color
Right Left Default Center
12 12 12 12

  Right Left     Center   Default
------- ------ ---------- -------
     12 12        12          12

+---------------+---------------+--------------------+
| Fruit         | Price         | Advantages         |
+===============+===============+====================+
| Bananas       | $1.34         | - built-in wrapper |
|               |               | - bright color     |
+---------------+---------------+--------------------+

| Right | Left | Default | Center |
|------:|:-----|---------|:------:|
|   12  |  12  |    12   |    12  |

Headers, Rules and Divs

This is a div.

### Headers, Rules and Divs {#headers-and-stuff}

::: {style="color:red"}

This is a div.
:::

Working with Pandoc


The Pipeline

Reader Filters Writer Template

Reader/Writer
Convert between a given format and an AST 2
Filter
Performs operations on and modifies the AST
Template
Supplys a surrounding in which the converted AST is embedded

The CLI

Pandoc has some parameters:

pandoc [OPTIONS] [FILES]
  -f FORMAT, -r FORMAT  --from=FORMAT, --read=FORMAT
  -t FORMAT, -w FORMAT  --to=FORMAT, --write=FORMAT
  -o FILE               --output=FILE
                        --data-dir=DIRECTORY
                        --base-header-level=NUMBER
                        --strip-empty-paragraphs
                        --indented-code-classes=STRING
  -F PROGRAM            --filter=PROGRAM
                        --lua-filter=SCRIPTPATH
  -Pandoc is a free and open-source software document converterp                    --preserve-tabs
                        --tab-stop=NUMBER
                        --track-changes=accept|reject|all
                        --file-scope
                        --extract-media=PATH
  -s                    --standalone
                        --template=FILE
  -M KEY[:VALUE]        --metadata=KEY[:VALUE]
  -V KEY[:VALUE]        --variable=KEY[:VALUE]
  -D FORMAT             --print-default-template=FORMAT
                        --print-default-data-file=FILE
                        --print-highlight-style=STYLE|FILE
                        --dpi=NUMBER
                        --eol=crlf|lf|native
                        --wrap=auto|none|preserve
                        --columns=NUMBER
                        --strip-comments
                        --toc, --table-of-contents
                        --toc-depth=NUMBER
                        --no-highlight
                        --highlight-style=STYLE|FILE
                        --syntax-definition=FILE
  -H FILE               --include-in-header=FILE
  -B FILE               --include-before-body=FILE
  -A FILE               --include-after-body=FILE
                        --resource-path=SEARCHPATH
                        --request-header=NAME:VALUE
                        --self-contained
                        --html-q-tags
                        --ascii
                        --reference-links
                        --reference-location=block|section|document
                        --atx-headers
                        --top-level-division=section|chapter|part
  -N                    --number-sections
                        --number-offset=NUMBERS
                        --listings
  -i                    --incremental
                        --slide-level=NUMBER
                        --section-divs
                        --default-image-extension=extension
                        --email-obfuscation=none|javascript|references
                        --id-prefix=STRING
  -T STRING             --title-prefix=STRING
  -c URL                --css=URL
                        --reference-doc=FILE
                        --epub-subdirectory=DIRNAME
                        --epub-cover-image=FILE
                        --epub-metadata=FILE
                        --epub-embed-font=FILE
                        --epub-chapter-level=NUMBER
                        --pdf-engine=PROGRAM
                        --pdf-engine-opt=STRING
                        --bibliography=FILE
                        --csl=FILE
                        --citation-abbreviations=FILE
                        --natbib
                        --biblatex
                        --mathml
                        --webtex[=URL]
                        --mathjax[=URL]
                        --katex[=URL]
                        --gladtex
                        --abbreviations=FILE
                        --trace
                        --dump-args
                        --ignore-args
                        --verbose
                        --quiet
                        --fail-if-warnings
                        --log=FILE
                        --bash-completion
                        --list-input-formats
                        --list-output-formats
                        --list-extensions[=FORMAT]
                        --list-highlight-languages
                        --list-highlight-styles
  -v                    --version
  -h                    --help

Compiling Markdown to HTML:

pandoc --standalone --from markdown --to html -o Out.html In.txt

Compiling this talk:

echo recompiling HTML
pandoc -f markdown+emoji -s --toc --toc-depth=2 --css gitlab.css -o Vortrag.html Vortrag.md
echo recompiling Reveal
pandoc -f markdown+emoji -s -t revealjs -o Vortrag_reveal.html -V revealjs-url=reveal.js-3.7.0 --slide-level 2 Vortrag.md

Templates

Variables filled into templates are taken from metadata or commandline-arguments.
They are referenced by $varname$.
To check if a variable is set use $if(var)$ with $endif$.
Iteration is also possible:

$for(var)$$var$$sep$, $endfor$

HTML Template:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="$lang$" xml:lang="$lang$"$if(dir)$ dir="$dir$"$endif$>
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
$for(author-meta)$
  <meta name="author" content="$author-meta$" />
$endfor$
$if(date-meta)$
  <meta name="dcterms.date" content="$date-meta$" />
$endif$
$if(keywords)$
  <meta name="keywords" content="$for(keywords)$$keywords$$sep$, $endfor$" />
$endif$
  <title>$if(title-prefix)$$title-prefix$ – $endif$$pagetitle$</title>
  <style type="text/css">
      code{white-space: pre-wrap;}
      span.smallcaps{font-variant: small-caps;}
      span.underline{text-decoration: underline;}
      div.column{display: inline-block; vertical-align: top; width: 50%;}
$if(quotes)$
      q { quotes: "“" "”" "‘" "’"; }
$endif$
  </style>
$if(highlighting-css)$
  <style type="text/css">
$highlighting-css$
  </style>
$endif$
$for(css)$
  <link rel="stylesheet" href="$css$" />
$endfor$
$if(math)$
  $math$
$endif$
$for(header-includes)$
  $header-includes$
$endfor$
</head>
<body>
$for(include-before)$
$include-before$
$endfor$
$if(title)$
<header>
<h1 class="title">$title$</h1>
$if(subtitle)$
<p class="subtitle">$subtitle$</p>
$endif$
$for(author)$
<p class="author">$author$</p>
$endfor$
$if(date)$
<p class="date">$date$</p>
$endif$
</header>
$endif$
$if(toc)$
<nav id="$idprefix$TOC">
$table-of-contents$
</nav>
$endif$
$body$
$for(include-after)$
$include-after$
$endfor$
</body> </html>

Extensions

Readers and writers allow for fine-grained customization by use of extensions.

Enabling emoji (👍) in markdown: 

pandoc -f markdown+emoji in.md

Convert tables to multiline:

pandoc -f markdown -t markdown-grid_tables-simple_tables
                       +multiline_tables-pipe_tables 

Filter

Filters allow for even more individualisation and actual extensibility by piping the native representation through a program which outputs the modified native code.


import Text.Pandoc.JSON

doInclude :: Block -> IO Block
doInclude cb@(CodeBlock (id, classes, namevals) contents) =
  case lookup "include" namevals of
       Just f     -> return . (CodeBlock (id, classes, namevals)) =<< readFile f
       Nothing    -> return cb
doInclude x = return x

main :: IO ()
main = toJSONFilter doInclude
~~~~ {include="README"}
this will be replaced by contents of README
~~~~

data Pandoc = Pandoc Meta [Block]

data Block
    = Plain [Inline]        
    | Para [Inline]         
    | LineBlock [[Inline]]  
    | CodeBlock Attr String 
    | RawBlock Format String 
    | BlockQuote [Block]    
    | OrderedList ListAttributes [[Block]] 
    | BulletList [[Block]]  
    | DefinitionList [([Inline],[[Block]])]  
    | Header Int Attr [Inline] 
    | HorizontalRule        
    | Table [Inline] [Alignment] [Double] [TableCell] [[TableCell]]  
    | Div Attr [Block]      
    | Null                  
data Inline
    = Str String            
    | Emph [Inline]         
    | Strong [Inline]       
    | Strikeout [Inline]    
    | Superscript [Inline]  
    | Subscript [Inline]    
    | SmallCaps [Inline]    
    | Quoted QuoteType [Inline] 
    | Cite [Citation]  [Inline] 
    | Code Attr String      
    | Space                 
    | SoftBreak             
    | LineBreak             
    | Math MathType String  
    | RawInline Format String 
    | Link Attr [Inline] Target  
    | Image Attr [Inline] Target 
    | Note [Block]          
    | Span Attr [Inline]    

The Ecosystem

Slides

All available options do work, but each require individual, manual fixing.

Reveal.JS
Does work best out of the box, but uses lots of Javascript.
Impress.JS
Prezilike, seems straight forward.
Slideous/S5/Slidy/DZSlides/Power Point
… exist? 🙈

Static Site Generators 🔗

Gitit
Git and Pandoc based wiki
Blogs/CVs
Easily DIY-able

Tooling 🔗

pandoc-citeproc
Citation preprocessor for varoius bibliography formats and citation styles
Decker
A build and deployment tool for pandoc
Panrun
Simple wrapper script to insert multiple compilation configurations into a yaml block
Pandomatic/Panzer
Complex templated configurations for pandoc commands

Filters 🔗

Mermaid-Filters
Flowcharts, sequence and gantt diagrams
CSV2Table
Inserts an external CSVs as a genuine table
R Pandoc
Automatically plots/compiles R code
ABC to Music 🔗
Converts ABC input into actual music notation
TikZ 🔗
Allows for raw TikZ input

Wrappers and Interfaces 🔗

Python, Ruby, Scala, JavaScript, Perl, Pascal, C, R

Contributed Templates 🔗

E-Book generation scripts, Journal, PhD theses, lecture notes, resume/CV, Bootstrap & HTML

Personal Verdict


Neither is Pandoc made to replace Latex (-Beamer), nor is it the painless solution to everything.

There are many annoyances:

Pandoc should in my Opinion be used for:

Slides are possible and good for some poeple if you settle on one format.


Demo!

Relevant websites:


  1. Some Note

  2. also known as the “native format”