table of contents

exporting from org mode with pandoc

2023-12-07

motivation

the need for an emacs-independent exporting of notes arose from the fact that i was using org version 9.7-pre in which exporting functionality is basically broken.

pandoc

because i didnt want to reinvent the wheel, i landed on using pandoc, it has a huge community and i trust it to maintain org-mode parsability in the long run. there are some missing features in pandoc exporting which is why im writing this.

a starting point would be the following command:

pandoc infile.org --standalone --output outfile.html

custom css

if you have a css file that you'd like included, use the --css argument

pandoc infile.org\
       --standalone\
       --output outfile.html\
       --css style.css

custom html preamble/postamble

or more generally, if you have an html file you'd like included in the header, use the --include-in-header argument

pandoc infile.org\
       --standalone\
       --output outfile.html\
       --css style.css\
       --include-in-header=header.html

bibliography

if you have a bibliography file you want pandoc to use to handle citations, you can make use of the --bibliography, --biblatex, --citeproc arguments

pandoc infile.org\
       --standalone\
       --include-in-header=header.html\
       --output outfile.html\
       --css style.css\
       --bibliography=mybibfile.bib --biblatex --citeproc

latex rendering

pandoc supports multiple ways of handling latex snippets, it can use mathjax with the --mathjax argument, a better option for full latex support is rendering them with dvisvgm, using this lua filter that makes use of pandoc's filter api, which is a modified version of https://github.com/oltolm/pandoc-latex-math

--- source: https://github.com/oltolm/pandoc-latex-math

local system = require("pandoc.system")

function appendDepthToSVGFile(depth, svgPath)
    local f = io.open(svgPath, "a")
    f:write(string.format("<!-- depth=%spt -->\n", depth))
    f:close()
end

function NewLatexRender()
    return {
        preamble = [[
            \usepackage{amsmath}
            \usepackage{amsfonts}
            \usepackage{amssymb}
            \usepackage[T2A,T1]{fontenc}
            \usepackage{colordvi}
            \usepackage[active,tightpage]{preview}
        ]],
        latexClass = "article",
        fontEncoding = "utf8",
        fontSize = 12,
        bgcolor = "#FFFFFF",
        latexPath = "latex",
        dvisvgmPath = "dvisvgm"
    }
end

function html2rgb(color)
    return ''
end

function wrapFormula(lr, latexFormula)
    local bgcolor = lr.bgcolor ~= "#FFFFFF" and string.format("\\background{%s}\n", html2rgb(lr.bgcolor)) or ''
    local tex = string.format([[\documentclass[%dpt]{%s}
        \usepackage[%s]{inputenc}
        %s
        \begin{document}
        %s
        \begin{preview}
        %s
        \end{preview}
        \end{document}
        ]], lr.fontSize, lr.latexClass, lr.fontEncoding, lr.preamble, bgcolor, latexFormula)
    -- io.write(string.format("tex: [[%s]]\n", tex))
    return tex
end

function getDepth(out)
    local depth = string.match(out, "depth=(%d*%.?%d*)")
    return tonumber(depth)
end

function renderLatex(lr, latexFormula)
    local latexDocument = wrapFormula(lr, latexFormula)
    local currDir = system.get_working_directory()
    local svgFileName = pandoc.sha1(latexDocument) .. ".svg"
    local svgPath = currDir .. "/" .. svgFileName
    local f = io.open(svgPath, "r")
    if f ~= nil then
        local depth = getDepth(f:read("a"))
        f:close()
        --io.write(string.format("found SVG file=%s with depth=%spt\n", svgPath, depth))
        return depth, svgFileName
    end
    -- SVG file does not exist
    local depth = system.with_temporary_directory("latexmath", function(tmpDir)
        return system.with_working_directory(tmpDir, function()
            io.write(string.format("changed directory to (%s)\n", tmpDir))
            local tmpFile = io.open("latexmath.tex", "w")
            tmpFile:write(latexDocument)
            tmpFile:close()
            local out = command(lr, svgPath)
            local depth = getDepth(out)
            if depth == nil then
                io.write(string.format("%s: depth not found\n", svgPath))
                return nil
            end
            io.write(string.format("%s: depth=%spt\n", svgPath, depth))
            appendDepthToSVGFile(depth, svgPath)
            return depth
        end)
    end)
    return depth, svgFileName
end

function command(lr, svgPath)
    pandoc.pipe(lr.latexPath, {"--interaction=nonstopmode", "latexmath.tex"}, '')
    -- local out = pandoc.pipe(lr.dvisvgmPath, {"--no-fonts", "-o", svgPath, "latexmath.dvi"}, '')
    local f = io.popen(lr.dvisvgmPath .. " --no-fonts -o \"" .. svgPath .. "\" latexmath.dvi 2>&1")
    local out = f:read("a")
    f:close()
    -- io.write(string.format("out: [[%s]]\n", out))
    return out
end

function Math(elem)
    local latexFormula1
    local latexFormula = elem.text
    if elem.mathtype == "InlineMath" then
        latexFormula1 = string.format("\\(%s\\)", latexFormula)
    else
        -- DisplayMath
        latexFormula1 = string.format("\\[%s\\]", latexFormula)
    end
    local lr = NewLatexRender()
    local depth, svgFileName = renderLatex(lr, latexFormula1)
    local attr = {
        alt = latexFormula
    }
    if depth ~= nil then
        attr["style"] = string.format("vertical-align:-%spt", depth)
    end
    -- io.write(string.format("%s\n", dump(attr)))
    return pandoc.Image('', svgFileName, '', attr)
end

function dump(o)
    if type(o) == 'table' then
        local s = '{ '
        for k, v in pairs(o) do
            if type(k) ~= 'number' then
                k = '"' .. k .. '"'
            end
            s = s .. '[' .. k .. '] = ' .. dump(v) .. ','
        end
        return s .. '} '
    else
        return tostring(o)
    end
end

assuming this script is written in a file tex.lua, to use this filter, our command becomes:

pandoc infile.org\
       --standalone\
       --include-in-header=header.html\
       --output outfile.html\
       --css style.css\
       --bibliography=mybibfile.bib --biblatex --citeproc\
       --lua-filter tex.lua

internal links

consider the following org mode snippet:

#+name: my-def
#+begin_definition
we define a set \(A\) to be any unordered collection of objects
#+end_definition

by [[my-def][this definition]], the object {1,2,3} is a set.

although pandoc does expectedly render the definition block as a div with class definition, the link [[my-def][this definition]] doesnt get rendered properly by pandoc, neither does the #+name property.

after some messing around, i found that pandoc does handle the attr_html property of org blocks properly, e.g.

#+attr_html: :id my-def
#+begin_definition
we define a set \(A\) to be any unordered collection of objects
#+end_definition

by [[my-def][this definition]], the object {1,2,3} is a set.

results in the definition div having the id my-def, but we wouldnt want to modify our org files just to make them compatible with pandoc, instead we can do something "hacky" by modifying the stream before piping it into pandoc

sed 's/#+name:/#+attr_html: :id/' infile.org |\
    pandoc --from org\
           --to html5\
           --standalone\
           --include-in-header=header.html\
           --output outfile.html\
           --css style.css\
           --bibliography=mybibfile.bib --biblatex --citeproc\
           --lua-filter tex.lua

this way, the #+name property of the block becomes the id of its corresponding html block, but this still doesnt fix the link issue, since links get rendered as span's and not proper links, to fix this we can use the following lua filter:

function Span(span)
  -- print(dump(elem))
  if span.classes:includes 'spurious-link' then
    local content = span.content[1].content
    local target = span.attributes.target
    return pandoc.Link(content, '#' .. target)
  end
end

assuming this code is placed in the file internal_links.lua, our shell command becomes:

sed 's/#+name:/#+attr_html: :id/' infile.org |\
    pandoc --from org\
           --to html5\
           --standalone\
           --include-in-header=header.html\
           --output outfile.html\
           --css style.css\
           --bibliography=mybibfile.bib --biblatex --citeproc\
           --lua-filter tex.lua\
           --lua-filter internal_links.lua

org-roam links

pandoc on its own has no context of org-roam links, but org-roam stores everything it needs to operate in ~/.emacs.d/org-roam.db, this file is automatically updated if the option org-roam-db-autosync-mode is set to t, this way other programs can be used to query information from org-roam without needing to visit the org files themselves.

based on this fact the following filter that is a modified version of https://www.amoradi.org/20210730173543.html is used:

#!/usr/bin/env python3.10

# source: https://www.amoradi.org/20210730173543.html

import panflute as pf
import sqlite3
import pathlib
import sys
import os
import pprint
import urllib

ORG_ROAM_DB_PATH = "~/.emacs.d/org-roam.db"

db = None

def sanitize_link(elem, doc):
    if type(elem) != pf.Link:
        return None

    if not elem.url.startswith("id:"):
        return None

    file_id = elem.url.split(":")[1]

    cur = db.cursor()
    cur.execute(f"select id, file, title from nodes where id = '\"{file_id}\"';")
    data = cur.fetchone()

    # data contains string that are quoted, we need to remove the quotes
    file_id = data[0][1:-1]
    file_name = urllib.parse.quote(os.path.splitext(os.path.basename(data[1][1:-1]))[0])

    elem.url = f"{file_name}.html"
    return elem

def main(doc=None):
    return pf.run_filter(sanitize_link, doc=doc)

if __name__ == "__main__":
    db = sqlite3.connect(os.path.expanduser(ORG_ROAM_DB_PATH))
    main()

assuming the panflute python package is installed globally, and that this snippet is placed in the file roam_links.py, our exporting shell command becomes:

sed 's/#+name:/#+attr_html: :id/' infile.org |\
    pandoc --from org\
           --to html5\
           --standalone\
           --include-in-header=header.html\
           --output outfile.html\
           --css style.css\
           --bibliography=mybibfile.bib --biblatex --citeproc\
           --lua-filter tex.lua\
           --lua-filter internal_links.lua\
           --filter roam_links.py