Rolling My Own Static Site Generator

Table of Contents

1 Why Re-Invent The Wheel

I use Emacs Org mode for taking notes. Org mode has its own markup structure, richer than markdown, and allows you to export to a variety of formats, including XHTML. Emacs Org Mode is extremely powerful, with support for tables and spread sheets, computer algebra, SVG Image. equations, images, and embedded, run-able code snippets. My home directory is littered with Org files representing years of notes, and I am consolidating them into a single notes directory. I wanted to publish some of my notes as blog posts, without much overhead. Furthermore, when Org mode exports a document, it includes metadata such as the publication time, the modification time, keywords, description, etc. I just needed something to create a blog index, landing page, and overall site look and navigation.

I initially started a blog on Blogger. Blogger made it difficult to paste HTML markup generated by Org Mode. I ended up with two versions of my notes/blog posts, and Blogger did not have much control over the layout. I looked at existing static site generators, specifically Pelican. Pelican can take HTML as blog post inputs, but you do not have much control over the layout and indexing without investing a lot of work. Having trouble finding a static site generator that I liked, I decided to roll my own.

I have used XSLT in the past, and for this task, it really is the perfect tool. XSLT, Extensible Stylesheet Language Transformations, is a language designed for translating XML to XML, HTML, or plain text. With XSLT, I am able pull meta-data from Org's XHTML export and add navigation and other modifications to each XHTML file. You can check out the static site generator I built, on my GitHub page.

2 Org File Header

All of the Org files used in the blog have the following header fields: TITLE, DATE, DESCRIPTION, KEYWORDS. For example, this file has the following header.

#+TITLE: Rolling My Own Static Site Generator
#+DATE: [2018-03-29 Thu]
#+DESCRIPTION: How I create my blog with Emacs Org Mode and the shell.
#+KEYWORDS: XSLT, shell, Org

My Emacs init file has the following Org configuration for exporting blog files to XHTML. Metadata is passed to the site generator by placing it in the org-html-preamble-format variable. (I added newlines in the code string for displaying it on the web.) The fields, TITLE, DESCRIPTION, and KEYWORDS are stored in the output XHTML header.

(setq org-publish-project-alist
      '(("blog"
         :publishing-directory "/home/devin/blog/org-html/"
         :base-extension "org"
         :base-directory "/home/devin/blog/org-files/"
         :publishing-function org-html-publish-to-html
         :html-head-include-default-style nil
         :with-tags nil
         :html-html5-fancy t
         :html-container "article"
         :auto-sitemap nil
         :html-divs ((preamble "header" "preamble")
                     (content "main" "content")
                     (postamble "footer" "postamble"))
         :html-preamble t
         :html-postamble nil
         :html-doctype "xhtml-strict"
         :headline-levels 6
         :html-text-markup-alist ((bold . "<str>%s</str>")
                                 (code . "<code>%s</code>")
                                 (italic . "<em>%s</em>")
                                 (strike-through . "<del>%s</del>")
                                 (underline . "<u>%s</u>")
                                 (verbatim . "<code>%s</code>"))))
      org-html-preamble-format '(("en" "<p id='title'>%t</p>
          <p id='subtitle'>%s</p><p id='author'>%a</p><p id='email'>%e</p>
          <p id='date'>%d</p><p id='org-html-creator-string'>%c</p>
          <p id='export-time'>%T</p><p id='mod-time'>%C</p>")))
(defun org-disable-todo-keywords (orig-fun &rest args)
 (let ((org-todo-keywords '())) (apply orig-fun args)))
(advice-add 'org-export-as :around #'org-disable-todo-keywords)
(setq org-html-htmlize-output-type 'css
      org-html-with-latex 'dvisvgm)

An XSLT script gets each table's title, modification time, publication date, and keywords. This data is passed to an AWK script as a CSV. The keywords, in the XHTML files, are already formatted as comma separated lists. This AWK script creates an XML metadata file. Utilizing XSLT, the metadata file is used to generate index pages and links between pages.

3 Blog Threads

Aside from the blog posts, the other user generated file is threads.gxl. This is a Graph eXchange Language XML file, which both Graphviz and XSLT can process. This file must be edited to add a blog post to a thread. Graphviz is used to generate an SVG graph showing in what order blog posts should be read, since posts may be multi-part. The before mentioned AWK script also includes the thread information in the metadata XML file, specifically what blog posts immediately precede or succeed a particular post.

3.1 GXL and Lisp Aside

On a side note, GXL could be used by Emacs Lisp to generate graphs. Emacs Lisp makes it easy to generate XML with libraries such as xmlgen. Emacs can then pass the generated XML to Graphviz, and then display the output image in a buffer. It is not so easy to create Graphviz Dot code because it does not have the same kind of tree structure that Lisp and XML share. It is also difficult to access Graphviz's shared object libraries from Emacs Lisp because Emacs requires that the library have a number of features for it to work. Accessing shared objects is detailed in the Emacs Lisp manual section Emacs Dynamic Modules.

4 Blog Design

4.1 Site Layout and Markup

I decided from the start that the site would use a vertical main navigation bar. Because modern monitors are wide screen and text is generally best limited to 80 characters per line, I believe it is better to put any fixed content in the vertical margins, such as the site's navigation. I also wanted to, where possible, prefer HTML/CSS solutions over JavaScript solutions; the site should be as functional as possible when JavaScript is turned off or blocked, adhering to the principle of graceful degradation. The site's layout relies on CSS Grid and Flexbox. CSS variables are also used throughout the site.

4.2 Media Types

The site's desktop and mobile layout are controlled using CSS media queries. This also means that mobile browsers cannot request the desktop site because the same site serves as both the mobile and desktop version. A problem that I ran into in controlling whether mobile or desktop layout is displayed is that CSS cannot query the screen's physical dimensions, but it can query its resolution and DPI. Modern smart phones have display resolutions comparable to desktop monitors, but their DPIs are much higher. I use the resolution and DPI information to determine whether the mobile or desktop version should be displayed.

The print layout is also controlled in CSS. Navigation is hidden in the print media; only the main body is displayed.

4.3 Navigation Bar

The vertical navigation bar is built using CSS grid. Content inside each cell is positioned using flexbox. The blog menu toggle is implemented using the CSS checkbox hack. The checkbox is moved out of screen and made transparent. This keeps the Blog button navigable with TAB. I preferred this method over JavaScript because the sub-menu is an essential feature that would otherwise not work if JavaScript were not available. Normally, sub-menus are controlled by CSS using the :hover pseudo-class and are drop-downs so that they do not hide the parent menu. Aesthetically, I wanted the sub-menu to replace the main menu. If I were to implement it in JavaScript, I would have to change where the sub-menu was rendered without JavaScript when hovering, and then use JavaScript to change the CSS. Furthermore, mobile view would add further complication.

The sub-menu for blog index navigation has one more element than the main menu. Each corresponding item in the main menu and sub-menu share the same cell. The missing element in the main menu is filled with a <div>. The space between the last menu item and the Back to Top button is also filled with an empty <div>.

4.4 Blog Navigation

Usually, it is difficult to browse through posts made to a blog. They are usually cataloged chronologically and sometimes there is a keyword list. Rarely, are blog posts grouped into series, like a playlist. I created four forms of navigation for the blog, ordered by the original publication date, the date the post was last updated, the categories or keywords listed, and the sequence in which related posts are meant to be read. The first three navigation pages are generated from a single XML file, metadata.xml, which is generated from metadata in each Org file and threads.gxl.

The threads.gxl page contains an inline SVG with embedded hyperlinks. The Safari browser currently does not support such links and many mobile browsers have unknown support, so I added a hidden list of hyperlinks to supplement it.

Content reordering in the Categories page is done using in-browser XSLT, radio buttons, and CSS selectors. A browser without XSLT will get the default ordering. The checkbox hack is used to add custom radio buttons.

4.5 Content

Most of the site's images are SVGs, and the site's logo is created using CSS. This makes the site's visual content easily scalable and the sites logo readable by screen readers and web crawlers. SVG image backgrounds and dimensions are removed when the site is built, using XSLT. The dimensions are kept on equations, whose colors are also changed. I played around with trying to use JavaScript and object tags to do the editing in browser to see if the images would render in text browsers. The text browsers I tested appear to only sometimes render the SVGs regardless of whether size information is present. The object tag is also not well supported by screen readers, and browser inconsistencies might pop-up. The object tag is required to allow JavaScript to modify linked SVGs.

I chose to use SVG to render SVG Image. equations, with dvisvgm, rather than use the popular MathJax (Org Mode's default) JavaScript library, again for graceful degradation. I never use Org Mode's footnotes, so I repurpose them as equation image alt tags.

4.6 Colors

The site's color scheme is based around its background image, which I took in 2009 from Deep Creek, Alaska. It is a setting sun on the left and an active volcano, Mt. Redoubt, on the right. I used the apps Color Harmony, Paletton, and Color Calculator to figure out what colors would work together. I added shadowing to links such that they would pop but also look like they are being supported by clouds.

4.7 Browser Support

I am designing the blog to target popular, currently maintained browsers. Internet Explorer has not been updated since 2013, so I will not bother supporting it. I do not aim to make the browser experience consistent across browsers, using polyfills. What matters is that the site is usable and that the code is W3C compliant.

4.8 Future Enhancements

4.8.1 Dashes, Diacritics, and Ligatures

The site's main font family is Prof. Donald Knuth's CMU Serif. This family of fonts includes ligatures and diacritics. Browsers can replace character patterns with ligatures so long as the font has those letter-combination glyphs. Using pyftsubset, I include only Latin characters and the common ligatures in the font files that are sent to the browser. The program glyphhanger, which I learned about from the YouTube video, Web Fonts Are Not Rocket Science, uses pyftsubset as a back-end and is what lead me to it.

I created an XSLT script which replaces words with versions that have optional break points and diacritics. The words are still find-able using normal text. Initally I attempted to add diacritics by replacing characters with their diacritic counterparts. The original glyph is hidden with CSS and replaced using ::before and ::after pseudo-elements. However, this only works if you can change the original glyph to the background color. Making the character transpartent also did not work because the pseudo-elements cannot have a differernt transparency from the real element. Simply adding the diacritic marker in the ::after content also did not work because its position was inconsistent between browsers. My current thought is to create alternate glyph files where the regular Latin characters are replaced by their diacritic counterparts.

4.8.2 Comment Support

The JavaScript library, ghpages-ghcomments, allows GitHub Page sites to support user comments. The comments are stored as GitHub Issues. This is an example page with comments, and these are the same comments in the Issues page. This library is designed for use with the Jekyll static site generator, so adding it to the site will be a bit more involved.

4.8.3 Selenium Web Driver Testing and Markup Validation

While there is not too much that can go wrong with a static site, I still want to check for broken links. I also want to do automated markup validation using the W3C Nu Markup validator.