Automating Documentation with GitHub & Markdown – Part 1 – Setting Up

Automating Documentation with GitHub & Markdown – Part 1 – Setting Up

One of my biggest gripes when doing customer documentation is that invariably I have to start from scratch each time. This is mostly down to the way I work, being self-employed and my nomadic work lifestyle taking me from project to project, customer to customer. But each time I enter a company to augment their team, I always seem to have to start from Document1.docx.

Well, now I have had enough, and I need to make my life easier. One of the values of people employing me is for my skills and knowledge. So perhaps I should create a baseline document template for all my design and artefact work as part of MVC Ltd stock digital assets. After all, I end up writing 80% of the same stuff over and over again.

Now, I could do this just by creating my own Word document and using OneDrive/SharePoint versioning, keep it somewhat up-to-date. But I’ve had constant pains in the past with document formatting in Word and keeping the flow of the document on point for the customer. I wanted something that I can easily write without formatting issues, that was modular so I could pick and choose sections that I want to include in a particular document and something I can maintain easily without having to read 200 pages and making sure references, citations etc. were all aligned.

On my journey to find a solution that works for me, I heard a lot on the community on how Markdown and static content websites are now “the thing”. I had an idea, what if I could use this type of method to control my documentation and template it so that I can fit it into almost every customer?

I had to admit, I never knew what Markdown was and was surprised on how easy it is. Want to learn? Use this website. Having used it now for a month or so I find it liberating and I can hit my flow much easier than in Word. The reason for this is because my fingers just don’t need to leave my keyboard to select a heading or insert an image. It can all be done in Markdown using special characters. For instance, want a level 1 heading use # Your heading here, level 2? ## your level 2 heading here and so on. Want to italicise text? use *italic text here* and much more.

Using Markdown allows me to create a consistent typography that is free from external influences, inherited styles, margins and so on.

So now I am convinced Markdown should be my default authoring language, what do I need in order to make this into a document?

Firstly, I needed an editor. Markdown can be done in notepad or any plain text editor, but I wanted a more intelligent editor that can interpret the plain markdown code and display it in a formatted way so that I have a visual idea of how the final document will look. After a few trials I settled on Typora. Best of all it is free!

Example of Typora Interface

The next element I needed to consider was how to structure the documentation so that I could reuse elements to create a customer facing document that was relevant to their requirements.

Writing a single markdown document wouldn’t cut it because it would mean that I would need several similar documents to cover the core scenarios I meander between. So my style of documentation and structure needed to change.

I approached it in a modular way. I would write a section on each element of Teams, split out things that could be add-ons to the core functionality e.g. splitting Teams Meetings from Audio Conferencing, and having an Audio Conferencing section. Also have elements that cover calling plans, and others that cover direct routing and so on.

I began to create a structure where every document would have a set of base elements, and from that I could add on sections I wanted. The basic rule I set out with was that I would not cross reference sections, or mention them in any other section. E.g. I wouldn’t mention audio conferencing in the Teams Meeting Section, or Direct Routing in the Calling Plan Section etc. That way the final document would not be eluding to missing sections and stay on point.

The next issue to solve was to figure out how to bundle these elements together to produce a final customer facing version. I knew that documents created from this template approach would fit 80% of the final version, but at least it could save me several weeks of writing.

After searching the internet for a bit, I found a program called Pandoc which is a free, open source command line document reader and writer that supports Markdown inputs and converts them to a wide array of formats, including PDF, Word, Html and more.

A sample of code to generate a document in pandoc:

pandoc mymarkdownfile.md -o mywordfile.docx 

So now I have a way of converting my source files to a Word document. Next I wanted a way to be able to use document variables. What if I want to customise the document to include the customer name, how many users, licenses or any other variable to make the document look more personalised to the customer, rather than a sterile, generic, boiler plated document?

Pandoc has a feature called filters. A filter is a python script that Pandoc will process while the document is in its AST (a middle space between the source file and the output, before it writes the output). These scripts can be anything you can create. For me, I wanted to be able to replace placeholders in my source file with the defined output value I wanted.

Again, I searched the internet and found a free filter that Michael Stepner made called Pandoc-Mustache. This filter looked for placeholders within double-curly brackets {{variable here}}, hence the name mustache.

Example of a mustache variable

In order for Pandoc to be able to process this filter, the filter needs to understand what variables there are in the files to look for, and what to replace them with. This is done using a variable file which must be saved as a .YAML file.

Inside this YAML file you declare your variables and values e.g.

customer = MVC Ltd
tenant name = mvc.onmicrosoft.com
totalusers = '100'

Save the file as docmeta.yaml in reality the name is arbitrary.

Now in my markdown files, every document will begin with the same starting element e.g. docintro.md which will contain things like executive summary, dependencies, requirements, purpose and solution overview. In this file I want to add some meta data, called YAML Front Matter. In here we add the source to the YAML file I created with the variables.

---
mustache: .\docmeta.yaml
---

Front Matter is declared by encapsulating the matter between three dashes at the beginning of the document.

Now that I have this, I can now reference these variables in my documentation, and if I wanted to generate a new one, I would change the variables in the YAML and run Pandoc and voilla! document created.

pandoc mymarkdownfile.md --filter pandoc-mustache --s -o mywordfile.docx
Example Extract from created Word document

So now I had a method. What if I wanted to add more than one markdown file? Pandoc makes this really easy, you just have to declare them in the order you want them to be processed.

pandoc docintro.md teamscliend.md teamsmeetings.md --filter pandoc-mustache --s -o mywordfile.docx

Remember to include the YAML only in the first document in the input list. It is not needed in others and if you do put it in to cover yourself, beware there is a processing bug that writes the front matter as literal text in your word file. I found this quite frustrating.

So now I have an end to end structure and process I can use to generate a document. But there are still things I need to consider. What if other people need to create a document from these source files? I don’t want them to have to email me to create one, nor do I want them to store their own versions of these files on their desktop.

Where you store these, is entirely up to you. But bear in mind that with lots of elements the script you will end up running will be quite big and depends on the document structure each design will need. I chose GitHub because its just easy!

One thing to be really aware of is inserting images. In markdown you reference the image location instead of embedding the image in the markdown file. Then when processed by Pandoc, the image is fetched from it’s location and embedded into the Word file. This means that for others, the path to the image must be accessible. It is another reason I chose GitHub.

To insert an image from a Github in markdown, use this syntax

![](https://github.com/path to image/image.png?raw=true)

Out of the box, Pandoc will use the normal document template from Word to format the output document. This will use the Normal style which is likely not to suit your corporate styling. So to override this we reference a document that has been stylised to meet your corporate branding.

If you haven’t got a reference document create one from the default pandoc template using this code

pandoc -o custom-reference.docx --print-default-data-file reference.docx

Now modify the reference.docx with your styling. Pandoc uses the following style names when converting formatting

  • Normal
  • Body Text
  • First Paragraph
  • Compact
  • Title
  • Subtitle
  • Author
  • Date
  • Abstract
  • Bibliography
  • Heading 1
  • Heading 2
  • Heading 3
  • Heading 4
  • Heading 5
  • Heading 6
  • Heading 7
  • Heading 8
  • Heading 9
  • Block Text
  • Footnote Text
  • Definition Term
  • Definition
  • Caption
  • Table Caption
  • Table Normal
  • Image Caption
  • Figure
  • Captioned Figure
  • TOC Heading

Make sure you style each of them to suit your style. Add in any header and footer imagery or elements you want and save it.

Now when you run Pandoc, reference the template to generate a more corporate document

pandoc mymarkdownfile.md --filter pandoc-mustache --reference-doc reference.docx --s -o mywordfile.docx
Example of a styled Word Document from reference template.

So now I am finished with my local proof of concept. Onwards to putting this on GitHub and then generating documents from that source in the next post. However, before I go, let me go through how to install all the tooling you need.

If you want to add a table of contents at the beginning of the document, use this code

pandoc mymarkdownfile.md --filter pandoc-mustache --reference-doc reference.docx --toc --s -o mywordfile.docx 

Note that it is not possible to add a Cover Page to the template reference document as all content that isn’t in the header and footer section is ignored by the Pandoc writer processor. So this will have to be inserted post processing.

The same components work on Mac and Windows, but the Mac setup is a little different.

Windows

First you need to install python for windows. Please install version 3.7.4 https://www.python.org/downloads/windows/

Once python is installed, you need to install PIP. Download PIP for Windows

Open Command Prompt and type in python c:\path to\get-pip.py

Now head over to Pandoc.org and download Pandoc for Windows and follow the install instructions

Now we need to install some python libraries for Pandoc-Mustache

Open Command Prompt and type in

pip install panflute
pip install pyyaml
pip install future
pip install pystache

Once installed we can install Pandoc-Mustache using this command

pip install pandoc-mustache

Do not use the -U switch!

Mac OSX

You can install pandoc using brew or by downloading the binaries. Also install python using brew and then follow the pip commands from Windows. make sure you run these under sudo.

Now your machine is prepped with the tools you need to start creating your document.

In Part 2 I will show you how to use GitHub and protect your documentation from the public view.

Advertisements

One thought on “Automating Documentation with GitHub & Markdown – Part 1 – Setting Up

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: