Proof of Concept – Migrating Notes RichText to Markdown – Part I

First of all: this is not a blog post about “how to move my data to another platform”. It is about combining several (long) existing technologies to move some of your structured data to a lossless format like Markdown.

But lets start from the beginning (aka: the use case).

Most organizations have standardized content like organizational rules or application documentation stored in Notes databases. Which is totally fine when you (only) have to deliver that content to IBM Notes clients. There are also several ways to display that content in the web, may it be via old-school Notes web display, customized XPages or open-source applications like the OpenNTF Help application.

 

The question is: Why should I migrate some of this data? And why to Markdown?

It’s becoming a different story when you want – or have – to have that content available in environments where you don’t have an IBM Domino server running. Or when you want to deliver that content in a printable format like PDF. This is where Markdown comes into play.

Many people know Markdown as a “documentation syntax for developers”, mainly from GitHub. But there are also many other use-cases for using it. Like in flat-file CMS’s, blogs (Mark does that) – or for documentation content like in this case.

I became a big proponent of Markdown over the last years. It’s quite simple structure allows also non-technical people to write documentation in a consistent manner. It frees the authors from the hassle of thinking about how to format text, like how many new lines have to be before a heading or the font style of a to be highlighted comment. As Markdown is “only” structured text one can easily create new target formats like static HTML pages, PDF, ePub, Word documents and also Notes documents from it. So it makes it a perfect candidate to store your content in this format. Last but not least – Markdown files are just text files. So having them in source control is also possible – and recommended.

There are more pros (and some cons), I leave their evaluation up to you.

 

The goal and some preconditions

Here are the preconditions I’ve set to myself for this PoC.

  • Read Notes formatted RichText from an existing Notes database.
  • Make no modifications to the existing design of the database.
  • Don’t put anything on the IBM Domino server.
  • There should be no IBM Notes/Domino version dependency.
  • The conversion should run on a machine that has no IBM Notes/Domino technology installed.
  • Convert the to be read RichText to Markdown.

Let’s break them down into some details.

 

Read Notes formatted RichText from an existing Notes database

I make the assumption that most documentation is stored as RichText. If we’d only talk about plain text it wouldn’t make fun. 😉

RichText may include some formatted text, like headlines, underlined or bold text, inline images and more.

 

Make no modifications to the existing design of the database

Everything should work out of the box. Modifying the design of an existing database is sometimes not an option, so I want to use functionality that is generally available.

 

Don’t put anything on the IBM Domino server

Depending on how you want to get the data it’s maybe needed to put something on the Domino server. Like a plug-in (more on that later). I don’t want that – trying to keep it simple.

 

There should be no IBM Notes/Domino version dependency

The conversion should be independent of the used version of Notes/Domino. That means that the source content (==database) can also be hosted on a R7 or R6 server.

 

The conversion should run on a machine that has no Notes/Domino technology installed

This may sound strange. But I want to run the conversion process remotely. You’ll learn later why.

 

Convert the to be read RichText to Markdown

Last but not least the important bit, the goal: convert the read RichText to Markdown. I think it’s clear that a conversion from one format to another is (mostly) never free from manual afterchecks. So I’m going with a good 80/20 fidelity assumption.

 

Having said that I’ll proceed in part two of this blog series with the technology decisions.