How I Use Obsidian as a CMS for my Website

Introduction

My website gatlin.io is sourcing content from my Obsidian vault. In my opinion this is an exciting setup because I can create content in Obsidian as I do normally, adding a property public: true when I want to share it with others. Finally, I run a script within my website repo to scan my vault and turn all public: true notes content.

In this post, I will share my technical setup for anyone who finds it interesting or wants to do the same.

Preliminary Points

The core workflow is in a public script I wrote called metamark, available on npm
My website is a Nextjs website hosted on Vercel.
On my local machine, my personal website repo and my Obsidian vault are sibling directories.
My Obsidian vault has an essentially flat file structure. (See My PKM Workflow#PKM Notes Obsidian for more.)

Abstract

In my personal website repo I have metamark as a compile time dependency. I use it to iterate through my vault files. Metamark parses the frontmatter of my notes for a public boolean. If public: true it adds that note file to a "public files set". Next, it iterates through the "public files set", turning each file into structured data. The structured data includes HTML transformations (via unified.js tooling), table of contents extractions, and more. Finally, I write that payload to a large contents.json file within my website repo. contents.json is JSON structured data, meaning you can run arbitrary code against it, allowing you to recombine the content in any way you see fit. content.json is the "source of truth" for my Obsidian content which is then read at compile time by Nextjs (which means statically generated website pages) and finally served by Vercel.

Obsidian Frontmatter

To begin with, I tag my content using a typical approach of yaml frontmatter. Most notably, when a file is ready to be viewed, I mark it public: true. Here's an example of some of the more important metadata items.

---
public: true
tags: [note]
created: ...
updated: ...
---

The script I use parses this data with gray-matter. If public both exists and is set to true, then I add the file name to the public files set. This is all I do on the first pass.

The reason for a public files set

Obsidian is based on wiki links. One file will reference an arbitrary number of other files within Obsidian. How to handle these wiki links depends on whether those other files are public. If a linked file is not public, then the link should be removed and instead rendered as plaintext. Otherwise, it needs to undergo further processing to be turned into an HTML link.

The strategy of doing a first pass to identify which pages are public allows the second pass to target only public files for parsing, and to determine how to transform the wiki links in those files into either text or an html link.

Parse public files

After I have collected the set of public files, I iterate through that set and parse each file into structured data. The primary type is FileData, which has the following properties.

export interface FileData {
  fileName: string;
  slug: string;
  firstParagraphText: string;
  frontmatter: Record<string, any>;
  html: string;
  toc: TocItem[];
}

Finally, I take my list of generated data and write it to a contents.json file. Nextjs will then read that in and it becomes a data source.

Building the contents with metamark

In my website repo, I have a file in bin/buildContents.mjs that has the following code:

#! /usr/bin/env node

import m from 'metamark'

// important! run this script only from the root of this repo
const data = m.obsidian.vault.process('../vault')
const jsonString = m.utility.jsonStringify(data)
m.utility.writeToFileSync('./contents.json', jsonString)

Running this script generates the top-level file contents.json. I check-in that file to github. It contains only the information I set to public so there are no secrets in that file (so long as I didn't accidentally make a private file public!).

One interesting fact about this workflow is that only public is in contenst.json, meaning you could make your personal website repo public.

Another interesting idea I haven't explored is parsing even private files into contents.json, doing that only at compile time, leveraging private data to generate static pages, and then deleting contents.json post-build. This means you could still have a public repo while your built website still leveraged private data. (I don't currently have a use case for this, but if anyone using metamark is interested in it open up an issue.)

Serving the content in Nextjs

I will keep this section brief since I covered some of it before in a previous blog post, Nextjs Markdown Blog Setup.

I serve all my content - blogs, guides, and notes - through a single /content/[slug] route, which is inspired by the path structure of Wikipedia.
I use getStaticPaths to read all the content slugs at compile time.
Those slugs get passed via getStaticProps. I use the slug as a "soft key" to find the rest of the content. I then prune the content down into whatever is needed for that particular page. This is also done at compile time.

The Upsides

My content is decoupled from my codebase.

I was initially writing content within my website repo while also maintaining an Obsidian vault. Keeping them in-sync was not feasible. I had WIP content in one, experimental content in another, and copied content from each into the other one. It was unpleasant. Having my content all in one place is a relief!

Notes are developed in Obsidian, website is developed in repo

I write all my notes, at any stage of development, in my Obsidian vault. I write all my code, at any stage of development, in my website repo.

I used to manage in-progress content and in-progress code changes all within my website repo. One does not write content the way one writes code. So, in brief, it was ghastly. Dirty working trees were the abysmal commons. But, now, I can just go to to my website repo when I want to change the code on the website. Then, when my code is in a good (enough) spot, I can just run my data-import script when I want to update content.

I also now don't have to manage works in progress in my repo. Everything can be a work in progress in Obsidian. I can then simply flip public: false to public: true. Now, the next time I run my import script, it will regenerate contents.json and the file will be present.

Independent table of contents as data

In the past I have disliked using css-fu or other complicated strategies to manipulate a generated HTML ToC within a page. It would inject a h2 at the top of the content somewhere and I had to be a CSS wizard to get it looking the way I wanted it to across all the different screen sizes. I wanted more control than what this solution was offering.

Turning the ToC into an in-memory object allows me to manipulate the display of it however I want. It is now a pure-data input to an arbitrary function. The potential UX here is essentially unbounded, which is exciting to think about design-wise.

Arbitrary access to metadata

This is essentially the same point as the ToC point above. I read in all the frontmatter for each note and turn it into JSON data. The potential UX here is unbounded. For example, I use it currently to display page aliases.

The Downsides

Realtime edits are not viewable in `dev` mode.

In my old setup, where I edited content within my website repo, a simple refresh showed my content "as the user would experience it". This was a fast feedback loop.

Now, I have to run my import script, restart my dev env, and then reload the page.

My old "editor hat" workflow was reading it on the website to get a sense of the reading experience, and from that experience I would then make edits. I now lack that fast-feedback access to the "reader experience". I will have to restructure my "editor hat" workflow to accomodate.

One error I didn't catch for a while was when, in metamark, I accidentally disabled syntax highlight for most languages. Since I was mostly only reading my own content on Obsidian and not my website, it took me a very long time to notice that syntax highlighting was not working.

All this said, the decoupling of content from code feels unquestionably worth it, in my opinion, even years later.

Wiki links are not "to spec"

Wiki links are not in the markdown specification. They can only be parsed by specialised tools, like Obsidian itself. This makes handling them non-trivial.

The target audience is ambiguous

The way you write content for a website can be different from the way you write content for your own Personal Knowledge Management system. Which voice you use and which audience you are thinking about has an effect on your writing. For example I open this blog post talking about how "my website" sources content from my Obsidian vault. But, if you are not me, you are already reading this on "my website". So, I'm talking about "my website" like it's not what you are currently reading. I do this because I don't write it on "my website". I could change the opening text to say "This website...", but that's odd to write when I'm writing it in Obsidian. This is an example of picking an audience and writing for them. It feels arbitrary yet necessary.

Other Obsidian features are also not "to spec"

For each current and future Obsidian syntactic feature, the processing pipeline would have to handle its incorporation or removal. (It would be interesting if Obsidian was open source so I could see how they do it with Obsidian Publish.) The core problem is that I will be forever chasing down Obsidian syntax and making sure my own pipeline can handle it appropriately. I fear my pipeline will balloon in complexity over time, or I won't bother with it and it won't keep up with Obsidian over time. This fear has not been realized yet in my years of using metamark, but still, I feel it hangs over the project.

Probably the most important Obsidian feature that I have not solved for is the embed syntax: ![[...]]. It might be as simple as an embedded iframe, but I worry that isn't a particularly pleasant reading experience.

Closing notes

I wouldn't be able to do this without Unified.js

The most important thing enabling all this functionality is the unified.js ecosystem, particularly remark. They have chatted with me about my approach over time, and helped me rethink some ideas. I found it difficult at first to get started with the ecosystem, but once I got used to things it was empowering to manipulate my own markdown abstract syntax trees. I encourage people to check it out.

A lot of the work I'm doing is building on top of the remark-wiki-links library, so big thank you to the maintainer of that lib for making my life easier.

Obsidian Publish

If you are using Obsidian and would say your use case is rather straightforward, e.g., "I just want some of my vault notes made public, I don't care how", then Obsidian Publish might be a good option for you.

If you desire custom handling of your content, then metamark might be a tool worth using!

Conclusion

I am going to continue working on metamark, remark-obsidian-link, and my content pipeline. I am hopeful that my work could lead to easy-to-use utilities that could help others, especially since it seems more and more people are adopting "plaintext-forward" knowledge management tools.

Thanks for reading.