Exporting page URLs from Hugo

My previous blog has posts on it with a mix of URL styles, and various historic aliases to those pages. Because I am uncool and didn’t plan my URLs.

To preserve the addresses so links still work—and ultimately redirect here—I needed to generate a list of URLs. As the site was on Hugo, I could do this from the source markdown files, but there’s an easier way: write a layout for the format you want, and iterate the pages. (I assume you can do the same thing from Jekyll or Zola and similar.)

I decided to use TOML format, and write the following layout as themes/theme_name/layouts/_default/everything.toml:

{{ range .Site.Pages }}
[[page]]
aliases = [
  "{{ .RelPermalink }}",
{{ if .Params.aliases }}  {{ range .Params.aliases }}
  "{{ . }}",
{{ end }} {{ end }}
]
{{ end }}

This does the work of iterating pages and generating TOML content. You’ll see the output in a moment.

I also needed two other things. A page to trigger it, as content/everything.md:

---
layout: everything
title: Everything
---

And finally I needed to teach Hugo to recognize TOML in the config:

[mediaTypes]
[mediaTypes."text/toml"]
  suffixes = ["toml"]

[outputFormats]
[outputFormats.TOML]
  mediaType = "text/toml"
  isPlainText = true

[outputs]
page = ["HTML", "TOML"]

You can then hit localhost:1313/everything/index.toml and you get an entry containing a [[page]] for every URL:

[[page]
aliases = [
  "/posts/2022-03-27-tides-twice-a-day/",
  
  "/2022/03/27/tides-twice-a-day",

  "/2022/03/27/tides-twice-a-day.html",

  "/tides-twice-a-day",
 
]

So that works. The only content that didn’t get included was the RSS address. Everything else (pages, posts, landing page) was in the file.

I also tried adding a title and date field, just for sanity checking. With those, I had some problems with escaping of quote marks in titles. I didn’t need these fields, but ended up fixed them in my editor with search and replace, plus a few rounds with a TOML validator.