Exporting page URLs from Hugo
My previous blog has posts on it with a mix of URL styles, and various historic aliases to those pages. Because I am uncool and didn’t plan my URLs.
To preserve the addresses so links still work—and ultimately redirect here—I needed to generate a list of URLs. As the site was on Hugo, I could do this from the source markdown files, but there’s an easier way: write a layout for the format you want, and iterate the pages. (I assume you can do the same thing from Jekyll or Zola and similar.)
I decided to use TOML format, and write the following layout as themes/theme_name/layouts/_default/everything.toml:
{{ range .Site.Pages }}
[[page]]
aliases = [
"{{ .RelPermalink }}",
{{ if .Params.aliases }} {{ range .Params.aliases }}
"{{ . }}",
{{ end }} {{ end }}
]
{{ end }}This does the work of iterating pages and generating TOML content. You’ll see the output in a moment.
I also needed two other things. A page to trigger it, as content/everything.md:
---
layout: everything
title: Everything
---
And finally I needed to teach Hugo to recognize TOML in the config:
[mediaTypes]
[mediaTypes."text/toml"]
suffixes = ["toml"]
[outputFormats]
[outputFormats.TOML]
mediaType = "text/toml"
isPlainText = true
[outputs]
page = ["HTML", "TOML"]
You can then hit localhost:1313/everything/index.toml and you get an entry containing a [[page]] for every URL:
[[page]
aliases = [
"/posts/2022-03-27-tides-twice-a-day/",
"/2022/03/27/tides-twice-a-day",
"/2022/03/27/tides-twice-a-day.html",
"/tides-twice-a-day",
]
So that works. The only content that didn’t get included was the RSS address. Everything else (pages, posts, landing page) was in the file.
I also tried adding a title and date field, just for sanity checking. With those, I had some problems with escaping of quote marks in titles. I didn’t need these fields, but ended up fixed them in my editor with search and replace, plus a few rounds with a TOML validator.