Experiment with Buf and Starlark Docgen

Photo by Tim Johnson on Unsplash

Experiment with Buf and Starlark Docgen

·

5 min read

This post documents some experimentation. It's not useful guidance for you to follow today!

I've spent a lot of time thinking about API docs for Bazel rules. I spent several weeks writing docs.aspect.build/rules with very thoughtful presentation of those APIs. These are written in the "Starlark" language, so we need two things:

  • an extractor that uses a starlark interpreter to parse the code, including load statements to other files that contain bits of documentation

  • a renderer to format the documentation in a readable way, typically in Markdown or HTML

Everyone has historically used https://github.com/bazelbuild/stardoc for these tasks. I've found it problematic recently for a variety of reasons:

  • It expects bzl_library targets which come from a module that's no longer accepting PRs: https://github.com/orgs/bazelbuild/discussions/3

  • The gazelle generator for bzl_library is a separate module in the bazel_skylib repo which makes it a pain to install. It also requires building Go code from source in the user's build, even if they only changed a character in a docstring and have no Go code.

  • Stardoc itself requires building protoc from source, which demands a functional C++ compiler in the user's build. They recently rejected a fix for this which frustrated me.

  • Stardoc's rendering uses a clone of Velocity which is an ancient Java thing I used 20 years ago. (Actually I preferred FreeMarker!)

  • Because it doesn't ship the renderer as a pre-built binary, Stardoc leaks a maven_install into the user's build. Then you get yelled at if you have other Maven dependencies, due to something in rules_jvm_external. I've had other problems with their bzlmod-ification and breaking changes.

Finally, and maybe I'm biased: Stardoc feels hard to contribute to because it's controlled by a Google team who seem very overcommitted on other things. It needs a lot of love - for example bullet lists appearing inside a markdown table never look right. I appreciate their work! And I'll probably send more PRs there.

But I'm curious how easily it can be replaced. Let's use it as an excuse to learn and experiment!

Extraction is a Bazel native rule

As of Bazel 7, https://bazel.build/reference/be/general#starlark_doc_extract is built-in to Bazel, and stardoc 0.6.0 started using that extractor. We don't need Stardoc at all to pull the docstrings out of our Bazel rules and macros. This rule wraps the ModuleInfoExtractor.

I wish that was a standalone java_binary program so we could just visit our bzl_library targets with an Aspect! Then docgen wouldn't have to be spelled out in BUILD files at all! But that's a bigger project, so for this experiment I'll just use starlark_doc_extract rules in the BUILD files of the module.

# No load() statements required!
starlark_doc_extract(
  name = "defs.doc_extract",
  src = ":defs.bzl",
)

It produces a binary protocol buffer output. Ugh, does that mean we need rules_proto and the whole ugly process of figuring out Bazel's Protobuf story? Thankfully no, we can rely on the Buf Schema Registry to skip this whole step and immediately parse the result.

Buf Schema Registry

I think of it by analogy: Aspect is to Bazel as Buf is to Protocol Buffers. They're making an awesome end-user experience for a Google technology that has previously been really painful to use outside of Google's monorepo (google3).

In google3 there's a "global proto DB" which famously ran on Jeff Dean's desktop computer. (There was a build outage once when he was on vacation and his credentials expired). Buf runs a registry that provides a similar "global database" of schema definitions. You can use a private instance for the schemas within your company. For this example, I need the public Bazel schema, here it is for Bazel 7.2.1 (latest version as of writing) https://buf.build/bazel/bazel/docs/7.2.1

We can search that site for the schema describing the starlark extractor output, and find that it's stardoc_output.ModuleInfo. We're already up and running and can work with the data! Here's what that looks like on the command line using buf convert:

alexeagle@aspect-build % bazel build example:defs.doc_extract
INFO: Analyzed target //example:_defs.doc_extract (5 packages loaded, 9 targets configured).
INFO: Found 1 target...
Target //example:defs.doc_extract up-to-date:
  bazel-bin/example/defs.doc_extract.binaryproto

alexeagle@aspect-build % buf convert buf.build/bazel/bazel --type=stardoc_output.ModuleInfo --from=bazel-bin/example/defs.doc_extract.binaryproto
{"ruleInfo":[{"ruleName":"my_rule","attribute":[{"name":"name","docString":"A unique name for this target.","type":"NAME","mandatory":true}],"originKey":{"name":"my_rule","file":"//example:defs.bzl"}}],"funcInfo":[{"functionName":"my_macro","parameter":[{"name":"name","mandatory":true},{"name":"kwargs"}],"originKey":{"name":"my_macro","file":"//example:defs.bzl"}}],"moduleDocstring":"These are some bazel rules.\n\nThe docstring is multiple lines.","file":"//example:defs.bzl"}%

Writing a renderer

It's neat to see our docs as a JSON object. To see it as Markdown, we just need a template engine and a markdown template. There are TONS of options for this. Today I'll pick Handlebars which is light-weight and written by the famous and trusted Yehuda Katz.

The Buf schema registry can act as a language-specific package registry, so it's just as easy for me to wire this JavaScript library to read the binary protobuf data. I just need to install the bazel/bazel schema package, choosing the protobuf-es SDK. You can click around on https://buf.build/bazel/bazel/sdks/7.2.1 to see the options for your language. I quickly arrive at the incantation npm install @buf/bazel_bazel.bufbuild_es.

From our docsite code behind docs.aspect.build I already had a TypeScript program that uses Handlebars to render Markdown. Here's a short pseudocode so you can see the protobuf SDK being used to unmarshal the data:

import { ModuleInfoSchema } from '@buf/bazel_bazel.bufbuild_es/src/main/java/com/google/devtools/build/skydoc/rendering/proto/stardoc_output_pb.js'
import { fromBinary } from '@bufbuild/protobuf'
import Handlebars from 'handlebars'

const doc = fromBinary(ModuleInfoSchema, fs.readFileSync(argv[0]))
const template = Handlebars.compile(fs.readFileSync(<path to module.tmpl.md>))
console.log(template({doc}))

That's enough to create a program we can call to convert binary proto to markdown. To make it usable, we can expose a macro for use in our BUILD files:

load("@aspect_rules_js//js:defs.bzl", "js_run_binary")

def starlark_doc(name, src, out = None, deps = [], **kwargs):
    out = out or name + ".md"
    extract_target = "_{}.doc_extract".format(name)

    native.starlark_doc_extract(
        name = extract_target,
        src = src,
        deps = deps,
    )

    js_run_binary(
        name = name,
        srcs = [extract_target],
        tool = Label("//docgen/starlark:render"),
        args = ["$(rootpath {})".format(extract_target)],
        stdout = out,
    )

This just wires the extractor and renderer together for easy use. The result is then available for whatever we'd like to do with generated documentation.

Working code for this blog post is at https://github.com/alexeagle/rules_docgen - maybe this will turn into something usable in the future. For now, it was interesting to learn more about Stardoc and Buf!