DOCS-2868: Add llms.txt generation for LLM-friendly documentation#2556
DOCS-2868: Add llms.txt generation for LLM-friendly documentation#2556ctauchen wants to merge 2 commits intotigera:mainfrom
Conversation
Custom Docusaurus postBuild plugin that processes rendered HTML to generate hierarchical llms.txt and llms-full.txt files for each product (Calico OSS, Enterprise, Cloud). Files are committed to static/ for git-tracked change history and zero PR build overhead. DOCS-2868 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
✅ Deploy Preview for calico-docs-preview-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview succeeded!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
This PR adds an opt-in (GENERATE_LLMS=true) Docusaurus postBuild plugin to generate llms.txt and llms-full.txt files from rendered HTML, and wires up automation to keep the generated outputs committed under static/ so they’re served at well-known paths on docs.tigera.io.
Changes:
- Added a custom
docusaurus-plugin-llms-txtto extract main doc content from built HTML and convert it back to Markdown forllms.txt/llms-full.txtoutputs. - Registered the plugin in
docusaurus.config.jswith curated top pages and per-product descriptions. - Added
make generate-llmsand a scheduled GitHub Actions workflow to regenerate and commit updatedstatic/*/llms*.txtoutputs.
Reviewed changes
Copilot reviewed 13 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/plugins/docusaurus-plugin-llms-txt/index.js |
Main plugin entry; locates docs plugin instances, processes latest version docs, writes llms.txt + llms-full.txt. |
src/plugins/docusaurus-plugin-llms-txt/sidebar-utils.js |
Walks sidebars to produce ordered doc lists with section labels. |
src/plugins/docusaurus-plugin-llms-txt/extract.js |
Extracts core content from built HTML and expands tab content before conversion. |
src/plugins/docusaurus-plugin-llms-txt/convert.js |
Converts extracted HTML to Markdown via rehype/remark with custom handlers for admonitions, tabs, links, code. |
src/plugins/docusaurus-plugin-llms-txt/generate.js |
Formats per-product indexes and full concatenations, plus the root index. |
docusaurus.config.js |
Registers the plugin and configures descriptions/top pages/optional sections. |
Makefile |
Adds generate-llms and generate-llms-commit targets for local and automated regeneration. |
.github/workflows/generate-llms-txt.yml |
Weekly/manual workflow to regenerate and commit updated static/ outputs. |
package.json |
Adds unified/rehype/remark-related dev dependencies used by the generator. |
yarn.lock |
Lockfile updates for the added dependencies. |
static/llms.txt |
Adds the generated root llms index file to be served at /llms.txt. |
static/calico/llms.txt |
Adds generated Calico OSS per-product index. |
static/calico-enterprise/llms.txt |
Adds generated Calico Enterprise per-product index. |
static/calico-cloud/llms.txt |
Adds generated Calico Cloud per-product index. |
| */ | ||
| async function processDoc(doc, sectionLabel, outDir, siteUrl) { | ||
| // Map permalink to HTML file path | ||
| const htmlPath = path.join(outDir, doc.permalink, 'index.html'); |
There was a problem hiding this comment.
path.join(outDir, doc.permalink, 'index.html') will ignore outDir when doc.permalink starts with / (Node treats it as an absolute path). Since Docusaurus permalinks are typically absolute (e.g. /calico/latest/...), this can make the plugin look for HTML under the filesystem root and silently skip pages. Strip the leading slash (or use a safe join helper) when mapping permalinks to outDir paths.
| const htmlPath = path.join(outDir, doc.permalink, 'index.html'); | |
| const relativePermalink = doc.permalink.startsWith('/') | |
| ? doc.permalink.slice(1) | |
| : doc.permalink; | |
| const htmlPath = path.join(outDir, relativePermalink, 'index.html'); |
There was a problem hiding this comment.
🤖 This is actually a false positive — path.join() does not treat segments with a leading / as absolute paths (that's path.resolve()'s behavior). Verified:
path.join('/tmp/build', '/calico/latest/foo', 'index.html')
// → '/tmp/build/calico/latest/foo/index.html'The current code works correctly for all 1,033 pages. No change needed.
docusaurus.config.js
Outdated
| '/calico/latest/networking/configuring/bgp', | ||
| '/calico-cloud/get-started/system-requirements', | ||
| ], | ||
| optionalSections: ['release-notes'], |
There was a problem hiding this comment.
optionalSections: ['release-notes'] won’t match section labels like "Release notes" because isOptionalSection() does a simple includes() check without normalizing punctuation/whitespace. As a result, Release notes sections won’t be grouped under ## Optional as intended. Consider changing the pattern to release notes (space) or normalizing both strings (e.g., replace non-alphanumerics with -) before matching.
| optionalSections: ['release-notes'], | |
| optionalSections: ['release notes'], |
There was a problem hiding this comment.
🤖 Good catch — the sidebar label is "Release notes" (space) but the config had 'release-notes' (hyphen), so the includes() check never matched. Fixed by changing to 'release notes'. Release notes now correctly appear under ## Optional in per-product llms.txt files.
docusaurus.config.js
Outdated
| '/calico/latest/network-policy/get-started/calico-policy/calico-network-policy', | ||
| '/calico-enterprise/latest/network-policy/policy-tiers/tiered-policy', | ||
| '/calico/latest/operations/ebpf/enabling-ebpf', | ||
| '/calico-enterprise/latest/observability/', |
There was a problem hiding this comment.
topPages includes /calico-enterprise/latest/observability/, but the generated static/llms.txt Top Pages section doesn’t contain an Observability entry. That suggests the permalink isn’t resolving during generation (likely a trailing-slash mismatch) and the page is being dropped from the curated list. Normalize permalinks (e.g., consistently add/remove trailing /) when building docsByPermalink / resolving topPages, or update this entry to exactly match the doc’s permalink.
| '/calico-enterprise/latest/observability/', | |
| '/calico-enterprise/latest/observability', |
There was a problem hiding this comment.
🤖 Good catch on the trailing-slash mismatch. Rather than picking one format, I added permalink normalization (strip trailing slashes) to both the map build and the lookup in index.js. This handles index pages (which Docusaurus gives trailing slashes) and regular pages consistently. The observability page now appears in Top Pages.
| ## Top Pages | ||
|
|
||
| - [Calico quickstart guide](https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart): Quickstart for Calico. | ||
| - [Quickstart for Calico Enterprise on Kubernetes](https://docs.tigera.io/calico-enterprise/latest/getting-started/install-on-clusters/kubernetes/quickstart): Install Calico Enterprise on a single-host Kubernetes cluster for testing or development. | ||
| - [What happens when you connect a cluster to Calico Cloud](https://docs.tigera.io/calico-cloud/get-started/connect-cluster): Get answers to your questions about connecting to Calico Cloud. | ||
| - [Determine best networking option](https://docs.tigera.io/calico/latest/networking/determine-best-networking): Learn about the different networking options Calico supports so you can choose the best option for your needs. | ||
| - [Get started with Calico network policy](https://docs.tigera.io/calico/latest/network-policy/get-started/calico-policy/calico-network-policy): Create your first Calico network policies. Shows the rich features using sample policies that extend native Kubernetes network policy. | ||
| - [Get started with policy tiers](https://docs.tigera.io/calico-enterprise/latest/network-policy/policy-tiers/tiered-policy): Understand how tiered policy works and supports microsegmentation. | ||
| - [Enabling the eBPF data plane](https://docs.tigera.io/calico/latest/operations/ebpf/enabling-ebpf): Step-by-step instructions for enabling the eBPF data plane. | ||
| - [Configure BGP peering](https://docs.tigera.io/calico/latest/networking/configuring/bgp): Configure BGP peering with full mesh, node-specific peering, ToR, and/or Calico route reflectors. | ||
| - [System requirements](https://docs.tigera.io/calico-cloud/get-started/system-requirements): Review cluster requirements to connect to Calico Cloud. | ||
|
|
There was a problem hiding this comment.
The Top Pages list is missing the Calico Enterprise Observability page (configured in docusaurus.config.js as a curated top page). Please regenerate static/llms.txt after fixing the permalink resolution so the published index matches the intended top pages.
There was a problem hiding this comment.
🤖 Fixed — static/llms.txt has been regenerated. All 10 curated top pages now appear correctly, including the Calico Enterprise Observability entry.
| - name: Install dependencies | ||
| run: yarn install --immutable | ||
|
|
There was a problem hiding this comment.
This workflow runs yarn install --immutable, but make generate-llms depends on init, which runs $(YARN) install again. That double-install adds unnecessary CI time and can introduce differences between the two installs. Consider removing the explicit Install dependencies step, or adding a make target that builds without re-running init when deps are already installed (or adjust init to skip install when node_modules/.yarn/install-state.gz is present).
| - name: Install dependencies | |
| run: yarn install --immutable |
There was a problem hiding this comment.
🤖 Agreed — removed the explicit Install dependencies step from the workflow. make generate-llms depends on the init target which handles yarn install, so the separate step was redundant.
ctauchen
left a comment
There was a problem hiding this comment.
Addressing Copilot review feedback
| 'calico-enterprise': 'Enterprise-grade networking, security, and observability for Kubernetes.', | ||
| 'calico-cloud': 'SaaS-based Kubernetes security and observability platform.', | ||
| }, | ||
| topPages: [ |
There was a problem hiding this comment.
This manual config section lets us choose, based on our own criteria, which pages we highlight.
| @@ -0,0 +1,64 @@ | |||
| name: Generate llms.txt | |||
There was a problem hiding this comment.
@danudey PTAL. This looks good to me, but maybe there's something else worth doing?
- Fix optionalSections pattern: use 'release notes' (space) to match sidebar labels instead of 'release-notes' (hyphen) - Add permalink normalization (strip trailing slashes) for robust top page lookup, fixing missing observability entry - Remove redundant Install dependencies step from CI workflow since make generate-llms already handles it via the init target - Regenerate all static/llms.txt files with fixes applied Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

What is llms.txt?
llms.txt is a proposed standard (similar in spirit to robots.txt) that helps AI tools and large language models understand and consume website content. Instead of forcing LLMs to crawl and parse HTML, a site publishes clean Markdown files at well-known paths:
llms.txt— A lightweight index: page titles, URLs, and one-line descriptions organized by topic. Think of it as a table of contents that an LLM can read in a single prompt.llms-full.txt— The full documentation content concatenated into a single Markdown file, ready for ingestion into an LLM context window.These files make it dramatically easier for AI coding assistants (Copilot, Cursor, Claude, etc.) and RAG pipelines to use our docs as grounding context.
What this PR adds
A custom Docusaurus
postBuildplugin (src/plugins/docusaurus-plugin-llms-txt/) that:llms.txtandllms-full.txtfor each product (Calico, Calico Enterprise, Calico Cloud)llms.txt— a top-level index linking to all product docs with curated "top pages"GENERATE_LLMS=true— zero impact on normal PR buildsGenerated files
static/llms.txtstatic/calico/llms.txtstatic/calico/llms-full.txtstatic/calico-enterprise/llms.txtstatic/calico-enterprise/llms-full.txtstatic/calico-cloud/llms.txtstatic/calico-cloud/llms-full.txtHow it works
Automation
A GitHub Actions workflow (
.github/workflows/generate-llms-txt.yml) runs weekly (Monday 06:00 UTC) to regenerate the files and auto-commit any changes. Can also be triggered manually viaworkflow_dispatch.Local usage
Test plan
GENERATE_LLMS=true yarn buildcompletes successfullyyarn buildwithout env var) are unaffected🤖 Generated with Claude Code