Skip to content

DOCS-2868: Add llms.txt generation for LLM-friendly documentation#2556

Open
ctauchen wants to merge 2 commits intotigera:mainfrom
ctauchen:llms-txt
Open

DOCS-2868: Add llms.txt generation for LLM-friendly documentation#2556
ctauchen wants to merge 2 commits intotigera:mainfrom
ctauchen:llms-txt

Conversation

@ctauchen
Copy link
Collaborator

@ctauchen ctauchen commented Mar 3, 2026

What is llms.txt?

llms.txt is a proposed standard (similar in spirit to robots.txt) that helps AI tools and large language models understand and consume website content. Instead of forcing LLMs to crawl and parse HTML, a site publishes clean Markdown files at well-known paths:

  • llms.txt — A lightweight index: page titles, URLs, and one-line descriptions organized by topic. Think of it as a table of contents that an LLM can read in a single prompt.
  • llms-full.txt — The full documentation content concatenated into a single Markdown file, ready for ingestion into an LLM context window.

These files make it dramatically easier for AI coding assistants (Copilot, Cursor, Claude, etc.) and RAG pipelines to use our docs as grounding context.

What this PR adds

A custom Docusaurus postBuild plugin (src/plugins/docusaurus-plugin-llms-txt/) that:

  1. Processes rendered HTML (not source MDX) — so all variables, includes, and tabs are fully resolved
  2. Generates per-product files — separate llms.txt and llms-full.txt for each product (Calico, Calico Enterprise, Calico Cloud)
  3. Generates a root llms.txt — a top-level index linking to all product docs with curated "top pages"
  4. Handles Docusaurus-specific elements — tabs are expanded inline, admonitions become blockquotes, code blocks preserve language tags, relative URLs are resolved to absolute
  5. Is gated behind GENERATE_LLMS=true — zero impact on normal PR builds

Generated files

File Description Preview
static/llms.txt Root index linking to all products llms.txt
static/calico/llms.txt Calico Open Source page index calico/llms.txt
static/calico/llms-full.txt Calico Open Source full content (~2 MB) calico/llms-full.txt
static/calico-enterprise/llms.txt Calico Enterprise page index calico-enterprise/llms.txt
static/calico-enterprise/llms-full.txt Calico Enterprise full content (~3.5 MB) calico-enterprise/llms-full.txt
static/calico-cloud/llms.txt Calico Cloud page index calico-cloud/llms.txt
static/calico-cloud/llms-full.txt Calico Cloud full content (~2.5 MB) calico-cloud/llms-full.txt

How it works

yarn build (with GENERATE_LLMS=true)
  → Docusaurus builds HTML as normal
  → postBuild hook fires
  → Plugin walks sidebars to get ordered page list
  → For each page: extract content from HTML (cheerio) → convert to Markdown (unified/rehype-remark)
  → Assemble per-product llms.txt (index) and llms-full.txt (concatenated content)
  → Write to build output directory
  → Makefile target copies output to static/ for git tracking

Automation

A GitHub Actions workflow (.github/workflows/generate-llms-txt.yml) runs weekly (Monday 06:00 UTC) to regenerate the files and auto-commit any changes. Can also be triggered manually via workflow_dispatch.

Local usage

make generate-llms          # Build + generate + copy to static/
make generate-llms-commit   # Same as above + git commit if changed

Test plan

  • GENERATE_LLMS=true yarn build completes successfully
  • All 1,033 docs processed without errors
  • Per-product llms.txt files contain correct section headings and page links
  • llms-full.txt files contain full Markdown content with resolved variables
  • Tabs expanded with labels (260 tab blocks, all balanced)
  • Admonitions converted to blockquote format
  • Code blocks preserve language tags
  • Relative URLs resolved to absolute
  • Normal builds (yarn build without env var) are unaffected
  • Verify files are accessible via Netlify deploy preview
  • Verify GitHub Actions workflow syntax is valid

🤖 Generated with Claude Code

Custom Docusaurus postBuild plugin that processes rendered HTML to
generate hierarchical llms.txt and llms-full.txt files for each product
(Calico OSS, Enterprise, Cloud). Files are committed to static/ for
git-tracked change history and zero PR build overhead.

DOCS-2868

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ctauchen ctauchen requested a review from a team as a code owner March 3, 2026 14:42
Copilot AI review requested due to automatic review settings March 3, 2026 14:42
@netlify
Copy link

netlify bot commented Mar 3, 2026

Deploy Preview for calico-docs-preview-next ready!

Name Link
🔨 Latest commit 0990119
🔍 Latest deploy log https://app.netlify.com/projects/calico-docs-preview-next/deploys/69a6fab81222860008b21eb8
😎 Deploy Preview https://deploy-preview-2556--calico-docs-preview-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link

netlify bot commented Mar 3, 2026

Deploy Preview succeeded!

Built without sensitive environment variables

Name Link
🔨 Latest commit 0990119
🔍 Latest deploy log https://app.netlify.com/projects/tigera/deploys/69a6fab87002b300089d86be
😎 Deploy Preview https://deploy-preview-2556--tigera.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 68 (🔴 down 5 from production)
Accessibility: 98 (no change from production)
Best Practices: 92 (no change from production)
SEO: 100 (no change from production)
PWA: -
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an opt-in (GENERATE_LLMS=true) Docusaurus postBuild plugin to generate llms.txt and llms-full.txt files from rendered HTML, and wires up automation to keep the generated outputs committed under static/ so they’re served at well-known paths on docs.tigera.io.

Changes:

  • Added a custom docusaurus-plugin-llms-txt to extract main doc content from built HTML and convert it back to Markdown for llms.txt / llms-full.txt outputs.
  • Registered the plugin in docusaurus.config.js with curated top pages and per-product descriptions.
  • Added make generate-llms and a scheduled GitHub Actions workflow to regenerate and commit updated static/*/llms*.txt outputs.

Reviewed changes

Copilot reviewed 13 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/plugins/docusaurus-plugin-llms-txt/index.js Main plugin entry; locates docs plugin instances, processes latest version docs, writes llms.txt + llms-full.txt.
src/plugins/docusaurus-plugin-llms-txt/sidebar-utils.js Walks sidebars to produce ordered doc lists with section labels.
src/plugins/docusaurus-plugin-llms-txt/extract.js Extracts core content from built HTML and expands tab content before conversion.
src/plugins/docusaurus-plugin-llms-txt/convert.js Converts extracted HTML to Markdown via rehype/remark with custom handlers for admonitions, tabs, links, code.
src/plugins/docusaurus-plugin-llms-txt/generate.js Formats per-product indexes and full concatenations, plus the root index.
docusaurus.config.js Registers the plugin and configures descriptions/top pages/optional sections.
Makefile Adds generate-llms and generate-llms-commit targets for local and automated regeneration.
.github/workflows/generate-llms-txt.yml Weekly/manual workflow to regenerate and commit updated static/ outputs.
package.json Adds unified/rehype/remark-related dev dependencies used by the generator.
yarn.lock Lockfile updates for the added dependencies.
static/llms.txt Adds the generated root llms index file to be served at /llms.txt.
static/calico/llms.txt Adds generated Calico OSS per-product index.
static/calico-enterprise/llms.txt Adds generated Calico Enterprise per-product index.
static/calico-cloud/llms.txt Adds generated Calico Cloud per-product index.

*/
async function processDoc(doc, sectionLabel, outDir, siteUrl) {
// Map permalink to HTML file path
const htmlPath = path.join(outDir, doc.permalink, 'index.html');
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path.join(outDir, doc.permalink, 'index.html') will ignore outDir when doc.permalink starts with / (Node treats it as an absolute path). Since Docusaurus permalinks are typically absolute (e.g. /calico/latest/...), this can make the plugin look for HTML under the filesystem root and silently skip pages. Strip the leading slash (or use a safe join helper) when mapping permalinks to outDir paths.

Suggested change
const htmlPath = path.join(outDir, doc.permalink, 'index.html');
const relativePermalink = doc.permalink.startsWith('/')
? doc.permalink.slice(1)
: doc.permalink;
const htmlPath = path.join(outDir, relativePermalink, 'index.html');

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is actually a false positive — path.join() does not treat segments with a leading / as absolute paths (that's path.resolve()'s behavior). Verified:

path.join('/tmp/build', '/calico/latest/foo', 'index.html')
// → '/tmp/build/calico/latest/foo/index.html'

The current code works correctly for all 1,033 pages. No change needed.

'/calico/latest/networking/configuring/bgp',
'/calico-cloud/get-started/system-requirements',
],
optionalSections: ['release-notes'],
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optionalSections: ['release-notes'] won’t match section labels like "Release notes" because isOptionalSection() does a simple includes() check without normalizing punctuation/whitespace. As a result, Release notes sections won’t be grouped under ## Optional as intended. Consider changing the pattern to release notes (space) or normalizing both strings (e.g., replace non-alphanumerics with -) before matching.

Suggested change
optionalSections: ['release-notes'],
optionalSections: ['release notes'],

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Good catch — the sidebar label is "Release notes" (space) but the config had 'release-notes' (hyphen), so the includes() check never matched. Fixed by changing to 'release notes'. Release notes now correctly appear under ## Optional in per-product llms.txt files.

'/calico/latest/network-policy/get-started/calico-policy/calico-network-policy',
'/calico-enterprise/latest/network-policy/policy-tiers/tiered-policy',
'/calico/latest/operations/ebpf/enabling-ebpf',
'/calico-enterprise/latest/observability/',
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

topPages includes /calico-enterprise/latest/observability/, but the generated static/llms.txt Top Pages section doesn’t contain an Observability entry. That suggests the permalink isn’t resolving during generation (likely a trailing-slash mismatch) and the page is being dropped from the curated list. Normalize permalinks (e.g., consistently add/remove trailing /) when building docsByPermalink / resolving topPages, or update this entry to exactly match the doc’s permalink.

Suggested change
'/calico-enterprise/latest/observability/',
'/calico-enterprise/latest/observability',

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Good catch on the trailing-slash mismatch. Rather than picking one format, I added permalink normalization (strip trailing slashes) to both the map build and the lookup in index.js. This handles index pages (which Docusaurus gives trailing slashes) and regular pages consistently. The observability page now appears in Top Pages.

Comment on lines +5 to +16
## Top Pages

- [Calico quickstart guide](https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart): Quickstart for Calico.
- [Quickstart for Calico Enterprise on Kubernetes](https://docs.tigera.io/calico-enterprise/latest/getting-started/install-on-clusters/kubernetes/quickstart): Install Calico Enterprise on a single-host Kubernetes cluster for testing or development.
- [What happens when you connect a cluster to Calico Cloud](https://docs.tigera.io/calico-cloud/get-started/connect-cluster): Get answers to your questions about connecting to Calico Cloud.
- [Determine best networking option](https://docs.tigera.io/calico/latest/networking/determine-best-networking): Learn about the different networking options Calico supports so you can choose the best option for your needs.
- [Get started with Calico network policy](https://docs.tigera.io/calico/latest/network-policy/get-started/calico-policy/calico-network-policy): Create your first Calico network policies. Shows the rich features using sample policies that extend native Kubernetes network policy.
- [Get started with policy tiers](https://docs.tigera.io/calico-enterprise/latest/network-policy/policy-tiers/tiered-policy): Understand how tiered policy works and supports microsegmentation.
- [Enabling the eBPF data plane](https://docs.tigera.io/calico/latest/operations/ebpf/enabling-ebpf): Step-by-step instructions for enabling the eBPF data plane.
- [Configure BGP peering](https://docs.tigera.io/calico/latest/networking/configuring/bgp): Configure BGP peering with full mesh, node-specific peering, ToR, and/or Calico route reflectors.
- [System requirements](https://docs.tigera.io/calico-cloud/get-started/system-requirements): Review cluster requirements to connect to Calico Cloud.

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Top Pages list is missing the Calico Enterprise Observability page (configured in docusaurus.config.js as a curated top page). Please regenerate static/llms.txt after fixing the permalink resolution so the published index matches the intended top pages.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Fixed — static/llms.txt has been regenerated. All 10 curated top pages now appear correctly, including the Calico Enterprise Observability entry.

Comment on lines +35 to +37
- name: Install dependencies
run: yarn install --immutable

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow runs yarn install --immutable, but make generate-llms depends on init, which runs $(YARN) install again. That double-install adds unnecessary CI time and can introduce differences between the two installs. Consider removing the explicit Install dependencies step, or adding a make target that builds without re-running init when deps are already installed (or adjust init to skip install when node_modules/.yarn/install-state.gz is present).

Suggested change
- name: Install dependencies
run: yarn install --immutable

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Agreed — removed the explicit Install dependencies step from the workflow. make generate-llms depends on the init target which handles yarn install, so the separate step was redundant.

Copy link
Collaborator Author

@ctauchen ctauchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing Copilot review feedback

'calico-enterprise': 'Enterprise-grade networking, security, and observability for Kubernetes.',
'calico-cloud': 'SaaS-based Kubernetes security and observability platform.',
},
topPages: [
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This manual config section lets us choose, based on our own criteria, which pages we highlight.

@@ -0,0 +1,64 @@
name: Generate llms.txt
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danudey PTAL. This looks good to me, but maybe there's something else worth doing?

- Fix optionalSections pattern: use 'release notes' (space) to match
  sidebar labels instead of 'release-notes' (hyphen)
- Add permalink normalization (strip trailing slashes) for robust top
  page lookup, fixing missing observability entry
- Remove redundant Install dependencies step from CI workflow since
  make generate-llms already handles it via the init target
- Regenerate all static/llms.txt files with fixes applied

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ctauchen ctauchen changed the title Add llms.txt generation for LLM-friendly documentation DOCS-2868: Add llms.txt generation for LLM-friendly documentation Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants