Skip to content

SQL: Query topics#574

Open
kbatuigas wants to merge 13 commits into
rp-sqlfrom
DOC-1990-document-feature-query-redpanda-topics
Open

SQL: Query topics#574
kbatuigas wants to merge 13 commits into
rp-sqlfrom
DOC-1990-document-feature-query-redpanda-topics

Conversation

@kbatuigas
Copy link
Copy Markdown
Contributor

@kbatuigas kbatuigas commented May 4, 2026

Description

This pull request updates and expands the Redpanda SQL documentation to clarify table mapping, schema requirements, and streaming topic queries. It refines the CREATE TABLE reference, introduces new how-to guides for querying topics, and streamlines catalog documentation.

Documentation improvements for querying and mapping topics:

  • Added a new "Query streaming topics" how-to guide (query-streaming-topics.adoc) that walks users through mapping a Redpanda topic to a SQL table and running analytical queries directly on live data. This guide covers prerequisites, table creation, querying, and links to further resources.
  • Introduced a new index page for querying data (query-data/index.adoc) to provide an entry point for users learning to query Redpanda topics with SQL.

Enhancements and clarifications in the SQL reference:

  • Updated the CREATE TABLE documentation to clarify that schema_subject is required and that Redpanda SQL needs a schema to query a topic. Improved the explanation of struct_mapping_policy, especially regarding handling of nested and recursive types, and added documentation for the confluent_wire_protocol option. [1] [2]
  • Improved and updated SQL usage examples to demonstrate required options and multi-message Protobuf schema usage in table creation.

Catalog documentation simplification:

  • Replaced the detailed "Redpanda Catalogs" reference page with a stub, likely to be reworked or replaced by more focused documentation elsewhere.

Resolves https://github.com/redpanda-data/documentation-private/issues/
Review deadline: 21 May

Page previews

Checks

  • New feature
  • Content gap
  • Support Follow-up
  • Small fix (typos, links, copyedits, etc)

@netlify
Copy link
Copy Markdown

netlify Bot commented May 4, 2026

Deploy Preview for rp-cloud ready!

Name Link
🔨 Latest commit ff560eb
🔍 Latest deploy log https://app.netlify.com/projects/rp-cloud/deploys/6a10a45758aa2a000843209c
😎 Deploy Preview https://deploy-preview-574--rp-cloud.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 4, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cccfa626-a58d-49d6-a1ff-99d05ab41856

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch DOC-1990-document-feature-query-redpanda-topics

Comment @coderabbitai help to get the list of available commands and usage tips.

@kbatuigas kbatuigas force-pushed the DOC-1990-document-feature-query-redpanda-topics branch 2 times, most recently from 4d7b551 to 20ad041 Compare May 11, 2026 20:00
@kbatuigas kbatuigas force-pushed the DOC-1990-document-feature-query-redpanda-topics branch from 20ad041 to ddefdad Compare May 14, 2026 02:01
kbatuigas and others added 7 commits May 18, 2026 20:28
Renames modules/sql/pages/query/ to modules/sql/pages/query-data/ and
renames the streaming-topic how-to from query-redpanda-topics.adoc to
query-streaming-topics.adoc to match the SQL GA IA. Retitles the page
"Query streaming topics" and reframes the description and learning
objectives around live streaming data; bridge-query and Iceberg content
stays out of this page (DOC-2006 owns the Iceberg-topics how-to).

Adds a pointer to the Iceberg topics how-to under the intro and lists
it under Next steps. Updates the enable-prereq xref to point to the
Enable Redpanda SQL page. Drops the CREATE REDPANDA CATALOG link from
Next steps to align with the v1 framing that users do not typically
create their own Redpanda catalog. Reframes the Query data index page
description for v1 Iceberg scope (live and historical data in Redpanda
topics; no external Iceberg lakehouse).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	modules/sql/pages/query-data/redpanda-catalogs.adoc
@kbatuigas kbatuigas force-pushed the DOC-1990-document-feature-query-redpanda-topics branch from 75ae890 to 48ead8c Compare May 19, 2026 03:30
@kbatuigas kbatuigas marked this pull request as ready for review May 19, 2026 23:39
@kbatuigas kbatuigas requested a review from a team as a code owner May 19, 2026 23:39
Copy link
Copy Markdown
Contributor

@Feediver1 Feediver1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: SQL: Query topics (#574)

Files reviewed: 4 .adoc files (109 additions / 94 deletions)
Overall assessment: Solid documentation structure and content. Same integration-branch xref challenges as #571 — six unresolved cross-PR xrefs. One nav-linked stub page with no body. No What's New entry. A couple of em dashes that violate the style guide.

What this PR does

Expands Redpanda SQL query documentation on the rp-sql integration branch:

  • modules/sql/pages/query-data/index.adoc (new, 3 lines) — section index for "Query Data".
  • modules/sql/pages/query-data/query-streaming-topics.adoc (new, 80 lines) — how-to: map a topic to a SQL table and run analytical queries.
  • modules/sql/pages/query-data/redpanda-catalogs.adoc (1+ / 80−) — heavily reduced from a full reference to a 1-line stub.
  • modules/reference/pages/sql/sql-statements/create-table.adoc (25+ / 14−) — updated reference: schema_subject now required, expanded struct_mapping_policy (with cyclic-type guidance), new confluent_wire_protocol option, three full examples.

Jira ticket alignment

Ticket: DOC-1990 — "Document feature query Redpanda topics" (extracted from branch name).

Status: The PR delivers the planned query how-to and refreshes the CREATE TABLE reference. The stubbed redpanda-catalogs.adoc is mentioned in the PR description as "likely to be reworked" — worth confirming what the eventual replacement plan is before the integration branch lands.

Critical issues (must fix)

  1. Six broken xrefs to pages that aren't on rp-sql or in this branch:

    File:line xref target Provided by
    query-streaming-topics.adoc:10 sql:query-data/query-iceberg-topics.adoc PR #575 (still OPEN)
    query-streaming-topics.adoc:23 sql:get-started/deploy-sql-cluster.adoc PR #571 (still OPEN)
    query-streaming-topics.adoc:24 sql:manage/manage-access.adoc PR #580 (still OPEN)
    query-streaming-topics.adoc:25 sql:get-started/sql-quickstart.adoc PR #571 (still OPEN)
    query-streaming-topics.adoc:50 sql:query-data/query-nested-fields.adoc No known PR provides this — confirm it's planned, or remove the reference
    query-streaming-topics.adoc:77 sql:query-data/query-iceberg-topics.adoc (Next steps) PR #575 (still OPEN)
    • Fix: Coordinate merge ordering — all sibling PRs need to land on rp-sql before rp-sql lands on main, otherwise the build will surface six target of xref not found errors. Specifically check on query-nested-fields.adoc — if no PR is in flight for it, the inline reference at line 50 should be removed for now.
  2. redpanda-catalogs.adoc is a 1-line stub but nav.adoc:355 links to it as "Redpanda Catalogs". Users clicking that nav entry hit an empty page. The PR description acknowledges this is intentional ("likely to be reworked"), but a nav-linked empty page is bad UX.

    • Fix: Either (a) put a 2–3 sentence placeholder with "Coming soon — see [other page]" pointer, (b) leave the original content until the replacement lands and gut it in a later PR, or (c) remove the line from nav.adoc:355 and re-add when the page has content.
  3. Missing What's New entry. Same gap as #571: the May 2026 section of whats-new-cloud.adoc has no entry for the Redpanda SQL query workflow. Since this is GA documentation, a coordinated What's New entry should cover both PRs (and the broader SQL GA story across #571 / #575 / #580).

    • Fix: Add a single "Redpanda SQL: General availability" entry under == May 2026 that covers the get-started + query + auth pages together, rather than fragmenting into per-PR entries.
  4. Em dashes in create-table.adoc (style guide says no em dashes):

    • Line 7: "CREATE TABLE in Redpanda SQL maps Redpanda topics to SQL tables it does not create standalone tables with user-defined schemas."

    • Line 56: "Cyclic types are not supported in COMPOUND mode use JSON for recursive schemas."

    • Fix: Replace both em dashes with either a period + new sentence, a colon, or restructure the clause. Example for line 56: "Cyclic types are not supported in COMPOUND mode. Use JSON for recursive schemas."

Suggestions (should consider)

  1. Page-title case mismatch on the index. query-data/index.adoc:1 has = Query data (sentence case), but nav.adoc:354 labels it as "Query Data" (title case). Per team convention, page titles use title case to match the nav label.

    • Current: = Query data
    • Suggested: = Query Data
  2. Stub page comment. The 1-line redpanda-catalogs.adoc uses // stub as the only body marker. If you keep the stub approach, consider a more user-facing placeholder (e.g., a NOTE block or an xref to the related how-to) so the rendered page isn't blank.

  3. Checks boxes in PR body are all empty. Tick the relevant one ("New feature" or "Content gap") for tracking.

Impact on other files

  • modules/ROOT/nav.adoc ✓ — new pages already in nav at lines 354–357, including the (still-missing) query-iceberg-topics.adoc entry at line 357 — consistent with the rp-sql integration plan.
  • modules/get-started/pages/whats-new-cloud.adoc ❌ — no SQL GA entry (Critical #3).
  • Cross-component xrefs verified:
    • xref:reference:sql/sql-statements/create-table.adoc
    • xref:reference:sql/index.adoc
    • xref:reference:sql/sql-data-types/row.adoc (in create-table.adoc:56) — exists in rp-sql ✓
    • xref:reference:sql/sql-statements/create-redpanda-catalog.adoc (in create-table.adoc:7) — exists in rp-sql ✓
    • xref:sql:connect-to-sql/index.adoc
    • All other xref:sql:* xrefs — listed as broken in Critical #1.
  • Sibling PR dependencies: #571 (deploy + quickstart), #575 (query-iceberg), #580 (manage-access). Plus the unknown source for query-nested-fields.adoc.

CodeRabbit findings worth considering

None. CodeRabbit's check passed with no review summary or actionable comments.

What works well

  • Clean module layout: index + how-to + reference, all in the right places.
  • Comprehensive prerequisites section lists exactly what a reader needs before they can succeed: SQL engine enabled, RBAC permission, psql connection, registered Schema Registry schema.
  • Real-world SQL examples beyond toy SELECT * — aggregation with GROUP BY, ORDER BY, WHERE filters, LIMIT.
  • CREATE TABLE reference is thorough: required/optional column in the options table, three full examples (basic, multi-message Protobuf, error handling) covering distinct use cases.
  • Frontmatter compliance: :page-topic-type: how-to for the how-to, :page-topic-type: reference for the reference, learning objectives observable and measurable, personas correctly scoped (app_developer, data_engineer — query-side audience, not platform admins).
  • Sentence case correct on every H2+ heading in the new content.
  • Source-block syntax is consistent with the rest of the SQL module (long-form [source,sql] — matches the convention used in get-started/*.adoc).
  • schema_subject is now correctly marked Required in the reference table, addressing the schema-required guidance that was unclear before.
  • Helpful guidance on cyclic types in struct_mapping_policy — clearly tells users to switch to JSON mode for recursive schemas.
  • confluent_wire_protocol option fully documented with defaults and when to use each value.
  • CI is fully green and Netlify preview links cover the two main new pages.

Final-pass review via /docs-team-standards:pr-review.

@Feediver1
Copy link
Copy Markdown
Contributor

@kbatuigas Ping me again after you get your SME approvals and I can do a more thorough review


Map a Redpanda topic to a SQL table to run analytical queries directly against live streaming data without building ETL pipelines. Redpanda SQL reads each record's fields from the topic's registered schema.

To extend queries past your Redpanda retention window by reading the Iceberg history of Iceberg-enabled topics, see xref:sql:query-data/query-iceberg-topics.adoc[Query Iceberg-enabled Topics].
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly @kbatuigas ! :)

|STRING
|No
|Schema Registry subject name to use for deserializing topic data.
|Yes
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I may have mislead you, this is not required in code for the GA. If not provided we default to TopicNameStrategy so <topic>-value

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder where should we mention the existence of redpanda and redpanda_raw structs. The first is iceberg equivalent so all partition offset etc properties are there. The second is DLQ equivalent, filled when FILL_NULL error policy is set

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacekGalazka1 do those only exist for Iceberg topics? There is mention of them in this other doc specifically for querying Iceberg https://github.com/redpanda-data/cloud-docs/pull/575/changes#diff-3ab2a15f947f028cb3f75cdb5184029657557cac26b1c961cb27c72554ba3533R83 Is redpanda_raw populated only when FILL_NULL is set?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are always added to each kafka reader, so both pure kafka and iceberg backed will have it.
redpanda_raw is populated only when FILL_NULL is set and only for records that failed to decode. in all other cases it's NULL.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Good to go

@kbatuigas kbatuigas requested a review from Feediver1 May 22, 2026 18:46
@Feediver1 Feediver1 mentioned this pull request May 22, 2026
4 tasks
@Feediver1
Copy link
Copy Markdown
Contributor

There is a most unfortunate wrap on the Required column head in the table here: https://deploy-preview-574--rp-cloud.netlify.app/redpanda-cloud/reference/sql/sql-statements/create-table/#options Any way you can fix this?

Copy link
Copy Markdown
Contributor

@Feediver1 Feediver1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: SQL: Query topics (#574) — re-review

Files reviewed: 4 .adoc files (184 additions / 95 deletions — significantly more content than yesterday, mostly in create-table.adoc)
Overall assessment: Substantial improvements since my earlier review on 2026-05-21. Em-dashes removed, page-title casing fixed, stub page now clearly tagged for replacement, and a new "Auto-added columns" reference section documents the redpanda / redpanda_raw metadata columns per @JacekGalazka1's request. Two engineer SMEs (@mattschumpert, @JacekGalazka1) have APPROVED. Critical xref-to-sibling-PR issues are unchanged; What's New entry still missing.

What's changed since my earlier review

Commit Date Change
39cae297 2026-05-22 18:12 "Address review comments" — applies @JacekGalazka1's correction that schema_subject is not required (defaults to topic-name strategy).
b8d94b3b 2026-05-22 18:32 "Add info on redpanda and redpanda_raw structs" — adds a full "Auto-added columns" H2 + two H3 sub-sections in create-table.adoc (~63 new lines), plus a short note + xref in query-streaming-topics.adoc.
ff560eb1 2026-05-22 18:45 "Review pass" — final polish across create-table.adoc and the stub redpanda-catalogs.adoc (which now has a clear TODO referencing the DOC-2049 / PR #573 future-home).

Review state changes since yesterday:

  • @mattschumpert APPROVED (00:35 UTC today)
  • @JacekGalazka1 (Jacek, SME) APPROVED (14:29 UTC today)
  • mergeStateStatus: CLEAN (was BLOCKED)
  • reviewDecision: APPROVED

Jira ticket alignment

Ticket: DOC-1990 — "Document feature query Redpanda topics."

Status: Unchanged from earlier. ✅ Satisfies the ticket. New redpanda / redpanda_raw documentation enriches the page beyond the original ticket scope (in a good way).

Critical issues (must fix)

  1. Six broken xrefs (carried over from my earlier review — most still unresolved):

    File:line xref target Provided by
    query-streaming-topics.adoc:10, :85 sql:query-data/query-iceberg-topics.adoc PR #575 (still OPEN)
    query-streaming-topics.adoc:21 sql:get-started/deploy-sql-cluster.adoc PR #571 (still OPEN)
    query-streaming-topics.adoc:22 sql:manage/manage-access.adoc PR #580 (still OPEN)
    query-streaming-topics.adoc:23 sql:get-started/sql-quickstart.adoc PR #571 (still OPEN)
    query-streaming-topics.adoc:53 sql:query-data/query-nested-fields.adoc No known PR provides this — still no source identified

    Five of six resolve once siblings #571 / #575 / #580 land. The sixth (query-nested-fields.adoc) remains orphaned and is the one item that warrants direct action — confirm the page is planned, or drop the inline xref.

  2. Missing What's New entry — still missing. Same recommendation as before: a single coordinated "Redpanda SQL: General availability" entry in modules/get-started/pages/whats-new-cloud.adoc covering #571 / #574 / #575 / #580 / #584. None of the five SQL GA PRs add it; whichever lands last should also land the What's New entry.

Suggestions (should consider)

None new. Yesterday's suggestions are all resolved:

  1. Em-dashes in create-table.adoc (was lines 7, 56) — removed.

  2. H1 case on index.adoc — now = Query Data (title case).

  3. Stub page commentredpanda-catalogs.adoc now reads:

    = Redpanda Catalogs
    
    // TODO: Full content rewrite lives on the DOC-2049 branch (PR #573).
    // Replace this stub when DOC-2049 merges into rp-sql.

    Clear pointer to PR #573 instead of a bare // stub. Reader still hits an empty page if they click the nav entry — worth deciding whether to keep the nav entry until PR #573 merges, or temporarily remove it. Either is defensible.

Impact on other files

  • modules/ROOT/nav.adoc ✓ — entries unchanged from earlier review (still at lines 354–357).
  • modules/get-started/pages/whats-new-cloud.adoc ❌ — still no SQL GA entry (Critical #2).
  • Cross-references inside the diff: the new xref from query-streaming-topics.adoc:60 to xref:reference:sql/sql-statements/create-table.adoc#auto-added-columns[Auto-added columns] resolves — verified the [#auto-added-columns] anchor is correctly placed on the new H2 in create-table.adoc:76.
  • Cross-page consistency: the schema_subject is-required-or-not story is now consistent across create-table.adoc:32–35 (Required: No, defaults to topic-name strategy) and query-streaming-topics.adoc (no mention of explicitly setting it in the basic CREATE TABLE example). Matches what @JacekGalazka1 confirmed about the GA behavior.
  • Sibling PR dependencies (unchanged): #571 / #575 / #580 / #573 / #584 — same set as before.

CodeRabbit findings worth considering

None. CodeRabbit's check passed with no actionable findings on the current state.

Outstanding review activity — status

  • @JacekGalazka1 APPROVED (2026-05-22 14:29) — engineer SME, after his inline review on redpanda / redpanda_raw was answered by the b8d94b3b commit. His thread closed with "Perfect. Good to go".
  • @mattschumpert APPROVED (2026-05-22 00:35).
  • reviewDecision: APPROVED. No outstanding CHANGES_REQUESTED.

What works well

  • New "Auto-added columns" reference section in create-table.adoc is well-structured: parent H2 explains the contract (always present on every row; names are reserved), then === \redpanda`(always present, Kafka metadata struct) and=== `redpanda_raw`(populated only on FILL_NULL deserialization failures) each get their own H3 with description and field tables. The dead-letter pattern explanation forredpanda_raw` ("rows whose value fails schema deserialization remain queryable, with the malformed payload preserved for inspection or reprocessing") is a real conceptual win for users.
  • Schema-subject correction is accurate. Now matches what @JacekGalazka1 confirmed about the GA behavior (TopicNameStrategy default).
  • Cross-page consistency between query-streaming-topics.adoc and create-table.adoc on the metadata columns: short summary on the how-to page with an anchored xref to the full reference.
  • Stub page is now properly tagged. The TODO points at the specific PR (#573) and ticket (DOC-2049) that will replace the stub.
  • Style holds up: all H2+ headings in the diff are sentence case; H3 code-identifier headings (=== \redpanda`) are appropriately formatted; no em-dashes anywhere; source blocks all use the established [source,sql]` convention.
  • Two engineer SME approvals plus full CI green.

Re-review via /docs-team-standards:pr-review.

Copy link
Copy Markdown
Contributor

@Feediver1 Feediver1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with the understanding that all the related tickets will resolve critical issues, and 597 adds What's New.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants