Skip to content

Tell users to update the CLI when the Genie API drifts or disappears#5570

Merged
simonfaltum merged 3 commits into
mainfrom
simonfaltum/genie-defensive-errors
Jun 12, 2026
Merged

Tell users to update the CLI when the Genie API drifts or disappears#5570
simonfaltum merged 3 commits into
mainfrom
simonfaltum/genie-defensive-errors

Conversation

@simonfaltum

Copy link
Copy Markdown
Member

Why

databricks experimental genie ask is built on a public but undocumented backend route (/api/2.0/data-rooms/tools/onechat/responses). The route can move, be disabled, or change its wire format between Databricks releases without notice. When that happens, the user's only fix is updating the CLI, but nothing told them that:

  • A removed endpoint surfaced as a bare No API found for 'POST /data-rooms/tools/onechat/responses'.
  • Protocol drift was already detected (unparsed-event warning, "stream ended without an answer" error), but the messages only suggested --raw.

Changes

Before: a vanished endpoint or changed protocol produced errors with no recovery path. Now: every drift-shaped failure tells the user to update the CLI, via a single shared advice string ("update the Databricks CLI to the latest version (run 'databricks version --check')").

  • PostStream detects 404 with errors.Is(err, apierr.ErrNotFound). The route is fixed and carries no resource IDs, so a 404 can only mean the endpoint itself is gone or disabled. The error keeps the SDK error chain (%w) and appends the advice. The 404 shape (ENDPOINT_NOT_FOUND, "No API found for ...") was verified against a live workspace gateway.
  • The "stream ended without an answer" error (text and JSON renderers, now a shared noAnswerError helper) and the unparsed-events warning include the same advice, alongside the existing --raw suggestion.

No NEXT_CHANGELOG entry since the command is experimental.

Test plan

  • Unit test for the endpoint-gone 404 using the wire shape observed on a live workspace, asserting the advice and that errors.Is(err, apierr.ErrNotFound) still matches through the wrap.
  • Extended renderer unit tests assert the advice in the no-answer error (text + JSON) and the unparsed-events warning.
  • New acceptance tests pin the exact user-facing output: ask-endpoint-gone (404 from the stub server) and ask-protocol-drift (syntactically valid stream with renamed item types, text + JSON modes, exit code 1).
  • ./task checks, ./task lint-q, ./task fmt-q all clean.

This pull request and its description were written by Isaac.

The experimental genie command calls an undocumented backend route that can
move, be disabled, or change shape between Databricks releases. A removed
endpoint now reports the situation and points at a CLI update instead of
leaking a bare 'No API found' error, and the existing protocol-drift
detection (unparsed events, streams that end without an answer) carries the
same advice.

Co-authored-by: Isaac
RESOURCE_DOES_NOT_EXIST also unwraps to apierr.ErrNotFound, and the request
carries a user-supplied warehouseId: a pre-stream 404 about a missing
warehouse must keep the backend's message instead of claiming the endpoint
moved. A removed route maps to plain ErrNotFound (ENDPOINT_NOT_FOUND has no
error-code mapping in the SDK), so excluding ErrResourceDoesNotExist keeps
the drift advice for route-gone and code-less 404s only.

Co-authored-by: Isaac
// plain ErrNotFound). A 404 RESOURCE_DOES_NOT_EXIST is excluded: it refers
// to something the request named (e.g. the warehouse) and must keep the
// backend's own message instead of blaming the endpoint.
if errors.Is(err, apierr.ErrNotFound) && !errors.Is(err, apierr.ErrResourceDoesNotExist) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any other errors, like 500s when the shape of the body is wrong?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I probed the live endpoint to find out:

  • Wrong-shaped body (missing input): returns 500 INTERNAL_ERROR with an empty message, which rendered as a literal blank Error: . Added handling in 0a2277f: 500s now carry the same hedged advice ("if this keeps happening, the request format may have changed..."), and the no-details case gets explicit wording since there is no server message to pass through.
  • Bad warehouseId with a valid body: returns 200 and the failure arrives in-stream, so it is handled by the SSE error path and never hits the transport-level branches.
  • 400s are left alone: the server message passes through verbatim and I could not produce one from shape drift.

Response-shape drift (items the CLI cannot parse, or a stream with no answer) is covered separately by the renderer checks, which also point at a CLI update now.

Probing the live endpoint with a wrong-shaped body returns 500
INTERNAL_ERROR with an empty message, which rendered as a blank 'Error: '.
Wrap 500s with the same hedged update advice; the no-details case gets its
own wording since there is no server message to pass through. A bad
warehouseId with a valid body returns 200 and fails in-stream (handled by
the SSE error path), so it cannot be confused with this case.

Co-authored-by: Isaac
@simonfaltum simonfaltum enabled auto-merge June 12, 2026 13:37
@simonfaltum simonfaltum added this pull request to the merge queue Jun 12, 2026
@eng-dev-ecosystem-bot

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 0a2277f

Run: 27414175875

Env 🟨​KNOWN 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 15 264 976 8:45
🟨​ aws windows 7 15 266 974 19:35
💚​ aws-ucws linux 7 15 360 890 16:31
💚​ aws-ucws windows 7 15 362 888 11:51
💚​ azure linux 1 17 267 974 7:15
💚​ azure windows 1 17 269 972 12:31
💚​ azure-ucws linux 1 17 365 886 8:43
💚​ azure-ucws windows 1 17 367 884 11:23
💚​ gcp linux 1 17 263 977 7:27
💚​ gcp windows 1 17 265 975 10:40
22 interesting tests: 15 SKIP, 7 KNOWN
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 34 slowest tests (at least 2 minutes):
duration env testname
7:18 azure windows TestAccept
5:13 azure-ucws windows TestAccept
5:08 gcp windows TestAccept
4:53 aws-ucws windows TestAccept
4:26 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:25 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:19 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:07 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:03 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:57 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:35 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:29 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:23 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:21 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:19 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:18 aws-ucws linux TestAccept/bundle/deploy/files/no-snapshot-sync/DATABRICKS_BUNDLE_ENGINE=direct
3:16 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:15 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:15 aws-ucws linux TestAccept/bundle/resources/volumes/recreate/DATABRICKS_BUNDLE_ENGINE=terraform
3:12 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:06 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:04 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:58 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:56 azure linux TestAccept
2:56 gcp linux TestAccept
2:48 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:47 azure-ucws linux TestAccept
2:44 aws-ucws linux TestAccept
2:37 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:30 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:28 aws-ucws linux TestAccept/bundle/resources/volumes/recreate/DATABRICKS_BUNDLE_ENGINE=direct
2:25 aws-ucws linux TestAccept/bundle/deploy/mlops-stacks/DATABRICKS_BUNDLE_ENGINE=direct
2:18 aws-ucws linux TestAccept/bundle/resources/model_serving_endpoints/basic/DATABRICKS_BUNDLE_ENGINE=terraform
2:02 aws-ucws linux TestFilerReadWrite/workspace_files_extensions

Merged via the queue into main with commit d66b99e Jun 12, 2026
25 checks passed
@simonfaltum simonfaltum deleted the simonfaltum/genie-defensive-errors branch June 12, 2026 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants