From 8627e60c9214d9ab2096b8fcd2cf5262ddc7ce58 Mon Sep 17 00:00:00 2001 From: Jan Rose Date: Wed, 10 Jun 2026 15:41:19 +0200 Subject: [PATCH 1/3] Add Genie space examples The Databricks CLI supports the genie_space bundle resource since v1.3.0 (databricks/cli#5282, direct deployment engine only). This adds: - knowledge_base/genie_space: a minimal bundle that deploys a Genie space for the samples.nyctaxi.trips table - knowledge_base/app_with_genie_space: a Databricks app that answers questions through the Genie Conversation API, with the space ID injected via an app resource - knowledge_base/dashboard_nyc_taxi: a Genie space next to the existing AI/BI dashboard so users can also query the NYC taxi data in natural language Co-authored-by: Isaac --- .../app_with_genie_space/.gitignore | 1 + knowledge_base/app_with_genie_space/README.md | 52 +++++++++++++ .../app_with_genie_space/app/app.py | 56 ++++++++++++++ .../app_with_genie_space/app/app.yml | 11 +++ .../app_with_genie_space/app/requirements.txt | 2 + .../app/templates/index.html | 34 +++++++++ .../app_with_genie_space/databricks.yml | 24 ++++++ .../resources/genie_assistant.app.yml | 15 ++++ .../resources/nyc_taxi_genie.genie_space.yml | 12 +++ .../src/nyc_taxi_genie.geniespace.json | 56 ++++++++++++++ knowledge_base/dashboard_nyc_taxi/README.md | 17 ++++- .../dashboard_nyc_taxi/databricks.yml | 5 ++ .../resources/nyc_taxi_genie.genie_space.yml | 12 +++ .../src/nyc_taxi_genie.geniespace.json | 56 ++++++++++++++ knowledge_base/genie_space/.gitignore | 1 + knowledge_base/genie_space/README.md | 73 +++++++++++++++++++ knowledge_base/genie_space/databricks.yml | 24 ++++++ .../resources/nyc_taxi_genie.genie_space.yml | 27 +++++++ .../src/nyc_taxi_genie.geniespace.json | 56 ++++++++++++++ 19 files changed, 530 insertions(+), 4 deletions(-) create mode 100644 knowledge_base/app_with_genie_space/.gitignore create mode 100644 knowledge_base/app_with_genie_space/README.md create mode 100644 knowledge_base/app_with_genie_space/app/app.py create mode 100644 knowledge_base/app_with_genie_space/app/app.yml create mode 100644 knowledge_base/app_with_genie_space/app/requirements.txt create mode 100644 knowledge_base/app_with_genie_space/app/templates/index.html create mode 100644 knowledge_base/app_with_genie_space/databricks.yml create mode 100644 knowledge_base/app_with_genie_space/resources/genie_assistant.app.yml create mode 100644 knowledge_base/app_with_genie_space/resources/nyc_taxi_genie.genie_space.yml create mode 100644 knowledge_base/app_with_genie_space/src/nyc_taxi_genie.geniespace.json create mode 100644 knowledge_base/dashboard_nyc_taxi/resources/nyc_taxi_genie.genie_space.yml create mode 100644 knowledge_base/dashboard_nyc_taxi/src/nyc_taxi_genie.geniespace.json create mode 100644 knowledge_base/genie_space/.gitignore create mode 100644 knowledge_base/genie_space/README.md create mode 100644 knowledge_base/genie_space/databricks.yml create mode 100644 knowledge_base/genie_space/resources/nyc_taxi_genie.genie_space.yml create mode 100644 knowledge_base/genie_space/src/nyc_taxi_genie.geniespace.json diff --git a/knowledge_base/app_with_genie_space/.gitignore b/knowledge_base/app_with_genie_space/.gitignore new file mode 100644 index 00000000..15bcc6dd --- /dev/null +++ b/knowledge_base/app_with_genie_space/.gitignore @@ -0,0 +1 @@ +.databricks diff --git a/knowledge_base/app_with_genie_space/README.md b/knowledge_base/app_with_genie_space/README.md new file mode 100644 index 00000000..8d9ba79b --- /dev/null +++ b/knowledge_base/app_with_genie_space/README.md @@ -0,0 +1,52 @@ +# Databricks app using a Genie space + +This example demonstrates how to define a Databricks app that uses a Genie space in a Declarative Automation Bundle. + +It deploys a Genie space for the `samples.nyctaxi.trips` table and a Flask app that lets users ask the space questions in natural language through the [Genie Conversation API](https://docs.databricks.com/genie/conversation-api.html). + +For more information about Databricks Apps, see the [documentation](https://docs.databricks.com/aws/en/dev-tools/databricks-apps). +For more information about Genie, see the [documentation](https://docs.databricks.com/genie/index.html). + +## Prerequisites + +* Databricks CLI v1.3.0 or above. +* Genie spaces can only be deployed with the [direct deployment engine](https://docs.databricks.com/dev-tools/bundles/direct) (`engine: direct`), which is the default for new deployments since CLI v1.3.0. + +## Usage + +1. Modify `databricks.yml`: + - Update the `host` field to your Databricks workspace URL + - Update the `warehouse` field to the name of your SQL warehouse + +2. Deploy the bundle: + ```sh + databricks bundle deploy + ``` + +3. Run the app: + ```sh + databricks bundle run genie_assistant + ``` + +4. Open the app in your browser: + ```sh + databricks bundle open genie_assistant + ``` + Alternatively, run `databricks bundle summary` to display its URL. + +## How it works + +* `resources/nyc_taxi_genie.genie_space.yml` defines the Genie space, with its data sources, instructions, and sample questions stored in `src/nyc_taxi_genie.geniespace.json`. +* `resources/genie_assistant.app.yml` declares the Genie space as an app resource. This grants the app's service principal `CAN_RUN` permission on the space: + ```yaml + resources: + - name: "genie-space" + genie_space: + name: "NYC Taxi Trip Analysis" + space_id: ${resources.genie_spaces.nyc_taxi_genie.space_id} + permission: CAN_RUN + ``` +* `app/app.yml` injects the space ID into the app as the `GENIE_SPACE_ID` environment variable using `valueFrom: "genie-space"`. +* `app/app.py` sends each question to the space with `w.genie.start_conversation_and_wait(...)` and renders the text answer or the generated SQL and its results. + +Note that the app queries Genie with its own service principal identity: in addition to the `CAN_RUN` permission on the space granted by the bundle, the service principal must be able to use the SQL warehouse and read the tables that back the space. If access to the `samples` catalog is restricted for service principals in your workspace, point the space at a table the app can read. diff --git a/knowledge_base/app_with_genie_space/app/app.py b/knowledge_base/app_with_genie_space/app/app.py new file mode 100644 index 00000000..b080848a --- /dev/null +++ b/knowledge_base/app_with_genie_space/app/app.py @@ -0,0 +1,56 @@ +import os + +from databricks.sdk import WorkspaceClient +from flask import Flask, render_template, request + +app = Flask(__name__) + +w = WorkspaceClient() + +# The space ID is injected by the "genie_space" resource declared in app.yml. +space_id = os.getenv("GENIE_SPACE_ID") + + +@app.route("/", methods=["GET", "POST"]) +def home(): + question = None + answer = None + sql = None + columns = [] + rows = [] + + if request.method == "POST": + question = request.form["question"] + + # Start a new conversation in the Genie space and wait for the answer. + # Use w.genie.create_message_and_wait(...) to ask follow-up questions + # in the same conversation. + message = w.genie.start_conversation_and_wait(space_id, question) + + for attachment in message.attachments or []: + # Genie answers either with plain text... + if attachment.text: + answer = attachment.text.content + + # ...or with a generated SQL query and its result set. + if attachment.query: + answer = attachment.query.description + sql = attachment.query.query + result = w.genie.get_message_attachment_query_result( + space_id, + message.conversation_id, + message.id, + attachment.attachment_id, + ) + statement = result.statement_response + columns = [column.name for column in statement.manifest.schema.columns] + rows = statement.result.data_array or [] + + return render_template( + "index.html", + question=question, + answer=answer, + sql=sql, + columns=columns, + rows=rows, + ) diff --git a/knowledge_base/app_with_genie_space/app/app.yml b/knowledge_base/app_with_genie_space/app/app.yml new file mode 100644 index 00000000..7e267284 --- /dev/null +++ b/knowledge_base/app_with_genie_space/app/app.yml @@ -0,0 +1,11 @@ +command: + - flask + - --app + - app + - run + +env: + # The value is injected by the Databricks Apps runtime from the app resource + # named "genie-space" (see resources/genie_assistant.app.yml). + - name: GENIE_SPACE_ID + valueFrom: "genie-space" diff --git a/knowledge_base/app_with_genie_space/app/requirements.txt b/knowledge_base/app_with_genie_space/app/requirements.txt new file mode 100644 index 00000000..dcc9ae01 --- /dev/null +++ b/knowledge_base/app_with_genie_space/app/requirements.txt @@ -0,0 +1,2 @@ +databricks-sdk>=0.60.0 +flask diff --git a/knowledge_base/app_with_genie_space/app/templates/index.html b/knowledge_base/app_with_genie_space/app/templates/index.html new file mode 100644 index 00000000..ca74cabe --- /dev/null +++ b/knowledge_base/app_with_genie_space/app/templates/index.html @@ -0,0 +1,34 @@ + + + Genie space app managed by DABs + + +

Ask Genie about NYC taxi trips

+
+ + +
+ + {% if question %} +

{{ question }}

+ {% if answer %} +

{{ answer }}

+ {% endif %} + {% if sql %} +
{{ sql }}
+ {% endif %} + {% if columns %} + + + {% for column in columns %}{% endfor %} + + {% for row in rows %} + + {% for value in row %}{% endfor %} + + {% endfor %} +
{{ column }}
{{ value }}
+ {% endif %} + {% endif %} + + diff --git a/knowledge_base/app_with_genie_space/databricks.yml b/knowledge_base/app_with_genie_space/databricks.yml new file mode 100644 index 00000000..611e37e9 --- /dev/null +++ b/knowledge_base/app_with_genie_space/databricks.yml @@ -0,0 +1,24 @@ +bundle: + name: app_with_genie_space + + # Genie spaces can only be deployed with the direct deployment engine. + # The direct engine is the default for new deployments since Databricks CLI v1.3.0. + engine: direct + +include: + - resources/*.yml + +variables: + # The "warehouse_id" variable is used to reference the warehouse used by the Genie space. + warehouse_id: + lookup: + # Replace this with the name of your SQL warehouse. + warehouse: "Shared Unity Catalog Serverless" + +workspace: + host: https://myworkspace.databricks.com + +targets: + dev: + default: true + mode: development diff --git a/knowledge_base/app_with_genie_space/resources/genie_assistant.app.yml b/knowledge_base/app_with_genie_space/resources/genie_assistant.app.yml new file mode 100644 index 00000000..0d7d4624 --- /dev/null +++ b/knowledge_base/app_with_genie_space/resources/genie_assistant.app.yml @@ -0,0 +1,15 @@ +resources: + apps: + genie_assistant: + name: "genie-assistant" + description: "An app that answers questions using a Genie space" + source_code_path: ../app + + # The resources which this app has access to: + resources: + - name: "genie-space" + description: "The Genie space that the app sends questions to" + genie_space: + name: "NYC Taxi Trip Analysis" + space_id: ${resources.genie_spaces.nyc_taxi_genie.space_id} + permission: CAN_RUN diff --git a/knowledge_base/app_with_genie_space/resources/nyc_taxi_genie.genie_space.yml b/knowledge_base/app_with_genie_space/resources/nyc_taxi_genie.genie_space.yml new file mode 100644 index 00000000..b2873d86 --- /dev/null +++ b/knowledge_base/app_with_genie_space/resources/nyc_taxi_genie.genie_space.yml @@ -0,0 +1,12 @@ +resources: + genie_spaces: + nyc_taxi_genie: + title: "NYC Taxi Trip Analysis" + description: "Ask questions about NYC taxi trip data in natural language" + + # The serialized definition of the Genie space: its data sources, + # instructions, and sample questions. + file_path: ../src/nyc_taxi_genie.geniespace.json + + # The warehouse used to run the queries that Genie generates. + warehouse_id: ${var.warehouse_id} diff --git a/knowledge_base/app_with_genie_space/src/nyc_taxi_genie.geniespace.json b/knowledge_base/app_with_genie_space/src/nyc_taxi_genie.geniespace.json new file mode 100644 index 00000000..3f6d935a --- /dev/null +++ b/knowledge_base/app_with_genie_space/src/nyc_taxi_genie.geniespace.json @@ -0,0 +1,56 @@ +{ + "version": 2, + "config": { + "sample_questions": [ + { + "id": "11111111111111111111111111111111", + "question": ["What is the average fare per trip?"] + }, + { + "id": "22222222222222222222222222222222", + "question": ["How many trips were longer than 10 miles?"] + } + ] + }, + "data_sources": { + "tables": [ + { + "identifier": "samples.nyctaxi.trips", + "column_configs": [ + { "column_name": "tpep_pickup_datetime" }, + { "column_name": "tpep_dropoff_datetime" }, + { "column_name": "trip_distance" }, + { "column_name": "fare_amount" }, + { "column_name": "pickup_zip" }, + { "column_name": "dropoff_zip" } + ] + } + ] + }, + "instructions": { + "text_instructions": [ + { + "id": "33333333333333333333333333333333", + "content": [ + "This Genie space answers questions about NYC taxi trips.\n", + "All data is in the samples.nyctaxi.trips table.\n", + "Fare amounts are in USD. When asked about revenue, use SUM(fare_amount)." + ] + } + ], + "example_question_sqls": [ + { + "id": "44444444444444444444444444444444", + "question": ["What was the total revenue per pickup zip code?"], + "sql": [ + "SELECT\n", + " pickup_zip,\n", + " SUM(fare_amount) AS total_revenue\n", + "FROM samples.nyctaxi.trips\n", + "GROUP BY pickup_zip\n", + "ORDER BY total_revenue DESC" + ] + } + ] + } +} diff --git a/knowledge_base/dashboard_nyc_taxi/README.md b/knowledge_base/dashboard_nyc_taxi/README.md index 8e1708d2..ff7a425c 100644 --- a/knowledge_base/dashboard_nyc_taxi/README.md +++ b/knowledge_base/dashboard_nyc_taxi/README.md @@ -1,14 +1,15 @@ # Dashboard for NYC Taxi Trip Analysis -This example shows how to define a Declarative Automation Bundle with an AI/BI dashboard and a job that captures a snapshot of the dashboard and emails it to a subscriber. +This example shows how to define a Declarative Automation Bundle with an AI/BI dashboard, a job that captures a snapshot of the dashboard and emails it to a subscriber, and a Genie space to ask questions about the same data in natural language. It deploys the sample __NYC Taxi Trip Analysis__ dashboard to a Databricks workspace and configures a daily schedule to run the dashboard and send the snapshot in email to a specified email address. For more information about AI/BI dashboards, please refer to the [documentation](https://docs.databricks.com/dashboards/index.html). +For more information about Genie, please refer to the [documentation](https://docs.databricks.com/genie/index.html). ## Prerequisites -This example includes a dashboard snapshot task, which requires Databricks CLI v0.250.0 or above. Creating dashboards in bundles is supported in Databricks CLI v0.232.0 or above. +This example includes a Genie space, which requires Databricks CLI v1.3.0 or above. Genie spaces can only be deployed with the [direct deployment engine](https://docs.databricks.com/dev-tools/bundles/direct) (`engine: direct`), which is the default for new deployments since CLI v1.3.0. ## Usage @@ -25,6 +26,8 @@ This example includes a dashboard snapshot task, which requires Databricks CLI The AI/BI dashboard is created and the snapshot job is set to run daily at 8 AM, which captures a snapshot of the dashboard, and sends it in email to the specified subscriber. +The bundle also deploys the __NYC Taxi Trip Analysis Genie__ space defined in `resources/nyc_taxi_genie.genie_space.yml`, where users can ask questions about the same data in natural language. Its data sources, instructions, and sample questions are stored in `src/nyc_taxi_genie.geniespace.json`. + ### Visual modification You can use the Databricks UI to modify the dashboard, but any modifications made through the UI will not be applied to the bundle `.lvdash.json` file unless you explicitly update it. @@ -41,11 +44,17 @@ To continuously poll and retrieve the updated `.lvdash.json` file when it change databricks bundle generate dashboard --resource nyc_taxi_trip_analysis --force --watch ``` -Any remote modifications of a dashboard are noticed by the `deploy` command and require +The same workflow applies to the Genie space and its `.geniespace.json` file: + +```sh +databricks bundle generate genie-space --resource nyc_taxi_genie --force +``` + +Any remote modifications of a dashboard or Genie space are noticed by the `deploy` command and require you to acknowledge that remote changes can be overwritten by local changes. It is therefore recommended to run the `generate` command before running the `deploy` command. Otherwise, you may lose your remote changes. ### Manual modification -You can modify the `.lvdash.json` file directly and redeploy to observe your changes. +You can modify the `.lvdash.json` or `.geniespace.json` files directly and redeploy to observe your changes. diff --git a/knowledge_base/dashboard_nyc_taxi/databricks.yml b/knowledge_base/dashboard_nyc_taxi/databricks.yml index 27f4e520..dd1f84f5 100644 --- a/knowledge_base/dashboard_nyc_taxi/databricks.yml +++ b/knowledge_base/dashboard_nyc_taxi/databricks.yml @@ -1,6 +1,11 @@ bundle: name: dashboard_nyc_taxi + # The Genie space in this bundle can only be deployed with the direct deployment + # engine. The direct engine is the default for new deployments since Databricks + # CLI v1.3.0. + engine: direct + include: - resources/*.yml diff --git a/knowledge_base/dashboard_nyc_taxi/resources/nyc_taxi_genie.genie_space.yml b/knowledge_base/dashboard_nyc_taxi/resources/nyc_taxi_genie.genie_space.yml new file mode 100644 index 00000000..57818a43 --- /dev/null +++ b/knowledge_base/dashboard_nyc_taxi/resources/nyc_taxi_genie.genie_space.yml @@ -0,0 +1,12 @@ +resources: + genie_spaces: + nyc_taxi_genie: + title: "NYC Taxi Trip Analysis Genie" + description: "Ask questions about NYC taxi trip data in natural language" + + # The serialized definition of the Genie space: its data sources, + # instructions, and sample questions. + file_path: ../src/nyc_taxi_genie.geniespace.json + + # The warehouse used to run the queries that Genie generates. + warehouse_id: ${var.warehouse_id} diff --git a/knowledge_base/dashboard_nyc_taxi/src/nyc_taxi_genie.geniespace.json b/knowledge_base/dashboard_nyc_taxi/src/nyc_taxi_genie.geniespace.json new file mode 100644 index 00000000..3f6d935a --- /dev/null +++ b/knowledge_base/dashboard_nyc_taxi/src/nyc_taxi_genie.geniespace.json @@ -0,0 +1,56 @@ +{ + "version": 2, + "config": { + "sample_questions": [ + { + "id": "11111111111111111111111111111111", + "question": ["What is the average fare per trip?"] + }, + { + "id": "22222222222222222222222222222222", + "question": ["How many trips were longer than 10 miles?"] + } + ] + }, + "data_sources": { + "tables": [ + { + "identifier": "samples.nyctaxi.trips", + "column_configs": [ + { "column_name": "tpep_pickup_datetime" }, + { "column_name": "tpep_dropoff_datetime" }, + { "column_name": "trip_distance" }, + { "column_name": "fare_amount" }, + { "column_name": "pickup_zip" }, + { "column_name": "dropoff_zip" } + ] + } + ] + }, + "instructions": { + "text_instructions": [ + { + "id": "33333333333333333333333333333333", + "content": [ + "This Genie space answers questions about NYC taxi trips.\n", + "All data is in the samples.nyctaxi.trips table.\n", + "Fare amounts are in USD. When asked about revenue, use SUM(fare_amount)." + ] + } + ], + "example_question_sqls": [ + { + "id": "44444444444444444444444444444444", + "question": ["What was the total revenue per pickup zip code?"], + "sql": [ + "SELECT\n", + " pickup_zip,\n", + " SUM(fare_amount) AS total_revenue\n", + "FROM samples.nyctaxi.trips\n", + "GROUP BY pickup_zip\n", + "ORDER BY total_revenue DESC" + ] + } + ] + } +} diff --git a/knowledge_base/genie_space/.gitignore b/knowledge_base/genie_space/.gitignore new file mode 100644 index 00000000..15bcc6dd --- /dev/null +++ b/knowledge_base/genie_space/.gitignore @@ -0,0 +1 @@ +.databricks diff --git a/knowledge_base/genie_space/README.md b/knowledge_base/genie_space/README.md new file mode 100644 index 00000000..7d011eef --- /dev/null +++ b/knowledge_base/genie_space/README.md @@ -0,0 +1,73 @@ +# Genie space for NYC Taxi Trip Analysis + +This example shows how to define a Genie space using Declarative Automation Bundles. Genie spaces let business users ask questions about data in natural language. + +It deploys a Genie space that answers questions about the `samples.nyctaxi.trips` table. + +For more information about Genie, see the [Databricks documentation](https://docs.databricks.com/genie/index.html). + +## Prerequisites + +* Databricks CLI v1.3.0 or above. +* Genie spaces can only be deployed with the [direct deployment engine](https://docs.databricks.com/dev-tools/bundles/direct) (`engine: direct`), which is the default for new deployments since CLI v1.3.0. + +## Usage + +1. Modify `databricks.yml`: + - Update the `host` field to your Databricks workspace URL + - Update the `warehouse` field to the name of your SQL warehouse + +2. Deploy the Genie space: + ```sh + databricks bundle deploy + ``` + +3. Open the deployed Genie space in your browser: + ```sh + databricks bundle open + ``` + Alternatively, run `databricks bundle summary` to display its URL. + +## Key configuration + +The Genie space configuration in `resources/nyc_taxi_genie.genie_space.yml` includes: + +- **title**: Display name of the Genie space +- **file_path**: Path to the `.geniespace.json` file that holds the serialized definition of the space: its data sources, instructions, and sample questions. Instead of `file_path`, the definition can also be inlined in YAML under `serialized_space`. +- **warehouse_id**: The SQL warehouse used to run the queries that Genie generates +- **permissions**: Who can use the space (`CAN_VIEW`, `CAN_RUN`, `CAN_EDIT`, `CAN_MANAGE`) + +## Importing an existing Genie space + +To bring a Genie space that was authored in the Databricks UI into a bundle, run: + +```sh +databricks bundle generate genie-space --existing-id +``` + +This writes the configuration to `resources/.genie_space.yml` and the space definition to `src/.geniespace.json`. + +## Visual modification + +You can use the Databricks UI to modify the deployed Genie space, but any modifications made through the UI will not be applied to the bundle `.geniespace.json` file unless you explicitly update it. + +To update the local bundle `.geniespace.json` file, run: + +```sh +databricks bundle generate genie-space --resource nyc_taxi_genie --force +``` + +To continuously poll and retrieve the updated `.geniespace.json` file when it changes, run: + +```sh +databricks bundle generate genie-space --resource nyc_taxi_genie --force --watch +``` + +Any remote modifications of a Genie space are noticed by the `deploy` command and require +you to acknowledge that remote changes can be overwritten by local changes. +It is therefore recommended to run the `generate` command before running the `deploy` command. +Otherwise, you may lose your remote changes. + +### Manual modification + +You can modify the `.geniespace.json` file directly and redeploy to observe your changes. diff --git a/knowledge_base/genie_space/databricks.yml b/knowledge_base/genie_space/databricks.yml new file mode 100644 index 00000000..d6b8d7e5 --- /dev/null +++ b/knowledge_base/genie_space/databricks.yml @@ -0,0 +1,24 @@ +bundle: + name: genie_space + + # Genie spaces can only be deployed with the direct deployment engine. + # The direct engine is the default for new deployments since Databricks CLI v1.3.0. + engine: direct + +include: + - resources/*.yml + +variables: + # The "warehouse_id" variable is used to reference the warehouse used by the Genie space. + warehouse_id: + lookup: + # Replace this with the name of your SQL warehouse. + warehouse: "Shared Unity Catalog Serverless" + +workspace: + host: https://myworkspace.databricks.com + +targets: + dev: + default: true + mode: development diff --git a/knowledge_base/genie_space/resources/nyc_taxi_genie.genie_space.yml b/knowledge_base/genie_space/resources/nyc_taxi_genie.genie_space.yml new file mode 100644 index 00000000..e1a10242 --- /dev/null +++ b/knowledge_base/genie_space/resources/nyc_taxi_genie.genie_space.yml @@ -0,0 +1,27 @@ +resources: + genie_spaces: + nyc_taxi_genie: + title: "NYC Taxi Trip Analysis" + description: "Ask questions about NYC taxi trip data in natural language" + + # The serialized definition of the Genie space: its data sources, + # instructions, and sample questions. + # + # Instead of referencing a file, the definition can also be inlined + # in YAML under the "serialized_space" key. Specify only one of the two. + file_path: ../src/nyc_taxi_genie.geniespace.json + + # The warehouse used to run the queries that Genie generates. + warehouse_id: ${var.warehouse_id} + + # The "parent_path" field can be configured to place the Genie space in a + # non-standard folder in the workspace. + # + # It defaults to "${workspace.resource_path}", which is located + # under the bundle deployment root. + # + # parent_path: ${workspace.resource_path} + + permissions: + - level: CAN_RUN + group_name: users diff --git a/knowledge_base/genie_space/src/nyc_taxi_genie.geniespace.json b/knowledge_base/genie_space/src/nyc_taxi_genie.geniespace.json new file mode 100644 index 00000000..3f6d935a --- /dev/null +++ b/knowledge_base/genie_space/src/nyc_taxi_genie.geniespace.json @@ -0,0 +1,56 @@ +{ + "version": 2, + "config": { + "sample_questions": [ + { + "id": "11111111111111111111111111111111", + "question": ["What is the average fare per trip?"] + }, + { + "id": "22222222222222222222222222222222", + "question": ["How many trips were longer than 10 miles?"] + } + ] + }, + "data_sources": { + "tables": [ + { + "identifier": "samples.nyctaxi.trips", + "column_configs": [ + { "column_name": "tpep_pickup_datetime" }, + { "column_name": "tpep_dropoff_datetime" }, + { "column_name": "trip_distance" }, + { "column_name": "fare_amount" }, + { "column_name": "pickup_zip" }, + { "column_name": "dropoff_zip" } + ] + } + ] + }, + "instructions": { + "text_instructions": [ + { + "id": "33333333333333333333333333333333", + "content": [ + "This Genie space answers questions about NYC taxi trips.\n", + "All data is in the samples.nyctaxi.trips table.\n", + "Fare amounts are in USD. When asked about revenue, use SUM(fare_amount)." + ] + } + ], + "example_question_sqls": [ + { + "id": "44444444444444444444444444444444", + "question": ["What was the total revenue per pickup zip code?"], + "sql": [ + "SELECT\n", + " pickup_zip,\n", + " SUM(fare_amount) AS total_revenue\n", + "FROM samples.nyctaxi.trips\n", + "GROUP BY pickup_zip\n", + "ORDER BY total_revenue DESC" + ] + } + ] + } +} From 42c198caad365806c5f25b050feb332254fc349a Mon Sep 17 00:00:00 2001 From: Jan Rose Date: Wed, 10 Jun 2026 16:31:37 +0200 Subject: [PATCH 2/3] Address review feedback and live-deploy fixes Review feedback from pietern: - Inline the app configuration in a config block on the app resource instead of a separate app.yml file - Revert the Genie space addition to dashboard_nyc_taxi; the standalone example covers it and the two examples sit side-by-side anyway - Rename knowledge_base/genie_space to knowledge_base/genie_space_nyc_taxi Fixes found by deploying to a live workspace: - Sort column_configs by column_name; the Genie API rejects unsorted configs with INVALID_PARAMETER_VALUE - Use value_from (snake_case) in the app config block; valueFrom is ignored there, unlike in an app.yml source file Co-authored-by: Isaac --- knowledge_base/app_with_genie_space/README.md | 2 +- .../app_with_genie_space/app/app.yml | 11 ---- .../resources/genie_assistant.app.yml | 9 +++ .../src/nyc_taxi_genie.geniespace.json | 8 +-- knowledge_base/dashboard_nyc_taxi/README.md | 17 ++---- .../dashboard_nyc_taxi/databricks.yml | 5 -- .../resources/nyc_taxi_genie.genie_space.yml | 12 ---- .../src/nyc_taxi_genie.geniespace.json | 56 ------------------- .../.gitignore | 0 .../README.md | 0 .../databricks.yml | 2 +- .../resources/nyc_taxi_genie.genie_space.yml | 0 .../src/nyc_taxi_genie.geniespace.json | 8 +-- 13 files changed, 23 insertions(+), 107 deletions(-) delete mode 100644 knowledge_base/app_with_genie_space/app/app.yml delete mode 100644 knowledge_base/dashboard_nyc_taxi/resources/nyc_taxi_genie.genie_space.yml delete mode 100644 knowledge_base/genie_space/src/nyc_taxi_genie.geniespace.json rename knowledge_base/{genie_space => genie_space_nyc_taxi}/.gitignore (100%) rename knowledge_base/{genie_space => genie_space_nyc_taxi}/README.md (100%) rename knowledge_base/{genie_space => genie_space_nyc_taxi}/databricks.yml (95%) rename knowledge_base/{genie_space => genie_space_nyc_taxi}/resources/nyc_taxi_genie.genie_space.yml (100%) rename knowledge_base/{dashboard_nyc_taxi => genie_space_nyc_taxi}/src/nyc_taxi_genie.geniespace.json (94%) diff --git a/knowledge_base/app_with_genie_space/README.md b/knowledge_base/app_with_genie_space/README.md index 8d9ba79b..c2a752b3 100644 --- a/knowledge_base/app_with_genie_space/README.md +++ b/knowledge_base/app_with_genie_space/README.md @@ -46,7 +46,7 @@ For more information about Genie, see the [documentation](https://docs.databrick space_id: ${resources.genie_spaces.nyc_taxi_genie.space_id} permission: CAN_RUN ``` -* `app/app.yml` injects the space ID into the app as the `GENIE_SPACE_ID` environment variable using `valueFrom: "genie-space"`. +* The `config` block in `resources/genie_assistant.app.yml` injects the space ID into the app as the `GENIE_SPACE_ID` environment variable using `value_from: "genie-space"`. * `app/app.py` sends each question to the space with `w.genie.start_conversation_and_wait(...)` and renders the text answer or the generated SQL and its results. Note that the app queries Genie with its own service principal identity: in addition to the `CAN_RUN` permission on the space granted by the bundle, the service principal must be able to use the SQL warehouse and read the tables that back the space. If access to the `samples` catalog is restricted for service principals in your workspace, point the space at a table the app can read. diff --git a/knowledge_base/app_with_genie_space/app/app.yml b/knowledge_base/app_with_genie_space/app/app.yml deleted file mode 100644 index 7e267284..00000000 --- a/knowledge_base/app_with_genie_space/app/app.yml +++ /dev/null @@ -1,11 +0,0 @@ -command: - - flask - - --app - - app - - run - -env: - # The value is injected by the Databricks Apps runtime from the app resource - # named "genie-space" (see resources/genie_assistant.app.yml). - - name: GENIE_SPACE_ID - valueFrom: "genie-space" diff --git a/knowledge_base/app_with_genie_space/resources/genie_assistant.app.yml b/knowledge_base/app_with_genie_space/resources/genie_assistant.app.yml index 0d7d4624..922edbdd 100644 --- a/knowledge_base/app_with_genie_space/resources/genie_assistant.app.yml +++ b/knowledge_base/app_with_genie_space/resources/genie_assistant.app.yml @@ -5,6 +5,15 @@ resources: description: "An app that answers questions using a Genie space" source_code_path: ../app + # The app configuration: the command to start the app and its environment. + config: + command: ["flask", "--app", "app", "run"] + env: + # The value is injected by the Databricks Apps runtime from the app + # resource named "genie-space" declared below. + - name: GENIE_SPACE_ID + value_from: "genie-space" + # The resources which this app has access to: resources: - name: "genie-space" diff --git a/knowledge_base/app_with_genie_space/src/nyc_taxi_genie.geniespace.json b/knowledge_base/app_with_genie_space/src/nyc_taxi_genie.geniespace.json index 3f6d935a..dc526d86 100644 --- a/knowledge_base/app_with_genie_space/src/nyc_taxi_genie.geniespace.json +++ b/knowledge_base/app_with_genie_space/src/nyc_taxi_genie.geniespace.json @@ -17,12 +17,12 @@ { "identifier": "samples.nyctaxi.trips", "column_configs": [ - { "column_name": "tpep_pickup_datetime" }, - { "column_name": "tpep_dropoff_datetime" }, - { "column_name": "trip_distance" }, + { "column_name": "dropoff_zip" }, { "column_name": "fare_amount" }, { "column_name": "pickup_zip" }, - { "column_name": "dropoff_zip" } + { "column_name": "tpep_dropoff_datetime" }, + { "column_name": "tpep_pickup_datetime" }, + { "column_name": "trip_distance" } ] } ] diff --git a/knowledge_base/dashboard_nyc_taxi/README.md b/knowledge_base/dashboard_nyc_taxi/README.md index ff7a425c..8e1708d2 100644 --- a/knowledge_base/dashboard_nyc_taxi/README.md +++ b/knowledge_base/dashboard_nyc_taxi/README.md @@ -1,15 +1,14 @@ # Dashboard for NYC Taxi Trip Analysis -This example shows how to define a Declarative Automation Bundle with an AI/BI dashboard, a job that captures a snapshot of the dashboard and emails it to a subscriber, and a Genie space to ask questions about the same data in natural language. +This example shows how to define a Declarative Automation Bundle with an AI/BI dashboard and a job that captures a snapshot of the dashboard and emails it to a subscriber. It deploys the sample __NYC Taxi Trip Analysis__ dashboard to a Databricks workspace and configures a daily schedule to run the dashboard and send the snapshot in email to a specified email address. For more information about AI/BI dashboards, please refer to the [documentation](https://docs.databricks.com/dashboards/index.html). -For more information about Genie, please refer to the [documentation](https://docs.databricks.com/genie/index.html). ## Prerequisites -This example includes a Genie space, which requires Databricks CLI v1.3.0 or above. Genie spaces can only be deployed with the [direct deployment engine](https://docs.databricks.com/dev-tools/bundles/direct) (`engine: direct`), which is the default for new deployments since CLI v1.3.0. +This example includes a dashboard snapshot task, which requires Databricks CLI v0.250.0 or above. Creating dashboards in bundles is supported in Databricks CLI v0.232.0 or above. ## Usage @@ -26,8 +25,6 @@ This example includes a Genie space, which requires Databricks CLI v1.3.0 or abo The AI/BI dashboard is created and the snapshot job is set to run daily at 8 AM, which captures a snapshot of the dashboard, and sends it in email to the specified subscriber. -The bundle also deploys the __NYC Taxi Trip Analysis Genie__ space defined in `resources/nyc_taxi_genie.genie_space.yml`, where users can ask questions about the same data in natural language. Its data sources, instructions, and sample questions are stored in `src/nyc_taxi_genie.geniespace.json`. - ### Visual modification You can use the Databricks UI to modify the dashboard, but any modifications made through the UI will not be applied to the bundle `.lvdash.json` file unless you explicitly update it. @@ -44,17 +41,11 @@ To continuously poll and retrieve the updated `.lvdash.json` file when it change databricks bundle generate dashboard --resource nyc_taxi_trip_analysis --force --watch ``` -The same workflow applies to the Genie space and its `.geniespace.json` file: - -```sh -databricks bundle generate genie-space --resource nyc_taxi_genie --force -``` - -Any remote modifications of a dashboard or Genie space are noticed by the `deploy` command and require +Any remote modifications of a dashboard are noticed by the `deploy` command and require you to acknowledge that remote changes can be overwritten by local changes. It is therefore recommended to run the `generate` command before running the `deploy` command. Otherwise, you may lose your remote changes. ### Manual modification -You can modify the `.lvdash.json` or `.geniespace.json` files directly and redeploy to observe your changes. +You can modify the `.lvdash.json` file directly and redeploy to observe your changes. diff --git a/knowledge_base/dashboard_nyc_taxi/databricks.yml b/knowledge_base/dashboard_nyc_taxi/databricks.yml index dd1f84f5..27f4e520 100644 --- a/knowledge_base/dashboard_nyc_taxi/databricks.yml +++ b/knowledge_base/dashboard_nyc_taxi/databricks.yml @@ -1,11 +1,6 @@ bundle: name: dashboard_nyc_taxi - # The Genie space in this bundle can only be deployed with the direct deployment - # engine. The direct engine is the default for new deployments since Databricks - # CLI v1.3.0. - engine: direct - include: - resources/*.yml diff --git a/knowledge_base/dashboard_nyc_taxi/resources/nyc_taxi_genie.genie_space.yml b/knowledge_base/dashboard_nyc_taxi/resources/nyc_taxi_genie.genie_space.yml deleted file mode 100644 index 57818a43..00000000 --- a/knowledge_base/dashboard_nyc_taxi/resources/nyc_taxi_genie.genie_space.yml +++ /dev/null @@ -1,12 +0,0 @@ -resources: - genie_spaces: - nyc_taxi_genie: - title: "NYC Taxi Trip Analysis Genie" - description: "Ask questions about NYC taxi trip data in natural language" - - # The serialized definition of the Genie space: its data sources, - # instructions, and sample questions. - file_path: ../src/nyc_taxi_genie.geniespace.json - - # The warehouse used to run the queries that Genie generates. - warehouse_id: ${var.warehouse_id} diff --git a/knowledge_base/genie_space/src/nyc_taxi_genie.geniespace.json b/knowledge_base/genie_space/src/nyc_taxi_genie.geniespace.json deleted file mode 100644 index 3f6d935a..00000000 --- a/knowledge_base/genie_space/src/nyc_taxi_genie.geniespace.json +++ /dev/null @@ -1,56 +0,0 @@ -{ - "version": 2, - "config": { - "sample_questions": [ - { - "id": "11111111111111111111111111111111", - "question": ["What is the average fare per trip?"] - }, - { - "id": "22222222222222222222222222222222", - "question": ["How many trips were longer than 10 miles?"] - } - ] - }, - "data_sources": { - "tables": [ - { - "identifier": "samples.nyctaxi.trips", - "column_configs": [ - { "column_name": "tpep_pickup_datetime" }, - { "column_name": "tpep_dropoff_datetime" }, - { "column_name": "trip_distance" }, - { "column_name": "fare_amount" }, - { "column_name": "pickup_zip" }, - { "column_name": "dropoff_zip" } - ] - } - ] - }, - "instructions": { - "text_instructions": [ - { - "id": "33333333333333333333333333333333", - "content": [ - "This Genie space answers questions about NYC taxi trips.\n", - "All data is in the samples.nyctaxi.trips table.\n", - "Fare amounts are in USD. When asked about revenue, use SUM(fare_amount)." - ] - } - ], - "example_question_sqls": [ - { - "id": "44444444444444444444444444444444", - "question": ["What was the total revenue per pickup zip code?"], - "sql": [ - "SELECT\n", - " pickup_zip,\n", - " SUM(fare_amount) AS total_revenue\n", - "FROM samples.nyctaxi.trips\n", - "GROUP BY pickup_zip\n", - "ORDER BY total_revenue DESC" - ] - } - ] - } -} diff --git a/knowledge_base/genie_space/.gitignore b/knowledge_base/genie_space_nyc_taxi/.gitignore similarity index 100% rename from knowledge_base/genie_space/.gitignore rename to knowledge_base/genie_space_nyc_taxi/.gitignore diff --git a/knowledge_base/genie_space/README.md b/knowledge_base/genie_space_nyc_taxi/README.md similarity index 100% rename from knowledge_base/genie_space/README.md rename to knowledge_base/genie_space_nyc_taxi/README.md diff --git a/knowledge_base/genie_space/databricks.yml b/knowledge_base/genie_space_nyc_taxi/databricks.yml similarity index 95% rename from knowledge_base/genie_space/databricks.yml rename to knowledge_base/genie_space_nyc_taxi/databricks.yml index d6b8d7e5..a585eaea 100644 --- a/knowledge_base/genie_space/databricks.yml +++ b/knowledge_base/genie_space_nyc_taxi/databricks.yml @@ -1,5 +1,5 @@ bundle: - name: genie_space + name: genie_space_nyc_taxi # Genie spaces can only be deployed with the direct deployment engine. # The direct engine is the default for new deployments since Databricks CLI v1.3.0. diff --git a/knowledge_base/genie_space/resources/nyc_taxi_genie.genie_space.yml b/knowledge_base/genie_space_nyc_taxi/resources/nyc_taxi_genie.genie_space.yml similarity index 100% rename from knowledge_base/genie_space/resources/nyc_taxi_genie.genie_space.yml rename to knowledge_base/genie_space_nyc_taxi/resources/nyc_taxi_genie.genie_space.yml diff --git a/knowledge_base/dashboard_nyc_taxi/src/nyc_taxi_genie.geniespace.json b/knowledge_base/genie_space_nyc_taxi/src/nyc_taxi_genie.geniespace.json similarity index 94% rename from knowledge_base/dashboard_nyc_taxi/src/nyc_taxi_genie.geniespace.json rename to knowledge_base/genie_space_nyc_taxi/src/nyc_taxi_genie.geniespace.json index 3f6d935a..dc526d86 100644 --- a/knowledge_base/dashboard_nyc_taxi/src/nyc_taxi_genie.geniespace.json +++ b/knowledge_base/genie_space_nyc_taxi/src/nyc_taxi_genie.geniespace.json @@ -17,12 +17,12 @@ { "identifier": "samples.nyctaxi.trips", "column_configs": [ - { "column_name": "tpep_pickup_datetime" }, - { "column_name": "tpep_dropoff_datetime" }, - { "column_name": "trip_distance" }, + { "column_name": "dropoff_zip" }, { "column_name": "fare_amount" }, { "column_name": "pickup_zip" }, - { "column_name": "dropoff_zip" } + { "column_name": "tpep_dropoff_datetime" }, + { "column_name": "tpep_pickup_datetime" }, + { "column_name": "trip_distance" } ] } ] From dd8f18c1a8979d6e179fa41d6630321a1358f365 Mon Sep 17 00:00:00 2001 From: Jan Rose Date: Wed, 10 Jun 2026 16:41:12 +0200 Subject: [PATCH 3/3] Add public doc link to nyctaxi dataset --- knowledge_base/app_with_genie_space/README.md | 2 +- knowledge_base/genie_space_nyc_taxi/README.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/knowledge_base/app_with_genie_space/README.md b/knowledge_base/app_with_genie_space/README.md index c2a752b3..69065d21 100644 --- a/knowledge_base/app_with_genie_space/README.md +++ b/knowledge_base/app_with_genie_space/README.md @@ -2,7 +2,7 @@ This example demonstrates how to define a Databricks app that uses a Genie space in a Declarative Automation Bundle. -It deploys a Genie space for the `samples.nyctaxi.trips` table and a Flask app that lets users ask the space questions in natural language through the [Genie Conversation API](https://docs.databricks.com/genie/conversation-api.html). +It deploys a Genie space for the `samples.nyctaxi.trips` [table](https://docs.databricks.com/aws/en/discover/databricks-datasets#nyctaxi) and a Flask app that lets users ask the space questions in natural language through the [Genie Conversation API](https://docs.databricks.com/genie/conversation-api.html). For more information about Databricks Apps, see the [documentation](https://docs.databricks.com/aws/en/dev-tools/databricks-apps). For more information about Genie, see the [documentation](https://docs.databricks.com/genie/index.html). diff --git a/knowledge_base/genie_space_nyc_taxi/README.md b/knowledge_base/genie_space_nyc_taxi/README.md index 7d011eef..a51e5b5a 100644 --- a/knowledge_base/genie_space_nyc_taxi/README.md +++ b/knowledge_base/genie_space_nyc_taxi/README.md @@ -2,7 +2,7 @@ This example shows how to define a Genie space using Declarative Automation Bundles. Genie spaces let business users ask questions about data in natural language. -It deploys a Genie space that answers questions about the `samples.nyctaxi.trips` table. +It deploys a Genie space that answers questions about the `samples.nyctaxi.trips` [table](https://docs.databricks.com/aws/en/discover/databricks-datasets#nyctaxi). For more information about Genie, see the [Databricks documentation](https://docs.databricks.com/genie/index.html).