Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions knowledge_base/app_with_genie_space/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.databricks
52 changes: 52 additions & 0 deletions knowledge_base/app_with_genie_space/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Databricks app using a Genie space

This example demonstrates how to define a Databricks app that uses a Genie space in a Declarative Automation Bundle.

It deploys a Genie space for the `samples.nyctaxi.trips` [table](https://docs.databricks.com/aws/en/discover/databricks-datasets#nyctaxi) and a Flask app that lets users ask the space questions in natural language through the [Genie Conversation API](https://docs.databricks.com/genie/conversation-api.html).

For more information about Databricks Apps, see the [documentation](https://docs.databricks.com/aws/en/dev-tools/databricks-apps).
For more information about Genie, see the [documentation](https://docs.databricks.com/genie/index.html).

## Prerequisites

* Databricks CLI v1.3.0 or above.
* Genie spaces can only be deployed with the [direct deployment engine](https://docs.databricks.com/dev-tools/bundles/direct) (`engine: direct`), which is the default for new deployments since CLI v1.3.0.

## Usage

1. Modify `databricks.yml`:
- Update the `host` field to your Databricks workspace URL
- Update the `warehouse` field to the name of your SQL warehouse

2. Deploy the bundle:
```sh
databricks bundle deploy
```

3. Run the app:
```sh
databricks bundle run genie_assistant
```

4. Open the app in your browser:
```sh
databricks bundle open genie_assistant
```
Alternatively, run `databricks bundle summary` to display its URL.

## How it works

* `resources/nyc_taxi_genie.genie_space.yml` defines the Genie space, with its data sources, instructions, and sample questions stored in `src/nyc_taxi_genie.geniespace.json`.
* `resources/genie_assistant.app.yml` declares the Genie space as an app resource. This grants the app's service principal `CAN_RUN` permission on the space:
```yaml
resources:
- name: "genie-space"
genie_space:
name: "NYC Taxi Trip Analysis"
space_id: ${resources.genie_spaces.nyc_taxi_genie.space_id}
permission: CAN_RUN
```
* The `config` block in `resources/genie_assistant.app.yml` injects the space ID into the app as the `GENIE_SPACE_ID` environment variable using `value_from: "genie-space"`.
* `app/app.py` sends each question to the space with `w.genie.start_conversation_and_wait(...)` and renders the text answer or the generated SQL and its results.

Note that the app queries Genie with its own service principal identity: in addition to the `CAN_RUN` permission on the space granted by the bundle, the service principal must be able to use the SQL warehouse and read the tables that back the space. If access to the `samples` catalog is restricted for service principals in your workspace, point the space at a table the app can read.
56 changes: 56 additions & 0 deletions knowledge_base/app_with_genie_space/app/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import os

from databricks.sdk import WorkspaceClient
from flask import Flask, render_template, request

app = Flask(__name__)

w = WorkspaceClient()

# The space ID is injected by the "genie_space" resource declared in app.yml.
space_id = os.getenv("GENIE_SPACE_ID")


@app.route("/", methods=["GET", "POST"])
def home():
question = None
answer = None
sql = None
columns = []
rows = []

if request.method == "POST":
question = request.form["question"]

# Start a new conversation in the Genie space and wait for the answer.
# Use w.genie.create_message_and_wait(...) to ask follow-up questions
# in the same conversation.
message = w.genie.start_conversation_and_wait(space_id, question)

for attachment in message.attachments or []:
# Genie answers either with plain text...
if attachment.text:
answer = attachment.text.content

# ...or with a generated SQL query and its result set.
if attachment.query:
answer = attachment.query.description
sql = attachment.query.query
result = w.genie.get_message_attachment_query_result(
space_id,
message.conversation_id,
message.id,
attachment.attachment_id,
)
statement = result.statement_response
columns = [column.name for column in statement.manifest.schema.columns]
rows = statement.result.data_array or []

return render_template(
"index.html",
question=question,
answer=answer,
sql=sql,
columns=columns,
rows=rows,
)
2 changes: 2 additions & 0 deletions knowledge_base/app_with_genie_space/app/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
databricks-sdk>=0.60.0
flask
34 changes: 34 additions & 0 deletions knowledge_base/app_with_genie_space/app/templates/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
<html>
<head>
<title>Genie space app managed by DABs</title>
</head>
<body>
<h1>Ask Genie about NYC taxi trips</h1>
<form method="post">
<input type="text" name="question" placeholder="e.g. What is the average fare per trip?" size="60" required>
<button type="submit">Ask</button>
</form>

{% if question %}
<h2>{{ question }}</h2>
{% if answer %}
<p>{{ answer }}</p>
{% endif %}
{% if sql %}
<pre>{{ sql }}</pre>
{% endif %}
{% if columns %}
<table border="1" cellpadding="4">
<tr>
{% for column in columns %}<th>{{ column }}</th>{% endfor %}
</tr>
{% for row in rows %}
<tr>
{% for value in row %}<td>{{ value }}</td>{% endfor %}
</tr>
{% endfor %}
</table>
{% endif %}
{% endif %}
</body>
</html>
24 changes: 24 additions & 0 deletions knowledge_base/app_with_genie_space/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
bundle:
name: app_with_genie_space

# Genie spaces can only be deployed with the direct deployment engine.
# The direct engine is the default for new deployments since Databricks CLI v1.3.0.
engine: direct

include:
- resources/*.yml

variables:
# The "warehouse_id" variable is used to reference the warehouse used by the Genie space.
warehouse_id:
lookup:
# Replace this with the name of your SQL warehouse.
warehouse: "Shared Unity Catalog Serverless"

workspace:
host: https://myworkspace.databricks.com

targets:
dev:
default: true
mode: development
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
resources:
apps:
genie_assistant:
name: "genie-assistant"
description: "An app that answers questions using a Genie space"
source_code_path: ../app

# The app configuration: the command to start the app and its environment.
config:
command: ["flask", "--app", "app", "run"]
env:
# The value is injected by the Databricks Apps runtime from the app
# resource named "genie-space" declared below.
- name: GENIE_SPACE_ID
value_from: "genie-space"

# The resources which this app has access to:
resources:
- name: "genie-space"
description: "The Genie space that the app sends questions to"
genie_space:
name: "NYC Taxi Trip Analysis"
space_id: ${resources.genie_spaces.nyc_taxi_genie.space_id}
permission: CAN_RUN
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
resources:
genie_spaces:
nyc_taxi_genie:
title: "NYC Taxi Trip Analysis"
description: "Ask questions about NYC taxi trip data in natural language"

# The serialized definition of the Genie space: its data sources,
# instructions, and sample questions.
file_path: ../src/nyc_taxi_genie.geniespace.json

# The warehouse used to run the queries that Genie generates.
warehouse_id: ${var.warehouse_id}
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
{
"version": 2,
"config": {
"sample_questions": [
{
"id": "11111111111111111111111111111111",
"question": ["What is the average fare per trip?"]
},
{
"id": "22222222222222222222222222222222",
"question": ["How many trips were longer than 10 miles?"]
}
]
},
"data_sources": {
"tables": [
{
"identifier": "samples.nyctaxi.trips",
"column_configs": [
{ "column_name": "dropoff_zip" },
{ "column_name": "fare_amount" },
{ "column_name": "pickup_zip" },
{ "column_name": "tpep_dropoff_datetime" },
{ "column_name": "tpep_pickup_datetime" },
{ "column_name": "trip_distance" }
]
}
]
},
"instructions": {
"text_instructions": [
{
"id": "33333333333333333333333333333333",
"content": [
"This Genie space answers questions about NYC taxi trips.\n",
"All data is in the samples.nyctaxi.trips table.\n",
"Fare amounts are in USD. When asked about revenue, use SUM(fare_amount)."
]
}
],
"example_question_sqls": [
{
"id": "44444444444444444444444444444444",
"question": ["What was the total revenue per pickup zip code?"],
"sql": [
"SELECT\n",
" pickup_zip,\n",
" SUM(fare_amount) AS total_revenue\n",
"FROM samples.nyctaxi.trips\n",
"GROUP BY pickup_zip\n",
"ORDER BY total_revenue DESC"
]
}
]
}
}
1 change: 1 addition & 0 deletions knowledge_base/genie_space_nyc_taxi/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.databricks
73 changes: 73 additions & 0 deletions knowledge_base/genie_space_nyc_taxi/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Genie space for NYC Taxi Trip Analysis

This example shows how to define a Genie space using Declarative Automation Bundles. Genie spaces let business users ask questions about data in natural language.

It deploys a Genie space that answers questions about the `samples.nyctaxi.trips` [table](https://docs.databricks.com/aws/en/discover/databricks-datasets#nyctaxi).

For more information about Genie, see the [Databricks documentation](https://docs.databricks.com/genie/index.html).

## Prerequisites

* Databricks CLI v1.3.0 or above.
* Genie spaces can only be deployed with the [direct deployment engine](https://docs.databricks.com/dev-tools/bundles/direct) (`engine: direct`), which is the default for new deployments since CLI v1.3.0.

## Usage

1. Modify `databricks.yml`:
- Update the `host` field to your Databricks workspace URL
- Update the `warehouse` field to the name of your SQL warehouse

2. Deploy the Genie space:
```sh
databricks bundle deploy
```

3. Open the deployed Genie space in your browser:
```sh
databricks bundle open
```
Alternatively, run `databricks bundle summary` to display its URL.

## Key configuration

The Genie space configuration in `resources/nyc_taxi_genie.genie_space.yml` includes:

- **title**: Display name of the Genie space
- **file_path**: Path to the `.geniespace.json` file that holds the serialized definition of the space: its data sources, instructions, and sample questions. Instead of `file_path`, the definition can also be inlined in YAML under `serialized_space`.
- **warehouse_id**: The SQL warehouse used to run the queries that Genie generates
- **permissions**: Who can use the space (`CAN_VIEW`, `CAN_RUN`, `CAN_EDIT`, `CAN_MANAGE`)

## Importing an existing Genie space

To bring a Genie space that was authored in the Databricks UI into a bundle, run:

```sh
databricks bundle generate genie-space --existing-id <space-id>
```

This writes the configuration to `resources/<key>.genie_space.yml` and the space definition to `src/<key>.geniespace.json`.

## Visual modification

You can use the Databricks UI to modify the deployed Genie space, but any modifications made through the UI will not be applied to the bundle `.geniespace.json` file unless you explicitly update it.

To update the local bundle `.geniespace.json` file, run:

```sh
databricks bundle generate genie-space --resource nyc_taxi_genie --force
```

To continuously poll and retrieve the updated `.geniespace.json` file when it changes, run:

```sh
databricks bundle generate genie-space --resource nyc_taxi_genie --force --watch
```

Any remote modifications of a Genie space are noticed by the `deploy` command and require
you to acknowledge that remote changes can be overwritten by local changes.
It is therefore recommended to run the `generate` command before running the `deploy` command.
Otherwise, you may lose your remote changes.

### Manual modification

You can modify the `.geniespace.json` file directly and redeploy to observe your changes.
24 changes: 24 additions & 0 deletions knowledge_base/genie_space_nyc_taxi/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
bundle:
name: genie_space_nyc_taxi

# Genie spaces can only be deployed with the direct deployment engine.
# The direct engine is the default for new deployments since Databricks CLI v1.3.0.
engine: direct

include:
- resources/*.yml

variables:
# The "warehouse_id" variable is used to reference the warehouse used by the Genie space.
warehouse_id:
lookup:
# Replace this with the name of your SQL warehouse.
warehouse: "Shared Unity Catalog Serverless"

workspace:
host: https://myworkspace.databricks.com

targets:
dev:
default: true
mode: development
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
resources:
genie_spaces:
nyc_taxi_genie:
title: "NYC Taxi Trip Analysis"
description: "Ask questions about NYC taxi trip data in natural language"

# The serialized definition of the Genie space: its data sources,
# instructions, and sample questions.
#
# Instead of referencing a file, the definition can also be inlined
# in YAML under the "serialized_space" key. Specify only one of the two.
file_path: ../src/nyc_taxi_genie.geniespace.json

# The warehouse used to run the queries that Genie generates.
warehouse_id: ${var.warehouse_id}

# The "parent_path" field can be configured to place the Genie space in a
# non-standard folder in the workspace.
#
# It defaults to "${workspace.resource_path}", which is located
# under the bundle deployment root.
#
# parent_path: ${workspace.resource_path}

permissions:
- level: CAN_RUN
group_name: users
Loading