Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions contrib/job_with_ai_parse_document/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Source Documents (UC Volume)

5. **Upload documents** to your source volume

6. **Run job** from the Databricks UI (Workflows)
6. **Run job** from the Databricks UI (Jobs & Pipelines)

## Configuration

Expand Down Expand Up @@ -172,7 +172,7 @@ The included notebook visualizes parsing results with interactive bounding boxes
## Resources

- [Declarative Automation Bundles](https://docs.databricks.com/dev-tools/bundles/)
- [Databricks Workflows](https://docs.databricks.com/workflows/)
- [Lakeflow Jobs](https://docs.databricks.com/aws/en/jobs/)
- [Structured Streaming](https://docs.databricks.com/structured-streaming/)
- [`ai_parse_document` Function](https://docs.databricks.com/aws/en/sql/language-manual/functions/ai_parse_document)
- [`ai_query` Function](https://docs.databricks.com/aws/en/sql/language-manual/functions/ai_query)
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The '{{.project_name}}' project was generated by using the default-scala templat
This deploys everything that's defined for this project.
For example, the default template would deploy a job called
`[dev yourname] {{.project_name}}_job` to your workspace.
You can find that job by opening your workspace and clicking on **Workflows**.
You can find that job by opening your workspace and clicking on **Jobs & Pipelines**.

4. Similarly, to deploy a production copy, type:
```
Expand Down
2 changes: 1 addition & 1 deletion knowledge_base/job_backfill_data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ with this project. You can also use the CLI:
(Note: "dev" is the default target, so `--target` is optional.)

This deploys everything defined for this project, including the job
`[dev yourname] sql_backfill_example`. You can find it under **Workflows** (or **Jobs & Pipelines**) in your workspace.
`[dev yourname] sql_backfill_example`. You can find it under **Jobs & Pipelines** in your workspace.

3. To run the job with the default `run_date`:
```
Expand Down
2 changes: 1 addition & 1 deletion knowledge_base/pipeline_with_schema/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Pipeline with a dedicated Unity Catalog schema

This example demonstrates how to define a Unity Catalog schema and a Delta Live Tables pipeline that uses it.
This example demonstrates how to define a Unity Catalog schema and a [Lakeflow Spark Declarative Pipelines](https://docs.databricks.com/aws/en/dlt/) pipeline that uses it.

## Prerequisites

Expand Down
2 changes: 1 addition & 1 deletion knowledge_base/serverless_job/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This Declarative Automation Bundles example demonstrates how to define a job that runs on serverless compute.

For more information, please refer to the [documentation](https://docs.databricks.com/en/workflows/jobs/how-to/use-bundles-with-jobs.html#configure-a-job-that-uses-serverless-compute).
For more information, please refer to the [documentation](https://docs.databricks.com/aws/en/dev-tools/bundles/jobs-tutorial).

## Prerequisites

Expand Down
20 changes: 12 additions & 8 deletions knowledge_base/vector_search_product_discovery/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Vector Search: Semantic Product Discovery
# AI Search: Semantic Product Discovery

A Declarative Automation Bundle demonstrating semantic product search using
[Databricks Vector Search](https://docs.databricks.com/en/generative-ai/vector-search.html).
It automates the full setup — the Unity Catalog schema, the Vector Search endpoint and
[Databricks AI Search](https://docs.databricks.com/aws/en/ai-search/ai-search) (formerly
Vector Search).
It automates the full setup — the Unity Catalog schema, the AI Search endpoint and
index, and the jobs that load and query the catalog — so a single `databricks bundle deploy`
gives you a working semantic-search example to explore and adapt.

Expand All @@ -22,7 +23,7 @@ products in vector space.
```
data/products.json (synced to workspace by bundle deploy)
↓ embed descriptions → upsert_data()
product_index (Direct Access Vector Search index)
product_index (Direct Access AI Search index)
↓ embed query → similarity_search(query_vector=...)
ranked results
```
Expand All @@ -36,7 +37,7 @@ ranked results
│ └── products.json # Product catalog — synced to the workspace on deploy
├── resources/
│ ├── schema.yml # Unity Catalog schema that namespaces the index
│ ├── vector-search-endpoint.yml # Vector Search endpoint (managed ANN serving)
│ ├── vector-search-endpoint.yml # AI Search endpoint (managed ANN serving)
│ ├── vector-search-index.yml # Direct Access index — schema defined inline
│ ├── setup-job.yml # Job: embed product descriptions and upsert them
│ └── query-job.yml # Job: embed a query and return ranked results
Expand All @@ -45,6 +46,9 @@ ranked results
└── 02_query_demo.py # Semantic search — runs as a job or interactively
```

Bundle resource types are unchanged by the rename to AI Search: the endpoint and index
are still declared as `vector_search_endpoints` and `vector_search_indexes`.

## Prerequisites

- Databricks workspace with Unity Catalog enabled
Expand All @@ -69,7 +73,7 @@ ranked results
you — and several people can deploy into the same workspace without colliding. Use
`databricks bundle deploy --target prod` for the shared production copy.

> Vector Search endpoint creation takes a few minutes to reach ONLINE status.
> AI Search endpoint creation takes a few minutes to reach ONLINE status.

4. Load the catalog by running the bundle. This embeds all product descriptions and upserts them into the index.
```bash
Expand Down Expand Up @@ -103,7 +107,7 @@ databricks bundle deploy \
|---|---|---|
| `catalog` | `main` | Existing Unity Catalog catalog |
| `schema` | `product_search` | Schema created by the bundle |
| `endpoint_name` | `product-search-endpoint` | Vector Search endpoint name. Shared in prod; the `dev` target overrides it per user. |
| `endpoint_name` | `product-search-endpoint` | AI Search endpoint name. Shared in prod; the `dev` target overrides it per user. |
| `embedding_model` | `databricks-gte-large-en` | Foundation model used for embeddings |
| `embedding_dimension` | `1024` | Vector dimension. Drives both the index and the embedding requests; immutable after the index is created. |

Expand Down Expand Up @@ -150,6 +154,6 @@ table and it keeps itself up to date. Replace `index_type: DIRECT_ACCESS` and

## Resources

- [Databricks Vector Search](https://docs.databricks.com/en/generative-ai/vector-search.html)
- [Databricks AI Search](https://docs.databricks.com/aws/en/ai-search/ai-search)
- [Declarative Automation Bundles](https://docs.databricks.com/dev-tools/bundles/)
- [Foundation Models — GTE Large](https://docs.databricks.com/en/machine-learning/foundation-models/supported-models.html)
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ variables:
description: Unity Catalog schema name for the product search use case
default: product_search
endpoint_name:
description: Name of the Vector Search endpoint
description: Name of the AI Search endpoint
default: product-search-endpoint
embedding_model:
description: Model serving endpoint used to embed product descriptions
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ resources:

tasks:
- task_key: upsert_products
description: Load products from JSON, embed descriptions, and upsert into the Vector Search index
description: Load products from JSON, embed descriptions, and upsert into the AI Search index
environment_key: serverless_env
notebook_task:
notebook_path: ../src/01_upsert_products.py
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Databricks notebook source
# MAGIC %md
# MAGIC # Upsert Products into Vector Search Index
# MAGIC # Upsert Products into AI Search Index
# MAGIC
# MAGIC Reads the product catalog from the JSON file deployed with the bundle,
# MAGIC embeds each product description, then upserts all records into the Vector
# MAGIC Search index. Re-running is safe — upsert is idempotent on `product_id`.
# MAGIC embeds each product description, then upserts all records into the AI Search
# MAGIC index. Re-running is safe — upsert is idempotent on `product_id`.

# COMMAND ----------

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# MAGIC %md
# MAGIC # Semantic Product Search Demo
# MAGIC
# MAGIC Queries the Vector Search index to find products that match a natural-language
# MAGIC Queries the AI Search index to find products that match a natural-language
# MAGIC description. Try queries that would fail keyword search — e.g. *"something to
# MAGIC keep my coffee hot all day"* or *"gear for sleeping outside in freezing weather"*.

Expand Down
10 changes: 5 additions & 5 deletions mlops_stacks/docs/ml-developer-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,15 @@ to use databricks CLI bundles to deploy ML code together with resource configs t

This will allow you to develop locally and use databricks CLI bundles to deploy to your dev workspace to test out code and config changes.

### Develop on Databricks using Databricks Repos
### Develop on Databricks using Databricks Git folders

#### Prerequisites
You'll need:
* Access to run commands on a cluster running Databricks Runtime ML version 11.0 or above in your dev Databricks workspace
* To set up [Databricks Repos](https://learn.microsoft.com/azure/databricks/repos/index): see instructions below
* To set up [Databricks Git folders](https://learn.microsoft.com/azure/databricks/repos/index): see instructions below

#### Configuring Databricks Repos
To use Repos, [set up git integration](https://learn.microsoft.com/azure/databricks/repos/repos-setup) in your dev workspace.
#### Configuring Databricks Git folders
To use Git folders, [set up git integration](https://learn.microsoft.com/azure/databricks/repos/repos-setup) in your dev workspace.

If the current project has already been pushed to a hosted Git repo, follow the
[UI workflow](https://learn.microsoft.com/azure/databricks/repos/git-operations-with-repos#add-a-repo-and-connect-remotely-later)
Expand All @@ -49,7 +49,7 @@ Otherwise, e.g. if iterating on ML code for a new project, follow the steps belo

#### Running code on Databricks
You can iterate on the sample ML code by running the provided `mlops_stacks/training/notebooks/Train.py` notebook on Databricks using
[Repos](https://learn.microsoft.com/azure/databricks/repos/index).
[Git folders](https://learn.microsoft.com/azure/databricks/repos/index).


## Next Steps
Expand Down
2 changes: 1 addition & 1 deletion mlops_stacks/mlops_stacks/resources/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ Alternatively, you can use the other approaches described in the [databricks CLI
### Validate and provision ML resource configurations
1. After installing the databricks CLI and creating the `DATABRICKS_TOKEN` env variable, change to the `mlops_stacks` directory.
2. Run `databricks bundle validate` to validate the Databricks resource configurations.
3. Run `databricks bundle deploy` to provision the Databricks resource configurations to the dev workspace. The resource configurations and your ML code will be copied together to the dev workspace. The defined resources such as Databricks Workflows, MLflow Model and MLflow Experiment will be provisioned according to the config files under `mlops_stacks/resources`.
3. Run `databricks bundle deploy` to provision the Databricks resource configurations to the dev workspace. The resource configurations and your ML code will be copied together to the dev workspace. The defined resources such as Lakeflow Jobs, MLflow Model and MLflow Experiment will be provisioned according to the config files under `mlops_stacks/resources`.
4. Go to the Databricks dev workspace, check the defined model, experiment and workflows status, and interact with the created workflows.

### Destroy ML resource configurations
Expand Down