From 42005406e7d20d6821b96f00670ef2bebcdaaf7c Mon Sep 17 00:00:00 2001 From: Jan Rose Date: Fri, 12 Jun 2026 13:04:04 +0200 Subject: [PATCH] Update docs for renamed products (AI Search, Lakeflow, Git folders) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Follow-up to review feedback on #153: update renamed Databricks products in hand-maintained docs, keeping directory/file names and bundle resource types (vector_search_*) unchanged for link and config compatibility. - Vector Search -> AI Search (vector_search_product_discovery example); notes that resource types keep the legacy vector_search_* names - Delta Live Tables -> Lakeflow Spark Declarative Pipelines (pipeline_with_schema) - Databricks Workflows -> Lakeflow Jobs; UI references -> Jobs & Pipelines - Databricks Repos -> Git folders (mlops_stacks docs) - Point links for moved docs pages at their current locations Deliberately untouched: - default_python, default_sql, dbt_sql, lakeflow_pipelines_*, default_minimal, pydabs: regenerated wholesale from databricks/cli templates by scripts/update_from_templates.sh; the CLI templates still emit "Databricks asset bundle" wording, so that rename belongs in databricks/cli followed by a template refresh here - conftest.py dlt warning filters (must match the runtime warning text) - SDK identifiers (w.lakeview.*) and YAML resource type keys 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- contrib/job_with_ai_parse_document/README.md | 4 ++-- .../template/{{.project_name}}/README.md.tmpl | 2 +- knowledge_base/job_backfill_data/README.md | 2 +- knowledge_base/pipeline_with_schema/README.md | 2 +- knowledge_base/serverless_job/README.md | 2 +- .../vector_search_product_discovery/README.md | 20 +++++++++++-------- .../databricks.yml | 2 +- .../resources/setup-job.yml | 2 +- .../src/01_upsert_products.py | 6 +++--- .../src/02_query_demo.py | 2 +- mlops_stacks/docs/ml-developer-guide.md | 10 +++++----- mlops_stacks/mlops_stacks/resources/README.md | 2 +- 12 files changed, 30 insertions(+), 26 deletions(-) diff --git a/contrib/job_with_ai_parse_document/README.md b/contrib/job_with_ai_parse_document/README.md index d7e6ad2e..2bd3ba2d 100644 --- a/contrib/job_with_ai_parse_document/README.md +++ b/contrib/job_with_ai_parse_document/README.md @@ -63,7 +63,7 @@ Source Documents (UC Volume) 5. **Upload documents** to your source volume -6. **Run job** from the Databricks UI (Workflows) +6. **Run job** from the Databricks UI (Jobs & Pipelines) ## Configuration @@ -172,7 +172,7 @@ The included notebook visualizes parsing results with interactive bounding boxes ## Resources - [Declarative Automation Bundles](https://docs.databricks.com/dev-tools/bundles/) -- [Databricks Workflows](https://docs.databricks.com/workflows/) +- [Lakeflow Jobs](https://docs.databricks.com/aws/en/jobs/) - [Structured Streaming](https://docs.databricks.com/structured-streaming/) - [`ai_parse_document` Function](https://docs.databricks.com/aws/en/sql/language-manual/functions/ai_parse_document) - [`ai_query` Function](https://docs.databricks.com/aws/en/sql/language-manual/functions/ai_query) diff --git a/contrib/templates/default-scala/template/{{.project_name}}/README.md.tmpl b/contrib/templates/default-scala/template/{{.project_name}}/README.md.tmpl index cc4be258..80115834 100644 --- a/contrib/templates/default-scala/template/{{.project_name}}/README.md.tmpl +++ b/contrib/templates/default-scala/template/{{.project_name}}/README.md.tmpl @@ -21,7 +21,7 @@ The '{{.project_name}}' project was generated by using the default-scala templat This deploys everything that's defined for this project. For example, the default template would deploy a job called `[dev yourname] {{.project_name}}_job` to your workspace. - You can find that job by opening your workspace and clicking on **Workflows**. + You can find that job by opening your workspace and clicking on **Jobs & Pipelines**. 4. Similarly, to deploy a production copy, type: ``` diff --git a/knowledge_base/job_backfill_data/README.md b/knowledge_base/job_backfill_data/README.md index 760a1ba4..1f755070 100644 --- a/knowledge_base/job_backfill_data/README.md +++ b/knowledge_base/job_backfill_data/README.md @@ -60,7 +60,7 @@ with this project. You can also use the CLI: (Note: "dev" is the default target, so `--target` is optional.) This deploys everything defined for this project, including the job - `[dev yourname] sql_backfill_example`. You can find it under **Workflows** (or **Jobs & Pipelines**) in your workspace. + `[dev yourname] sql_backfill_example`. You can find it under **Jobs & Pipelines** in your workspace. 3. To run the job with the default `run_date`: ``` diff --git a/knowledge_base/pipeline_with_schema/README.md b/knowledge_base/pipeline_with_schema/README.md index fa87aab4..d5d22df1 100644 --- a/knowledge_base/pipeline_with_schema/README.md +++ b/knowledge_base/pipeline_with_schema/README.md @@ -1,6 +1,6 @@ # Pipeline with a dedicated Unity Catalog schema -This example demonstrates how to define a Unity Catalog schema and a Delta Live Tables pipeline that uses it. +This example demonstrates how to define a Unity Catalog schema and a [Lakeflow Spark Declarative Pipelines](https://docs.databricks.com/aws/en/dlt/) pipeline that uses it. ## Prerequisites diff --git a/knowledge_base/serverless_job/README.md b/knowledge_base/serverless_job/README.md index 318c9311..b96b26b9 100644 --- a/knowledge_base/serverless_job/README.md +++ b/knowledge_base/serverless_job/README.md @@ -2,7 +2,7 @@ This Declarative Automation Bundles example demonstrates how to define a job that runs on serverless compute. -For more information, please refer to the [documentation](https://docs.databricks.com/en/workflows/jobs/how-to/use-bundles-with-jobs.html#configure-a-job-that-uses-serverless-compute). +For more information, please refer to the [documentation](https://docs.databricks.com/aws/en/dev-tools/bundles/jobs-tutorial). ## Prerequisites diff --git a/knowledge_base/vector_search_product_discovery/README.md b/knowledge_base/vector_search_product_discovery/README.md index 6c46e8d7..772db15a 100644 --- a/knowledge_base/vector_search_product_discovery/README.md +++ b/knowledge_base/vector_search_product_discovery/README.md @@ -1,8 +1,9 @@ -# Vector Search: Semantic Product Discovery +# AI Search: Semantic Product Discovery A Declarative Automation Bundle demonstrating semantic product search using -[Databricks Vector Search](https://docs.databricks.com/en/generative-ai/vector-search.html). -It automates the full setup — the Unity Catalog schema, the Vector Search endpoint and +[Databricks AI Search](https://docs.databricks.com/aws/en/ai-search/ai-search) (formerly +Vector Search). +It automates the full setup — the Unity Catalog schema, the AI Search endpoint and index, and the jobs that load and query the catalog — so a single `databricks bundle deploy` gives you a working semantic-search example to explore and adapt. @@ -22,7 +23,7 @@ products in vector space. ``` data/products.json (synced to workspace by bundle deploy) ↓ embed descriptions → upsert_data() -product_index (Direct Access Vector Search index) +product_index (Direct Access AI Search index) ↓ embed query → similarity_search(query_vector=...) ranked results ``` @@ -36,7 +37,7 @@ ranked results │ └── products.json # Product catalog — synced to the workspace on deploy ├── resources/ │ ├── schema.yml # Unity Catalog schema that namespaces the index -│ ├── vector-search-endpoint.yml # Vector Search endpoint (managed ANN serving) +│ ├── vector-search-endpoint.yml # AI Search endpoint (managed ANN serving) │ ├── vector-search-index.yml # Direct Access index — schema defined inline │ ├── setup-job.yml # Job: embed product descriptions and upsert them │ └── query-job.yml # Job: embed a query and return ranked results @@ -45,6 +46,9 @@ ranked results └── 02_query_demo.py # Semantic search — runs as a job or interactively ``` +Bundle resource types are unchanged by the rename to AI Search: the endpoint and index +are still declared as `vector_search_endpoints` and `vector_search_indexes`. + ## Prerequisites - Databricks workspace with Unity Catalog enabled @@ -69,7 +73,7 @@ ranked results you — and several people can deploy into the same workspace without colliding. Use `databricks bundle deploy --target prod` for the shared production copy. - > Vector Search endpoint creation takes a few minutes to reach ONLINE status. + > AI Search endpoint creation takes a few minutes to reach ONLINE status. 4. Load the catalog by running the bundle. This embeds all product descriptions and upserts them into the index. ```bash @@ -103,7 +107,7 @@ databricks bundle deploy \ |---|---|---| | `catalog` | `main` | Existing Unity Catalog catalog | | `schema` | `product_search` | Schema created by the bundle | -| `endpoint_name` | `product-search-endpoint` | Vector Search endpoint name. Shared in prod; the `dev` target overrides it per user. | +| `endpoint_name` | `product-search-endpoint` | AI Search endpoint name. Shared in prod; the `dev` target overrides it per user. | | `embedding_model` | `databricks-gte-large-en` | Foundation model used for embeddings | | `embedding_dimension` | `1024` | Vector dimension. Drives both the index and the embedding requests; immutable after the index is created. | @@ -150,6 +154,6 @@ table and it keeps itself up to date. Replace `index_type: DIRECT_ACCESS` and ## Resources -- [Databricks Vector Search](https://docs.databricks.com/en/generative-ai/vector-search.html) +- [Databricks AI Search](https://docs.databricks.com/aws/en/ai-search/ai-search) - [Declarative Automation Bundles](https://docs.databricks.com/dev-tools/bundles/) - [Foundation Models — GTE Large](https://docs.databricks.com/en/machine-learning/foundation-models/supported-models.html) diff --git a/knowledge_base/vector_search_product_discovery/databricks.yml b/knowledge_base/vector_search_product_discovery/databricks.yml index 8a304671..73af3b8a 100644 --- a/knowledge_base/vector_search_product_discovery/databricks.yml +++ b/knowledge_base/vector_search_product_discovery/databricks.yml @@ -12,7 +12,7 @@ variables: description: Unity Catalog schema name for the product search use case default: product_search endpoint_name: - description: Name of the Vector Search endpoint + description: Name of the AI Search endpoint default: product-search-endpoint embedding_model: description: Model serving endpoint used to embed product descriptions diff --git a/knowledge_base/vector_search_product_discovery/resources/setup-job.yml b/knowledge_base/vector_search_product_discovery/resources/setup-job.yml index f2e28739..a117fae6 100644 --- a/knowledge_base/vector_search_product_discovery/resources/setup-job.yml +++ b/knowledge_base/vector_search_product_discovery/resources/setup-job.yml @@ -24,7 +24,7 @@ resources: tasks: - task_key: upsert_products - description: Load products from JSON, embed descriptions, and upsert into the Vector Search index + description: Load products from JSON, embed descriptions, and upsert into the AI Search index environment_key: serverless_env notebook_task: notebook_path: ../src/01_upsert_products.py diff --git a/knowledge_base/vector_search_product_discovery/src/01_upsert_products.py b/knowledge_base/vector_search_product_discovery/src/01_upsert_products.py index 0d16d556..0ab60c42 100644 --- a/knowledge_base/vector_search_product_discovery/src/01_upsert_products.py +++ b/knowledge_base/vector_search_product_discovery/src/01_upsert_products.py @@ -1,10 +1,10 @@ # Databricks notebook source # MAGIC %md -# MAGIC # Upsert Products into Vector Search Index +# MAGIC # Upsert Products into AI Search Index # MAGIC # MAGIC Reads the product catalog from the JSON file deployed with the bundle, -# MAGIC embeds each product description, then upserts all records into the Vector -# MAGIC Search index. Re-running is safe — upsert is idempotent on `product_id`. +# MAGIC embeds each product description, then upserts all records into the AI Search +# MAGIC index. Re-running is safe — upsert is idempotent on `product_id`. # COMMAND ---------- diff --git a/knowledge_base/vector_search_product_discovery/src/02_query_demo.py b/knowledge_base/vector_search_product_discovery/src/02_query_demo.py index f938e3a3..8d821b9d 100644 --- a/knowledge_base/vector_search_product_discovery/src/02_query_demo.py +++ b/knowledge_base/vector_search_product_discovery/src/02_query_demo.py @@ -2,7 +2,7 @@ # MAGIC %md # MAGIC # Semantic Product Search Demo # MAGIC -# MAGIC Queries the Vector Search index to find products that match a natural-language +# MAGIC Queries the AI Search index to find products that match a natural-language # MAGIC description. Try queries that would fail keyword search — e.g. *"something to # MAGIC keep my coffee hot all day"* or *"gear for sleeping outside in freezing weather"*. diff --git a/mlops_stacks/docs/ml-developer-guide.md b/mlops_stacks/docs/ml-developer-guide.md index fe6afbbb..7b2ec8eb 100644 --- a/mlops_stacks/docs/ml-developer-guide.md +++ b/mlops_stacks/docs/ml-developer-guide.md @@ -22,15 +22,15 @@ to use databricks CLI bundles to deploy ML code together with resource configs t This will allow you to develop locally and use databricks CLI bundles to deploy to your dev workspace to test out code and config changes. -### Develop on Databricks using Databricks Repos +### Develop on Databricks using Databricks Git folders #### Prerequisites You'll need: * Access to run commands on a cluster running Databricks Runtime ML version 11.0 or above in your dev Databricks workspace -* To set up [Databricks Repos](https://learn.microsoft.com/azure/databricks/repos/index): see instructions below +* To set up [Databricks Git folders](https://learn.microsoft.com/azure/databricks/repos/index): see instructions below -#### Configuring Databricks Repos -To use Repos, [set up git integration](https://learn.microsoft.com/azure/databricks/repos/repos-setup) in your dev workspace. +#### Configuring Databricks Git folders +To use Git folders, [set up git integration](https://learn.microsoft.com/azure/databricks/repos/repos-setup) in your dev workspace. If the current project has already been pushed to a hosted Git repo, follow the [UI workflow](https://learn.microsoft.com/azure/databricks/repos/git-operations-with-repos#add-a-repo-and-connect-remotely-later) @@ -49,7 +49,7 @@ Otherwise, e.g. if iterating on ML code for a new project, follow the steps belo #### Running code on Databricks You can iterate on the sample ML code by running the provided `mlops_stacks/training/notebooks/Train.py` notebook on Databricks using -[Repos](https://learn.microsoft.com/azure/databricks/repos/index). +[Git folders](https://learn.microsoft.com/azure/databricks/repos/index). ## Next Steps diff --git a/mlops_stacks/mlops_stacks/resources/README.md b/mlops_stacks/mlops_stacks/resources/README.md index 2618287f..03ffbd50 100644 --- a/mlops_stacks/mlops_stacks/resources/README.md +++ b/mlops_stacks/mlops_stacks/resources/README.md @@ -74,7 +74,7 @@ Alternatively, you can use the other approaches described in the [databricks CLI ### Validate and provision ML resource configurations 1. After installing the databricks CLI and creating the `DATABRICKS_TOKEN` env variable, change to the `mlops_stacks` directory. 2. Run `databricks bundle validate` to validate the Databricks resource configurations. -3. Run `databricks bundle deploy` to provision the Databricks resource configurations to the dev workspace. The resource configurations and your ML code will be copied together to the dev workspace. The defined resources such as Databricks Workflows, MLflow Model and MLflow Experiment will be provisioned according to the config files under `mlops_stacks/resources`. +3. Run `databricks bundle deploy` to provision the Databricks resource configurations to the dev workspace. The resource configurations and your ML code will be copied together to the dev workspace. The defined resources such as Lakeflow Jobs, MLflow Model and MLflow Experiment will be provisioned according to the config files under `mlops_stacks/resources`. 4. Go to the Databricks dev workspace, check the defined model, experiment and workflows status, and interact with the created workflows. ### Destroy ML resource configurations