Add Vector Search semantic product discovery example by janniklasrose · Pull Request #153 · databricks/bundle-examples

janniklasrose · 2026-05-01T09:14:18Z

Summary

Adds a Declarative Automation Bundle under knowledge_base/vector_search_product_discovery/ that demonstrates semantic product search end-to-end with Databricks Vector Search:

vector_search_endpoints + vector_search_indexes declared as bundle resources; jobs reference them via ${resources.*.name} so dev-mode prefixing flows through automatically
A dev (default, mode: development) and a prod target — a plain bundle deploy is isolated per user (per-user endpoint name; schema/jobs/index dev-prefixed), so several people can deploy into one workspace without colliding
Direct Access index (engine: direct); descriptions are embedded explicitly in 01_upsert_products.py and the query notebook embeds the query before similarity_search — Direct Access indexes don't auto-embed (a Delta Sync feature)
A single embedding_dimension variable feeds both the index spec and the notebooks (immutable after index creation, so one knob prevents a silent mismatch)
The query job calls dbutils.notebook.exit(...) so ranked results come back from databricks bundle run / jobs get-run-output
schema_json uses the flat {"col":"type"} form required by the API

Requirements

Databricks CLI v1.1.0+, which ships vector_search_endpoints / vector_search_indexes as first-class DABs resources (was databricks/cli#5123).

Test plan

Verified with CLI v1.1.0 on a workspace (default dev target):

databricks bundle validate — dev and prod
databricks bundle deploy — endpoint ONLINE, index created
databricks bundle run product_discovery_setup — products embedded + upserted
databricks bundle run product_discovery_query --params "query=something to keep my coffee hot all day" — returns ranked results as JSON (e.g. the insulated water bottle surfaces with no keyword overlap)
databricks bundle destroy — clean teardown

This pull request and its description were written by Isaac.

Demonstrates a Direct Access Vector Search index and endpoint declared as bundle resources (vector_search_endpoints, vector_search_indexes), tested e2e against staging with the direct engine. Key design decisions: - Jobs use resource references (${resources.*.name}) for endpoint and index names so dev-mode prefixing flows through automatically - schema_json uses flat {"col":"type"} format required by the API - Notebooks embed descriptions/queries explicitly (Direct Access indexes don't auto-embed; that's a Delta Sync feature) - engine: direct set in bundle config so no env var is needed Co-authored-by: Isaac

Co-authored-by: Isaac

pietern

Ran this example end-to-end on a dogfood workspace with the released CLI (v1.1.0): validate → deploy → run (setup + query) → destroy. The embed → upsert → similarity-search logic is correct — all three README example queries returned the documented top result, so the substance is solid. Also confirmed v1.1.0 recognizes vector_search_endpoints / vector_search_indexes, so the cli#5123 dependency has shipped (correctly struck through in the description).

Nice to see the index name reference ${resources.schemas.product_search_schema.name} rather than the raw ${var.schema} — that's the mode-prefix-safe form.

Remaining feedback is about per-deploy isolation and the CLI run experience, flagged inline. Nothing blocks the single-user happy path; it's mostly "what happens when a second person deploys this into the same workspace."

juliacrawf-db · 2026-06-02T23:28:28Z

+# Vector Search: Semantic Product Discovery
+
+A Declarative Automation Bundle demonstrating **semantic product search** using
+[Databricks Vector Search](https://docs.databricks.com/en/generative-ai/vector-search.html).


I have heard "Vector Search" is being renamed?

Let's keep it at Vector Search for consistency with DABs resource names (vector_search_*). We can do a follow-up where we update this for all renamed resources in bundle-examples (I'm sure there's more)

Co-authored-by: Isaac

pietern · 2026-06-12T09:30:02Z

+              embedding_model: "{{job.parameters.embedding_model}}"
+              embedding_dimension: "{{job.parameters.embedding_dimension}}"
+              query: "{{job.parameters.query}}"
+              num_results: "{{job.parameters.num_results}}"


I think these are automatically pushed down from job parameters into the task.

If so you don't need to specify any of them.

Follow-up to [review feedback on #153](#153 (comment)): job-level `parameters` are automatically pushed down to notebook tasks, so the `base_parameters` blocks that mirrored them 1:1 via `{{job.parameters.x}}` were redundant. Confirmed against the docs ([Parameterize jobs](https://docs.databricks.com/aws/en/jobs/parameters), [Access parameter values from a task](https://docs.databricks.com/aws/en/jobs/parameter-use)): - Job parameters are pushed down to tasks that use key-value parameters; notebooks read them with `dbutils.widgets.get()` — which is exactly how `01_upsert_products.py` and `02_query_demo.py` already read them. - When a task parameter and a job parameter share a name, the job parameter is fetched — so these `base_parameters` could never take effect anyway; removing them is behavior-preserving. `databricks bundle validate` passes for both targets against a real workspace (dev and prod, host placeholder swapped locally for validation only). This pull request and its description were generated with Claude Code.

janniklasrose added 2 commits April 30, 2026 23:59

Apply ruff format to upsert/query notebooks

2e83d4c

Co-authored-by: Isaac

janniklasrose requested review from andrewnester, denik and pietern May 27, 2026 15:57

janniklasrose added 2 commits June 1, 2026 15:26

Use schema resource reference

8801820

Cleanup

3a06d9f

pietern reviewed Jun 2, 2026

View reviewed changes

Comment thread contrib/vector_search_product_discovery/resources/index.yml Outdated

Comment thread knowledge_base/vector_search_product_discovery/resources/setup-job.yml

janniklasrose added 3 commits June 2, 2026 14:55

Move files

fa94643

Format schema_json

24daec9

Strip .job.yml from .yml files

a9a8957

pietern reviewed Jun 2, 2026

View reviewed changes