You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tycoon's scaffolders (tycoon data analyze, tycoon data analyze --rill, tycoon semantics scaffold) emit deliberately conservative artifacts — generic select * from source staging models, dimension-only OSI fields, basic Rill metrics views with auto-classified measures. The "last mile" — turning those skeletons into good models, dashboards, and semantic definitions — is left entirely to the user.
Today the only nudge is a single ai_hint() call at the end of analyze (src/tycoon/commands/explore.py:233):
Tip: tycoon ask chat "improve the staging models for <source>"
That's a printed string, not a command. tycoon ask chat is wired for querying data, not editing project files. The Rill scaffolder has no equivalent hint at all.
Why this matters now
v0.1.5 already ships local-LLM integration (tycoon register llm) with LM Studio and Ollama as first-class providers, plus a recommended local model (Qwen 2.5 Coder 7B Instruct, ~4.7 GB).
v0.1.6 is shipping the OSI semantic-layer scaffolder (v0.1.6: scaffold OSI semantic-layer YAML from dbt marts #28) with the same conservative dial — datasets and dimensional fields, no metrics. Users will hand-author the metric SQL, and an LLM that knows the warehouse schema is a near-perfect assistant for that.
Privacy is a structural feature. Tycoon's whole pitch is local-first analytics. Sending warehouse schemas + model SQL to a hosted LLM would undercut that. With LM Studio / Ollama already wired up, the entire refinement loop can stay on the user's M-series laptop.
Proposal
A new tycoon ask refine namespace, symmetric with tycoon ask chat. Three subcommands matching the three scaffold surfaces:
tycoon ask refine model <name># refine a dbt model
tycoon ask refine dashboard <name># refine a Rill dashboard / metrics view
tycoon ask refine semantics # refine the OSI YAML (project-wide, single file)
Behavior
For each invocation:
Gather context — the current artifact YAML/SQL, plus the relevant warehouse schema (column names + types, sample rows for measure detection, dlt-internal-aware filtering same as the scaffolders).
Send to the configured local LLM via the existing ask.llm block in tycoon.yml — same provider, same model, same base_url already used by tycoon ask chat.
Show a diff — old vs. proposed, side by side or unified, in the terminal.
Apply on confirmation — write the refined file. Sentinel-protect: refinements remove the # @generated by tycoon header so the user owns the file from then on (or keep it and add a second # @refined by tycoon ask marker — bikeshed in design phase).
What "refine" means per surface
Surface
What the LLM does
dbt staging model
Add column-level documentation, suggest cleanups (try_cast, NULL handling, deduplication via QUALIFY), name nested-flattened columns sensibly (user__login → author_login)
dbt mart model
Propose mart definitions when none exist; refine grain, joins, common metrics for a given staging set
Rill dashboard / metrics view
Suggest meaningful measures (counts, sums, distinct counts) based on column types; propose time grains; rename labels for human readability
OSI YAML
Propose metrics (the part the Conservative dial intentionally leaves blank); validate syntactically against the bundled JSON Schema before showing the diff
Why local-only is the right starting point
Recommended model already covers it. Qwen 2.5 Coder 7B Instruct (the tycoon register llm recommendation) handles SQL refinement and YAML editing well within the 7B class. No need to escalate to 70B for this loop.
Schema + sample data shouldn't leave the laptop. Especially for the conference-talk demo audience, "my warehouse schema is being POSTed to OpenAI" is a non-starter. LM Studio / Ollama keeps the inference local; tycoon is the only orchestrator that needs to know the call happened.
Hosted providers stay supported.tycoon register llm openai etc. still work — the ask refine command just uses whatever provider is registered. But local is the default expectation and the marketing line.
Out of scope (this issue)
Multi-turn refinement / chat-style iteration. First version is one-shot: ask, diff, apply or reject. Iteration happens by re-running.
Cross-file refinement (e.g. "refine all marts that depend on stg_orders"). Single-artifact at a time for v1.
Auto-apply without diff. Always show the diff. No --yes flag in v1.
Hosted-only LLM gating. Don't refuse to run when the user is on a hosted provider — but the docs lead with the local pitch.
Acceptance criteria
tycoon ask refine model <name> produces a usable refined version of a generated staging model when the active LLM is LM Studio with Qwen 2.5 Coder 7B loaded
Same for tycoon ask refine dashboard <name> against a Rill metrics view + dashboard pair
Same for tycoon ask refine semantics against the OSI YAML
Diff is shown before any write; user confirms (Y/n) per file
Tests use a mocked LLM transport so the suite doesn't require a running runtime
docs/commands/ask.md documents the new namespace; the ai_hint() calls in explore.py and the OSI scaffolder are updated to suggest ask refine instead of ask chat where appropriate
CHANGELOG + docs/releases/v<X>.md entry under whichever cycle this lands in
Cross-references
#7 — original AI-agent integration (where register llm and ask chat live)
#28 — OSI scaffolder (the "refine semantics" surface)
#34 — register dbt --create (the immediate predecessor: scaffolds → "now make it good")
docs/commands/semantics.md Path B — "Nao consumes OSI" — a future where the LLM reads OSI metric definitions instead of raw schema. `ask refine semantics` is the write side of that same story
src/tycoon/commands/explore.py:233 — current `ai_hint()` placeholder
Problem
Tycoon's scaffolders (
tycoon data analyze,tycoon data analyze --rill,tycoon semantics scaffold) emit deliberately conservative artifacts — genericselect * from sourcestaging models, dimension-only OSI fields, basic Rill metrics views with auto-classified measures. The "last mile" — turning those skeletons into good models, dashboards, and semantic definitions — is left entirely to the user.Today the only nudge is a single
ai_hint()call at the end ofanalyze(src/tycoon/commands/explore.py:233):That's a printed string, not a command.
tycoon ask chatis wired for querying data, not editing project files. The Rill scaffolder has no equivalent hint at all.Why this matters now
tycoon register llm) with LM Studio and Ollama as first-class providers, plus a recommended local model (Qwen 2.5 Coder 7B Instruct, ~4.7 GB).register dbt --create(tycoon register dbt --create: bootstrap a new dbt project from the CLI #34) — completing the path from "no dbt" to "scaffolded dbt" with one command. The next gap is "scaffolded dbt → good dbt".Proposal
A new
tycoon ask refinenamespace, symmetric withtycoon ask chat. Three subcommands matching the three scaffold surfaces:Behavior
For each invocation:
ask.llmblock intycoon.yml— same provider, same model, samebase_urlalready used bytycoon ask chat.# @generated by tycoonheader so the user owns the file from then on (or keep it and add a second# @refined by tycoon askmarker — bikeshed in design phase).What "refine" means per surface
try_cast, NULL handling, deduplication via QUALIFY), name nested-flattened columns sensibly (user__login→author_login)Why local-only is the right starting point
tycoon register llmrecommendation) handles SQL refinement and YAML editing well within the 7B class. No need to escalate to 70B for this loop.tycoon register llm openaietc. still work — theask refinecommand just uses whatever provider is registered. But local is the default expectation and the marketing line.Out of scope (this issue)
--yesflag in v1.Acceptance criteria
tycoon ask refine model <name>produces a usable refined version of a generated staging model when the active LLM is LM Studio with Qwen 2.5 Coder 7B loadedtycoon ask refine dashboard <name>against a Rill metrics view + dashboard pairtycoon ask refine semanticsagainst the OSI YAMLask.llmisn't configured ("Run `tycoon register llm` first")tycoon ask chatalready runs (UX: one-command setup for MotherDuck + Nao + LM Studio (or any local OpenAI-compatible LLM) #7 §local-LLM-probe in v0.1.5)docs/commands/ask.mddocuments the new namespace; theai_hint()calls inexplore.pyand the OSI scaffolder are updated to suggestask refineinstead ofask chatwhere appropriatedocs/releases/v<X>.mdentry under whichever cycle this lands inCross-references
register llmandask chatlive)register dbt --create(the immediate predecessor: scaffolds → "now make it good")docs/commands/semantics.mdPath B — "Nao consumes OSI" — a future where the LLM reads OSI metric definitions instead of raw schema. `ask refine semantics` is the write side of that same storysrc/tycoon/commands/explore.py:233— current `ai_hint()` placeholder