DEV-1449: Fix nested-DAG multi-hop dotted dim from downstream stages by ZmeiGorynych · Pull Request #137 · MotleyAI/slayer

ZmeiGorynych · 2026-05-20T14:42:57Z

Summary

Fixes DEV-1449: when stage 1 projects a multi-hop dotted dim (e.g. customers.regions.name) and a downstream stage references the same path, SLayer used to emit broken SQL referencing a hybrid __ + . table alias that doesn't exist in scope.
Virtual stage models from _query_as_model now carry a source_model_origin breadcrumb (exclude=True, in-memory only). A shared resolve_via_stage_origin helper looks up the __-flattened column on the virtual model when the join-walk fails, using two lookup candidates (ancestor-stripped + full-flat) so chained 3+ stage DAGs work.
Four cross-stage paths gain the fallback: dimensions, time dimensions, cross-model measures (intercept before strict join walk → emit local re-aggregated measure), filters (regex rewrite of resolved_sql). Parameterized cross-model aggs fall through to the existing CTE path.

Repro

mcp__slayer__query_nested(queries=[
    {"name": "s1", "source_model": "orders",
     "dimensions": ["customers.regions.name"],
     "measures": [{"formula": "*:count"}]},
    {"source_model": "s1",
     "dimensions": ["customers.regions.name"],
     "measures": [{"formula": "*:count"}]},
])
# Before: `no such column: customers__regions.name`
# After:  outer SQL references `s1.customers__regions__name`, runs cleanly.

Test plan

28 new tests in tests/test_nested_dag_cross_stage_refs.py cover: reported case (2-hop), 3-hop, cross-stage time dims, cross-model measure re-aggregation (incl. row-count check), filter rewrite (SQL shape + row exclusion), order_by (SQL shape + actual sort), chained DAG Candidate A/B precedence, engineered collision, non-virtual regression, ModelExtension, stored query-backed model save/execute breadcrumb, time_shift over cross-stage time dim, serialization roundtrip drops breadcrumb (YAML + SQLite), *:count cross-model intercept, rename interaction (DEV-1443).
Full unit suite (2908 tests) passes; no regressions.
ruff check slayer/ tests/ clean.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Virtual-stage models carry an in-memory lineage breadcrumb at execution time (not persisted), enabling deterministic multi-hop dotted reference resolution and stage-aware fallbacks for dotted dimensions/time dims.
- Safe re-aggregation of cross-stage distributive measures (sum/min/max; count→sum) with alias and provenance preservation.
Bug Fixes
- Prevents hybrid/internal alias leakage; rewrites filters/HAVING to flat stage aliases and adds collision protection when measures canonicalize.
Tests
- Large end-to-end and unit suite covering multi-hop refs, aliasing, filter rewrites, aggregation semantics, and serialization roundtrips.

…eam stages When stage 1 projects a multi-hop dotted dim (e.g. `customers.regions.name`) and a downstream stage references the same dotted path, SLayer used to emit broken SQL referencing a hybrid `__` + `.` table alias that doesn't exist in scope (e.g. `customers__regions.name` against an `s1` subquery). Fix: virtual stage models from `_query_as_model` now carry a `source_model_origin` breadcrumb (Pydantic field with `exclude=True`, so it's in-memory only). A new shared `resolve_via_stage_origin` helper in `enrichment.py` consults that breadcrumb when the join-walk fails and looks up the `__`-flattened column name on the virtual model. Two lookup candidates — ancestor-stripped and full-flat — handle both depth-1 and chained nested-DAG cases. Four cross-stage code paths gain the fallback: dimensions, time dimensions, cross-model measures (intercept before the strict join walk; emits a local re-aggregated measure), and filters (rewrites the filter SQL via direct re.sub against `resolved_sql`). Parameterized cross-model aggs fall through to the existing CTE path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

linear · 2026-05-20T14:43:01Z

DEV-1449

coderabbitai · 2026-05-20T14:43:11Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds an in-memory SourceModelOrigin breadcrumb on virtual SlayerModel instances, a resolve_via_stage_origin helper, interception logic to re-aggregate distributive dotted cross-stage measures locally, fallbacks for dimension/time/filter resolution to use flattened stage columns, and a comprehensive test suite validating SQL shape and execution.

Changes

Virtual Stage Cross-Reference Resolution

Layer / File(s)	Summary
Virtual Stage Lineage Model `slayer/core/models.py`	`SourceModelOrigin` Pydantic model captures in-memory lineage chain with `name`, optional `data_source`, and recursive `parent` reference; `SlayerModel.source_model_origin` field added (transient, excluded from persistence).
Virtual Stage Lineage Population `slayer/engine/query_engine.py`	`_query_as_model` populates `source_model_origin` on constructed virtual `SlayerModel` with wrapped inner model's name, data_source, and chained lineage via parent field.
Resolution Core & Measure Interception `slayer/engine/enrichment.py`	Adds `Column` import, `resolve_via_stage_origin` helper, and interception that converts dotted cross-model distributive aggregations into local re-aggregations when inner-stage flattened columns exist; wired into expression resolver, transform flattening, and query-level measure enrichment.
Stage-Origin Resolver `slayer/engine/enrichment.py`	Implements candidate generation (ancestor-stripped first, then full-flat) from `source_model_origin` and returns the matching `Column` or `None`.
Dimension, Time, and Filter Fallbacks `slayer/engine/enrichment.py`	When join-walk resolution fails and `model.source_model_origin` exists, enrichment falls back to `resolve_via_stage_origin` to locate flat stage columns, rewrites SQL/aliasing, and treats the resolved column as local to the virtual stage for dimensions, time-dimensions, and strict dotted filters.
Test Fixtures and SQL Inspection `tests/test_nested_dag_cross_stage_refs.py`	Adds nested-DAG fixtures, YAML/SQLite-backed engines, sqlglot SQL-inspection helpers, and broad test coverage: multi-hop dotted dimensions/time-dims, time_shift, cross-stage measures, filters, ordering, rename/intercept semantics, breadcrumb lifecycle/serialization roundtrips, and negative/regression scenarios.

Sequence Diagram(s)

sequenceDiagram
  participant OuterEngine as Outer Query Engine
  participant Enr as Enrichment Resolver
  participant QueryAsModel as _query_as_model
  participant OriginResolver as resolve_via_stage_origin
  participant SQLGen as SQL Generator

  OuterEngine->>Enr: Resolve dotted ref (e.g., customers.regions.name)
  Enr->>Enr: Attempt join-walk resolution
  Enr->>QueryAsModel: Check model metadata (is virtual?)
  QueryAsModel-->>Enr: model.source_model_origin present
  Enr->>OriginResolver: resolve_via_stage_origin(model, parts)
  OriginResolver->>OriginResolver: Try ancestor-stripped and full flat candidates
  OriginResolver-->>Enr: Return matching Column (e.g., customers__regions__name)
  Enr->>SQLGen: Emit <virtual_stage_alias>.<flat_col> in SQL
  SQLGen-->>OuterEngine: SELECT s1.customers__regions__name

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

MotleyAI/slayer#9: Related extension of query-as-model/virtual stage machinery that this PR further augments with source lineage and resolution.
MotleyAI/slayer#134: Touches enrichment canonical↔user alias mapping and projection contracts used by the virtual-stage interception logic.
MotleyAI/slayer#97: Overlaps on _query_as_model entrypoint changes and named-query scoping relevant to virtual model wrapping.

Poem

🐰 I nibble breadcrumbs through each nested stage,
Flatten dotted paths from one little page,
No hybrid ghosts will haunt the SQL line,
Just scoped flat aliases—how they shine!
Hop hop—DEV-1449, carrot time!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: fixing nested-DAG multi-hop dotted dimension references from downstream stages, which is the core objective of this PR addressing DEV-1449.
Docstring Coverage	✅ Passed	Docstring coverage is 81.97% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch egor/dev-1449-nested-dag-multi-hop-dotted-dimension-referenced-from-a

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

tests/test_nested_dag_cross_stage_refs.py (2)

1135-1135: ⚡ Quick win

Move the local SQLiteStorage import to the module import block.

Keeping imports at top-level avoids hidden import points and aligns with the project import rule.

♻️ Proposed fix

@@
 from slayer.engine.enrichment import resolve_via_stage_origin
 from slayer.engine.query_engine import SlayerQueryEngine
+from slayer.storage.sqlite_storage import SQLiteStorage
 from slayer.storage.yaml_storage import YAMLStorage
@@
-        from slayer.storage.sqlite_storage import SQLiteStorage
         db_path = tmp_path / "slayer.db"
         storage = SQLiteStorage(db_path=str(db_path))

As per coding guidelines "Place imports at the top of Python files".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_nested_dag_cross_stage_refs.py` at line 1135, Move the local
import "from slayer.storage.sqlite_storage import SQLiteStorage" out of the test
body and place it with the module-level imports at the top of
tests/test_nested_dag_cross_stage_refs.py; locate any test function or block
referencing SQLiteStorage and remove the inline import, then add the same import
to the file's import block so SQLiteStorage is imported once at module import
time in accordance with the project's import rules.

843-843: ⚡ Quick win

Use keyword arguments for engine.execute calls.

These calls pass more than one argument but use a positional first parameter. Switch to query=... for consistency with the repo rule.

♻️ Proposed fix

-            resp = await engine.execute(single, dry_run=True)
+            resp = await engine.execute(query=single, dry_run=True)
...
-            resp = await engine.execute(outer, dry_run=True)
+            resp = await engine.execute(query=outer, dry_run=True)

As per coding guidelines "Use keyword arguments for functions with more than 1 parameter in Python".

Also applies to: 1025-1025

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_nested_dag_cross_stage_refs.py` at line 843, The call to
engine.execute currently passes the query as a positional argument (resp = await
engine.execute(single, dry_run=True)); update it to use keyword arguments (resp
= await engine.execute(query=single, dry_run=True)) so it follows the repo rule;
make the same change for any other occurrences of engine.execute that pass the
query positionally (e.g., where the first arg is a variable like single) to use
query=... and preserve the dry_run=True keyword.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/engine/enrichment.py`:
- Around line 602-642: The fast-path in _try_intercept_cross_model_as_local must
be restricted to only semantics-preserving, re-aggregatable operations: check
ref.aggregation_name against an explicit safe set (e.g.
"sum","min","max","count") before proceeding and return None for all others
(avg, count_distinct, etc.); keep the existing logic using
model.get_column(flat_with_agg), _ensure_aggregated_measure, and
known_aliases[field_name], and do not attempt to re-aggregate parameterized refs
(ref.agg_args/agg_kwargs) — if you need to special-case the "*" count-to-sum
semantic, handle that explicitly by mapping the aggregation and flat_with_agg
appropriately, otherwise fall back to the CTE path.
- Around line 951-971: The interception branch that handles dot-qualified
spec.measure_name and returns early must perform the same rename/alias
bookkeeping as the normal local-aggregate path before continuing: after
finding/creating the local_alias (via _try_intercept_cross_model_as_local) and
updating the EnrichedMeasure metadata (measures / em.label / em.type), also
apply the canonical-to-user remap and projection/alias handling that the
standard local-aggregate branch does for qfield (respecting qfield.name
override, qfield.label, qfield.type), update any rename maps used by filters and
ORDER BY, and then call _mark_user_declared(local_alias) and append to
user_projection before continue; in short, reuse the same rename/alias
bookkeeping steps the normal branch uses so qfield.name properly remaps the
internal alias to the user-facing name.

---

Nitpick comments:
In `@tests/test_nested_dag_cross_stage_refs.py`:
- Line 1135: Move the local import "from slayer.storage.sqlite_storage import
SQLiteStorage" out of the test body and place it with the module-level imports
at the top of tests/test_nested_dag_cross_stage_refs.py; locate any test
function or block referencing SQLiteStorage and remove the inline import, then
add the same import to the file's import block so SQLiteStorage is imported once
at module import time in accordance with the project's import rules.
- Line 843: The call to engine.execute currently passes the query as a
positional argument (resp = await engine.execute(single, dry_run=True)); update
it to use keyword arguments (resp = await engine.execute(query=single,
dry_run=True)) so it follows the repo rule; make the same change for any other
occurrences of engine.execute that pass the query positionally (e.g., where the
first arg is a variable like single) to use query=... and preserve the
dry_run=True keyword.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 77ef8bcd-edcb-4bc5-906f-b7db5105506a

📥 Commits

Reviewing files that changed from the base of the PR and between a8fb874 and ae15468.

📒 Files selected for processing (4)

slayer/core/models.py
slayer/engine/enrichment.py
slayer/engine/query_engine.py
tests/test_nested_dag_cross_stage_refs.py

Three fixes to `_try_intercept_cross_model_as_local` and its line-896 caller in `slayer/engine/enrichment.py`: 1. Semantics gate (CodeRabbit). Gate the intercept to aggregations that re-aggregate to an equivalent result. `sum`/`min`/`max` pass through unchanged; `count` re-maps to `sum` over the inner per-group count (using `count` on stage rows would just count groups, not rows); everything else (`avg`, `count_distinct`, median, ...) returns None and falls through to the existing cross-model CTE path. 2. Candidate A — ancestor-strip (Codex). Mirror `resolve_via_stage_origin`'s Candidate A/B order inside the intercept. The source-prefixed form (`orders.customers.revenue:sum` against an `s1` wrapping `orders`) lands on `customers__revenue_sum` because `_alias_to_short` stripped the immediate ancestor on the inner stage. Build the ancestor chain from `source_model_origin`, try the ancestor-stripped flat first, then full flat. 3. Rename bookkeeping at line-896 (CodeRabbit). The intercept's early `continue` skipped the standard local-aggregate rename machinery: updating the EnrichedMeasure's name/alias, writing `known_aliases[qfield.name]`, `canonical_to_user_name[canonical_name]`, `measure_canonical_key_to_alias` provenance, and surfacing the user alias in `user_projection`. Mirror it. Tests: 6 new cases — avg / count_distinct fall through to CTE path; sum is intercepted (regression guard); source-prefixed CMM resolves via Candidate A; renamed intercept surfaces as user alias; renamed intercept filter via colon-form remaps to the rename. Updated *:count test to assert outer SUM (not COUNT) per the new semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_nested_dag_cross_stage_refs.py`:
- Around line 1414-1428: The test currently only asserts the internal canonical
alias isn't present but doesn't ensure a filter exists or that the rewritten
name is used; update the block that inspects outer_select (using where, having,
clause, clause_cols from clause.find_all(exp.Column)) to first assert that at
least one of where or having is not None (i.e., a filter exists), then assert
the forbidden internal name "customers__revenue_sum_sum" is absent, and finally
assert that the rewritten alias (e.g., column name "rev") appears in clause_cols
to confirm the rename was applied.
- Around line 1209-1220: The test currently only looks for exp.Column inside
each exp.Count (using count_call.find_all(exp.Column)) and therefore misses
COUNT(*) cases; update the COUNT guard so that for each count_call (in the
outer_select loop) you also check for exp.Star (e.g.,
count_call.find_all(exp.Star) or count_call.this is exp.Star) and treat those as
stage-row counts to reject, or otherwise inspect count_call.args to detect a
star/no-column COUNT; ensure the assert checks fail when a COUNT(*) would target
the stage (same scope as checking ("s1", "customers___count") in
outer_count_targets) by adding the star-detection to the logic that populates
outer_count_targets or by asserting directly on count_call when a star is found.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 91b7543f-b32a-45ef-a592-c86774b2cae6

📥 Commits

Reviewing files that changed from the base of the PR and between ae15468 and 16b454d.

📒 Files selected for processing (2)

slayer/engine/enrichment.py
tests/test_nested_dag_cross_stage_refs.py

- Codex: add duplicate-canonical-qfield guard to the cross-stage intercept branch in `enrichment.py`. Without it, two qfields with the same cross-stage canonical aggregation but different `name`s would silently corrupt `user_projection`: the second call's rename mutates the first call's EnrichedMeasure alias and the projection still contains the pre-rename alias with no backing measure. Mirrors the guard the standard local-agg branch runs. New test: two intercepted qfields with different `name`s raise. - CodeRabbit: tighten the `*:count` cross-stage assertion so it also rejects `COUNT(*)` (and any other outer COUNT shape), not only `COUNT(<column>)`. The intercept now re-maps `count` to `sum`, so any outer COUNT is a regression — assert none exists at all. - CodeRabbit: tighten the renamed-CMM filter assertion to require at least one of WHERE/HAVING (so a silently-dropped filter can't pass the test). Dropped the "must reference `rev` literally" sub-check — standard SQL HAVING references the aggregation expression (`SUM(s1.customers__revenue_sum)`), not the projection alias, so pinning a literal `rev` ref would conflict with correct emission. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

tests/test_nested_dag_cross_stage_refs.py (1)

1472-1472: 💤 Low value

Consider accepting both SlayerError and ValueError for consistency.

For consistency with similar validation tests in this file (lines 919, 1252, 1272), consider accepting both exception types.

Minor consistency improvement

-            with pytest.raises(ValueError, match="canonicalises to the same"):
+            with pytest.raises((SlayerError, ValueError), match="canonicalises to the same"):
                 await engine.execute(query=[inner, outer], dry_run=True)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_nested_dag_cross_stage_refs.py` at line 1472, Update the assertion
in the failing test that currently uses pytest.raises(ValueError,
match="canonicalises to the same") so it accepts either SlayerError or
ValueError for consistency with other validation tests; change the raises check
to allow both exception types (e.g., use pytest.raises((SlayerError,
ValueError), match="canonicalises to the same")) and ensure SlayerError is
imported or referenced where the test uses pytest.raises.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_nested_dag_cross_stage_refs.py`:
- Line 1472: Update the assertion in the failing test that currently uses
pytest.raises(ValueError, match="canonicalises to the same") so it accepts
either SlayerError or ValueError for consistency with other validation tests;
change the raises check to allow both exception types (e.g., use
pytest.raises((SlayerError, ValueError), match="canonicalises to the same")) and
ensure SlayerError is imported or referenced where the test uses pytest.raises.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 28a865e2-982f-47e8-ae34-c86f1d9d30a0

📥 Commits

Reviewing files that changed from the base of the PR and between 16b454d and d0d4878.

📒 Files selected for processing (2)

slayer/engine/enrichment.py
tests/test_nested_dag_cross_stage_refs.py

- Codex: intercept now surfaces under the cross-model canonical alias (e.g. `s1.customers.revenue_sum`) instead of the doubled-sum internal form (`s1.customers__revenue_sum_sum`). Previously the intercept built the EnrichedMeasure against the flat stage column and `_ensure_aggregated_measure` produced an alias that didn't match the cross-model CTE path's shape — public result keys would change between paths, and colon-form filters / ORDER BY refs canonicalising to `customers.revenue_sum` would fail to find the intercepted measure in `known_aliases`. Always rename the intercepted measure to the cross-model canonical (or to the user-supplied `qfield.name`), and write both `target_name` and the canonical alias into `known_aliases`. - CodeRabbit nitpick: move SQLiteStorage import from inside test_sqlite_storage_roundtrip_drops_origin to the top of the file (project rule: imports at module top). - CodeRabbit nitpick: accept both (SlayerError, ValueError) in the duplicate-qfield test's pytest.raises, for consistency with other validation tests in this file. Updated the row-count test to assert the new surfaced key shape `s1.customers.revenue_sum` (matching what a regular cross-model query would produce). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/test_nested_dag_cross_stage_refs.py (2)
904-924: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

rev:sum is no longer a valid negative case.

The new alias-bookkeeping contract records both the canonical and target names, and CI confirms this path no longer raises. Keeping pytest.raises(...) here locks the suite to the pre-change behavior instead of the behavior this PR is introducing. Update this test to assert the resolved SQL/output shape instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_nested_dag_cross_stage_refs.py` around lines 904 - 924, Update
test_renamed_cmm_via_rename_does_not_resolve to reflect the new
alias-bookkeeping behavior: remove the pytest.raises((SlayerError, ValueError))
expectation around await engine.execute(...) and instead run
engine.execute(query=[inner, outer], dry_run=True) and assert the
returned/dry-run result contains the expected resolved SQL or output shape
(e.g., that the outer resolves `rev:sum` to the proper cross-model measure
column from s1). Locate the test by function name and adjust assertions on the
engine.execute/dry_run result accordingly.
866-900: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Renamed-inner canonical alias path is still red in CI.

This test encodes the new canonical-alias contract, but CI shows engine.execute(..., dry_run=True) still raises ValueError: Model 's1' has no join to 'customers' here. The rename-aware intercept / known_aliases path this assertion depends on is not wired through yet, so this stack cannot merge cleanly as-is.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_nested_dag_cross_stage_refs.py` around lines 866 - 900, The
dry-run path used by engine.execute (called in
test_renamed_cmm_colon_syntax_resolves_via_canonical_flat) does not propagate
the rename-aware intercept / known_aliases used for CROSS-MODEL measures, so the
resolver still errors with "Model 's1' has no join to 'customers'"; fix by
threading the rename-aware intercept and known_aliases into the resolution flow
that engine.execute(..., dry_run=True) uses (where aliases are resolved and
_alias_to_short is called) so that the canonical alias mapping
(customers__revenue_sum) is available during dry-run; locate the
resolution/join-lookup logic invoked by engine.execute and ensure it consults
the same intercept/known_aliases structure used in normal execution.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/engine/enrichment.py`:
- Around line 1044-1078: When renaming an intercepted measure in
_ensure_aggregated_measure (the block that sets target_name/target_alias and
updates measures, known_aliases, measure_canonical_key_to_alias,
canonical_to_user_name), preserve the existing "rename-vs-canonical collision"
guard: before applying the rename (i.e. before looping measures and updating
aliases), check whether target_name (qfield.name or canonical_name) would
collide with any existing canonical alias owned by a different measure/canonical
key; if so, reject or surface the same error as the non-intercept path (prevent
deduping onto the intercepted alias). Implement this by reusing the same
collision check logic used elsewhere (compare target_name/target_alias against
measure_canonical_key_to_alias and known_aliases) and abort/raise the same
conflict error instead of proceeding with the rename.

---

Outside diff comments:
In `@tests/test_nested_dag_cross_stage_refs.py`:
- Around line 904-924: Update test_renamed_cmm_via_rename_does_not_resolve to
reflect the new alias-bookkeeping behavior: remove the
pytest.raises((SlayerError, ValueError)) expectation around await
engine.execute(...) and instead run engine.execute(query=[inner, outer],
dry_run=True) and assert the returned/dry-run result contains the expected
resolved SQL or output shape (e.g., that the outer resolves `rev:sum` to the
proper cross-model measure column from s1). Locate the test by function name and
adjust assertions on the engine.execute/dry_run result accordingly.
- Around line 866-900: The dry-run path used by engine.execute (called in
test_renamed_cmm_colon_syntax_resolves_via_canonical_flat) does not propagate
the rename-aware intercept / known_aliases used for CROSS-MODEL measures, so the
resolver still errors with "Model 's1' has no join to 'customers'"; fix by
threading the rename-aware intercept and known_aliases into the resolution flow
that engine.execute(..., dry_run=True) uses (where aliases are resolved and
_alias_to_short is called) so that the canonical alias mapping
(customers__revenue_sum) is available during dry-run; locate the
resolution/join-lookup logic invoked by engine.execute and ensure it consults
the same intercept/known_aliases structure used in normal execution.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: edf22757-89f2-4936-8b44-9f88b77a25c2

📥 Commits

Reviewing files that changed from the base of the PR and between d0d4878 and 24a5ace.

📒 Files selected for processing (2)

slayer/engine/enrichment.py
tests/test_nested_dag_cross_stage_refs.py

- python:S3776 (CRITICAL) x2: cognitive complexity in `_resolve_dimensions` (22→) and `_resolve_time_dimensions` (18→). The two functions had near-identical join-walk + stage-origin fallback logic in their dotted-ref branches. Extracted the shared flow into `_resolve_dotted_dim_with_stage_fallback`. Both call sites now delegate; complexity drops back below the 15 threshold and the fallback behaviour stays identical (cross-model CTE re-rooting still depends on the lenient fall-through when stage-origin misses). - python:S7503 (MINOR) x5: five resolver-direct unit tests in test_nested_dag_cross_stage_refs.py were marked `async` but call only the sync `resolve_via_stage_origin` helper — no awaits in the body. Dropped the `async` keyword on those 5 methods. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Codex (major): cross-stage intercept dup-guard keyed on raw `spec.measure_name`, so two refs that resolve to the same flat column via different Candidate A/B paths (e.g. `orders.customers.revenue:sum` ancestor-strips to `customers__revenue_sum`, and `customers.revenue:sum` full-flats to the same column) slipped past the guard. The second call's rename then mutated the first call's EnrichedMeasure alias and `user_projection` was left with an orphaned reference. Split the intercept helper into `_intercept_candidate_for_cross_model` (pure, returns the resolved `(flat_with_agg, outer_agg)` or None) + `_try_intercept_cross_model_as_local` (applies the candidate). Key the dup-guard on `("agg-intercept", flat_with_agg, outer_agg)` so equivalent refs collide. - CodeRabbit (major): the rename-vs-canonical collision guard from the standard local-aggregate branch was missing on the intercept branch. If an intercepted measure renamed to `customer_id_max` while another query measure canonicalises to `customer_id_max`, `_ensure_aggregated_measure`'s alias-keyed dedup would silently collapse the second aggregate onto the first. Added the cross- other-canonical check, mirroring the standard branch. Two new tests pin the guards: source-prefixed + unprefixed variants of the same cross-stage aggregate with different `name`s now raise; an intercepted rename colliding with another canonical raises. Round-4 CodeRabbit nitpicks (SQLiteStorage top-level import; accept both SlayerError+ValueError in duplicate-qfield test) were stale — already addressed in 24a5ace. Round-4 CI failure (`lint-and-test (3.11)`) was a flake at `Cannot install sqlglotc` (files.pythonhosted.org read timeout), unrelated to PR code; should self-resolve on the next push. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Intercepted cross-model measures without an explicit `name` were setting `EnrichedMeasure.name` to the dotted cross-model canonical (`customers.revenue_sum`). The dotted form belongs on `em.alias` (public result key, filter/ORDER BY resolution) but NOT on `em.name` — `_query_as_model` uses `em.name` as the short `Column.name` of the wrapped virtual model, and `Column.name` validation rejects dots. So a 3-stage DAG where the middle stage is an intercepted unrenamed CMM would fail when `_query_as_model` tried to wrap the middle stage. Fix: only mutate `em.name` when the rename target is a simple identifier (user-supplied `qfield.name`). When the target is the cross-model canonical (dotted), keep `em.name` as the internal flat form `_ensure_aggregated_measure` produced. Test: 3-stage DAG with an intercepted unrenamed CMM as the middle stage. Without the fix, the wrap step raised a Column.name validation error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Round 5's fix avoided the Column.name validation error but kept `em.name` as the doubled-sum internal canonical (`customers__revenue_sum_sum`). When that stage became the inner of a third stage, `_query_as_model` exposed the wrapped column under that doubled-sum name, so the third stage's intercept (which looks for the flat single-sum form `customers__revenue_sum` — what a single-stage cross-model query produces) MISSED and fell through to the cross-model CTE path, which virtual stages don't have. Fix: in the unrenamed case, derive `em.name` from the dotted cross-model canonical by replacing `.` with `__` — `customers.revenue_sum` → `customers__revenue_sum`. This matches `_alias_to_short`'s convention and what a single-stage produces, so the third stage's intercept finds the alias on `s2.columns`. Strengthened the 3-stage test to actually reference `customers.revenue:sum` again in stage 3 and assert the outer SELECT contains `SUM(s2.customers__revenue_sum)` — pinning that the third-stage intercept now resolves correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The filter stage-origin fallback was designed for DIMENSION refs cross-stage (e.g. `customers.regions.name = 'X'`). When the dotted ref's leaf was an aggregation canonical like `customers.revenue_sum` (from parse_filter rewriting `customers.revenue:sum > 100`), the fallback still fired and rewrote the filter to `WHERE s1.customers__revenue_sum > 100` — applying the predicate to the inner row-level values rather than HAVING-classifying it against the outer re-aggregated SUM. Fix: gate the fallback on the leaf NOT looking like an aggregation canonical (no `_<agg>` suffix matching any BUILTIN_AGGREGATIONS). Cross-model measure filters on intercepted CMMs are not auto- resolved (DEV-1445 territory); the standard strict-error fires instead of silently emitting a wrong WHERE. Test pins the gating: a filter `customers.revenue:sum > 100` on an outer query with an unrenamed intercepted CMM now raises (DEV-1445 limitation surface), rather than emitting incorrect SQL. The working rename path (covered by `test_renamed_intercept_filter_via_colon_form`) is unaffected — `canonical_to_user_name` remaps colon-form to the user alias before the filter resolver sees the dotted ref. Codex's parallel ORDER-BY observation (colon-form ORDER BY of intercepted unrenamed CMMs binds the inner stage column instead of the outer aggregate) is the same DEV-1445 territory — not auto- resolved; users must reference via the renamed alias. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ORDER BY by colon/canonical form on an unrenamed intercepted cross-model measure (e.g. `order=[{"column":"customers.revenue:sum"}]` or `customers.revenue_sum`) fell through to a non-existent `customers.revenue_sum` bare column because `generator._resolve_order_column`'s alias_lookup is built from `EnrichedMeasure.name` (which the intercept's auto-rename changed to the flat `customers__revenue_sum`), not from `known_aliases`. Fix: register the dotted canonical name in `field_name_aliases` so `_resolve_order_column`'s qualified-match branch (line 1903-1906 in generator.py) resolves `customers.revenue:sum` → projection alias `s1.customers.revenue_sum`. Test pins the resolution: ORDER BY by colon form on an outer intercepted CMM now references the projection alias. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…dag-multi-hop-dotted-dimension-referenced-from-a

DEV-1448 (merged into main while PR #137 was in review) propagates the user-supplied `name` on a cross-model measure through to the wrapped virtual stage's column. That reverses the assumption three of my tests were built around (pre-DEV-1448: rename doesn't propagate; the canonical flat alias stays on s1.columns). Updated to reflect the new reality: - `test_renamed_cmm_colon_syntax_no_longer_resolves_after_rename` (was `…_resolves_via_canonical_flat`): the rename now replaces `customers__revenue_sum` on s1.columns with `rev`, so the outer's colon-form `customers.revenue:sum` misses the intercept candidate and falls through to the cross-model CTE path which raises on a virtual stage (no joins). DEV-1445 territory: users renamed cross-model measures must reference them by the rename downstream. - `test_renamed_cmm_via_rename_resolves` (was `…_does_not_resolve`): the rename DOES propagate, so `rev:sum` on the outer resolves as a standard local-aggregate over s1.rev → SUM(s1.rev). - `test_intercepted_rename_colliding_with_other_canonical_raises`: the collision now triggers DEV-1448's downstream-short-name guard before my round-4 intercept-canonical-collision guard fires — both are correct outcomes; widened the regex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Codex flagged two fragile heuristics with the same root cause: the cross-stage intercept and the filter fallback were using naming-suffix patterns as proxies for "is this an aggregation canonical?", but the suffix shape is unreliable. 1. Intercept candidate could re-aggregate dim columns. If an inner stage projects a real DIMENSION whose flat name happens to look like `<path>__<measure>_<agg>` (e.g. a user-defined `revenue_sum` dim, or a column auto-flattened to that shape), the intercept's `model.get_column(flat_with_agg)` lookup matched it and silently emitted `SUM(s1.customers__revenue_sum)` even though no aggregate was projected. Fix: add `SourceModelOrigin.agg_column_names: frozenset[str]`, populated by `_query_as_model` from the inner enriched query's measures / cross_model_measures / transforms / expressions (their downstream short names). Gate the intercept on `candidate in agg_names` — only aggregation projections qualify. 2. Filter fallback rejected valid cross-stage dim filters. The suffix-based gate blocked any dotted ref whose leaf ended with a builtin aggregation suffix, so a real dim filter like `customers.revenue_sum = 'X'` (where the dim is literally named that) skipped stage-origin resolution and raised in strict mode even though `s1.customers__revenue_sum` exists. Fix: use parser provenance (`filter_synthesized_aliases` — the set of names `parse_filter` synthesized from colon syntax) to distinguish a colon-syntax-synthesized aggregate alias from a user-typed literal dim ref. Already plumbed into `resolve_filter_columns` for the bare-name check at line 2695; reuse the same signal for the dotted-ref fallback gate. Test: pin that a dim column with an `_<agg>`-shaped name does NOT get intercepted (resolve_via_stage_origin still resolves it as a plain column, but the intercept's gating on `agg_column_names` keeps it from being silently re-summed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`test_intercept_skips_dim_columns_that_look_like_aggregations` calls only the sync `resolve_via_stage_origin` helper — no awaits — so the `async` keyword was unused. Sonar S7503 minor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`agg_column_names` (the intercept's "safe to re-aggregate" gate) included plain measures / transforms / expressions, but those have user-supplied or canonical-derived names that can coincidentally match a cross-model canonical-flat shape (e.g. `{"formula": "amount:sum", "name": "customers__revenue_sum"}`). A subsequent outer-stage `customers.revenue:sum` reference would silently bind to this unrelated column. Narrow the set to two reliable sources: * Auto-derived cross-model canonical-flats (`_alias_to_short(cm.alias)`). * Intercept-produced EnrichedMeasures, marked via a new `EnrichedMeasure.from_cross_model_intercept` flag set in `_try_intercept_cross_model_as_local` and at the qfield-site intercept call. The intercept-produced flag is what keeps 3-stage DAGs working: the middle stage's intercept emits a local EnrichedMeasure (not a CrossModelMeasure), and the third stage's intercept needs to recognise that column as a safe re-aggregation source. Test: a renamed local measure on the inner stage whose name coincidentally matches the cross-model canonical-flat shape (e.g. `{"formula":"amount:sum","name":"customers__revenue_sum"}`) no longer triggers the outer-stage intercept — it falls through to the cross-model CTE path, which raises (virtual stage has no joins), surfacing the mismatch instead of silently re-summing the wrong column. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The round-8 fix registered the dotted cross-model canonical in `field_name_aliases` (so ORDER BY can resolve `customers.revenue:sum` via `_resolve_order_column`'s qualified-match branch) only at the qfield-site intercept. The `_flatten_spec` / `_ensure_measure_from_spec` intercept branches (which run when the user references a cross-model measure ONLY in `order=` or in a transform/expression argument, without also declaring it as a `measures=` entry) didn't register the alias and ORDER BY fell through to a non-existent bare column. Move the registration into `_try_intercept_cross_model_as_local` itself so all three call sites get it. Test pins the order-only case (no qfield declaration): ORDER BY must not reference a bare `customers` table after the intercept. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-05-20T18:35:06Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Comment thread slayer/engine/enrichment.py Outdated

Comment thread slayer/engine/enrichment.py

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Comment thread tests/test_nested_dag_cross_stage_refs.py Outdated

Comment thread tests/test_nested_dag_cross_stage_refs.py

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Comment thread slayer/engine/enrichment.py

ZmeiGorynych and others added 12 commits May 20, 2026 17:56

Merge remote-tracking branch 'origin/main' into egor/dev-1449-nested-…

8a15097

…dag-multi-hop-dotted-dimension-referenced-from-a

ZmeiGorynych merged commit b01a152 into main May 20, 2026
6 checks passed

Conversation

ZmeiGorynych commented May 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Repro

Test plan

Summary by CodeRabbit

Uh oh!

linear Bot commented May 20, 2026

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 20, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZmeiGorynych commented May 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading