Skip to content

DEV-1449: Fix nested-DAG multi-hop dotted dim from downstream stages#137

Merged
ZmeiGorynych merged 16 commits into
mainfrom
egor/dev-1449-nested-dag-multi-hop-dotted-dimension-referenced-from-a
May 20, 2026
Merged

DEV-1449: Fix nested-DAG multi-hop dotted dim from downstream stages#137
ZmeiGorynych merged 16 commits into
mainfrom
egor/dev-1449-nested-dag-multi-hop-dotted-dimension-referenced-from-a

Conversation

@ZmeiGorynych
Copy link
Copy Markdown
Member

@ZmeiGorynych ZmeiGorynych commented May 20, 2026

Summary

  • Fixes DEV-1449: when stage 1 projects a multi-hop dotted dim (e.g. customers.regions.name) and a downstream stage references the same path, SLayer used to emit broken SQL referencing a hybrid __ + . table alias that doesn't exist in scope.
  • Virtual stage models from _query_as_model now carry a source_model_origin breadcrumb (exclude=True, in-memory only). A shared resolve_via_stage_origin helper looks up the __-flattened column on the virtual model when the join-walk fails, using two lookup candidates (ancestor-stripped + full-flat) so chained 3+ stage DAGs work.
  • Four cross-stage paths gain the fallback: dimensions, time dimensions, cross-model measures (intercept before strict join walk → emit local re-aggregated measure), filters (regex rewrite of resolved_sql). Parameterized cross-model aggs fall through to the existing CTE path.

Repro

mcp__slayer__query_nested(queries=[
    {"name": "s1", "source_model": "orders",
     "dimensions": ["customers.regions.name"],
     "measures": [{"formula": "*:count"}]},
    {"source_model": "s1",
     "dimensions": ["customers.regions.name"],
     "measures": [{"formula": "*:count"}]},
])
# Before: `no such column: customers__regions.name`
# After:  outer SQL references `s1.customers__regions__name`, runs cleanly.

Test plan

  • 28 new tests in tests/test_nested_dag_cross_stage_refs.py cover: reported case (2-hop), 3-hop, cross-stage time dims, cross-model measure re-aggregation (incl. row-count check), filter rewrite (SQL shape + row exclusion), order_by (SQL shape + actual sort), chained DAG Candidate A/B precedence, engineered collision, non-virtual regression, ModelExtension, stored query-backed model save/execute breadcrumb, time_shift over cross-stage time dim, serialization roundtrip drops breadcrumb (YAML + SQLite), *:count cross-model intercept, rename interaction (DEV-1443).
  • Full unit suite (2908 tests) passes; no regressions.
  • ruff check slayer/ tests/ clean.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Virtual-stage models carry an in-memory lineage breadcrumb at execution time (not persisted), enabling deterministic multi-hop dotted reference resolution and stage-aware fallbacks for dotted dimensions/time dims.
    • Safe re-aggregation of cross-stage distributive measures (sum/min/max; count→sum) with alias and provenance preservation.
  • Bug Fixes

    • Prevents hybrid/internal alias leakage; rewrites filters/HAVING to flat stage aliases and adds collision protection when measures canonicalize.
  • Tests

    • Large end-to-end and unit suite covering multi-hop refs, aliasing, filter rewrites, aggregation semantics, and serialization roundtrips.

Review Change Stack

…eam stages

When stage 1 projects a multi-hop dotted dim (e.g. `customers.regions.name`)
and a downstream stage references the same dotted path, SLayer used to emit
broken SQL referencing a hybrid `__` + `.` table alias that doesn't exist
in scope (e.g. `customers__regions.name` against an `s1` subquery).

Fix: virtual stage models from `_query_as_model` now carry a
`source_model_origin` breadcrumb (Pydantic field with `exclude=True`, so
it's in-memory only). A new shared `resolve_via_stage_origin` helper in
`enrichment.py` consults that breadcrumb when the join-walk fails and
looks up the `__`-flattened column name on the virtual model. Two
lookup candidates — ancestor-stripped and full-flat — handle both
depth-1 and chained nested-DAG cases.

Four cross-stage code paths gain the fallback: dimensions, time
dimensions, cross-model measures (intercept before the strict join
walk; emits a local re-aggregated measure), and filters (rewrites the
filter SQL via direct re.sub against `resolved_sql`). Parameterized
cross-model aggs fall through to the existing CTE path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@linear
Copy link
Copy Markdown

linear Bot commented May 20, 2026

DEV-1449

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an in-memory SourceModelOrigin breadcrumb on virtual SlayerModel instances, a resolve_via_stage_origin helper, interception logic to re-aggregate distributive dotted cross-stage measures locally, fallbacks for dimension/time/filter resolution to use flattened stage columns, and a comprehensive test suite validating SQL shape and execution.

Changes

Virtual Stage Cross-Reference Resolution

Layer / File(s) Summary
Virtual Stage Lineage Model
slayer/core/models.py
SourceModelOrigin Pydantic model captures in-memory lineage chain with name, optional data_source, and recursive parent reference; SlayerModel.source_model_origin field added (transient, excluded from persistence).
Virtual Stage Lineage Population
slayer/engine/query_engine.py
_query_as_model populates source_model_origin on constructed virtual SlayerModel with wrapped inner model's name, data_source, and chained lineage via parent field.
Resolution Core & Measure Interception
slayer/engine/enrichment.py
Adds Column import, resolve_via_stage_origin helper, and interception that converts dotted cross-model distributive aggregations into local re-aggregations when inner-stage flattened columns exist; wired into expression resolver, transform flattening, and query-level measure enrichment.
Stage-Origin Resolver
slayer/engine/enrichment.py
Implements candidate generation (ancestor-stripped first, then full-flat) from source_model_origin and returns the matching Column or None.
Dimension, Time, and Filter Fallbacks
slayer/engine/enrichment.py
When join-walk resolution fails and model.source_model_origin exists, enrichment falls back to resolve_via_stage_origin to locate flat stage columns, rewrites SQL/aliasing, and treats the resolved column as local to the virtual stage for dimensions, time-dimensions, and strict dotted filters.
Test Fixtures and SQL Inspection
tests/test_nested_dag_cross_stage_refs.py
Adds nested-DAG fixtures, YAML/SQLite-backed engines, sqlglot SQL-inspection helpers, and broad test coverage: multi-hop dotted dimensions/time-dims, time_shift, cross-stage measures, filters, ordering, rename/intercept semantics, breadcrumb lifecycle/serialization roundtrips, and negative/regression scenarios.

Sequence Diagram(s)

sequenceDiagram
  participant OuterEngine as Outer Query Engine
  participant Enr as Enrichment Resolver
  participant QueryAsModel as _query_as_model
  participant OriginResolver as resolve_via_stage_origin
  participant SQLGen as SQL Generator

  OuterEngine->>Enr: Resolve dotted ref (e.g., customers.regions.name)
  Enr->>Enr: Attempt join-walk resolution
  Enr->>QueryAsModel: Check model metadata (is virtual?)
  QueryAsModel-->>Enr: model.source_model_origin present
  Enr->>OriginResolver: resolve_via_stage_origin(model, parts)
  OriginResolver->>OriginResolver: Try ancestor-stripped and full flat candidates
  OriginResolver-->>Enr: Return matching Column (e.g., customers__regions__name)
  Enr->>SQLGen: Emit <virtual_stage_alias>.<flat_col> in SQL
  SQLGen-->>OuterEngine: SELECT s1.customers__regions__name
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • MotleyAI/slayer#9: Related extension of query-as-model/virtual stage machinery that this PR further augments with source lineage and resolution.
  • MotleyAI/slayer#134: Touches enrichment canonical↔user alias mapping and projection contracts used by the virtual-stage interception logic.
  • MotleyAI/slayer#97: Overlaps on _query_as_model entrypoint changes and named-query scoping relevant to virtual model wrapping.

Poem

🐰 I nibble breadcrumbs through each nested stage,
Flatten dotted paths from one little page,
No hybrid ghosts will haunt the SQL line,
Just scoped flat aliases—how they shine!
Hop hop—DEV-1449, carrot time!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: fixing nested-DAG multi-hop dotted dimension references from downstream stages, which is the core objective of this PR addressing DEV-1449.
Docstring Coverage ✅ Passed Docstring coverage is 81.97% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch egor/dev-1449-nested-dag-multi-hop-dotted-dimension-referenced-from-a

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
tests/test_nested_dag_cross_stage_refs.py (2)

1135-1135: ⚡ Quick win

Move the local SQLiteStorage import to the module import block.

Keeping imports at top-level avoids hidden import points and aligns with the project import rule.

♻️ Proposed fix
@@
 from slayer.engine.enrichment import resolve_via_stage_origin
 from slayer.engine.query_engine import SlayerQueryEngine
+from slayer.storage.sqlite_storage import SQLiteStorage
 from slayer.storage.yaml_storage import YAMLStorage
@@
-        from slayer.storage.sqlite_storage import SQLiteStorage
         db_path = tmp_path / "slayer.db"
         storage = SQLiteStorage(db_path=str(db_path))

As per coding guidelines "Place imports at the top of Python files".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_nested_dag_cross_stage_refs.py` at line 1135, Move the local
import "from slayer.storage.sqlite_storage import SQLiteStorage" out of the test
body and place it with the module-level imports at the top of
tests/test_nested_dag_cross_stage_refs.py; locate any test function or block
referencing SQLiteStorage and remove the inline import, then add the same import
to the file's import block so SQLiteStorage is imported once at module import
time in accordance with the project's import rules.

843-843: ⚡ Quick win

Use keyword arguments for engine.execute calls.

These calls pass more than one argument but use a positional first parameter. Switch to query=... for consistency with the repo rule.

♻️ Proposed fix
-            resp = await engine.execute(single, dry_run=True)
+            resp = await engine.execute(query=single, dry_run=True)
...
-            resp = await engine.execute(outer, dry_run=True)
+            resp = await engine.execute(query=outer, dry_run=True)

As per coding guidelines "Use keyword arguments for functions with more than 1 parameter in Python".

Also applies to: 1025-1025

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_nested_dag_cross_stage_refs.py` at line 843, The call to
engine.execute currently passes the query as a positional argument (resp = await
engine.execute(single, dry_run=True)); update it to use keyword arguments (resp
= await engine.execute(query=single, dry_run=True)) so it follows the repo rule;
make the same change for any other occurrences of engine.execute that pass the
query positionally (e.g., where the first arg is a variable like single) to use
query=... and preserve the dry_run=True keyword.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/engine/enrichment.py`:
- Around line 602-642: The fast-path in _try_intercept_cross_model_as_local must
be restricted to only semantics-preserving, re-aggregatable operations: check
ref.aggregation_name against an explicit safe set (e.g.
"sum","min","max","count") before proceeding and return None for all others
(avg, count_distinct, etc.); keep the existing logic using
model.get_column(flat_with_agg), _ensure_aggregated_measure, and
known_aliases[field_name], and do not attempt to re-aggregate parameterized refs
(ref.agg_args/agg_kwargs) — if you need to special-case the "*" count-to-sum
semantic, handle that explicitly by mapping the aggregation and flat_with_agg
appropriately, otherwise fall back to the CTE path.
- Around line 951-971: The interception branch that handles dot-qualified
spec.measure_name and returns early must perform the same rename/alias
bookkeeping as the normal local-aggregate path before continuing: after
finding/creating the local_alias (via _try_intercept_cross_model_as_local) and
updating the EnrichedMeasure metadata (measures / em.label / em.type), also
apply the canonical-to-user remap and projection/alias handling that the
standard local-aggregate branch does for qfield (respecting qfield.name
override, qfield.label, qfield.type), update any rename maps used by filters and
ORDER BY, and then call _mark_user_declared(local_alias) and append to
user_projection before continue; in short, reuse the same rename/alias
bookkeeping steps the normal branch uses so qfield.name properly remaps the
internal alias to the user-facing name.

---

Nitpick comments:
In `@tests/test_nested_dag_cross_stage_refs.py`:
- Line 1135: Move the local import "from slayer.storage.sqlite_storage import
SQLiteStorage" out of the test body and place it with the module-level imports
at the top of tests/test_nested_dag_cross_stage_refs.py; locate any test
function or block referencing SQLiteStorage and remove the inline import, then
add the same import to the file's import block so SQLiteStorage is imported once
at module import time in accordance with the project's import rules.
- Line 843: The call to engine.execute currently passes the query as a
positional argument (resp = await engine.execute(single, dry_run=True)); update
it to use keyword arguments (resp = await engine.execute(query=single,
dry_run=True)) so it follows the repo rule; make the same change for any other
occurrences of engine.execute that pass the query positionally (e.g., where the
first arg is a variable like single) to use query=... and preserve the
dry_run=True keyword.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 77ef8bcd-edcb-4bc5-906f-b7db5105506a

📥 Commits

Reviewing files that changed from the base of the PR and between a8fb874 and ae15468.

📒 Files selected for processing (4)
  • slayer/core/models.py
  • slayer/engine/enrichment.py
  • slayer/engine/query_engine.py
  • tests/test_nested_dag_cross_stage_refs.py

Comment thread slayer/engine/enrichment.py Outdated
Comment thread slayer/engine/enrichment.py
Three fixes to `_try_intercept_cross_model_as_local` and its line-896
caller in `slayer/engine/enrichment.py`:

1. Semantics gate (CodeRabbit). Gate the intercept to aggregations that
   re-aggregate to an equivalent result. `sum`/`min`/`max` pass through
   unchanged; `count` re-maps to `sum` over the inner per-group count
   (using `count` on stage rows would just count groups, not rows);
   everything else (`avg`, `count_distinct`, median, ...) returns None
   and falls through to the existing cross-model CTE path.

2. Candidate A — ancestor-strip (Codex). Mirror
   `resolve_via_stage_origin`'s Candidate A/B order inside the intercept.
   The source-prefixed form (`orders.customers.revenue:sum` against an
   `s1` wrapping `orders`) lands on `customers__revenue_sum` because
   `_alias_to_short` stripped the immediate ancestor on the inner stage.
   Build the ancestor chain from `source_model_origin`, try the
   ancestor-stripped flat first, then full flat.

3. Rename bookkeeping at line-896 (CodeRabbit). The intercept's early
   `continue` skipped the standard local-aggregate rename machinery:
   updating the EnrichedMeasure's name/alias, writing
   `known_aliases[qfield.name]`, `canonical_to_user_name[canonical_name]`,
   `measure_canonical_key_to_alias` provenance, and surfacing the user
   alias in `user_projection`. Mirror it.

Tests: 6 new cases — avg / count_distinct fall through to CTE path;
sum is intercepted (regression guard); source-prefixed CMM resolves via
Candidate A; renamed intercept surfaces as user alias; renamed intercept
filter via colon-form remaps to the rename. Updated *:count test to
assert outer SUM (not COUNT) per the new semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_nested_dag_cross_stage_refs.py`:
- Around line 1414-1428: The test currently only asserts the internal canonical
alias isn't present but doesn't ensure a filter exists or that the rewritten
name is used; update the block that inspects outer_select (using where, having,
clause, clause_cols from clause.find_all(exp.Column)) to first assert that at
least one of where or having is not None (i.e., a filter exists), then assert
the forbidden internal name "customers__revenue_sum_sum" is absent, and finally
assert that the rewritten alias (e.g., column name "rev") appears in clause_cols
to confirm the rename was applied.
- Around line 1209-1220: The test currently only looks for exp.Column inside
each exp.Count (using count_call.find_all(exp.Column)) and therefore misses
COUNT(*) cases; update the COUNT guard so that for each count_call (in the
outer_select loop) you also check for exp.Star (e.g.,
count_call.find_all(exp.Star) or count_call.this is exp.Star) and treat those as
stage-row counts to reject, or otherwise inspect count_call.args to detect a
star/no-column COUNT; ensure the assert checks fail when a COUNT(*) would target
the stage (same scope as checking ("s1", "customers___count") in
outer_count_targets) by adding the star-detection to the logic that populates
outer_count_targets or by asserting directly on count_call when a star is found.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 91b7543f-b32a-45ef-a592-c86774b2cae6

📥 Commits

Reviewing files that changed from the base of the PR and between ae15468 and 16b454d.

📒 Files selected for processing (2)
  • slayer/engine/enrichment.py
  • tests/test_nested_dag_cross_stage_refs.py

Comment thread tests/test_nested_dag_cross_stage_refs.py Outdated
Comment thread tests/test_nested_dag_cross_stage_refs.py
- Codex: add duplicate-canonical-qfield guard to the cross-stage
  intercept branch in `enrichment.py`. Without it, two qfields with
  the same cross-stage canonical aggregation but different `name`s
  would silently corrupt `user_projection`: the second call's rename
  mutates the first call's EnrichedMeasure alias and the
  projection still contains the pre-rename alias with no backing
  measure. Mirrors the guard the standard local-agg branch runs.
  New test: two intercepted qfields with different `name`s raise.

- CodeRabbit: tighten the `*:count` cross-stage assertion so it also
  rejects `COUNT(*)` (and any other outer COUNT shape), not only
  `COUNT(<column>)`. The intercept now re-maps `count` to `sum`, so
  any outer COUNT is a regression — assert none exists at all.

- CodeRabbit: tighten the renamed-CMM filter assertion to require at
  least one of WHERE/HAVING (so a silently-dropped filter can't pass
  the test). Dropped the "must reference `rev` literally" sub-check
  — standard SQL HAVING references the aggregation expression
  (`SUM(s1.customers__revenue_sum)`), not the projection alias, so
  pinning a literal `rev` ref would conflict with correct emission.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_nested_dag_cross_stage_refs.py (1)

1472-1472: 💤 Low value

Consider accepting both SlayerError and ValueError for consistency.

For consistency with similar validation tests in this file (lines 919, 1252, 1272), consider accepting both exception types.

Minor consistency improvement
-            with pytest.raises(ValueError, match="canonicalises to the same"):
+            with pytest.raises((SlayerError, ValueError), match="canonicalises to the same"):
                 await engine.execute(query=[inner, outer], dry_run=True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_nested_dag_cross_stage_refs.py` at line 1472, Update the assertion
in the failing test that currently uses pytest.raises(ValueError,
match="canonicalises to the same") so it accepts either SlayerError or
ValueError for consistency with other validation tests; change the raises check
to allow both exception types (e.g., use pytest.raises((SlayerError,
ValueError), match="canonicalises to the same")) and ensure SlayerError is
imported or referenced where the test uses pytest.raises.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_nested_dag_cross_stage_refs.py`:
- Line 1472: Update the assertion in the failing test that currently uses
pytest.raises(ValueError, match="canonicalises to the same") so it accepts
either SlayerError or ValueError for consistency with other validation tests;
change the raises check to allow both exception types (e.g., use
pytest.raises((SlayerError, ValueError), match="canonicalises to the same")) and
ensure SlayerError is imported or referenced where the test uses pytest.raises.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 28a865e2-982f-47e8-ae34-c86f1d9d30a0

📥 Commits

Reviewing files that changed from the base of the PR and between 16b454d and d0d4878.

📒 Files selected for processing (2)
  • slayer/engine/enrichment.py
  • tests/test_nested_dag_cross_stage_refs.py

- Codex: intercept now surfaces under the cross-model canonical alias
  (e.g. `s1.customers.revenue_sum`) instead of the doubled-sum
  internal form (`s1.customers__revenue_sum_sum`). Previously the
  intercept built the EnrichedMeasure against the flat stage column
  and `_ensure_aggregated_measure` produced an alias that didn't
  match the cross-model CTE path's shape — public result keys would
  change between paths, and colon-form filters / ORDER BY refs
  canonicalising to `customers.revenue_sum` would fail to find the
  intercepted measure in `known_aliases`. Always rename the
  intercepted measure to the cross-model canonical (or to the
  user-supplied `qfield.name`), and write both `target_name` and the
  canonical alias into `known_aliases`.

- CodeRabbit nitpick: move SQLiteStorage import from inside
  test_sqlite_storage_roundtrip_drops_origin to the top of the file
  (project rule: imports at module top).

- CodeRabbit nitpick: accept both (SlayerError, ValueError) in the
  duplicate-qfield test's pytest.raises, for consistency with other
  validation tests in this file.

Updated the row-count test to assert the new surfaced key shape
`s1.customers.revenue_sum` (matching what a regular cross-model
query would produce).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/test_nested_dag_cross_stage_refs.py (2)

904-924: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

rev:sum is no longer a valid negative case.

The new alias-bookkeeping contract records both the canonical and target names, and CI confirms this path no longer raises. Keeping pytest.raises(...) here locks the suite to the pre-change behavior instead of the behavior this PR is introducing. Update this test to assert the resolved SQL/output shape instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_nested_dag_cross_stage_refs.py` around lines 904 - 924, Update
test_renamed_cmm_via_rename_does_not_resolve to reflect the new
alias-bookkeeping behavior: remove the pytest.raises((SlayerError, ValueError))
expectation around await engine.execute(...) and instead run
engine.execute(query=[inner, outer], dry_run=True) and assert the
returned/dry-run result contains the expected resolved SQL or output shape
(e.g., that the outer resolves `rev:sum` to the proper cross-model measure
column from s1). Locate the test by function name and adjust assertions on the
engine.execute/dry_run result accordingly.

866-900: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Renamed-inner canonical alias path is still red in CI.

This test encodes the new canonical-alias contract, but CI shows engine.execute(..., dry_run=True) still raises ValueError: Model 's1' has no join to 'customers' here. The rename-aware intercept / known_aliases path this assertion depends on is not wired through yet, so this stack cannot merge cleanly as-is.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_nested_dag_cross_stage_refs.py` around lines 866 - 900, The
dry-run path used by engine.execute (called in
test_renamed_cmm_colon_syntax_resolves_via_canonical_flat) does not propagate
the rename-aware intercept / known_aliases used for CROSS-MODEL measures, so the
resolver still errors with "Model 's1' has no join to 'customers'"; fix by
threading the rename-aware intercept and known_aliases into the resolution flow
that engine.execute(..., dry_run=True) uses (where aliases are resolved and
_alias_to_short is called) so that the canonical alias mapping
(customers__revenue_sum) is available during dry-run; locate the
resolution/join-lookup logic invoked by engine.execute and ensure it consults
the same intercept/known_aliases structure used in normal execution.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/engine/enrichment.py`:
- Around line 1044-1078: When renaming an intercepted measure in
_ensure_aggregated_measure (the block that sets target_name/target_alias and
updates measures, known_aliases, measure_canonical_key_to_alias,
canonical_to_user_name), preserve the existing "rename-vs-canonical collision"
guard: before applying the rename (i.e. before looping measures and updating
aliases), check whether target_name (qfield.name or canonical_name) would
collide with any existing canonical alias owned by a different measure/canonical
key; if so, reject or surface the same error as the non-intercept path (prevent
deduping onto the intercepted alias). Implement this by reusing the same
collision check logic used elsewhere (compare target_name/target_alias against
measure_canonical_key_to_alias and known_aliases) and abort/raise the same
conflict error instead of proceeding with the rename.

---

Outside diff comments:
In `@tests/test_nested_dag_cross_stage_refs.py`:
- Around line 904-924: Update test_renamed_cmm_via_rename_does_not_resolve to
reflect the new alias-bookkeeping behavior: remove the
pytest.raises((SlayerError, ValueError)) expectation around await
engine.execute(...) and instead run engine.execute(query=[inner, outer],
dry_run=True) and assert the returned/dry-run result contains the expected
resolved SQL or output shape (e.g., that the outer resolves `rev:sum` to the
proper cross-model measure column from s1). Locate the test by function name and
adjust assertions on the engine.execute/dry_run result accordingly.
- Around line 866-900: The dry-run path used by engine.execute (called in
test_renamed_cmm_colon_syntax_resolves_via_canonical_flat) does not propagate
the rename-aware intercept / known_aliases used for CROSS-MODEL measures, so the
resolver still errors with "Model 's1' has no join to 'customers'"; fix by
threading the rename-aware intercept and known_aliases into the resolution flow
that engine.execute(..., dry_run=True) uses (where aliases are resolved and
_alias_to_short is called) so that the canonical alias mapping
(customers__revenue_sum) is available during dry-run; locate the
resolution/join-lookup logic invoked by engine.execute and ensure it consults
the same intercept/known_aliases structure used in normal execution.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: edf22757-89f2-4936-8b44-9f88b77a25c2

📥 Commits

Reviewing files that changed from the base of the PR and between d0d4878 and 24a5ace.

📒 Files selected for processing (2)
  • slayer/engine/enrichment.py
  • tests/test_nested_dag_cross_stage_refs.py

Comment thread slayer/engine/enrichment.py
ZmeiGorynych and others added 12 commits May 20, 2026 17:56
- python:S3776 (CRITICAL) x2: cognitive complexity in
  `_resolve_dimensions` (22→) and `_resolve_time_dimensions` (18→).
  The two functions had near-identical join-walk + stage-origin
  fallback logic in their dotted-ref branches. Extracted the shared
  flow into `_resolve_dotted_dim_with_stage_fallback`. Both call sites
  now delegate; complexity drops back below the 15 threshold and the
  fallback behaviour stays identical (cross-model CTE re-rooting still
  depends on the lenient fall-through when stage-origin misses).

- python:S7503 (MINOR) x5: five resolver-direct unit tests in
  test_nested_dag_cross_stage_refs.py were marked `async` but call
  only the sync `resolve_via_stage_origin` helper — no awaits in
  the body. Dropped the `async` keyword on those 5 methods.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Codex (major): cross-stage intercept dup-guard keyed on raw
  `spec.measure_name`, so two refs that resolve to the same flat
  column via different Candidate A/B paths (e.g.
  `orders.customers.revenue:sum` ancestor-strips to
  `customers__revenue_sum`, and `customers.revenue:sum` full-flats
  to the same column) slipped past the guard. The second call's
  rename then mutated the first call's EnrichedMeasure alias and
  `user_projection` was left with an orphaned reference.

  Split the intercept helper into `_intercept_candidate_for_cross_model`
  (pure, returns the resolved `(flat_with_agg, outer_agg)` or None)
  + `_try_intercept_cross_model_as_local` (applies the candidate).
  Key the dup-guard on `("agg-intercept", flat_with_agg, outer_agg)`
  so equivalent refs collide.

- CodeRabbit (major): the rename-vs-canonical collision guard from
  the standard local-aggregate branch was missing on the intercept
  branch. If an intercepted measure renamed to `customer_id_max`
  while another query measure canonicalises to `customer_id_max`,
  `_ensure_aggregated_measure`'s alias-keyed dedup would silently
  collapse the second aggregate onto the first. Added the cross-
  other-canonical check, mirroring the standard branch.

Two new tests pin the guards: source-prefixed + unprefixed variants
of the same cross-stage aggregate with different `name`s now raise;
an intercepted rename colliding with another canonical raises.

Round-4 CodeRabbit nitpicks (SQLiteStorage top-level import; accept
both SlayerError+ValueError in duplicate-qfield test) were stale —
already addressed in 24a5ace.

Round-4 CI failure (`lint-and-test (3.11)`) was a flake at
`Cannot install sqlglotc` (files.pythonhosted.org read timeout),
unrelated to PR code; should self-resolve on the next push.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Intercepted cross-model measures without an explicit `name` were
setting `EnrichedMeasure.name` to the dotted cross-model canonical
(`customers.revenue_sum`). The dotted form belongs on `em.alias`
(public result key, filter/ORDER BY resolution) but NOT on `em.name`
— `_query_as_model` uses `em.name` as the short `Column.name` of
the wrapped virtual model, and `Column.name` validation rejects
dots. So a 3-stage DAG where the middle stage is an intercepted
unrenamed CMM would fail when `_query_as_model` tried to wrap the
middle stage.

Fix: only mutate `em.name` when the rename target is a simple
identifier (user-supplied `qfield.name`). When the target is the
cross-model canonical (dotted), keep `em.name` as the internal flat
form `_ensure_aggregated_measure` produced.

Test: 3-stage DAG with an intercepted unrenamed CMM as the middle
stage. Without the fix, the wrap step raised a Column.name
validation error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 5's fix avoided the Column.name validation error but kept
`em.name` as the doubled-sum internal canonical (`customers__revenue_sum_sum`).
When that stage became the inner of a third stage, `_query_as_model`
exposed the wrapped column under that doubled-sum name, so the
third stage's intercept (which looks for the flat single-sum form
`customers__revenue_sum` — what a single-stage cross-model query
produces) MISSED and fell through to the cross-model CTE path,
which virtual stages don't have.

Fix: in the unrenamed case, derive `em.name` from the dotted
cross-model canonical by replacing `.` with `__` —
`customers.revenue_sum` → `customers__revenue_sum`. This matches
`_alias_to_short`'s convention and what a single-stage produces,
so the third stage's intercept finds the alias on `s2.columns`.

Strengthened the 3-stage test to actually reference
`customers.revenue:sum` again in stage 3 and assert the outer
SELECT contains `SUM(s2.customers__revenue_sum)` — pinning that
the third-stage intercept now resolves correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The filter stage-origin fallback was designed for DIMENSION refs
cross-stage (e.g. `customers.regions.name = 'X'`). When the dotted
ref's leaf was an aggregation canonical like `customers.revenue_sum`
(from parse_filter rewriting `customers.revenue:sum > 100`), the
fallback still fired and rewrote the filter to
`WHERE s1.customers__revenue_sum > 100` — applying the predicate to
the inner row-level values rather than HAVING-classifying it
against the outer re-aggregated SUM.

Fix: gate the fallback on the leaf NOT looking like an aggregation
canonical (no `_<agg>` suffix matching any BUILTIN_AGGREGATIONS).
Cross-model measure filters on intercepted CMMs are not auto-
resolved (DEV-1445 territory); the standard strict-error fires
instead of silently emitting a wrong WHERE.

Test pins the gating: a filter `customers.revenue:sum > 100` on an
outer query with an unrenamed intercepted CMM now raises (DEV-1445
limitation surface), rather than emitting incorrect SQL. The
working rename path (covered by `test_renamed_intercept_filter_via_colon_form`)
is unaffected — `canonical_to_user_name` remaps colon-form to the
user alias before the filter resolver sees the dotted ref.

Codex's parallel ORDER-BY observation (colon-form ORDER BY of
intercepted unrenamed CMMs binds the inner stage column instead of
the outer aggregate) is the same DEV-1445 territory — not auto-
resolved; users must reference via the renamed alias.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ORDER BY by colon/canonical form on an unrenamed intercepted
cross-model measure (e.g. `order=[{"column":"customers.revenue:sum"}]`
or `customers.revenue_sum`) fell through to a non-existent
`customers.revenue_sum` bare column because
`generator._resolve_order_column`'s alias_lookup is built from
`EnrichedMeasure.name` (which the intercept's auto-rename changed
to the flat `customers__revenue_sum`), not from `known_aliases`.

Fix: register the dotted canonical name in `field_name_aliases` so
`_resolve_order_column`'s qualified-match branch
(line 1903-1906 in generator.py) resolves
`customers.revenue:sum` → projection alias `s1.customers.revenue_sum`.

Test pins the resolution: ORDER BY by colon form on an outer
intercepted CMM now references the projection alias.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dag-multi-hop-dotted-dimension-referenced-from-a
DEV-1448 (merged into main while PR #137 was in review) propagates
the user-supplied `name` on a cross-model measure through to the
wrapped virtual stage's column. That reverses the assumption three
of my tests were built around (pre-DEV-1448: rename doesn't
propagate; the canonical flat alias stays on s1.columns).

Updated to reflect the new reality:

- `test_renamed_cmm_colon_syntax_no_longer_resolves_after_rename`
  (was `…_resolves_via_canonical_flat`): the rename now replaces
  `customers__revenue_sum` on s1.columns with `rev`, so the outer's
  colon-form `customers.revenue:sum` misses the intercept candidate
  and falls through to the cross-model CTE path which raises on a
  virtual stage (no joins). DEV-1445 territory: users renamed
  cross-model measures must reference them by the rename downstream.

- `test_renamed_cmm_via_rename_resolves` (was `…_does_not_resolve`):
  the rename DOES propagate, so `rev:sum` on the outer resolves as
  a standard local-aggregate over s1.rev → SUM(s1.rev).

- `test_intercepted_rename_colliding_with_other_canonical_raises`:
  the collision now triggers DEV-1448's downstream-short-name
  guard before my round-4 intercept-canonical-collision guard
  fires — both are correct outcomes; widened the regex.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex flagged two fragile heuristics with the same root cause:
the cross-stage intercept and the filter fallback were using
naming-suffix patterns as proxies for "is this an aggregation
canonical?", but the suffix shape is unreliable.

1. Intercept candidate could re-aggregate dim columns. If an
   inner stage projects a real DIMENSION whose flat name happens
   to look like `<path>__<measure>_<agg>` (e.g. a user-defined
   `revenue_sum` dim, or a column auto-flattened to that shape),
   the intercept's `model.get_column(flat_with_agg)` lookup
   matched it and silently emitted `SUM(s1.customers__revenue_sum)`
   even though no aggregate was projected.

   Fix: add `SourceModelOrigin.agg_column_names: frozenset[str]`,
   populated by `_query_as_model` from the inner enriched query's
   measures / cross_model_measures / transforms / expressions
   (their downstream short names). Gate the intercept on
   `candidate in agg_names` — only aggregation projections qualify.

2. Filter fallback rejected valid cross-stage dim filters. The
   suffix-based gate blocked any dotted ref whose leaf ended with
   a builtin aggregation suffix, so a real dim filter like
   `customers.revenue_sum = 'X'` (where the dim is literally named
   that) skipped stage-origin resolution and raised in strict mode
   even though `s1.customers__revenue_sum` exists.

   Fix: use parser provenance (`filter_synthesized_aliases` — the
   set of names `parse_filter` synthesized from colon syntax) to
   distinguish a colon-syntax-synthesized aggregate alias from a
   user-typed literal dim ref. Already plumbed into
   `resolve_filter_columns` for the bare-name check at line 2695;
   reuse the same signal for the dotted-ref fallback gate.

Test: pin that a dim column with an `_<agg>`-shaped name does NOT
get intercepted (resolve_via_stage_origin still resolves it as a
plain column, but the intercept's gating on `agg_column_names`
keeps it from being silently re-summed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`test_intercept_skips_dim_columns_that_look_like_aggregations`
calls only the sync `resolve_via_stage_origin` helper — no awaits
— so the `async` keyword was unused. Sonar S7503 minor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`agg_column_names` (the intercept's "safe to re-aggregate" gate)
included plain measures / transforms / expressions, but those have
user-supplied or canonical-derived names that can coincidentally
match a cross-model canonical-flat shape (e.g.
`{"formula": "amount:sum", "name": "customers__revenue_sum"}`).
A subsequent outer-stage `customers.revenue:sum` reference would
silently bind to this unrelated column.

Narrow the set to two reliable sources:
  * Auto-derived cross-model canonical-flats (`_alias_to_short(cm.alias)`).
  * Intercept-produced EnrichedMeasures, marked via a new
    `EnrichedMeasure.from_cross_model_intercept` flag set in
    `_try_intercept_cross_model_as_local` and at the qfield-site
    intercept call.

The intercept-produced flag is what keeps 3-stage DAGs working:
the middle stage's intercept emits a local EnrichedMeasure (not a
CrossModelMeasure), and the third stage's intercept needs to
recognise that column as a safe re-aggregation source.

Test: a renamed local measure on the inner stage whose name
coincidentally matches the cross-model canonical-flat shape (e.g.
`{"formula":"amount:sum","name":"customers__revenue_sum"}`) no
longer triggers the outer-stage intercept — it falls through to
the cross-model CTE path, which raises (virtual stage has no
joins), surfacing the mismatch instead of silently re-summing
the wrong column.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The round-8 fix registered the dotted cross-model canonical in
`field_name_aliases` (so ORDER BY can resolve `customers.revenue:sum`
via `_resolve_order_column`'s qualified-match branch) only at the
qfield-site intercept. The `_flatten_spec` / `_ensure_measure_from_spec`
intercept branches (which run when the user references a cross-model
measure ONLY in `order=` or in a transform/expression argument,
without also declaring it as a `measures=` entry) didn't register
the alias and ORDER BY fell through to a non-existent bare column.

Move the registration into `_try_intercept_cross_model_as_local`
itself so all three call sites get it.

Test pins the order-only case (no qfield declaration): ORDER BY
must not reference a bare `customers` table after the intercept.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

@ZmeiGorynych ZmeiGorynych merged commit b01a152 into main May 20, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant