Use AbstractPPL AD interface by yebai · Pull Request #1363 · TuringLang/DynamicPPL.jl

yebai · 2026-04-21T10:40:35Z

Migrates DynamicPPL's AD plumbing to the new AbstractPPL.prepare / value_and_gradient!! interface introduced in AbstractPPL 0.15, retiring DynamicPPL's direct dependency on DifferentiationInterface.

The LogDensityFunction constructor now hands AD preparation to AbstractPPL.prepare(adtype, logdensity_internal, x; context=..., check_dims=false) and LogDensityProblems.logdensity_and_gradient simply forwards to AbstractPPL.value_and_gradient!!. The internal _use_closure heuristic that picked between Base.Fix1 closures and DI.Constant arguments per backend is removed — that decision now lives in AbstractPPL. logdensity_at is renamed to logdensity_internal (with a const alias for compatibility), and LogDensityAt becomes a deprecation shim that returns an AbstractPPL.Evaluators.VectorEvaluator.

Replacing #1354. ~~Tests currently fail due to #1364~~

Notes: The primal and gradient times are identical to main except for tiny models, where this branch beats main for logdensity evaluation while gradient remains comparable.

Details

The apparent AD regression in the tiny linked row is mostly caused by a much faster
primal denominator, not by slower raw gradients.

Reported row:

main:    primal = 23.6 ns, Mooncake ratio = 7.16   -> raw grad ~= 169 ns
branch:  primal = 4.63 ns, Mooncake ratio = 33.43 -> raw grad ~= 155 ns

So Mooncake is slightly faster in raw time for this row. The ratio looks worse because
t(logdensity) became tiny.

Root cause

On Julia 1.11, old main's optimised primal still contains a call boundary:

invoke DynamicPPL.tilde_assume!!(...)

Current branch inlines through the same path, so the one-dimensional linked model reduces
to loading params[1] and evaluating two scalar Normal logpdfs.

Local Julia 1.11.9 checks:

old main e2d4b5da:
  primal ~= 9.4 ns

old main + monkey-patched @inline tilde_assume!!:
  primal ~= 3.2 ns

current branch:
  primal ~= 2.5 ns

Adding @inline only to LogDensityProblems.logdensity did not materially change old
main. The important boundary is the assume/init path.

- remove DifferentiationInterface from Project.toml deps and compat - replace DI gradient preparation/evaluation with AbstractPPL.prepare and AbstractPPL.value_and_gradient - reuse LogDensityAt and closures without local AbstractPPL.prepare piracy - guard the Mooncake precompile workload until AbstractPPLMooncakeExt is loaded - pin AbstractPPL to the evaluator-interface branch in [sources] for this environment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

# Conflicts: # Project.toml # src/logdensityfunction.jl

codecov · 2026-04-21T11:00:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.70%. Comparing base (d2052a1) to head (8d27685).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1363      +/-   ##
==========================================
- Coverage   82.30%   81.70%   -0.61%     
==========================================
  Files          50       50              
  Lines        3543     3536       -7     
==========================================
- Hits         2916     2889      -27     
- Misses        627      647      +20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Fix #1364 TODO - [x] use `main` as base, as otherwise benchmarking would fail, need to change back to #1364 before merging. - ~~Julia 1.10’s Pkg does not reliably apply the root [sources] override during build/test, so min CI fails.~~ This change switches `LogDensityFunction` AD preparation to pass a structural `LogDensityAt` problem directly into `AbstractPPL.prepare(...)` instead of wrapping it in backend-specific anonymous closures. It also removes the old `_use_closure` machinery, since the structural problem object now provides the stable one-argument evaluator shape that AD backends need. The accompanying test coverage adds a focused `ReverseDiff` check to confirm that `AutoReverseDiff(; compile=true)` still retains a compiled tape and can be reused across repeated `logdensity_and_gradient` calls. Benchmarks ``` +----------------------+------------+--------------+--------+--------------+ | Backend | PR grad μs | main grad μs | Ratio | Summary | +----------------------+------------+--------------+--------+--------------+ | ForwardDiff | 0.322 | 0.322 | 1.001x | 0.1% slower | | Forward Mooncake | 0.900 | 2.601 | 0.346x | 2.89x faster | | Forward Enzyme | 0.789 | 1.051 | 0.751x | 24.9% faster | | ReverseDiff | 14.239 | 15.225 | 0.935x | 6.5% faster | | ReverseDiff compiled | 5.355 | 5.872 | 0.912x | 8.8% faster | | Reverse Mooncake | 0.892 | 0.928 | 0.961x | 3.9% faster | | Reverse Enzyme | 1.253 | 1.268 | 0.988x | 1.2% faster | +----------------------+------------+--------------+--------+--------------+ ``` --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-23T19:35:14Z

DynamicPPL.jl documentation for PR #1363 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1363/

Update LogDensityFunction gradient evaluation to match the AbstractPPL evaluator interface and make the floattypes environment resolve that source branch explicitly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Drops the base-vs-head comparison entirely. The benchmark workflow now runs once on the PR head, on a pinned `ubuntu-22.04` runner, and reports absolute log-density times plus gradient/log-density ratios in the posted comment. Output schema follows Mooncake's bench harness; readers compare against recent main-branch comments to spot regressions. Noise reduction in `run_ad`: per-sample incremental GC teardown and a full GC before each measurement keep accumulated garbage from triggering mid-sample collections that inflate individual samples. Adds a `benchmark_seconds` knob for tightening the median estimate. Also removes the synthetic reference timing that normalised eval times against a non-DPPL function. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Inline `Models.jl` and `DynamicPPLBenchmarks.jl` into `benchmarks.jl` and convert `benchmarks/Project.toml` from a package to a flat environment, mirroring Mooncake.jl's `bench/run_benchmarks.jl` layout. Also: take dim from `length(r.params)` (run_ad already constructed the LDF) so models are no longer evaluated twice on the success path; switch results to NamedTuples so `print_results` reads `r.name`/`r.dim`/...; extract `transform_strategy(islinked)` helper; drop unused imports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # benchmarks/Project.toml # benchmarks/src/DynamicPPLBenchmarks.jl

Run a full cross-product of the 9 model configs × 4 AD backends × {linked, unlinked} = 72 rows, ordered model → linked → backend so each model's eight rows are adjacent for side-by-side inspection. `:reversediff_compiled` is excluded because compiled tapes are input-dependent and silently produce wrong gradients on parameter-dependent control flow (see CLAUDE.md). Per-row logging mirrors Mooncake's `bench/run_benchmarks.jl`: an `(i / N, name, (linked = …))` header, the backend on its own line, then `t(logdensity)` / `t(grad)` formatted with units. `model_dimension` is now defensive (returns `missing` on init failures) and the table formats `missing` dims as `err`, so combos that crash during dimension lookup still produce a well-formed row instead of derailing the run. Also: add a `setup` stage to `run_ad`'s Chairmarks pipeline that deep-copies `params` per sample, matching Mooncake's harness — setup runs before the timed window, so the copy is excluded from measurements. Widen `combos` to a typed `Tuple{...}[]` so it accepts models with non-default contexts (e.g. the `condition`-wrapped `loop_univariate`/`multivariate` rows). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…apsible full table LDA's discrete `Categorical` RVs make `linked = false` ill-defined for gradient-based AD, so all four backends previously errored on those rows. Skip them at combination time, leaving 68 rows. In markdown mode, emit a `### Gist: Smorgasbord` block with just that model's eight rows (Smorgasbord covers the broadest set of DPPL features, so it is the most informative single row band), then put the full 68-row table inside `<details><summary>` so it is collapsed by default in GitHub PR comments. Plain (non-markdown) output is unchanged. Drop the now-redundant `### Benchmark Results` heading from the workflow body since the script emits its own. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Required as a direct dep so the benchmarks project resolves cleanly without a manual `Pkg.resolve()` after `Pkg.instantiate()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@V3

The Benchmarking comment now reads top-down as: SHA in the heading, Smorgasbord gist as a level-3 section, explanatory paragraph, full table as a level-3 section, and computer info as a foldable <details> with an inline <b>-styled summary (avoids markdown inside <details>, which renders inconsistently). The two integration jobs added to CI.yml were still on setup-julia@v2; bumped to @V3 for consistency with the main job. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Splits Benchmarking.yml into three jobs (benchmark-pr, benchmark-main, post-comment). The PR-comment now shows PR-head numbers up top and main's numbers in a foldout below, with a column legend and labelled SHAs. If benchmark-main fails (e.g. transitionally before this branch lands on main, since main's bench script does not yet support markdown mode), post-comment still posts PR-head numbers and notes the main job result inline. Workflow hardening: concurrency group cancels superseded PR runs, 60min timeout per bench job, explicit pull-requests: write permission so fork PRs can post comments. Body assembly moved out of the YAML literal block scalar into a shell heredoc using `body-path:` (peter-evans pattern), avoiding env-var multi-line interpolation. Bench script no longer prints the obsolete "compare against main-branch PR run" paragraph -- main is now in the same comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # .github/workflows/Benchmarking.yml # benchmarks/Project.toml # benchmarks/benchmarks.jl # src/test_utils/ad.jl

The 68-row table dominates the PR comment vertically. Wrap it in a <details><summary> block so it is collapsed by default, leaving the Smorgasbord gist as the at-a-glance view. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Long-form table (one row per (model, linked, backend)) becomes a pivoted table where each (Model, Dim, Linked) row spans all four AD backends as columns. 68 rows collapse to 17, the gist/full-table split goes away (single unified table), and the gist constant + foldout wrapper in benchmarks/benchmarks.jl are dropped. `t(logdensity)` does not depend on the AD backend, so the four primal samples per group are noise around a common value — take the minimum as the most stable estimate. Per-backend `err` cells render independently of the primal column. Workflow comment restructure to match: SHA goes inline in the H2 title, "PR head"/"Main" preamble lines drop, main-branch results live under their own H3 with a "Click to see" foldout (and a fallback H3 note if benchmark-main failed), Computer Information is its own H3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Switch the `evaluator-interface` revision overrides in every Project.toml to `main` now that the evaluator API has landed there. Also drop redundant parens in a `@model` Type default in benchmarks/benchmarks.jl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the LogDensityAt struct with a deprecation shim that emits `Base.depwarn` and returns an `AbstractPPL.Evaluators.VectorEvaluator` wrapping a closure over `logdensity_internal`. The new path is the sanctioned one now that AD prep flows through `AbstractPPL.prepare`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the closure-over-state form with the new `context::Tuple` kwarg on `AbstractPPL.prepare`, which threads constants through to the problem function. Equivalent semantics, no closure construction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pass `check_dims=false` to match the pre-deprecation `LogDensityAt`'s unchecked call behavior; callers that relied on it shouldn't suddenly hit a `DimensionMismatch`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Move test/integration/{enzyme,reversediff}/ to test/ext/{DynamicPPLEnzymeCoreExt,DynamicPPLReverseDiffExt}/ and convert the MarginalLogDensities split (subdir env + sibling script) to a single test/ext/DynamicPPLMarginalLogDensitiesExt/{Project.toml,main.jl}. Each integration env now follows the same layout. CI.yml updated accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Remove the `AbstractPPL = {url = ..., rev = "main"}` overrides from all eight Project.toml files. Compat is already at 0.15, so the resolver picks up the registered release. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fresh-buffer benchmark setup - In `DynamicPPL.TestUtils.AD.run_ad`, use `setup = deepcopy($params)` for both the primal and gradient `@be ` calls so each Chairmarks sample starts from a fresh input buffer instead of reusing the same vector across samples. Matches Mooncake's bench harness; setup runs outside the timed window, so the copy isn't measured. Plus some minor tweaks. ~~This appears to have a noticeable impact on log-density evaluation for tiny models, first noticed in TuringLang#1363~~: e.g., `simple assume observe（linked = true)` logdensity improves from ~20 ns to ~4 ns. In practise, these tiny nanosecond overheads won't matter at all, so the improvement here is only for perfectionists. EDIT: likely `LogDensityAt` -> `logdensity_internal` reduced logdensity's overhead by 20 ns in TuringLang#1363 --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

shravanngoswamii

@yebai few suggestions and improvement comments.

shravanngoswamii · 2026-05-23T15:23:49Z

Mooncake and Enzyme are faster on this PR. So the 10× regression reported in comments are no longer valid.

main branch d2052a1:
Model                      dim    linked     primal    FwdDiff    RvsDiff    Mooncake    Enzyme
-----------------------------------------------------------------------------------------------
Simple assume observe*       1     false    3.09 ns      12.60    1542.87       40.05      8.81
Simple assume observe*       1      true    3.09 ns      12.88    1690.22       43.55      9.27

This PR 3bdeb74:
Model                      dim    linked     primal    FwdDiff    RvsDiff    Mooncake    Enzyme
-----------------------------------------------------------------------------------------------
Simple assume observe*       1     false    3.26 ns      12.76    2101.00       37.30      8.23
Simple assume observe*       1      true    3.02 ns      13.37    1832.40       38.87      8.85

My system:

julia> versioninfo()
Julia Version 1.12.6
Commit 15346901f00 (2026-04-09 19:20 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × 12th Gen Intel(R) Core(TM) i5-12500H
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, alderlake)
  GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 16 virtual cores)

julia>

Co-authored-by: Shravan Goswami <123811742+shravanngoswamii@users.noreply.github.com>

Drop redundant inner `using AbstractPPL: AbstractPPL` (already imported at the top of the file) and reuse `ldf.model` instead of constructing a second `tiny()` instance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

yebai · 2026-05-25T10:24:43Z

Mooncake and Enzyme are faster on this PR. So the 10× regression reported in comments are no longer valid.

To clarify, these metrics were never valid; they arose from a misunderstanding. For some models, log-density evaluations occur on the order of nanoseconds, making the gradient-to-primal ratios highly noisy across runs and machines. For these tiny models, the primal and gradient timings themselves are more reliable. More generally, these benchmarks are intended primarily to catch obvious allocation or type-stability regressions and are meant for DynamicPPL developers only.

Replace the alarmist warning with a gentler note clarifying that the benchmarks are aimed at DynamicPPL developers and primarily catch allocation or type-stability regressions. Move the `*`-row explanation out of the per-PR comment into the README's interpretation notes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot assigned yebai Apr 21, 2026

Merge remote-tracking branch 'origin/main' into adproblems-interface

e1cac2f

# Conflicts: # Project.toml # src/logdensityfunction.jl

yebai force-pushed the adproblems-interface branch from 2340748 to e1cac2f Compare April 21, 2026 10:51

yebai mentioned this pull request Apr 21, 2026

Is _use_closure still necessary with native gradient API #1364

Closed

yebai and others added 2 commits April 23, 2026 19:28

Merge branch 'main' into adproblems-interface

3436cb7

yebai and others added 15 commits April 26, 2026 21:55

Merge branch 'main' into adproblems-interface

6e01f94

Split AD integration tests

a11e9cc

Fix benchmark dependency sources

00969e0

Format logdensityfunction test

ad29168

format

33f262c

Merge branch 'main' into adproblems-interface

c3b3016

Add DifferentiationInterface to benchmarks

187bd3d

Merge branch 'main' into adproblems-interface

0b597b9

Use AbstractPPL gradient bang API

aa3af76

Update LogDensityFunction gradient evaluation to match the AbstractPPL evaluator interface and make the floattypes environment resolve that source branch explicitly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Merge branch 'improve-benchmarks-1374' into adproblems-interface

a50b023

# Conflicts: # benchmarks/Project.toml # benchmarks/src/DynamicPPLBenchmarks.jl

Add DifferentiationInterface to benchmarks env

49b1837

Required as a direct dep so the benchmarks project resolves cleanly without a manual `Pkg.resolve()` after `Pkg.instantiate()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

yebai force-pushed the adproblems-interface branch from 976a8eb to 49b1837 Compare May 4, 2026 22:52

yebai and others added 5 commits May 5, 2026 10:53

Merge remote-tracking branch 'origin/main' into adproblems-interface

bc79c56

# Conflicts: # .github/workflows/Benchmarking.yml # benchmarks/Project.toml # benchmarks/benchmarks.jl # src/test_utils/ad.jl

yebai and others added 8 commits May 19, 2026 12:32

Merge branch 'main' into adproblems-interface

8b418df

Merge branch 'main' into adproblems-interface

971f8a9

Skip dims check in LogDensityAt deprecation shim

54a0820

Pass `check_dims=false` to match the pre-deprecation `LogDensityAt`'s unchecked call behavior; callers that relied on it shouldn't suddenly hit a `DimensionMismatch`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop Bijectors source pins now that 0.16 is registered

02f6bc2

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

yebai marked this pull request as ready for review May 21, 2026 19:54

Point Enzyme CI workflow at relocated integration test

3bdeb74

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

yebai requested a review from shravanngoswamii May 22, 2026 09:11

shravanngoswamii reviewed May 23, 2026

View reviewed changes

Comment thread src/logdensityfunction.jl

Comment thread src/logdensityfunction.jl Outdated

Comment thread src/logdensityfunction.jl Outdated

shravanngoswamii reviewed May 23, 2026

View reviewed changes

Comment thread src/logdensityfunction.jl Outdated

shravanngoswamii reviewed May 23, 2026

View reviewed changes

Comment thread test/logdensityfunction.jl

yebai and others added 3 commits May 25, 2026 10:54

Apply suggestions from code review

0f875a5

Co-authored-by: Shravan Goswami <123811742+shravanngoswamii@users.noreply.github.com>

Update test/logdensityfunction.jl

dfe0684

Co-authored-by: Shravan Goswami <123811742+shravanngoswamii@users.noreply.github.com>

Clean up LogDensityAt deprecation shim test

f485f5c

Drop redundant inner `using AbstractPPL: AbstractPPL` (already imported at the top of the file) and reuse `ldf.model` instead of constructing a second `tiny()` instance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

shravanngoswamii approved these changes May 25, 2026

View reviewed changes

yebai and others added 2 commits May 25, 2026 11:31

Link transform-strategy TODO to issue and drop benchmark header

8d27685

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

yebai added this pull request to the merge queue May 25, 2026

Merged via the queue into main with commit a115fa8 May 25, 2026
23 of 24 checks passed

yebai deleted the adproblems-interface branch May 25, 2026 11:40

shravanngoswamii mentioned this pull request May 25, 2026

Update HISTORY.md for 0.42.0 #1408

Open

yebai mentioned this pull request May 25, 2026

Use AbstractPPL evaluator interface TuringLang/AdvancedVI.jl#255

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use AbstractPPL AD interface#1363

Use AbstractPPL AD interface#1363
yebai merged 58 commits into
mainfrom
adproblems-interface

yebai commented Apr 21, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

shravanngoswamii left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shravanngoswamii commented May 23, 2026

Uh oh!

Uh oh!

yebai commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yebai commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root cause

Uh oh!

codecov Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

shravanngoswamii left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shravanngoswamii commented May 23, 2026

Uh oh!

Uh oh!

yebai commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yebai commented Apr 21, 2026 •

edited

Loading

codecov Bot commented Apr 21, 2026 •

edited

Loading