Skip to content

Use AbstractPPL AD interface#1363

Merged
yebai merged 58 commits into
mainfrom
adproblems-interface
May 25, 2026
Merged

Use AbstractPPL AD interface#1363
yebai merged 58 commits into
mainfrom
adproblems-interface

Conversation

@yebai
Copy link
Copy Markdown
Member

@yebai yebai commented Apr 21, 2026

Migrates DynamicPPL's AD plumbing to the new AbstractPPL.prepare / value_and_gradient!! interface introduced in AbstractPPL 0.15, retiring DynamicPPL's direct dependency on DifferentiationInterface.

The LogDensityFunction constructor now hands AD preparation to AbstractPPL.prepare(adtype, logdensity_internal, x; context=..., check_dims=false) and LogDensityProblems.logdensity_and_gradient simply forwards to AbstractPPL.value_and_gradient!!. The internal _use_closure heuristic that picked between Base.Fix1 closures and DI.Constant arguments per backend is removed — that decision now lives in AbstractPPL. logdensity_at is renamed to logdensity_internal (with a const alias for compatibility), and LogDensityAt becomes a deprecation shim that returns an AbstractPPL.Evaluators.VectorEvaluator.

Replacing #1354. Tests currently fail due to #1364

Notes: The primal and gradient times are identical to main except for tiny models, where this branch beats main for logdensity evaluation while gradient remains comparable.

Details


The apparent AD regression in the tiny linked row is mostly caused by a much faster
primal denominator, not by slower raw gradients.

Reported row:

main:    primal = 23.6 ns, Mooncake ratio = 7.16   -> raw grad ~= 169 ns
branch:  primal = 4.63 ns, Mooncake ratio = 33.43 -> raw grad ~= 155 ns

So Mooncake is slightly faster in raw time for this row. The ratio looks worse because
t(logdensity) became tiny.

Root cause

On Julia 1.11, old main's optimised primal still contains a call boundary:

invoke DynamicPPL.tilde_assume!!(...)

Current branch inlines through the same path, so the one-dimensional linked model reduces
to loading params[1] and evaluating two scalar Normal logpdfs.

Local Julia 1.11.9 checks:

old main e2d4b5da:
  primal ~= 9.4 ns

old main + monkey-patched @inline tilde_assume!!:
  primal ~= 3.2 ns

current branch:
  primal ~= 2.5 ns

Adding @inline only to LogDensityProblems.logdensity did not materially change old
main. The important boundary is the assume/init path.

- remove DifferentiationInterface from Project.toml deps and compat
- replace DI gradient preparation/evaluation with AbstractPPL.prepare and AbstractPPL.value_and_gradient
- reuse LogDensityAt and closures without local AbstractPPL.prepare piracy
- guard the Mooncake precompile workload until AbstractPPLMooncakeExt is loaded
- pin AbstractPPL to the evaluator-interface branch in [sources] for this environment

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
# Conflicts:
#	Project.toml
#	src/logdensityfunction.jl
@yebai yebai force-pushed the adproblems-interface branch from 2340748 to e1cac2f Compare April 21, 2026 10:51
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.70%. Comparing base (d2052a1) to head (8d27685).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1363      +/-   ##
==========================================
- Coverage   82.30%   81.70%   -0.61%     
==========================================
  Files          50       50              
  Lines        3543     3536       -7     
==========================================
- Hits         2916     2889      -27     
- Misses        627      647      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yebai and others added 2 commits April 23, 2026 19:28
Fix #1364 

TODO 
- [x] use `main` as base, as otherwise benchmarking would fail, need to
change back to #1364 before merging.
- ~~Julia 1.10’s Pkg does not reliably apply the root [sources] override
during build/test, so min CI fails.~~

This change switches `LogDensityFunction` AD preparation to pass a
structural `LogDensityAt` problem directly into
`AbstractPPL.prepare(...)` instead of wrapping it in backend-specific
anonymous closures. It also removes the old `_use_closure` machinery,
since the structural problem object now provides the stable one-argument
evaluator shape that AD backends need.

The accompanying test coverage adds a focused `ReverseDiff` check to
confirm that `AutoReverseDiff(; compile=true)` still retains a compiled
tape and can be reused across repeated `logdensity_and_gradient` calls.

Benchmarks 


```
  +----------------------+------------+--------------+--------+--------------+
  | Backend              | PR grad μs | main grad μs |  Ratio | Summary      |
  +----------------------+------------+--------------+--------+--------------+
  | ForwardDiff          |      0.322 |        0.322 |  1.001x | 0.1% slower  |
  | Forward Mooncake     |      0.900 |        2.601 |  0.346x | 2.89x faster |
  | Forward Enzyme       |      0.789 |        1.051 |  0.751x | 24.9% faster |
  | ReverseDiff          |     14.239 |       15.225 |  0.935x | 6.5% faster  |
  | ReverseDiff compiled |      5.355 |        5.872 |  0.912x | 8.8% faster  |
  | Reverse Mooncake     |      0.892 |        0.928 |  0.961x | 3.9% faster  |
  | Reverse Enzyme       |      1.253 |        1.268 |  0.988x | 1.2% faster  |
  +----------------------+------------+--------------+--------+--------------+
```

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

DynamicPPL.jl documentation for PR #1363 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1363/

yebai and others added 15 commits April 26, 2026 21:55
Update LogDensityFunction gradient evaluation to match the AbstractPPL evaluator interface and make the floattypes environment resolve that source branch explicitly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drops the base-vs-head comparison entirely. The benchmark workflow now
runs once on the PR head, on a pinned `ubuntu-22.04` runner, and reports
absolute log-density times plus gradient/log-density ratios in the
posted comment. Output schema follows Mooncake's bench harness; readers
compare against recent main-branch comments to spot regressions.

Noise reduction in `run_ad`: per-sample incremental GC teardown and a
full GC before each measurement keep accumulated garbage from triggering
mid-sample collections that inflate individual samples. Adds a
`benchmark_seconds` knob for tightening the median estimate. Also
removes the synthetic reference timing that normalised eval times
against a non-DPPL function.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inline `Models.jl` and `DynamicPPLBenchmarks.jl` into `benchmarks.jl` and
convert `benchmarks/Project.toml` from a package to a flat environment,
mirroring Mooncake.jl's `bench/run_benchmarks.jl` layout.

Also: take dim from `length(r.params)` (run_ad already constructed the LDF)
so models are no longer evaluated twice on the success path; switch results
to NamedTuples so `print_results` reads `r.name`/`r.dim`/...; extract
`transform_strategy(islinked)` helper; drop unused imports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	benchmarks/Project.toml
#	benchmarks/src/DynamicPPLBenchmarks.jl
Run a full cross-product of the 9 model configs × 4 AD backends ×
{linked, unlinked} = 72 rows, ordered model → linked → backend so each
model's eight rows are adjacent for side-by-side inspection.
`:reversediff_compiled` is excluded because compiled tapes are
input-dependent and silently produce wrong gradients on
parameter-dependent control flow (see CLAUDE.md).

Per-row logging mirrors Mooncake's `bench/run_benchmarks.jl`: an
`(i / N, name, (linked = …))` header, the backend on its own line,
then `t(logdensity)` / `t(grad)` formatted with units. `model_dimension`
is now defensive (returns `missing` on init failures) and the table
formats `missing` dims as `err`, so combos that crash during dimension
lookup still produce a well-formed row instead of derailing the run.

Also: add a `setup` stage to `run_ad`'s Chairmarks pipeline that
deep-copies `params` per sample, matching Mooncake's harness — setup
runs before the timed window, so the copy is excluded from
measurements. Widen `combos` to a typed `Tuple{...}[]` so it accepts
models with non-default contexts (e.g. the `condition`-wrapped
`loop_univariate`/`multivariate` rows).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…apsible full table

LDA's discrete `Categorical` RVs make `linked = false` ill-defined for
gradient-based AD, so all four backends previously errored on those
rows. Skip them at combination time, leaving 68 rows.

In markdown mode, emit a `### Gist: Smorgasbord` block with just that
model's eight rows (Smorgasbord covers the broadest set of DPPL
features, so it is the most informative single row band), then put the
full 68-row table inside `<details><summary>` so it is collapsed by
default in GitHub PR comments. Plain (non-markdown) output is
unchanged. Drop the now-redundant `### Benchmark Results` heading from
the workflow body since the script emits its own.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Required as a direct dep so the benchmarks project resolves cleanly
without a manual `Pkg.resolve()` after `Pkg.instantiate()`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yebai yebai force-pushed the adproblems-interface branch from 976a8eb to 49b1837 Compare May 4, 2026 22:52
yebai and others added 5 commits May 5, 2026 10:53
The Benchmarking comment now reads top-down as: SHA in the heading,
Smorgasbord gist as a level-3 section, explanatory paragraph, full
table as a level-3 section, and computer info as a foldable <details>
with an inline <b>-styled summary (avoids markdown inside <details>,
which renders inconsistently). The two integration jobs added to
CI.yml were still on setup-julia@v2; bumped to @V3 for consistency
with the main job.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Splits Benchmarking.yml into three jobs (benchmark-pr, benchmark-main,
post-comment). The PR-comment now shows PR-head numbers up top and
main's numbers in a foldout below, with a column legend and labelled
SHAs. If benchmark-main fails (e.g. transitionally before this branch
lands on main, since main's bench script does not yet support markdown
mode), post-comment still posts PR-head numbers and notes the main job
result inline.

Workflow hardening: concurrency group cancels superseded PR runs,
60min timeout per bench job, explicit pull-requests: write permission
so fork PRs can post comments.

Body assembly moved out of the YAML literal block scalar into a shell
heredoc using `body-path:` (peter-evans pattern), avoiding env-var
multi-line interpolation. Bench script no longer prints the obsolete
"compare against main-branch PR run" paragraph -- main is now in the
same comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	.github/workflows/Benchmarking.yml
#	benchmarks/Project.toml
#	benchmarks/benchmarks.jl
#	src/test_utils/ad.jl
The 68-row table dominates the PR comment vertically. Wrap it in a
<details><summary> block so it is collapsed by default, leaving the
Smorgasbord gist as the at-a-glance view.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Long-form table (one row per (model, linked, backend)) becomes a
pivoted table where each (Model, Dim, Linked) row spans all four AD
backends as columns. 68 rows collapse to 17, the gist/full-table split
goes away (single unified table), and the gist constant + foldout
wrapper in benchmarks/benchmarks.jl are dropped.

`t(logdensity)` does not depend on the AD backend, so the four primal
samples per group are noise around a common value — take the minimum
as the most stable estimate. Per-backend `err` cells render
independently of the primal column.

Workflow comment restructure to match: SHA goes inline in the H2
title, "PR head"/"Main" preamble lines drop, main-branch results live
under their own H3 with a "Click to see" foldout (and a fallback H3
note if benchmark-main failed), Computer Information is its own H3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yebai and others added 8 commits May 19, 2026 12:32
Switch the `evaluator-interface` revision overrides in every Project.toml
to `main` now that the evaluator API has landed there. Also drop redundant
parens in a `@model` Type default in benchmarks/benchmarks.jl.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the LogDensityAt struct with a deprecation shim that emits
`Base.depwarn` and returns an `AbstractPPL.Evaluators.VectorEvaluator`
wrapping a closure over `logdensity_internal`. The new path is the
sanctioned one now that AD prep flows through `AbstractPPL.prepare`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the closure-over-state form with the new `context::Tuple` kwarg
on `AbstractPPL.prepare`, which threads constants through to the problem
function. Equivalent semantics, no closure construction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pass `check_dims=false` to match the pre-deprecation `LogDensityAt`'s
unchecked call behavior; callers that relied on it shouldn't suddenly
hit a `DimensionMismatch`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move test/integration/{enzyme,reversediff}/ to test/ext/{DynamicPPLEnzymeCoreExt,DynamicPPLReverseDiffExt}/ and convert the MarginalLogDensities split (subdir env + sibling script) to a single test/ext/DynamicPPLMarginalLogDensitiesExt/{Project.toml,main.jl}.

Each integration env now follows the same layout. CI.yml updated accordingly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove the `AbstractPPL = {url = ..., rev = "main"}` overrides from all
eight Project.toml files. Compat is already at 0.15, so the resolver
picks up the registered release.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pull Bot pushed a commit to Stars1233/DynamicPPL.jl that referenced this pull request May 19, 2026
Fresh-buffer benchmark setup
- In `DynamicPPL.TestUtils.AD.run_ad`, use `setup = deepcopy($params)`
for both the primal and gradient `@be ` calls so each Chairmarks sample
starts from a fresh input buffer instead of reusing the same vector
across samples. Matches Mooncake's bench harness; setup runs outside the
timed window, so the copy isn't measured.

Plus some minor tweaks. 


~~This appears to have a noticeable impact on log-density evaluation for
tiny models, first noticed in TuringLang#1363~~: e.g., `simple assume
observe(linked = true)` logdensity improves from ~20 ns to ~4 ns. In
practise, these tiny nanosecond overheads won't matter at all, so the
improvement here is only for perfectionists.

EDIT: likely `LogDensityAt` -> `logdensity_internal` reduced
logdensity's overhead by 20 ns in TuringLang#1363

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yebai yebai marked this pull request as ready for review May 21, 2026 19:54
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yebai yebai requested a review from shravanngoswamii May 22, 2026 09:11
Copy link
Copy Markdown
Member

@shravanngoswamii shravanngoswamii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yebai few suggestions and improvement comments.

Comment thread src/logdensityfunction.jl
Comment thread src/logdensityfunction.jl Outdated
Comment thread src/logdensityfunction.jl Outdated
Comment thread src/logdensityfunction.jl Outdated
@shravanngoswamii
Copy link
Copy Markdown
Member

Mooncake and Enzyme are faster on this PR. So the 10× regression reported in comments are no longer valid.

main branch d2052a1:
Model                      dim    linked     primal    FwdDiff    RvsDiff    Mooncake    Enzyme
-----------------------------------------------------------------------------------------------
Simple assume observe*       1     false    3.09 ns      12.60    1542.87       40.05      8.81
Simple assume observe*       1      true    3.09 ns      12.88    1690.22       43.55      9.27

This PR 3bdeb74:
Model                      dim    linked     primal    FwdDiff    RvsDiff    Mooncake    Enzyme
-----------------------------------------------------------------------------------------------
Simple assume observe*       1     false    3.26 ns      12.76    2101.00       37.30      8.23
Simple assume observe*       1      true    3.02 ns      13.37    1832.40       38.87      8.85

My system:

julia> versioninfo()
Julia Version 1.12.6
Commit 15346901f00 (2026-04-09 19:20 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × 12th Gen Intel(R) Core(TM) i5-12500H
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, alderlake)
  GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 16 virtual cores)

julia> 

Comment thread test/logdensityfunction.jl
yebai and others added 3 commits May 25, 2026 10:54
Co-authored-by: Shravan Goswami <123811742+shravanngoswamii@users.noreply.github.com>
Co-authored-by: Shravan Goswami <123811742+shravanngoswamii@users.noreply.github.com>
Drop redundant inner `using AbstractPPL: AbstractPPL` (already imported
at the top of the file) and reuse `ldf.model` instead of constructing a
second `tiny()` instance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yebai
Copy link
Copy Markdown
Member Author

yebai commented May 25, 2026

Mooncake and Enzyme are faster on this PR. So the 10× regression reported in comments are no longer valid.

To clarify, these metrics were never valid; they arose from a misunderstanding. For some models, log-density evaluations occur on the order of nanoseconds, making the gradient-to-primal ratios highly noisy across runs and machines. For these tiny models, the primal and gradient timings themselves are more reliable. More generally, these benchmarks are intended primarily to catch obvious allocation or type-stability regressions and are meant for DynamicPPL developers only.

yebai and others added 2 commits May 25, 2026 11:31
Replace the alarmist warning with a gentler note clarifying that the
benchmarks are aimed at DynamicPPL developers and primarily catch
allocation or type-stability regressions. Move the `*`-row explanation
out of the per-PR comment into the README's interpretation notes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yebai yebai added this pull request to the merge queue May 25, 2026
Merged via the queue into main with commit a115fa8 May 25, 2026
23 of 24 checks passed
@yebai yebai deleted the adproblems-interface branch May 25, 2026 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants