fix: resolve SparseDistribution crash on zero/NaN probability mass#83
fix: resolve SparseDistribution crash on zero/NaN probability mass#83Jonathangadeaharder wants to merge 1 commit into
Conversation
4c2dc4f to
e4cf880
Compare
xlinbsd
left a comment
There was a problem hiding this comment.
Reproduced on v0.3.7 — manual patch from PR #83 works**
Environment: M5 Pro 64 GB, macOS 26.5, Youssofal/Qwen3.6-27B-MTPLX-Optimized-Quality, --profile sustained, --reasoning off.
After a few minutes of agentic sessions the server crashes with ValueError: SparseDistribution probabilities must have positive mass, and the client gets Connection reset by peer (os error 54).
Applied the patch from PR #83 manually to site-packages/mtplx/sampling.py — two changes:
1. SparseDistribution.__post_init__ — fallback instead of raise:
# before
if not np.isfinite(total) or total <= 0:
raise ValueError("SparseDistribution probabilities must have positive mass")
# after (mirrors fast_sampling.py which already has this fix)
if not np.isfinite(total) or total <= 0:
token_ids = token_ids[:1] if token_ids.size > 0 else np.array([0], dtype=np.int64)
probs = np.array([1.0], dtype=np.float64)
total = 1.02. residual_distribution — add NaN guard to all 4 if total <= 0: checks:
if total <= 0 or not np.isfinite(total):Sessions are now stable. Note that fast_sampling.py already has the equivalent fallback at lines 98-100, 151-153 and 208-211 — sampling.py was just missing it.
Please merge PR #83.
Resolves #82
Changes
SparseDistributionConstructor: MadeSparseDistribution.__post_init__fall back to a valid one-hot distribution on the first available token (greedy choice) if the sum of the probabilities is not finite or<= 0, rather than raising a crash-inducingValueError.residual_distribution: Addednot np.isfinite(total)checks to the fallback paths inresidual_distribution(sparse, dense, and non-sparse branches) to preventNaNvalues from bypassing target fallbacks.tests/test_sampling.pyverifying fallback recovery on zero/NaN probability mass, as well as a mock test forresidual_distributionNaN robustness.All tests pass successfully.