feat(processor): voice-activity-driven gate with adaptive afftdn and band measurement#130
Merged
Conversation
Pass 1 now measures and exports voiced percentiles (p10), noise percentiles (p95), and their separation; speech gate threshold anchors to voiced p10 minus 6 dB margin, ensuring no voice attenuation. Gate depth fixed 14 dB (transparent band midpoint), reducing to 8 dB on narrow gaps (separation < 12 dB) to prevent floor pumping; old aggression tiers and proportional depth removed. Release fixed 200 ms (hold folded in), attack 5 ms, ratio capped 2.0 (gentle 1.5 for wide LRA), gentle mode override deleted. TUI status row renamed from "Soft gate" to "Gate depth". AGENTS.md and Pipeline.md updated for the new threshold/depth/release strategy and speech-aware metrics. Acceptance criteria validated on 55-stem corpus: candidate sorting deterministic, de-esser engages on sibilance (band excess), gate window stable on silence and speech overlap. Signed-off-by: Martin Wimpress <code@wimpress.io>
…sync gate filter comments - Remove SpeechGateAggression field from AdaptiveDiagnostics (no longer computed) - Remove its zero-initialisation in tuneSpeechGate - Update buildSpeechGateFilter comments to reflect current design (LRA-based ratio, fixed attack/release/knee) - Reference tuneSpeechGate directly instead of generic "adaptive.go" Signed-off-by: Martin Wimpress <code@wimpress.io>
The noise-floor seed read unweighted RMS while the VAD split, floor, and
the protective margin run on K-weighted momentary LUFS, so a single 6 dB
margin spanned two scales and miscalibrated the noise floor by the
per-file spectral offset. The seed also lacked a floored-interval guard,
so voice-activated captures degenerated it to -120 dBFS.
- Seed now reads MomentaryLUFS, with a floored-interval guard that
excludes digital silence; fully-gated captures fall back to a
non-clamping sentinel so the Otsu split is placed freely.
- speechMinimumNoiseMarginDB 6.0 -> 2.0: the former 6 encoded the
RMS-to-LUFS offset and over-clamped the split once the seed moved
onto the LUFS axis.
Calibration fix: 53/55 corpus stems byte-identical, the noise floor
moves onto the honest axis, and the three voice-activated TT-202 stems
return to their baseline floor instead of the degenerate -120 seed. Two
low-separation stems re-split a shade gentler (lower threshold, never
into speech). Tests updated; build, test, lint green.
Signed-off-by: Martin Wimpress <code@wimpress.io>
… from measured floor
Noise reduction was fixed; add a tuneNoiseReduction step that adapts the
afftdn FFT stage per file from Pass 1 measurements:
- Disable afftdn when Noise.VoiceActivated is true. Platform-gated
captures have digital-silence gaps, so afftdn has no floor to lower
and track_noise warbles on true silence. Corpus measurement: afftdn
helps 11 stems, is neutral on 39, and harms only the voice-activated
captures, so it is dropped exactly there (anlmdn alone; the Denoise
row reads "NLM") and left on everywhere it helps.
- When enabled, set afftdn nf from the measured Noise.Floor (momentary
-LUFS, re-clamped to afftdn's [-80, -20]) with track_noise off, so it
subtracts against the measured floor instead of self-tracking. A/B
validated: floor about 1 dB deeper on average, speech identical, no
added warble.
nr stays fixed at 12. New diagnostics: afftdn_enabled,
afftdn_noise_floor_db, afftdn_disable_reason. Report and docs updated;
build, test, lint green.
Signed-off-by: Martin Wimpress <code@wimpress.io>
When the elected room-tone region is trustworthy, afftdn now runs
nt=custom with a per-band noise profile measured from that room tone,
instead of assuming a flat white spectrum, so it subtracts noise
matching the actual room colour.
- measureNoiseBands: region-scoped 15-band RMS decode of the elected
room-tone region (afftdn band centres 80 Hz to 24 kHz), stored on
NoiseProfile. buildAfftdnBandNoise emits the relative bn shape
(clip(band - mean, +-24 dB)); the measured nf still carries the
level and nr=12 the depth.
- Non-finite bands (the 24 kHz band sits above the band-limit and
Nyquist) are excluded from the mean and emitted flat, never NaN;
BandsMeasured requires at least 10 of 15 finite bands.
- Conditional: custom only when not voice-activated, gate separation
>= 12 dB, and room-tone flatness >= 0.45; otherwise the white + nf
path. Voice-activated captures keep afftdn disabled.
A/B vs the white+nf path: 50 stems custom, 2 white fallback, 3
disabled; 36 improved, 14 unchanged, 0 regressed, no warble. The afftdn
HELPS stems improved (BF-08-stephen under-speech floor down ~7 dB).
New diagnostics afftdn_noise_type and afftdn_band_noise; report and docs
updated. build, test, lint green.
Signed-off-by: Martin Wimpress <code@wimpress.io>
…nded goroutines Adds runBandMeasurements (shared semaphore sized runtime.NumCPU) to fan the 17 post-loop band decodes (2 speech via measureSpeechBands, 15 noise via measureNoiseBands) across cores while keeping concurrent decoders capped at the core count. Each band opens its own reader and writes only its own result slot, so no mutable state is shared and measured RMS values stay bit-identical to the serial path (scheduling only changed). Caps Pass 1 decode-loop progress at 0.95 and reserves 0.95..1.0 for the band phase, with a bandProgressTracker (atomic counter, monotonic) emitting per-band ProgressUpdates so the progress bar advances smoothly from decode into bands. Early-return band functions drain their progress slots via drainBandProgress so the phase reaches 1.0 even when a band fails or the profile is empty. Fixes the realtime-speed badge (⚡ ×N) in renderTimeline: Pass 1's capped progress under-reported decode throughput, so the badge showed false slowdowns during the fast band phase. speedFraction un-scales the capped progress back to true fraction (1.0 at the cap, clamped thereafter) so the badge reflects actual realtime speed across all passes. Updates AGENTS.md architecture diagram and Pass 1 spec to document the bounded-goroutine discipline and progress mapping. Signed-off-by: Martin Wimpress <code@wimpress.io>
Remove the optional room-tone-scan duration cap from the CLI, configuration, and analyser. This feature allowed limiting the noise-floor seed scan to an input prefix for faster processing on long files. With the default 0s (whole- file scan) as the only remaining behaviour, the cap adds no practical value. Signed-off-by: Martin Wimpress <code@wimpress.io>
Contributor
There was a problem hiding this comment.
1 issue found across 39 files
Confidence score: 5/5
- In
internal/report/report_full.md.golden, the adaptation diagnostics currently hide the afftdn noise model, which could reduce observability when reviewing adaptation behavior after merge; update the golden/report output to include afftdn before merging to keep diagnostics complete.
Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.
Re-trigger cubic
Set AfftdnNoiseType in the test fixture to match production AdaptConfig output, resolving a blank cell in the Adaptation diagnostics table that conflicted with the Noise removal table's correct `w` value. Signed-off-by: Martin Wimpress <code@wimpress.io>
Contributor
There was a problem hiding this comment.
0 issues found across 2 files (changes from recent commits).
Requires human review: Major refactoring of speech gate, noise reduction, and band measurement logic with 3694 lines changed. Core audio processing pipeline modified.
Re-trigger cubic
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Redesigns the speech gate from aggression-tiered heuristics to a voiced-anchored
threshold model, unifies the noise-floor seed onto the K-weighted momentary LUFS
axis, and adds adaptive afftdn with custom noise profile generation from
measured room tone. Pass 1 now parallelises 17 post-loop band measurements
across CPU cores while keeping concurrent decoders capped at core count, and
removes the unused room-tone-scan-duration CLI flag.
Changes
Speech gate and noise-floor seed:
voiced p10 minus 6 dB margin (no voice attenuation); depth fixed 14 dB
(transparent band), reduces to 8 dB on narrow gaps (separation < 12 dB) to
prevent floor pumping; release fixed 200 ms (hold folded in), attack 5 ms,
ratio capped 2.0 (gentle 1.5 for wide LRA > 15 LU); remove old aggression
tiers and proportional depth
reducing speechMinimumNoiseMarginDB from 6.0 to 2.0 to match the new axis;
voice-activated captures fall back to sentinel instead of degenerating to
-120 dBFS
buildSpeechGateFilter comments to reflect current design
Adaptive afftdn and custom profiles:
have digital-silence gaps where afftdn warbles on true silence and offers no
floor benefit
[-80, -20]) with track_noise off, so it subtracts against measured floor
instead of self-tracking
RMS spectrum of elected room-tone region; buildAfftdnBandNoise emits
relative shape (clip(band - mean, +-24 dB)); conditional on not voice-
activated, separation >= 12 dB, and flatness >= 0.45
afftdn_disable_reason, afftdn_noise_type, afftdn_band_noise
Band measurement parallelisation:
fan 17 post-loop band decodes (2 speech, 15 noise) across cores while
keeping concurrent decoders capped; measured RMS values bit-identical to
bandProgressTracker (atomic counter) emitting per-band updates
capped progress to true fraction so badge reflects actual realtime speed
Cleanup:
only; duration cap added no practical value)
Documentation and testing:
discipline and speech-aware metrics
gate and afftdn strategy
enable/disable, and custom profile generation
Corpus validation:
de-esser sibilance detection stable, gate window stable on silence and speech
overlap
disabled (voice-activated); 36 improved / 14 unchanged / 0 regressed, no
warble (e.g. BF-08-stephen under-speech floor down ~7 dB)
stems return to baseline floor instead of degenerate -120 dBFS seed
Testing
Build, test, lint pass. Validation corpus measurements documented in
gate-branch research and A/B sweep artefacts (uncommitted). Voice-activated
platform-gated captures (TT-202) restore to expected 0 dB floor baseline.