feat(processor): voice-activity-driven gate with adaptive afftdn and band measurement by flexiondotorg · Pull Request #130 · linuxmatters/jivetalking

flexiondotorg · 2026-06-17T13:00:19Z

Summary

Redesigns the speech gate from aggression-tiered heuristics to a voiced-anchored
threshold model, unifies the noise-floor seed onto the K-weighted momentary LUFS
axis, and adds adaptive afftdn with custom noise profile generation from
measured room tone. Pass 1 now parallelises 17 post-loop band measurements
across CPU cores while keeping concurrent decoders capped at core count, and
removes the unused room-tone-scan-duration CLI flag.

Changes

Speech gate and noise-floor seed:

Implement voice-activity-driven speech gate redesign: threshold anchors to
voiced p10 minus 6 dB margin (no voice attenuation); depth fixed 14 dB
(transparent band), reduces to 8 dB on narrow gaps (separation < 12 dB) to
prevent floor pumping; release fixed 200 ms (hold folded in), attack 5 ms,
ratio capped 2.0 (gentle 1.5 for wide LRA > 15 LU); remove old aggression
tiers and proportional depth
Unify VAD noise seed onto momentary LUFS axis with floored-interval guard,
reducing speechMinimumNoiseMarginDB from 6.0 to 2.0 to match the new axis;
voice-activated captures fall back to sentinel instead of degenerating to
-120 dBFS
Remove SpeechGateAggression diagnostic (no longer computed) and update
buildSpeechGateFilter comments to reflect current design

Adaptive afftdn and custom profiles:

Disable afftdn when Noise.VoiceActivated is true; platform-gated captures
have digital-silence gaps where afftdn warbles on true silence and offers no
floor benefit
Seed afftdn nf from measured Noise.Floor (momentary LUFS, re-clamped to
[-80, -20]) with track_noise off, so it subtracts against measured floor
instead of self-tracking
Give afftdn a measured custom noise profile: measureNoiseBands reads 15-band
RMS spectrum of elected room-tone region; buildAfftdnBandNoise emits
relative shape (clip(band - mean, +-24 dB)); conditional on not voice-
activated, separation >= 12 dB, and flatness >= 0.45
Add new diagnostics: afftdn_enabled, afftdn_noise_floor_db,
afftdn_disable_reason, afftdn_noise_type, afftdn_band_noise

Band measurement parallelisation:

Add runBandMeasurements with bounded goroutines (core-count semaphore) to
fan 17 post-loop band decodes (2 speech, 15 noise) across cores while
keeping concurrent decoders capped; measured RMS values bit-identical to
bandProgressTracker (atomic counter) emitting per-band updates
Fix realtime-speed badge (⚡ ×N) in renderTimeline: speedFraction un-scales
capped progress to true fraction so badge reflects actual realtime speed

Cleanup:

Remove --room-tone-scan-duration CLI flag and configuration (whole-file scan
only; duration cap added no practical value)

Documentation and testing:

Update AGENTS.md architecture diagram and Pass 1 spec for bounded-goroutine
discipline and speech-aware metrics
Update Pipeline.md and Levelator-Comparison-And-Gap-Analysis.md for new
gate and afftdn strategy
Expand adaptive_test.go to cover gate threshold/depth/ratio, afftdn
enable/disable, and custom profile generation
Add analyser_band_runner_test.go for bounded-goroutine band measurement
Add afftdn_custom_graph_test.go for custom profile filter spec validation
Update existing reporter and UI tests for new diagnostics

Corpus validation:

A/B tested gate redesign on 55-stem corpus: candidate sorting deterministic,
de-esser sibilance detection stable, gate window stable on silence and speech
overlap
A/B tested afftdn custom profile: 50 stems custom, 2 white fallback, 3
disabled (voice-activated); 36 improved / 14 unchanged / 0 regressed, no
warble (e.g. BF-08-stephen under-speech floor down ~7 dB)
Noise seed unification: 53/55 stems byte-identical, 3 voice-activated TT-202
stems return to baseline floor instead of degenerate -120 dBFS seed

Testing

Build, test, lint pass. Validation corpus measurements documented in
gate-branch research and A/B sweep artefacts (uncommitted). Voice-activated
platform-gated captures (TT-202) restore to expected 0 dB floor baseline.

Pass 1 now measures and exports voiced percentiles (p10), noise percentiles (p95), and their separation; speech gate threshold anchors to voiced p10 minus 6 dB margin, ensuring no voice attenuation. Gate depth fixed 14 dB (transparent band midpoint), reducing to 8 dB on narrow gaps (separation < 12 dB) to prevent floor pumping; old aggression tiers and proportional depth removed. Release fixed 200 ms (hold folded in), attack 5 ms, ratio capped 2.0 (gentle 1.5 for wide LRA), gentle mode override deleted. TUI status row renamed from "Soft gate" to "Gate depth". AGENTS.md and Pipeline.md updated for the new threshold/depth/release strategy and speech-aware metrics. Acceptance criteria validated on 55-stem corpus: candidate sorting deterministic, de-esser engages on sibilance (band excess), gate window stable on silence and speech overlap. Signed-off-by: Martin Wimpress <code@wimpress.io>

…sync gate filter comments - Remove SpeechGateAggression field from AdaptiveDiagnostics (no longer computed) - Remove its zero-initialisation in tuneSpeechGate - Update buildSpeechGateFilter comments to reflect current design (LRA-based ratio, fixed attack/release/knee) - Reference tuneSpeechGate directly instead of generic "adaptive.go" Signed-off-by: Martin Wimpress <code@wimpress.io>

The noise-floor seed read unweighted RMS while the VAD split, floor, and the protective margin run on K-weighted momentary LUFS, so a single 6 dB margin spanned two scales and miscalibrated the noise floor by the per-file spectral offset. The seed also lacked a floored-interval guard, so voice-activated captures degenerated it to -120 dBFS. - Seed now reads MomentaryLUFS, with a floored-interval guard that excludes digital silence; fully-gated captures fall back to a non-clamping sentinel so the Otsu split is placed freely. - speechMinimumNoiseMarginDB 6.0 -> 2.0: the former 6 encoded the RMS-to-LUFS offset and over-clamped the split once the seed moved onto the LUFS axis. Calibration fix: 53/55 corpus stems byte-identical, the noise floor moves onto the honest axis, and the three voice-activated TT-202 stems return to their baseline floor instead of the degenerate -120 seed. Two low-separation stems re-split a shade gentler (lower threshold, never into speech). Tests updated; build, test, lint green. Signed-off-by: Martin Wimpress <code@wimpress.io>

… from measured floor Noise reduction was fixed; add a tuneNoiseReduction step that adapts the afftdn FFT stage per file from Pass 1 measurements: - Disable afftdn when Noise.VoiceActivated is true. Platform-gated captures have digital-silence gaps, so afftdn has no floor to lower and track_noise warbles on true silence. Corpus measurement: afftdn helps 11 stems, is neutral on 39, and harms only the voice-activated captures, so it is dropped exactly there (anlmdn alone; the Denoise row reads "NLM") and left on everywhere it helps. - When enabled, set afftdn nf from the measured Noise.Floor (momentary -LUFS, re-clamped to afftdn's [-80, -20]) with track_noise off, so it subtracts against the measured floor instead of self-tracking. A/B validated: floor about 1 dB deeper on average, speech identical, no added warble. nr stays fixed at 12. New diagnostics: afftdn_enabled, afftdn_noise_floor_db, afftdn_disable_reason. Report and docs updated; build, test, lint green. Signed-off-by: Martin Wimpress <code@wimpress.io>

When the elected room-tone region is trustworthy, afftdn now runs nt=custom with a per-band noise profile measured from that room tone, instead of assuming a flat white spectrum, so it subtracts noise matching the actual room colour. - measureNoiseBands: region-scoped 15-band RMS decode of the elected room-tone region (afftdn band centres 80 Hz to 24 kHz), stored on NoiseProfile. buildAfftdnBandNoise emits the relative bn shape (clip(band - mean, +-24 dB)); the measured nf still carries the level and nr=12 the depth. - Non-finite bands (the 24 kHz band sits above the band-limit and Nyquist) are excluded from the mean and emitted flat, never NaN; BandsMeasured requires at least 10 of 15 finite bands. - Conditional: custom only when not voice-activated, gate separation >= 12 dB, and room-tone flatness >= 0.45; otherwise the white + nf path. Voice-activated captures keep afftdn disabled. A/B vs the white+nf path: 50 stems custom, 2 white fallback, 3 disabled; 36 improved, 14 unchanged, 0 regressed, no warble. The afftdn HELPS stems improved (BF-08-stephen under-speech floor down ~7 dB). New diagnostics afftdn_noise_type and afftdn_band_noise; report and docs updated. build, test, lint green. Signed-off-by: Martin Wimpress <code@wimpress.io>

…nded goroutines Adds runBandMeasurements (shared semaphore sized runtime.NumCPU) to fan the 17 post-loop band decodes (2 speech via measureSpeechBands, 15 noise via measureNoiseBands) across cores while keeping concurrent decoders capped at the core count. Each band opens its own reader and writes only its own result slot, so no mutable state is shared and measured RMS values stay bit-identical to the serial path (scheduling only changed). Caps Pass 1 decode-loop progress at 0.95 and reserves 0.95..1.0 for the band phase, with a bandProgressTracker (atomic counter, monotonic) emitting per-band ProgressUpdates so the progress bar advances smoothly from decode into bands. Early-return band functions drain their progress slots via drainBandProgress so the phase reaches 1.0 even when a band fails or the profile is empty. Fixes the realtime-speed badge (⚡ ×N) in renderTimeline: Pass 1's capped progress under-reported decode throughput, so the badge showed false slowdowns during the fast band phase. speedFraction un-scales the capped progress back to true fraction (1.0 at the cap, clamped thereafter) so the badge reflects actual realtime speed across all passes. Updates AGENTS.md architecture diagram and Pass 1 spec to document the bounded-goroutine discipline and progress mapping. Signed-off-by: Martin Wimpress <code@wimpress.io>

Remove the optional room-tone-scan duration cap from the CLI, configuration, and analyser. This feature allowed limiting the noise-floor seed scan to an input prefix for faster processing on long files. With the default 0s (whole- file scan) as the only remaining behaviour, the cap adds no practical value. Signed-off-by: Martin Wimpress <code@wimpress.io>

cubic-dev-ai

1 issue found across 39 files

Confidence score: 5/5

In internal/report/report_full.md.golden, the adaptation diagnostics currently hide the afftdn noise model, which could reduce observability when reviewing adaptation behavior after merge; update the golden/report output to include afftdn before merging to keep diagnostics complete.

_{Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Re-trigger cubic}

Set AfftdnNoiseType in the test fixture to match production AdaptConfig output, resolving a blank cell in the Adaptation diagnostics table that conflicted with the Noise removal table's correct `w` value. Signed-off-by: Martin Wimpress <code@wimpress.io>

cubic-dev-ai

0 issues found across 2 files (changes from recent commits).

_{Requires human review: Major refactoring of speech gate, noise reduction, and band measurement logic with 3694 lines changed. Core audio processing pipeline modified.

Re-trigger cubic}

flexiondotorg added 7 commits June 16, 2026 18:07

cubic-dev-ai Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread internal/report/report_full.md.golden Outdated

cubic-dev-ai Bot reviewed Jun 17, 2026

View reviewed changes

flexiondotorg merged commit 2f7f4ed into main Jun 17, 2026
16 checks passed

flexiondotorg deleted the gate branch June 17, 2026 13:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(processor): voice-activity-driven gate with adaptive afftdn and band measurement#130

feat(processor): voice-activity-driven gate with adaptive afftdn and band measurement#130
flexiondotorg merged 8 commits into
mainfrom
gate

flexiondotorg commented Jun 17, 2026

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

flexiondotorg commented Jun 17, 2026

Summary

Changes

Testing

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cubic-dev-ai Bot left a comment •

edited

Loading