Skip to content

feat(processor): voice-activity-driven gate with adaptive afftdn and band measurement#130

Merged
flexiondotorg merged 8 commits into
mainfrom
gate
Jun 17, 2026
Merged

feat(processor): voice-activity-driven gate with adaptive afftdn and band measurement#130
flexiondotorg merged 8 commits into
mainfrom
gate

Conversation

@flexiondotorg

Copy link
Copy Markdown
Contributor

Summary

Redesigns the speech gate from aggression-tiered heuristics to a voiced-anchored
threshold model, unifies the noise-floor seed onto the K-weighted momentary LUFS
axis, and adds adaptive afftdn with custom noise profile generation from
measured room tone. Pass 1 now parallelises 17 post-loop band measurements
across CPU cores while keeping concurrent decoders capped at core count, and
removes the unused room-tone-scan-duration CLI flag.

Changes

Speech gate and noise-floor seed:

  • Implement voice-activity-driven speech gate redesign: threshold anchors to
    voiced p10 minus 6 dB margin (no voice attenuation); depth fixed 14 dB
    (transparent band), reduces to 8 dB on narrow gaps (separation < 12 dB) to
    prevent floor pumping; release fixed 200 ms (hold folded in), attack 5 ms,
    ratio capped 2.0 (gentle 1.5 for wide LRA > 15 LU); remove old aggression
    tiers and proportional depth
  • Unify VAD noise seed onto momentary LUFS axis with floored-interval guard,
    reducing speechMinimumNoiseMarginDB from 6.0 to 2.0 to match the new axis;
    voice-activated captures fall back to sentinel instead of degenerating to
    -120 dBFS
  • Remove SpeechGateAggression diagnostic (no longer computed) and update
    buildSpeechGateFilter comments to reflect current design

Adaptive afftdn and custom profiles:

  • Disable afftdn when Noise.VoiceActivated is true; platform-gated captures
    have digital-silence gaps where afftdn warbles on true silence and offers no
    floor benefit
  • Seed afftdn nf from measured Noise.Floor (momentary LUFS, re-clamped to
    [-80, -20]) with track_noise off, so it subtracts against measured floor
    instead of self-tracking
  • Give afftdn a measured custom noise profile: measureNoiseBands reads 15-band
    RMS spectrum of elected room-tone region; buildAfftdnBandNoise emits
    relative shape (clip(band - mean, +-24 dB)); conditional on not voice-
    activated, separation >= 12 dB, and flatness >= 0.45
  • Add new diagnostics: afftdn_enabled, afftdn_noise_floor_db,
    afftdn_disable_reason, afftdn_noise_type, afftdn_band_noise

Band measurement parallelisation:

  • Add runBandMeasurements with bounded goroutines (core-count semaphore) to
    fan 17 post-loop band decodes (2 speech, 15 noise) across cores while
    keeping concurrent decoders capped; measured RMS values bit-identical to
    bandProgressTracker (atomic counter) emitting per-band updates
  • Fix realtime-speed badge (⚡ ×N) in renderTimeline: speedFraction un-scales
    capped progress to true fraction so badge reflects actual realtime speed

Cleanup:

  • Remove --room-tone-scan-duration CLI flag and configuration (whole-file scan
    only; duration cap added no practical value)

Documentation and testing:

  • Update AGENTS.md architecture diagram and Pass 1 spec for bounded-goroutine
    discipline and speech-aware metrics
  • Update Pipeline.md and Levelator-Comparison-And-Gap-Analysis.md for new
    gate and afftdn strategy
  • Expand adaptive_test.go to cover gate threshold/depth/ratio, afftdn
    enable/disable, and custom profile generation
  • Add analyser_band_runner_test.go for bounded-goroutine band measurement
  • Add afftdn_custom_graph_test.go for custom profile filter spec validation
  • Update existing reporter and UI tests for new diagnostics

Corpus validation:

  • A/B tested gate redesign on 55-stem corpus: candidate sorting deterministic,
    de-esser sibilance detection stable, gate window stable on silence and speech
    overlap
  • A/B tested afftdn custom profile: 50 stems custom, 2 white fallback, 3
    disabled (voice-activated); 36 improved / 14 unchanged / 0 regressed, no
    warble (e.g. BF-08-stephen under-speech floor down ~7 dB)
  • Noise seed unification: 53/55 stems byte-identical, 3 voice-activated TT-202
    stems return to baseline floor instead of degenerate -120 dBFS seed

Testing

Build, test, lint pass. Validation corpus measurements documented in
gate-branch research and A/B sweep artefacts (uncommitted). Voice-activated
platform-gated captures (TT-202) restore to expected 0 dB floor baseline.

  Pass 1 now measures and exports voiced percentiles (p10), noise percentiles
  (p95), and their separation; speech gate threshold anchors to voiced p10
  minus 6 dB margin, ensuring no voice attenuation. Gate depth fixed 14 dB
  (transparent band midpoint), reducing to 8 dB on narrow gaps (separation
  < 12 dB) to prevent floor pumping; old aggression tiers and proportional
  depth removed. Release fixed 200 ms (hold folded in), attack 5 ms, ratio
  capped 2.0 (gentle 1.5 for wide LRA), gentle mode override deleted. TUI
  status row renamed from "Soft gate" to "Gate depth". AGENTS.md and
  Pipeline.md updated for the new threshold/depth/release strategy and
  speech-aware metrics. Acceptance criteria validated on 55-stem corpus:
  candidate sorting deterministic, de-esser engages on sibilance (band
  excess), gate window stable on silence and speech overlap.

Signed-off-by: Martin Wimpress <code@wimpress.io>
…sync gate filter comments

  - Remove SpeechGateAggression field from AdaptiveDiagnostics (no longer computed)
  - Remove its zero-initialisation in tuneSpeechGate
  - Update buildSpeechGateFilter comments to reflect current design (LRA-based ratio, fixed attack/release/knee)
  - Reference tuneSpeechGate directly instead of generic "adaptive.go"

Signed-off-by: Martin Wimpress <code@wimpress.io>
  The noise-floor seed read unweighted RMS while the VAD split, floor, and
  the protective margin run on K-weighted momentary LUFS, so a single 6 dB
  margin spanned two scales and miscalibrated the noise floor by the
  per-file spectral offset. The seed also lacked a floored-interval guard,
  so voice-activated captures degenerated it to -120 dBFS.

  - Seed now reads MomentaryLUFS, with a floored-interval guard that
    excludes digital silence; fully-gated captures fall back to a
    non-clamping sentinel so the Otsu split is placed freely.
  - speechMinimumNoiseMarginDB 6.0 -> 2.0: the former 6 encoded the
    RMS-to-LUFS offset and over-clamped the split once the seed moved
    onto the LUFS axis.

  Calibration fix: 53/55 corpus stems byte-identical, the noise floor
  moves onto the honest axis, and the three voice-activated TT-202 stems
  return to their baseline floor instead of the degenerate -120 seed. Two
  low-separation stems re-split a shade gentler (lower threshold, never
  into speech). Tests updated; build, test, lint green.

Signed-off-by: Martin Wimpress <code@wimpress.io>
… from measured floor

  Noise reduction was fixed; add a tuneNoiseReduction step that adapts the
  afftdn FFT stage per file from Pass 1 measurements:

  - Disable afftdn when Noise.VoiceActivated is true. Platform-gated
    captures have digital-silence gaps, so afftdn has no floor to lower
    and track_noise warbles on true silence. Corpus measurement: afftdn
    helps 11 stems, is neutral on 39, and harms only the voice-activated
    captures, so it is dropped exactly there (anlmdn alone; the Denoise
    row reads "NLM") and left on everywhere it helps.
  - When enabled, set afftdn nf from the measured Noise.Floor (momentary
    -LUFS, re-clamped to afftdn's [-80, -20]) with track_noise off, so it
    subtracts against the measured floor instead of self-tracking. A/B
    validated: floor about 1 dB deeper on average, speech identical, no
    added warble.

  nr stays fixed at 12. New diagnostics: afftdn_enabled,
  afftdn_noise_floor_db, afftdn_disable_reason. Report and docs updated;
  build, test, lint green.

Signed-off-by: Martin Wimpress <code@wimpress.io>
  When the elected room-tone region is trustworthy, afftdn now runs
  nt=custom with a per-band noise profile measured from that room tone,
  instead of assuming a flat white spectrum, so it subtracts noise
  matching the actual room colour.

  - measureNoiseBands: region-scoped 15-band RMS decode of the elected
    room-tone region (afftdn band centres 80 Hz to 24 kHz), stored on
    NoiseProfile. buildAfftdnBandNoise emits the relative bn shape
    (clip(band - mean, +-24 dB)); the measured nf still carries the
    level and nr=12 the depth.
  - Non-finite bands (the 24 kHz band sits above the band-limit and
    Nyquist) are excluded from the mean and emitted flat, never NaN;
    BandsMeasured requires at least 10 of 15 finite bands.
  - Conditional: custom only when not voice-activated, gate separation
    >= 12 dB, and room-tone flatness >= 0.45; otherwise the white + nf
    path. Voice-activated captures keep afftdn disabled.

  A/B vs the white+nf path: 50 stems custom, 2 white fallback, 3
  disabled; 36 improved, 14 unchanged, 0 regressed, no warble. The afftdn
  HELPS stems improved (BF-08-stephen under-speech floor down ~7 dB).
  New diagnostics afftdn_noise_type and afftdn_band_noise; report and docs
  updated. build, test, lint green.

Signed-off-by: Martin Wimpress <code@wimpress.io>
…nded goroutines

  Adds runBandMeasurements (shared semaphore sized runtime.NumCPU) to fan the 17
  post-loop band decodes (2 speech via measureSpeechBands, 15 noise via
  measureNoiseBands) across cores while keeping concurrent decoders capped at the
  core count. Each band opens its own reader and writes only its own result slot,
  so no mutable state is shared and measured RMS values stay bit-identical to the
  serial path (scheduling only changed).

  Caps Pass 1 decode-loop progress at 0.95 and reserves 0.95..1.0 for the band
  phase, with a bandProgressTracker (atomic counter, monotonic) emitting per-band
  ProgressUpdates so the progress bar advances smoothly from decode into bands.
  Early-return band functions drain their progress slots via drainBandProgress so
  the phase reaches 1.0 even when a band fails or the profile is empty.

  Fixes the realtime-speed badge (⚡ ×N) in renderTimeline: Pass 1's capped
  progress under-reported decode throughput, so the badge showed false slowdowns
  during the fast band phase. speedFraction un-scales the capped progress back to
  true fraction (1.0 at the cap, clamped thereafter) so the badge reflects actual
  realtime speed across all passes.

  Updates AGENTS.md architecture diagram and Pass 1 spec to document the
  bounded-goroutine discipline and progress mapping.

Signed-off-by: Martin Wimpress <code@wimpress.io>
  Remove the optional room-tone-scan duration cap from the CLI, configuration,
  and analyser. This feature allowed limiting the noise-floor seed scan to an
  input prefix for faster processing on long files. With the default 0s (whole-
  file scan) as the only remaining behaviour, the cap adds no practical value.

Signed-off-by: Martin Wimpress <code@wimpress.io>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 39 files

Confidence score: 5/5

  • In internal/report/report_full.md.golden, the adaptation diagnostics currently hide the afftdn noise model, which could reduce observability when reviewing adaptation behavior after merge; update the golden/report output to include afftdn before merging to keep diagnostics complete.

Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Re-trigger cubic

Comment thread internal/report/report_full.md.golden Outdated
Set AfftdnNoiseType in the test fixture to match production AdaptConfig
output, resolving a blank cell in the Adaptation diagnostics table that
conflicted with the Noise removal table's correct `w` value.

Signed-off-by: Martin Wimpress <code@wimpress.io>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Requires human review: Major refactoring of speech gate, noise reduction, and band measurement logic with 3694 lines changed. Core audio processing pipeline modified.

Re-trigger cubic

@flexiondotorg flexiondotorg merged commit 2f7f4ed into main Jun 17, 2026
16 checks passed
@flexiondotorg flexiondotorg deleted the gate branch June 17, 2026 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant