From 869850a7cbb86ec0b5e62a1e797ea3d8d7622815 Mon Sep 17 00:00:00 2001 From: Jamie Kirkpatrick Date: Sat, 18 Apr 2026 11:03:10 +0100 Subject: [PATCH] feat(livekit): auto-flag stereo on audio tracks with num_channels == 2 Without setting TF_STEREO in AddTrackRequest.audio_features (or the deprecated stereo bool), the LiveKit server negotiates the published audio track as mono. libwebrtc's Opus encoder then downmixes any asymmetric stereo content to mono-duplicated-both-channels, so the receiver sees identical L and R on every frame regardless of what the publisher pushed via capture_frame. The JS SDK sets these same flags based on ``MediaStreamTrack.getSettings().channelCount`` or explicit ``opts.forceStereo`` (see livekit/client-sdk-js LocalParticipant.ts ``isStereo`` handling). The Rust SDK doesn't expose an equivalent option and doesn't infer from the audio source's declared ``num_channels``. This patch closes that gap: if the track's underlying source declares ``num_channels == 2``, flag the track as stereo on the AddTrackRequest. ``RtcAudioSource::num_channels()`` is not a public accessor (generated via ``enum_dispatch!``), so we match the ``Native`` variant directly and keep a wildcard arm for the ``#[non_exhaustive]`` enum. Verified end-to-end against a Chrome client via ``MediaStreamTrackProcessor``: before the patch, L == R on every frame with all our content on R; after, L stays at codec floor (-100+ dBFS) while R carries the TTS speech envelope, matching what the publisher pushed. Discussion context: timeline-protocol-v6 stereo marker channel (silent-L / TTS-on-R for sample-aligned marker tones) kept showing identical L and R at the client despite SDP advertising stereo=1 and AudioSource being constructed with num_channels=2. Every workaround we tried on the Python side (APM options, ``SOURCE_SCREENSHARE_AUDIO``, ``max_bitrate`` pinning, native 48 kHz source, disabling libwebrtc voice processing) failed because the server-side SDP negotiation already locked the track to mono before our audio ever reached the encoder. --- .../src/room/participant/local_participant.rs | 20 +++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/livekit/src/room/participant/local_participant.rs b/livekit/src/room/participant/local_participant.rs index d37015dac..c8a638acd 100644 --- a/livekit/src/room/participant/local_participant.rs +++ b/livekit/src/room/participant/local_participant.rs @@ -316,6 +316,26 @@ impl LocalParticipant { req.audio_features.push(proto::AudioTrackFeature::TfPreconnectBuffer as i32); } + // Auto-flag stereo on audio tracks whose underlying source declares + // num_channels == 2. Without this, the server treats the track as + // mono and libwebrtc's Opus path downmixes asymmetric stereo to + // mono-duplicated-both-channels (identical content on L and R). + // This matches the JS client SDK's forceStereo+audioFeatures behaviour. + // RtcAudioSource's num_channels() accessor is private (generated via + // enum_dispatch!), so we match the variant directly. + if let LocalTrack::Audio(audio_track) = &track { + use libwebrtc::audio_source::RtcAudioSource; + let is_stereo = match audio_track.rtc_source() { + RtcAudioSource::Native(native) => native.num_channels() == 2, + #[allow(unreachable_patterns)] + _ => false, + }; + if is_stereo { + req.audio_features.push(proto::AudioTrackFeature::TfStereo as i32); + req.stereo = true; + } + } + let mut encodings = Vec::default(); match &track { LocalTrack::Video(video_track) => {