feat(livekit): auto-flag stereo on audio tracks with num_channels == 2 by jkp · Pull Request #1023 · livekit/rust-sdks

jkp · 2026-04-18T10:03:55Z

Problem

When an application publishes a 2-channel NativeAudioSource via the Rust SDK and pushes asymmetric stereo AudioFrames (e.g. silence on L, speech on R), the client-side receive path sees identical L and R content on every frame — as if the track were mono-duplicated. SDP advertises stereo=1 thanks to the standard Opus fmtp, and MediaStreamTrack.getSettings().channelCount returns 2 on the receiver, but the actual decoded content is mono.

Root cause: the server locks the track to mono-encoded Opus at negotiation time, and libwebrtc's Opus encoder downmixes asymmetric input accordingly, before any content reaches the wire. The server makes this decision based on AddTrackRequest.audio_features (specifically TF_STEREO) and the deprecated AddTrackRequest.stereo bool. The JS client SDK (LocalParticipant.ts) sets both flags when opts.forceStereo === true or when MediaStreamTrack.getSettings().channelCount === 2. The Rust SDK never sets either — it doesn't read num_channels off the source, and TrackPublishOptions has no force_stereo / audio_preset field.

Fix

If the track being published has a NativeAudioSource with num_channels == 2, push TfStereo into audio_features and set the deprecated stereo bool. Both go on the existing AddTrackRequest already built in publish_track, right next to the analogous TfPreconnectBuffer handling.

RtcAudioSource::num_channels() is private (generated via enum_dispatch! in libwebrtc), so we match the Native variant directly to reach the public NativeAudioSource::num_channels(). A wildcard arm covers the #[non_exhaustive] enum.

Verified

Built liblivekit_ffi for aarch64-apple-darwin with this patch, dropped it into a Python SDK install, ran a LiveKit session with an agent publishing 2-channel 24 kHz audio (silence on L, TTS speech on R), and captured the decoded track at the Chrome client via MediaStreamTrackProcessor:

frame 5:  ch=2 L=-InfinitydBFS R=-12.8dBFS  |diff|=Infinity
frame 10: ch=2 L=-InfinitydBFS R=-90.5dBFS  |diff|=Infinity
frame 15: ch=2 L=-InfinitydBFS R=-21.8dBFS  |diff|=Infinity
...

L reads at codec floor on every frame; R tracks the speech envelope. Without the patch, L == R exactly on every frame with content magnitude matching whatever we pushed on R.

Context

This came out of building a stereo marker channel on top of the speech track (silent L → marker tones placed sample-aligned with speech boundaries on L, speech PCM on R) for turn-boundary synchronization. All publisher-side workarounds we tried (disabling APM via AudioSourceOptions, labelling as SOURCE_SCREENSHARE_AUDIO, pinning max_bitrate, pushing native 48 kHz frames to bypass the resampler) failed because the server's mono negotiation precedes everything else. The JS SDK's explicit stereo hint on AddTrackRequest is the only thing that prevents it.

Happy to split the stereo bool (deprecated) and the TfStereo feature into separate conditionals if you prefer — currently sending both for robustness across server versions.

Without setting TF_STEREO in AddTrackRequest.audio_features (or the deprecated stereo bool), the LiveKit server negotiates the published audio track as mono. libwebrtc's Opus encoder then downmixes any asymmetric stereo content to mono-duplicated-both-channels, so the receiver sees identical L and R on every frame regardless of what the publisher pushed via capture_frame. The JS SDK sets these same flags based on ``MediaStreamTrack.getSettings().channelCount`` or explicit ``opts.forceStereo`` (see livekit/client-sdk-js LocalParticipant.ts ``isStereo`` handling). The Rust SDK doesn't expose an equivalent option and doesn't infer from the audio source's declared ``num_channels``. This patch closes that gap: if the track's underlying source declares ``num_channels == 2``, flag the track as stereo on the AddTrackRequest. ``RtcAudioSource::num_channels()`` is not a public accessor (generated via ``enum_dispatch!``), so we match the ``Native`` variant directly and keep a wildcard arm for the ``#[non_exhaustive]`` enum. Verified end-to-end against a Chrome client via ``MediaStreamTrackProcessor``: before the patch, L == R on every frame with all our content on R; after, L stays at codec floor (-100+ dBFS) while R carries the TTS speech envelope, matching what the publisher pushed. Discussion context: timeline-protocol-v6 stereo marker channel (silent-L / TTS-on-R for sample-aligned marker tones) kept showing identical L and R at the client despite SDP advertising stereo=1 and AudioSource being constructed with num_channels=2. Every workaround we tried on the Python side (APM options, ``SOURCE_SCREENSHARE_AUDIO``, ``max_bitrate`` pinning, native 48 kHz source, disabling libwebrtc voice processing) failed because the server-side SDP negotiation already locked the track to mono before our audio ever reached the encoder.

CLAassistant · 2026-04-18T11:38:31Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Jamie Kirkpatrick seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(livekit): auto-flag stereo on audio tracks with num_channels == 2#1023

feat(livekit): auto-flag stereo on audio tracks with num_channels == 2#1023
jkp wants to merge 1 commit intolivekit:mainfrom
supertest-ai:stereo-autodetect-upstream

jkp commented Apr 18, 2026

Uh oh!

CLAassistant commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jkp commented Apr 18, 2026

Problem

Fix

Verified

Context

Uh oh!

CLAassistant commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants