feat(livekit): auto-flag stereo on audio tracks with num_channels == 2#1023
Open
jkp wants to merge 1 commit intolivekit:mainfrom
Open
feat(livekit): auto-flag stereo on audio tracks with num_channels == 2#1023jkp wants to merge 1 commit intolivekit:mainfrom
jkp wants to merge 1 commit intolivekit:mainfrom
Conversation
Without setting TF_STEREO in AddTrackRequest.audio_features (or the deprecated stereo bool), the LiveKit server negotiates the published audio track as mono. libwebrtc's Opus encoder then downmixes any asymmetric stereo content to mono-duplicated-both-channels, so the receiver sees identical L and R on every frame regardless of what the publisher pushed via capture_frame. The JS SDK sets these same flags based on ``MediaStreamTrack.getSettings().channelCount`` or explicit ``opts.forceStereo`` (see livekit/client-sdk-js LocalParticipant.ts ``isStereo`` handling). The Rust SDK doesn't expose an equivalent option and doesn't infer from the audio source's declared ``num_channels``. This patch closes that gap: if the track's underlying source declares ``num_channels == 2``, flag the track as stereo on the AddTrackRequest. ``RtcAudioSource::num_channels()`` is not a public accessor (generated via ``enum_dispatch!``), so we match the ``Native`` variant directly and keep a wildcard arm for the ``#[non_exhaustive]`` enum. Verified end-to-end against a Chrome client via ``MediaStreamTrackProcessor``: before the patch, L == R on every frame with all our content on R; after, L stays at codec floor (-100+ dBFS) while R carries the TTS speech envelope, matching what the publisher pushed. Discussion context: timeline-protocol-v6 stereo marker channel (silent-L / TTS-on-R for sample-aligned marker tones) kept showing identical L and R at the client despite SDP advertising stereo=1 and AudioSource being constructed with num_channels=2. Every workaround we tried on the Python side (APM options, ``SOURCE_SCREENSHARE_AUDIO``, ``max_bitrate`` pinning, native 48 kHz source, disabling libwebrtc voice processing) failed because the server-side SDP negotiation already locked the track to mono before our audio ever reached the encoder.
|
Jamie Kirkpatrick seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When an application publishes a 2-channel
NativeAudioSourcevia the Rust SDK and pushes asymmetric stereoAudioFrames (e.g. silence on L, speech on R), the client-side receive path sees identical L and R content on every frame — as if the track were mono-duplicated. SDP advertisesstereo=1thanks to the standard Opus fmtp, andMediaStreamTrack.getSettings().channelCountreturns 2 on the receiver, but the actual decoded content is mono.Root cause: the server locks the track to mono-encoded Opus at negotiation time, and libwebrtc's Opus encoder downmixes asymmetric input accordingly, before any content reaches the wire. The server makes this decision based on
AddTrackRequest.audio_features(specificallyTF_STEREO) and the deprecatedAddTrackRequest.stereobool. The JS client SDK (LocalParticipant.ts) sets both flags whenopts.forceStereo === trueor whenMediaStreamTrack.getSettings().channelCount === 2. The Rust SDK never sets either — it doesn't readnum_channelsoff the source, andTrackPublishOptionshas noforce_stereo/audio_presetfield.Fix
If the track being published has a
NativeAudioSourcewithnum_channels == 2, pushTfStereointoaudio_featuresand set the deprecatedstereobool. Both go on the existingAddTrackRequestalready built inpublish_track, right next to the analogousTfPreconnectBufferhandling.RtcAudioSource::num_channels()is private (generated viaenum_dispatch!inlibwebrtc), so we match theNativevariant directly to reach the publicNativeAudioSource::num_channels(). A wildcard arm covers the#[non_exhaustive]enum.Verified
Built
liblivekit_ffiforaarch64-apple-darwinwith this patch, dropped it into a Python SDK install, ran a LiveKit session with an agent publishing 2-channel 24 kHz audio (silence on L, TTS speech on R), and captured the decoded track at the Chrome client viaMediaStreamTrackProcessor:L reads at codec floor on every frame; R tracks the speech envelope. Without the patch, L == R exactly on every frame with content magnitude matching whatever we pushed on R.
Context
This came out of building a stereo marker channel on top of the speech track (silent L → marker tones placed sample-aligned with speech boundaries on L, speech PCM on R) for turn-boundary synchronization. All publisher-side workarounds we tried (disabling APM via
AudioSourceOptions, labelling asSOURCE_SCREENSHARE_AUDIO, pinningmax_bitrate, pushing native 48 kHz frames to bypass the resampler) failed because the server's mono negotiation precedes everything else. The JS SDK's explicit stereo hint onAddTrackRequestis the only thing that prevents it.Happy to split the
stereobool (deprecated) and theTfStereofeature into separate conditionals if you prefer — currently sending both for robustness across server versions.