Skip to content

Get build/tests working for M114#49

Open
kevindhanna wants to merge 14 commits intoWonderInventions:developfrom
kevindhanna:develop
Open

Get build/tests working for M114#49
kevindhanna wants to merge 14 commits intoWonderInventions:developfrom
kevindhanna:develop

Conversation

@kevindhanna
Copy link
Copy Markdown

@kevindhanna kevindhanna commented Mar 20, 2026

Hi there 👋

I got the build and I think all the required changes for M114 working, and got the test suite to run reliably

FYI Claude Code was used when making these changes, hopefully that's ok (though I see from your develop commits maybe you're using it as well)

Threading & lifecycle

  • Separate Network Thread: M114 moved operations to the network thread, and PeerConnection::Close() does sequential BlockingCalls to nework then worker. Sharing the thread caused contention, so changed to match Chrome's architecture
  • DataChannel close even ordering: PeerConnection::Close() calls PrepareForShutdown() which deactivates the SafeTask safety flag, silently dropping OnStateChange callbacks. Close events are now dispatched before the `Close()
  • SIngle observer registration: SctpDataChannel uses an ObserverAdapter that relays callbacks via SafeTask. There was a race between RegisterObserver calls (DataChannelObserver and RTCDataChannel) that resulted in messages being silently dropped when sent on dc.onopen

Build

  • Link SimulcastEncoderAdapter: library was never linked, causing a segfault
  • macOS 26 SDK compatibility: GC_AVAILABLE_BUT_DEPRECATED has een removed since 26 Tahoe, so we patch a fallback
  • nix.gni fallback: use some default values when outside nix

Tests

  • Remove simple-peer dependency: It's unmaintained for 4 years
  • Fix flakey audio sink tests: count events instead of using a fixed runtime
  • msid deduplication test: align with with M114 behaviour

@kevindhanna kevindhanna changed the title Final M114 Get build/tests working for M114 Mar 20, 2026
@kevindhanna kevindhanna force-pushed the develop branch 5 times, most recently from 2b04f68 to 7d785f2 Compare March 20, 2026 13:53
…tion

Since M111 (commit e04c3970994b), RtpSenderBase::set_stream_ids()
deduplicates stream IDs before they reach SDP serialization. This
aligns with the W3C WebRTC spec: "For each stream in streams, add
stream.id to [[AssociatedMediaStreamIds]] if it's not already there."

The test creates stream2 and stream3 with the same id "testStreamId",
so M114 correctly deduplicates: 2 unique stream IDs × 2 tracks = 4
msid lines, not 6.

See: https://webrtc-review.googlesource.com/c/src/+/288701
See: webrtc bug 14769
Copy link
Copy Markdown

@duvallj duvallj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!! I too have been using Claude Code, and you seem to have used it responsibly since the code changes are overall of good quality. I do have some feedback I'd like to be addressed before this can be merged however:

@duvallj
Copy link
Copy Markdown

duvallj commented Mar 23, 2026

Running npm run test on my machine gives the following result:

#
# Fatal error in: ../../download/src/pc/sctp_transport.cc, line 105
# last system error: 0
# Check failed: (owner_thread_)->IsCurrent()
# zsh: abort      npm run test

I think this has something to do with the new thread changes?

@kevindhanna
Copy link
Copy Markdown
Author

kevindhanna commented Mar 24, 2026

You're running on Windows? I've run them a bunch on a couple of Macs no problem, but I can try a Windows machine to see if I can replicate (albeit only Windows 10...).

That machine is also x64 vs my Macs ARM, so that might make a difference

It seems it's a debug build assertion, DEBUG=1 npm run build then running the tests I get a segfault.

yeah, looks like SctpTransport needs to be updated now it's owned by the network thread

https://github.com/kevindhanna/node-webrtc/blob/develop/src/interfaces/rtc_sctp_transport.cc#L42

we can see the debug assertion:
https://webrtc.googlesource.com/src/+/729f79c176c8b4c3a8c1b60f39df5fb43323b27b/pc/peer_connection.cc#1817

I'll update it

@kevindhanna kevindhanna marked this pull request as draft March 25, 2026 09:01
@kevindhanna kevindhanna force-pushed the develop branch 2 times, most recently from a8da607 to b8d56c8 Compare March 25, 2026 17:21
Dispatch close events proactively during RTCPeerConnection.close(),
before calling the underlying PeerConnection::Close().

This matters because PeerConnection::Close() calls
PrepareForShutdown() (data_channel_controller.cc:173) which deactivates
the SafeTask safety flag, silently cancelling any pending OnStateChange
callbacks from the network thread. By dispatching close events first,
they reach JS regardless of the C++ shutdown sequence.
The macOS 26 (Tahoe) SDK removed the CG_AVAILABLE_BUT_DEPRECATED macro
from CoreGraphics. This macro was used in M114's
modules/desktop_capture/mac/screen_capturer_mac.mm to provide correct
deprecation annotations for CGDisplayStream wrapper functions (a
workaround for incorrect annotations in the 13.3 SDK, see
https://crbug.com/1431897).

The patch adds a fallback #define that maps CG_AVAILABLE_BUT_DEPRECATED
to API_DEPRECATED when the macro is not provided by the SDK, preserving
the deprecation annotations on older SDKs while allowing compilation on
macOS 26+. This only affects macOS builds — the file is not compiled on
Linux or Windows.

Also adds a general-purpose patch script (scripts/patch-webrtc.sh)
and PATCH_COMMAND to the libwebrtc ExternalProject in CMakeLists.txt,
so patches are applied automatically and idempotently during the build.
…le-peer

Replace the simple-peer dependency with direct RTCPeerConnection usage.
simple-peer is unmaintained (last release 2021) and was only used in
these two test files.
…based

The timing approach was flakey due to event loop timing jitter.
BuiltinVideoEncoderFactory::CreateVideoEncoder() constructs a
SimulcastEncoderAdapter (builtin_video_encoder_factory.cc:45), but
librtc_simulcast_encoder_adapter.a was never linked into the final
binary. The constructor symbol resolved to 0x0, causing a segfault
on the EncoderQueue when sending video frames through a negotiated
peer connection.

Found via `nm -g wrtc.node | grep "U "` which showed a single
undefined webrtc symbol: SimulcastEncoderAdapterC1.
…loss

In M114, SctpDataChannel uses an ObserverAdapter that relays callbacks
through the signaling thread via SafeTask. When RegisterObserver was
called twice (first from DataChannelObserver's constructor, then from
RTCDataChannel's constructor), the adapter's SetDelegate call could
race with in-flight message delivery tasks on the signaling thread,
causing messages sent immediately after dc.onopen to be silently lost.

The fix removes the initial RegisterObserver from DataChannelObserver,
so registration only happens once in RTCDataChannel's constructor. The
SCTP layer queues any messages that arrive before an observer is
registered, and DeliverQueuedReceivedData() delivers them when
RegisterObserver is called.

Also dispatches the "open" state change event if the channel has
already transitioned to kOpen before the RTCDataChannel wrapper is
constructed, ensuring JS onopen handlers fire.
Previously, the same thread was passed for both network_thread and
worker_thread to CreatePeerConnectionFactory. In M114, many operations
moved to the network thread, and PeerConnection::Close() does
sequential BlockingCalls to network then worker threads
(pc/peer_connection.cc:1909-1929). Sharing a single thread caused
cleanup contention that limited sequential connection throughput.

This matches Chrome's architecture (pc/connection_context.cc:85-98)
where network, worker, and signaling threads are all separate. The
network thread gets a socket server for ICE port management, matching
the CreateWithSocketServer() pattern Chrome uses.
CMakeLists.txt previously required a nix.gni file (generated by the Nix
shell hook) containing platform-specific GN args. Builds outside Nix
failed with "file STRINGS file nix.gni cannot be read". Now falls back
to sensible defaults (is_clang=true, use_lld=false,
clang_use_chrome_plugins=false) when the file doesn't exist.

Also add a default UknownError path for ErrorFactory::DOMExceptionNameToString
WebRTC's SctpTransport is constructed on the network thread and asserts
owner_thread_->IsCurrent() on most methods (see the RTC_DCHECK_RUN_ON
in the WebRTC source at pc/sctp_transport.cc lines 59, 66, 105). The
node-webrtc code was dispatching to WorkerThread instead, which is the
wrong thread. Expose NetworkThread so callers can dispatch correctly.
WebRTC's SctpTransport, DtlsTransport, and IceTransport are all
constructed on the network thread and assert owner_thread_->IsCurrent()
(or creator_thread_->IsCurrent()) on methods like dtls_transport(),
RegisterObserver(), UnregisterObserver(), and internal(). See the
RTC_DCHECK_RUN_ON assertions in the WebRTC source:

  - pc/sctp_transport.cc lines 59, 66, 105
  - pc/dtls_transport.cc lines 55, 61
  - pc/ice_transport.cc line 27

The node-webrtc code was dispatching these calls to WorkerThread, which
is the wrong thread and causes a crash in debug builds:

  Fatal error in: pc/sctp_transport.cc, line 105
  Check failed: (owner_thread_)->IsCurrent()

Additionally, PeerConnection::GetSctpTransport() requires the network
thread (pc/peer_connection.cc line 1817).

Change all call sites to use NetworkThread(). For Stop() methods that
may be called from network-thread callbacks (e.g. OnStateChange), use
IsCurrent() to avoid a redundant BlockingCall.
Move libwebrtc after the video codec factory libraries so the GNU
linker sees their unresolved symbols before processing libwebrtc.
Fixes undefined symbol _ZN6webrtc18SupportedVP9CodecsEb on Linux
(macOS is unaffected due to its two-pass linker).
These are needed inside the Nix FHS environment for depot_tools
bootstrapping (curl) and CMake ExternalProject downloads (git).
@kevindhanna
Copy link
Copy Markdown
Author

kevindhanna commented Mar 27, 2026

@duvallj do you ever experience the tests hanging after completion when running the debug build? I've tried on Linux and Mac and all pass, but 9 out of 10 times the process doesn't exit cleanly afterward.

I tried checking out v0.10.0 (e1189c plus the Linux compatibility changes above) and running the debug build and it exited cleanly. I added the DataChannel close changes and peer connection factory network thread changes and it still exited fine, so I'm not sure what's causing it. Node doesn't seem to have any open handles...

Release builds work a-ok

@kevindhanna kevindhanna marked this pull request as ready for review March 27, 2026 14:32
In debug builds, a Node.js-internal referenced async handle keeps the
event loop alive after all tests complete. This is not caused by the
native addon — all addon handles are verified stopped via per-object
create/stop tracking. This does not occur in release builds.

Use tape's onFinish callback to exit with the appropriate code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants