Harden C API and RPC boundaries against fault-injected failures#259
Merged
Conversation
added 15 commits
June 28, 2026 11:08
The rDSN public API (DSN_API functions) is a C ABI, but the implementations are C++ and freely call code that can throw (heap allocation, STL/blob/string, serialization, engine calls). An exception unwinding across the C ABI is undefined behavior. Under fault injection (libfiu malloc faults), those throws become reachable on paths that previously "never failed", so an escaping std::bad_alloc can crash the process in a way the C caller cannot observe or handle. Phase 1 - no-throw boundary: - Add src/core/src/c_api_guard.h: DSN_C_GUARD_BEGIN / DSN_C_GUARD_END / DSN_C_GUARD_END_VOID. The guard wraps a throw-capable C entry point in try/catch and translates any escaping exception into that API's documented failure value, always logging the diagnostic (function name plus what()) so a translated failure is never silent. Two catch clauses: std::exception (for what()) and catch(...) to also stop non-std exceptions. The value variant ends with a trailing return (fail_ret) so -Wreturn-type stays quiet under -Wall -Werror. - Apply guards to 68 functions across 8 files (service_api_c.cpp, rpc_message.cpp, main.cpp, app_manager.cpp, address.cpp, command_manager.cpp, perf_counters.cpp, global_checkers.cpp). - fail_ret conventions: pointer -> nullptr, bool -> false, dsn_error_t -> ERR_UNKNOWN, count APIs -> -1, *_from_string / config getters -> the caller-supplied default. - Deliberately not guarded: logging.cpp (catch handler calls derror, so guarding the logger risks recursion), dsn_run / dsn_run_config (already self-catch), and startup-only type registration (no runtime sentinel). Phase 2 - abort -> error returns where a channel already exists: - message_ex::write_next / write_commit / read_commit changed from void to bool; the "pending write/read not committed" and "no buffer under read" sequencing dasserts become derror + return false. read_next had its leading dassert converted the same way. The dsn_msg_* C wrappers propagate the bool. write_append (no callers) keeps void + dassert. - task::wait() self-wait (a guaranteed deadlock) becomes derror + return false; wait() now returns bool. dsn_task_wait_timeout (bool) propagates it; dsn_task_wait (void) keeps its dassert since it has no error channel. Boundary hardening for the message read/write out-params: - write_next / read_next validate the ptr and size out-parameters for null before use. - write_next validates *ptr after tls_trans_mem_next (re-committing 0 bytes to keep the next/commit pairing consistent) before doing pointer arithmetic on it. - read_next validates the produced *ptr (null for an empty blob, where _rw_offset is 0) after assigning it and before mutating read state, so it never returns a null *ptr as a successful read. Internal invariants with no caller-recoverable cause (e.g. the write buffer-count post-condition) are intentionally kept as dassert. Built on Ubuntu 16.04 with -Wall -Werror: Build succeed.
write_next/write_commit and read_next/read_commit were converted from process-aborting dassert to recoverable false returns when called out of order. Add TEST(core, message_ex_rw_pairing) to lock in that contract: - write_commit with nothing pending, and a double write_commit, return false - write_next with an uncommitted write pending returns false and zeroes the ptr/size out-params - read_commit with nothing under read, and a double read_commit, return false - read_next with an uncommitted read pending returns false and zeroes the ptr/size out-params Built on Ubuntu 16.04 with -Wall -Werror: Build succeed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
rDSN.dist.serviceandrDSN.tools.hpchardening.Validation
rDSN.dist.service82ec2cdrDSN.tools.hpc758c7b5