diff --git a/docs/dev-wiki/README.md b/docs/dev-wiki/README.md new file mode 100644 index 000000000..536991e8a --- /dev/null +++ b/docs/dev-wiki/README.md @@ -0,0 +1,44 @@ +# SIL Kit Developer Wiki + +This wiki is the internal developer-oriented knowledge base for working on the SIL Kit repository. +It focuses on repository structure, engineering workflows, implementation details, and the practical +knowledge needed to make changes safely. + +## What This Wiki Is For + +- helping developers get productive in the repository quickly +- documenting architecture and subsystem boundaries +- capturing build, test, debugging, and release workflows +- recording project-specific conventions that are easy to miss when reading code alone + +## Suggested Reading Path + +1. Read [Repository Layout](./repository-layout.md) to understand the tree and major modules. +2. Read [Build and Test](./build-and-test.md) to get a local development workflow. +3. Read [Core Architecture](./core-architecture.md) and [Networking and Transport](./networking-and-transport.md). +4. Use [Utilities and Processes](./utilities-and-processes.md), [Release and Versioning](./release-and-versioning.md), and [Debugging and Common Failures](./debugging-and-common-failures.md) as needed. + +## Current Pages + +- [Repository Layout](./repository-layout.md) +- [Build and Test](./build-and-test.md) +- [Core Architecture](./core-architecture.md) +- [Networking and Transport](./networking-and-transport.md) +- [Utilities and Processes](./utilities-and-processes.md) +- [Release and Versioning](./release-and-versioning.md) +- [Debugging and Common Failures](./debugging-and-common-failures.md) + +## Proposed Wiki Topics + +The following pages are intended to be added over time as repository knowledge evolves. + +## Scope + +This wiki should prefer practical repository knowledge over end-user documentation. +It should stand on its own as a developer and AI-facing knowledge base rather than acting as a navigation +layer into the separate docs set. + +## Status + +This is the initial front page. Additional pages can be added incrementally as repository knowledge +is collected and validated. diff --git a/docs/dev-wiki/build-and-test.md b/docs/dev-wiki/build-and-test.md new file mode 100644 index 000000000..ca821b0d6 --- /dev/null +++ b/docs/dev-wiki/build-and-test.md @@ -0,0 +1,237 @@ +# Build and Test + +This page summarizes the practical build and test workflows for local development in this repository. + +## Prerequisites + +At minimum, local development requires: + +- Git +- CMake +- a supported C++ toolchain such as Visual Studio, GCC, or Clang +- initialized submodules + +Initialize submodules after cloning: + +```powershell +git submodule update --init --recursive +``` + +Documentation builds additionally require: + +- Python 3 +- dependencies from `SilKit/ci/docker/docs_requirements.txt` +- `pipenv` +- Doxygen + +## Preferred Local Flow: CMake Presets + +The repository already defines presets in `CMakePresets.json`. +For day-to-day development, presets are the easiest way to stay aligned with the intended build layout. + +Useful configure presets include: + +- `debug` +- `release` +- `relwithdebinfo` +- `distrib` +- `x86-debug` +- `vs141-x64-debug` + +The presets use these default directory conventions: + +- build tree: `_build/` +- install tree: `_install/` + +Typical local debug build: + +```powershell +cmake --preset debug +cmake --build --preset debug +``` + +Typical release build: + +```powershell +cmake --preset release +cmake --build --preset release +``` + +## Important CMake Options + +The root `CMakeLists.txt` defines the main switches developers usually care about: + +- `SILKIT_BUILD_TESTS`: build unit, integration, and functional tests +- `SILKIT_BUILD_UTILITIES`: build utilities such as registry, monitor, and system controller +- `SILKIT_BUILD_DEMOS`: build demo applications +- `SILKIT_BUILD_DOCS`: build Sphinx and Doxygen documentation +- `SILKIT_INSTALL_SOURCE`: install and package the source tree +- `SILKIT_WARNINGS_AS_ERRORS`: treat compiler warnings as errors +- `SILKIT_BUILD_DASHBOARD`: build the dashboard client code +- `SILKIT_BUILD_STATIC`: build SIL Kit as a static library instead of shared + +Most local development can start from the `debug` preset and only override options when needed. + +## Manual Configure Flow + +If you do not want to use presets, a manual configure still works. + +Example: + +```powershell +cmake -S . -B _build/manual-debug -G Ninja -DCMAKE_BUILD_TYPE=Debug +cmake --build _build/manual-debug +``` + +To enable docs explicitly: + +```powershell +cmake -S . -B _build/docs -G Ninja -DCMAKE_BUILD_TYPE=Debug -DSILKIT_BUILD_DOCS=ON +cmake --build _build/docs --target Doxygen +``` + +## Build Outputs + +A few output conventions are helpful to know: + +- build trees usually live under `_build/` +- presets install into `_install/` +- test executables are configured to run from the build output directory +- demo binaries are emitted into the common build output directory when built in-tree + +The library project sets a debug postfix of `d`, so debug binaries can sit alongside release binaries. + +## Running Tests + +Testing is enabled globally in the root build with `enable_testing()`. +The library creates several aggregate GoogleTest executables, and CTest registers suites through helper macros in `SilKit/cmake/SilKitTest.cmake`. + +If you configured with a preset that enables tests, run all tests with: + +```powershell +ctest --preset debug --output-on-failure +``` + +Or from a specific build directory: + +```powershell +ctest --test-dir _build/debug --output-on-failure +``` + +Useful variants: + +```powershell +ctest --test-dir _build/debug -R Lin --output-on-failure +ctest --test-dir _build/debug -R PubSub --output-on-failure +ctest --test-dir _build/debug -N +``` + +What these do: + +- `-R ` filters tests by CTest test name +- `-N` lists tests without running them +- `--output-on-failure` prints failing test output immediately + +## How Tests Are Structured + +The main test executables are: + +- `SilKitUnitTests` +- `SilKitIntegrationTests` +- `SilKitInternalIntegrationTests` +- `SilKitFunctionalTests` +- `SilKitInternalFunctionalTests` + +Tests are added through `add_silkit_test_to_executable(...)`. +That helper usually creates one CTest entry per test suite, using a GoogleTest filter. + +Practical implications: + +- CTest test names usually map to suites rather than binary names +- adding a new integration or functional test often means adding a source file in `SilKit/IntegrationTests/` and registering it in the corresponding `CMakeLists.txt` +- if a test seems missing from CTest, check whether it was wired through the helper macro + +## Running A Narrow Slice During Development + +For fast iteration, prefer a narrow configure and a filtered test run. + +Examples: + +- build one preset once, then rebuild incrementally with `cmake --build --preset debug` +- run only matching suites with `ctest --test-dir _build/debug -R --output-on-failure` +- list available tests first with `ctest --test-dir _build/debug -N` + +This is usually faster and more predictable than repeatedly creating fresh build directories. + +## Building Utilities and Demos + +Utilities are included when `SILKIT_BUILD_UTILITIES=ON`. +That includes: + +- `SilKitRegistry` +- `SilKitSystemController` +- `SilKitMonitor` + +Demos are included when `SILKIT_BUILD_DEMOS=ON`. +The `Demos` project contains communication demos, API demos, and benchmark tooling. + +For many feature changes, building both tests and utilities is more useful than building demos. +For changes affecting example flows or protocol behavior, demos may also be worth rebuilding. + +## Building Documentation + +The docs build is optional and uses both Doxygen and Sphinx. + +Install dependencies: + +```powershell +pip3 install -r SilKit/ci/docker/docs_requirements.txt +pip3 install pipenv +``` + +Configure and build docs: + +```powershell +cmake -S . -B _build/docs -G Ninja -DSILKIT_BUILD_DOCS=ON +cmake --build _build/docs --target Doxygen +``` + +The docs target named `Doxygen` actually drives both Doxygen extraction and the Sphinx HTML build. + +## Packaging + +Packaging is handled through CPack. + +Typical packaging command after configuration: + +```powershell +cmake --build _build/distrib --target package +``` + +The `distrib` preset is the closest existing preset to a packaging-oriented build. + +## Practical Recommendations + +For everyday code changes: + +1. Use `cmake --preset debug` once. +2. Rebuild incrementally with `cmake --build --preset debug`. +3. Run only the relevant CTest slice while iterating. +4. Run a broader `ctest --preset debug --output-on-failure` before finishing. + +For documentation changes: + +1. Enable `SILKIT_BUILD_DOCS`. +2. Build the `Doxygen` target. +3. Check the generated Sphinx output and warnings. + +For release-like validation: + +1. Use the `distrib` preset. +2. Build and test from that preset. +3. Run the `package` target if needed. + +## Related Pages + +- [Repository Layout](./repository-layout.md) +- [Developer Wiki Front Page](./README.md) diff --git a/docs/dev-wiki/core-architecture.md b/docs/dev-wiki/core-architecture.md new file mode 100644 index 000000000..82a8cf944 --- /dev/null +++ b/docs/dev-wiki/core-architecture.md @@ -0,0 +1,284 @@ +# Core Architecture + +This page connects the public SIL Kit architecture model with the actual repository structure and runtime entry points. +It is meant for developers changing implementation code, not for end-user API onboarding. + +## Architecture In One View + +At a high level, SIL Kit is built around these ideas: + +- applications create participants through the public API +- participants expose communication and orchestration services +- participants discover each other through a registry +- communication between participants is peer-to-peer once connections are established +- lifecycle and virtual time coordination are distributed across participants rather than centralized in one simulation process + +SIL Kit is designed around these guiding principles: + +- local view for participants +- distributed simulation +- stable API and ABI behavior +- reconfigurability via configuration files + +## Conceptual Building Blocks + +The public conceptual model is made up of the following runtime components: + +- `Registry`: discovery and initial connection brokering +- `Participant`: the local runtime node embedded into an application or utility +- `Services`: the APIs attached to a participant, such as CAN, Ethernet, PubSub, RPC, logging, lifecycle, and time sync +- `System Controller`: defines required participants and influences system-wide lifecycle behavior +- `System Monitor`: observes participant and system state transitions + +## API Shape And The Hourglass + +Versioning documentation describes the public API shape as an hourglass pattern: + +- the `C` API is the versioned compatibility anchor +- the `C++` API in `silkit/...` is a header-only wrapper around the `C` API + +This is important when making changes: + +- changes in public `C` API contracts have the strongest compatibility impact +- changes in the header-only `C++` API can still be user-visible even when the implementation lives deeper in the repository +- internal implementation code should be kept conceptually below the API boundary, even when the C++ API makes the public surface feel direct + +## Main Runtime Flow + +The most important runtime path is: + +1. User code creates or loads a participant configuration. +2. User code creates a participant. +3. Internal code validates and sanitizes the configuration. +4. A concrete participant implementation is instantiated with a transport backend. +5. The participant joins the SIL Kit simulation and discovers peers through the registry. +6. Services exchange messages over established connections. + +In the current codebase, the main entry points are: + +- `SilKit/source/CreateParticipantImpl.cpp` +- `SilKit/source/core/participant/CreateParticipantInternal.cpp` +- `SilKit/source/core/participant/CreateParticipantT.hpp` + +The flow is currently straightforward: + +- `CreateParticipantImpl(...)` calls `Core::CreateParticipantInternal(...)` +- `CreateParticipantInternal(...)` instantiates `Participant` +- `CreateParticipantT(...)` runs configuration validation and constructs the participant object +- the created participant then joins the distributed simulation + +This means that participant creation is one of the best entry points when you need to understand how configuration, transport, and service setup come together. + +## Layer Mapping To Repository Code + +The conceptual architecture maps onto the repository roughly like this. + +### Public Surface + +- `SilKit/include/`: public headers and API contracts +- `SilKit/source/capi/`: C API implementation layer + +This is where API and ABI-sensitive work usually begins. + +### Participant Construction And Runtime Shell + +- `SilKit/source/core/participant/` + +This area is responsible for: + +- participant creation +- participant configuration validation and sanitization +- participant object construction +- wiring the participant to the chosen transport backend + +Important files include: + +- `Participant.hpp` +- `Participant.cpp` +- `ValidateAndSanitizeConfig.*` +- `CreateParticipantInternal.*` +- `CreateParticipantT.*` + +### Core Runtime Infrastructure + +- `SilKit/source/core/internal/` +- `SilKit/source/core/service/` +- `SilKit/source/core/requests/` +- `SilKit/source/core/vasio/` + +These areas cover the non-user-facing runtime machinery. + +Broadly: + +- `core/service/` handles service discovery and related serialization +- `core/requests/` handles request/reply style internals +- `core/vasio/` contains the transport implementation and registry runtime based on the VAsio backend +- `core/internal/` contains deeper internals shared by the participant runtime + +### Services + +- `SilKit/source/services/can/` +- `SilKit/source/services/ethernet/` +- `SilKit/source/services/flexray/` +- `SilKit/source/services/lin/` +- `SilKit/source/services/pubsub/` +- `SilKit/source/services/rpc/` +- `SilKit/source/services/logging/` +- `SilKit/source/services/metrics/` +- `SilKit/source/services/orchestration/` + +These directories implement the participant-visible services. +If a change affects semantics at the controller or service level, this is often the first place to inspect. + +### Experimental And Extension Areas + +- `SilKit/source/experimental/` +- `SilKit/source/extensions/` +- `SilKit/source/dashboard/` + +These areas layer additional or less stable capabilities on top of the core runtime. + +## Discovery And Transport + +The registry is mandatory for bringing up a SIL Kit system. +Its conceptual role is discovery and initial connection brokering. + +In the implementation: + +- registry creation currently goes through `SilKit/source/CreateSilKitRegistryImpl.cpp` +- the concrete registry type is `Core::VAsioRegistry` +- the transport backend lives in `SilKit/source/core/vasio/` + +A key detail of the runtime model is: + +- the registry establishes connections between participants +- participants then communicate through peer-to-peer connections +- if direct transport is not available, the registry can also be used as a proxy path + +When working on connectivity issues, relevant implementation files usually live under: + +- `SilKit/source/core/vasio/VAsioConnection.*` +- `SilKit/source/core/vasio/VAsioRegistry.*` +- `SilKit/source/core/vasio/ConnectPeer.*` +- `SilKit/source/core/vasio/ConnectKnownParticipants.*` +- `SilKit/source/core/vasio/RemoteConnectionManager.*` + +## Service Discovery + +A major architectural theme in SIL Kit is that participants typically do not need hardcoded knowledge of peer instances. +Communication is resolved through service descriptions, matching network names, and topic names. + +For implementation work, the central discovery-related area is: + +- `SilKit/source/core/service/` + +Important responsibilities there include: + +- service discovery event handling +- discovery filtering and matching +- serialization of service metadata + +This layer is the bridge between transport-level connectivity and service-level communication semantics. + +## Orchestration And Time + +The orchestration subsystem is one of the most important architectural slices because it controls how participants behave as a coherent simulation. + +Its implementation is in: + +- `SilKit/source/services/orchestration/` + +That directory contains the core orchestration pieces: + +- lifecycle management and lifecycle state handling +- `LifecycleService` +- `TimeSyncService` +- `SystemController` +- `SystemMonitor` +- system state tracking and supporting serialization/time helpers + +Conceptually: + +- `LifecycleService` controls a participant's local lifecycle behavior +- `TimeSyncService` adds virtual time synchronization for coordinated simulation steps +- `SystemMonitor` observes participant and system-wide state transitions +- `SystemController` defines required participants and can influence system-wide orchestration + +The lifecycle state machine and system state rules matter here mainly as implementation boundaries and code ownership hints. + +## Communication Services + +The communication-service architecture follows a repeated pattern: + +- a participant creates a specific service or controller +- the service announces itself through discovery +- matching peers exchange messages using common service metadata such as network or topic names +- transport and serialization layers carry the actual messages + +This pattern is reused across: + +- bus-oriented services such as CAN, Ethernet, LIN, and FlexRay +- application-level services such as PubSub and RPC + +If a feature spans multiple protocol families, it often means the real architectural concern is below the individual service directories, in discovery, transport, or orchestration. + +## Configuration As An Architectural Boundary + +Configuration is not just convenience glue in SIL Kit. It is part of the architecture. + +Why it matters: + +- participants are expected to run without a config file, but they should allow user-provided configuration +- runtime behavior such as logging, middleware setup, and service wiring can be changed without recompilation +- participant construction validates and sanitizes configuration before the runtime is created + +Relevant implementation areas include: + +- `SilKit/source/config/` +- `SilKit/source/core/participant/ValidateAndSanitizeConfig.*` + +When a change appears to be purely local but has configuration impact, check both the config model and the participant-construction path. + +## Where To Start For Common Change Types + +If you need to change a public API: + +- start in `SilKit/include/` +- inspect `SilKit/source/capi/` +- review compatibility implications in version-related code and changelog metadata + +If you need to change participant bring-up: + +- start in `SilKit/source/CreateParticipantImpl.cpp` +- then inspect `SilKit/source/core/participant/` + +If you need to change connectivity or registry behavior: + +- start in `SilKit/source/core/vasio/` +- inspect transport and utility implementation paths nearby + +If you need to change lifecycle, monitor, or virtual time behavior: + +- start in `SilKit/source/services/orchestration/` +- inspect adjacent orchestration code paths and tests + +If you need to change protocol-specific messaging behavior: + +- start in the relevant `SilKit/source/services//` directory +- then inspect discovery and transport code if the problem spans multiple services + +## Architecture Boundaries To Preserve + +When changing code, these boundaries are especially worth preserving: + +- public API contracts should remain cleanly separated from runtime internals +- service semantics should not accidentally leak transport-specific assumptions upward +- protocol-specific fixes should not duplicate logic that belongs in shared discovery or orchestration layers +- configuration validation should happen before runtime behavior depends on it +- lifecycle and time-sync behavior should remain consistent with the documented state model + +## Related Pages + +- [Repository Layout](./repository-layout.md) +- [Build and Test](./build-and-test.md) +- [Developer Wiki Front Page](./README.md) diff --git a/docs/dev-wiki/debugging-and-common-failures.md b/docs/dev-wiki/debugging-and-common-failures.md new file mode 100644 index 000000000..01034234c --- /dev/null +++ b/docs/dev-wiki/debugging-and-common-failures.md @@ -0,0 +1,397 @@ +# Debugging and Common Failures + +This page summarizes the failure modes that are most likely to matter during day-to-day development in the SIL Kit repository. + +It is organized around the layers where problems usually surface: + +- configuration loading +- participant startup +- registry connectivity +- participant-to-participant connectivity +- lifecycle and orchestration +- version and interoperability mismatches +- local build and test workflow + +## General Approach + +When debugging SIL Kit issues, it usually helps to separate the problem into one of four buckets first: + +1. build/configuration problem before the process starts +2. participant startup problem while joining the simulation +3. transport/discovery problem after connecting to the registry +4. lifecycle or service-level problem after communication has started + +Many confusing symptoms come from misclassifying a transport problem as a service problem, or a lifecycle problem as a transport problem. + +## First Questions To Ask + +Before diving into code, answer these quickly: + +1. Did the participant process start and load its configuration successfully? +2. Did it connect to the registry? +3. Did it finish connecting to all known participants? +4. Is the issue only visible in coordinated mode or only with time synchronization enabled? +5. Is the issue reproducible in a narrow local setup with one registry and two participants? + +Those five questions usually cut the search space down faster than reading long logs linearly. + +## Configuration Failures + +### Typical Symptoms + +- `Error: Failed to load configuration ...` +- `Unknown schema version '...' found in participant configuration!` +- `Unknown schema version '...' found in registry configuration!` +- startup falls back to defaults unexpectedly + +### Common Causes + +- malformed YAML or JSON +- wrong schema version in the configuration file +- confusion between command-line values and configured values +- unexpected includes or include search paths in participant configuration handling +- empty or mismatched participant name assumptions + +### Relevant Code Paths + +- `SilKit/source/config/ParticipantConfigurationFromXImpl.cpp` +- `SilKit/source/core/participant/ValidateAndSanitizeConfig.cpp` +- `Utilities/SilKitRegistry/config/RegistryConfiguration.cpp` + +### Useful Mental Model + +There are two different stages here: + +- deserialization and schema validation +- sanitization and default/override resolution + +If the process fails before creating a participant, the problem is usually in deserialization. +If it starts but behaves differently than expected, the problem is often in sanitization or override precedence. + +### Practical Checks + +- confirm the participant name is not empty +- confirm the registry URI being used is the one you intended +- confirm whether the config file overrides the command-line arguments you passed +- for registry config, confirm the schema version matches the parser expectation + +## Participant Startup Failures + +### Typical Symptoms + +- `JoinSimulation: no acceptors available` +- `Something went wrong: ...` very early in startup +- no service callbacks ever fire + +### What It Usually Means + +The process reached participant construction but failed before or during join. + +The most important startup path is: + +- participant configuration is sanitized +- acceptors are opened +- the registry connection is attempted +- the registry handshake completes +- known participants are contacted + +If startup dies before the registry handshake, look at participant construction and transport setup rather than service logic. + +### Relevant Code Paths + +- `SilKit/source/CreateParticipantImpl.cpp` +- `SilKit/source/core/participant/CreateParticipantInternal.cpp` +- `SilKit/source/core/participant/CreateParticipantT.hpp` +- `SilKit/source/core/vasio/VAsioConnection.cpp` + +### Common Causes + +- no valid local acceptor endpoints could be opened +- invalid local-domain socket assumptions +- network binding problems on the machine +- conflicting participant names leading to handshake failure later in startup + +## Registry Connectivity Failures + +### Typical Symptoms + +- `Failed to connect to SIL Kit Registry` +- repeated messages about connection attempts +- startup never reaches peer connection setup + +### What It Usually Means + +The participant cannot establish the first required connection in the system. +This is a registry reachability problem, not a service problem. + +### Relevant Code Paths + +- `SilKit/source/core/vasio/VAsioConnection.cpp` +- `Utilities/SilKitRegistry/Registry.cpp` +- `Utilities/SilKitRegistry/config/RegistryConfiguration.cpp` + +### Common Causes + +- wrong connect URI or listen URI +- hostname not resolvable from the participant host +- registry only reachable through local domain sockets in the current environment +- firewall, container, VM, WSL, NAT, or host-interface mismatch +- registry bound to an interface that remote participants cannot reach + +### Practical Checks + +- verify which registry URI the participant is actually using after sanitization +- verify which URI the registry is actually listening on at runtime +- check whether `localhost` is meaningful in the current topology +- check whether domain sockets are helping or hurting in the current environment + +### Key Diagnostic Distinction + +If the participant never reports a successful registry connection, do not debug controller, PubSub, CAN, or lifecycle behavior yet. +Those layers are downstream of registry reachability. + +## Participant-To-Participant Connectivity Failures + +### Typical Symptoms + +- registry connection succeeds, but startup still fails +- `Failed to connect to known participants: ...` +- `Timeout while waiting for replies from known participants: ...` +- `Timeout during connection setup. The participant was able to connect to the registry, but not to all participants.` +- proxy fallback warnings appear + +### What It Usually Means + +Discovery worked, but the direct peer graph could not be completed. + +This is one of the most common real-world categories of failures. + +### Relevant Code Paths + +- `SilKit/source/core/vasio/ConnectKnownParticipants.cpp` +- `SilKit/source/core/vasio/ConnectPeer.*` +- `SilKit/source/core/vasio/RemoteConnectionManager.cpp` +- `SilKit/source/core/vasio/TransformAcceptorUris.cpp` +- `SilKit/source/core/vasio/VAsioConnection.cpp` + +### Common Causes + +- advertised endpoints are not reachable from the other host +- loopback or catch-all addresses are wrong for the topology +- domain sockets are advertised in a situation where only TCP is usable +- remote connect request capability is not enabled or not supported by the peer +- proxy fallback capability is disabled locally or remotely + +### How To Think About It + +For each discovered participant, the runtime tries: + +1. direct connect +2. remote connect request +3. registry proxy fallback + +If all three fail, the join fails. + +So the debugging questions are: + +1. Was the direct endpoint actually reachable? +2. Was reverse connection even possible by capability? +3. Was proxy fallback allowed by both sides? + +### Proxy Fallback Warnings + +If the logs say direct connection failed and the registry is being used as a proxy, the system may still function. + +That means: + +- the problem is real +- the setup is degraded rather than fully broken +- latency and overhead will usually be worse + +Treat this as a topology/debugging issue, not as a normal steady-state success case. + +## Duplicate Participant Name And Handshake Failures + +### Typical Symptoms + +- `Timeout during connection handshake with the SIL Kit Registry` +- message suggesting that a participant with the same name may already be connected +- failed participant announcement reply diagnostics + +### What It Usually Means + +Two participants are trying to join the same registry with the same participant name, or there is a version-mixed handshake edge case around that situation. + +### Relevant Code Paths + +- `SilKit/source/core/vasio/VAsioConnection.cpp` +- registry-side participant announcement handling + +### Practical Checks + +- confirm every participant name is unique in the running system +- check whether an older crashed process is still connected +- if mixed versions are involved, do not assume the error text is perfectly modern or perfectly specific + +Duplicate-name failures are often misread as generic transport issues because they surface during handshake timeouts. + +## Lifecycle And Orchestration Failures + +### Typical Symptoms + +- coordinated participants never start running +- system controller waits forever for required participants +- simulation enters error state unexpectedly +- stop and abort behavior feels inconsistent +- messages like `This participant is in OperationMode::Coordinated, but is not among the participants that are reported to the system controller as "required".` +- `Required participant names are already set.` +- `Tried to instantiate ... multiple times` + +### What It Usually Means + +Transport may already be healthy. +The problem is now in orchestration policy, participant mode, or service-instantiation semantics. + +### Relevant Code Paths + +- `SilKit/source/services/orchestration/` +- `Utilities/SilKitSystemController/SystemController.cpp` +- `Utilities/SilKitMonitor/PassiveSystemMonitor.cpp` + +### Common Causes + +- coordinated participants are not listed as required participants +- more than one actor tries to define workflow configuration +- lifecycle, time sync, or system monitor are instantiated multiple times on the same participant +- shutdown is being attempted from a system state where stop is not sufficient and abort is required + +### Useful Distinctions + +- transport problem: cannot connect to peers +- orchestration problem: can connect, but never reaches the expected lifecycle state + +If the registry and peer graph are healthy, and the system still does not run, move into lifecycle debugging quickly. + +### Practical Checks + +- confirm whether the participant is autonomous or coordinated +- confirm whether coordinated participants are in the required set +- confirm that workflow configuration is only being set once +- confirm that singleton-style services are not being instantiated multiple times + +## Time Synchronization Confusion + +### Typical Symptoms + +- simulation appears frozen in coordinated mode +- simulation step handler never fires +- callbacks fire in an order different from what was expected + +### What It Usually Means + +The participant may have a lifecycle but has not reached the state where synchronized time can advance. + +Common causes include: + +- coordinated participants still waiting for missing required peers +- time sync service not created or not configured as expected +- confusion between communication-ready, ready-to-run, and running phases + +This is usually not a transport problem if peer connectivity is already complete. + +## Interoperability And Version Mismatch Problems + +### Typical Symptoms + +- `Network incompatibility between this version range ...` +- participant announcement reply version errors +- misleading handshake diagnostics in mixed-version environments + +### What It Usually Means + +The system crossed a compatibility boundary in API-independent runtime behavior, usually at the protocol or handshake level. + +### Relevant Code Paths + +- `SilKit/source/core/vasio/VAsioConnection.cpp` +- version and registry message handling nearby + +### Common Causes + +- genuinely incompatible protocol behavior +- mixed old/new registry and participant combinations +- duplicate participant names in older-version combinations producing less clear diagnostics + +### Practical Checks + +- confirm whether all participants and utilities are from a compatible version line +- confirm whether the error is a true protocol mismatch or a duplicate-name edge case +- if the failure mentions handshake protocol version, inspect both participant and registry versions together + +## Build And Test Failures During Development + +### Typical Symptoms + +- build config succeeds but tests are missing +- `ctest` runs fewer suites than expected +- a utility or demo binary is missing from the build tree +- docs target is unavailable + +### Common Causes + +- wrong preset or manual configuration flags +- `SILKIT_BUILD_TESTS`, `SILKIT_BUILD_UTILITIES`, `SILKIT_BUILD_DEMOS`, or `SILKIT_BUILD_DOCS` not enabled as expected +- filtering `ctest` too aggressively +- expecting demos or docs in a build configuration that does not include them + +### Practical Checks + +- inspect the chosen preset in `CMakePresets.json` +- inspect the root `CMakeLists.txt` options +- use `ctest -N` to confirm whether tests were registered +- confirm the build tree you are invoking matches the configure preset you think you used + +## Fast Debugging Workflow + +When you do not know where the problem is, this sequence is usually efficient: + +1. Confirm configuration loaded successfully. +2. Confirm participant startup reached registry connection. +3. Confirm registry connection succeeded. +4. Confirm all known participant handshakes completed. +5. Confirm the issue is transport, orchestration, or service-level. +6. Only then move into protocol-specific debugging. + +That order avoids spending time in the wrong layer. + +## Useful Files By Failure Type + +For configuration problems: + +- `SilKit/source/config/ParticipantConfigurationFromXImpl.cpp` +- `SilKit/source/core/participant/ValidateAndSanitizeConfig.cpp` + +For startup and connectivity problems: + +- `SilKit/source/core/vasio/VAsioConnection.cpp` +- `SilKit/source/core/vasio/ConnectKnownParticipants.cpp` +- `SilKit/source/core/vasio/RemoteConnectionManager.cpp` +- `SilKit/source/core/vasio/TransformAcceptorUris.cpp` + +For lifecycle and shutdown behavior: + +- `SilKit/source/services/orchestration/` +- `Utilities/SilKitSystemController/SystemController.cpp` + +For monitor-side observation: + +- `Utilities/SilKitMonitor/PassiveSystemMonitor.cpp` + +## Related Pages + +- [Build and Test](./build-and-test.md) +- [Core Architecture](./core-architecture.md) +- [Networking and Transport](./networking-and-transport.md) +- [Utilities and Processes](./utilities-and-processes.md) +- [Release and Versioning](./release-and-versioning.md) +- [Developer Wiki Front Page](./README.md) diff --git a/docs/dev-wiki/networking-and-transport.md b/docs/dev-wiki/networking-and-transport.md new file mode 100644 index 000000000..9967e319e --- /dev/null +++ b/docs/dev-wiki/networking-and-transport.md @@ -0,0 +1,373 @@ +# Networking and Transport + +This page explains how participant discovery, connection setup, and transport fallback work in the current SIL Kit implementation. +It is focused on the runtime internals behind connectivity, not on user-level setup alone. + +## Big Picture + +The transport model follows this sequence: + +1. A participant opens local acceptors for incoming connections. +2. The participant connects to the registry. +3. The participant announces its reachable endpoints to the registry. +4. The registry informs participants about each other. +5. Participants try to establish direct peer-to-peer connections. +6. If direct connection fails, SIL Kit may try a remote-connect request. +7. If that still fails, SIL Kit may fall back to routing messages through the registry as a proxy. + +Conceptually, the registry is always required for discovery. +In the normal case, it is not part of the steady-state data path between participants. + +## Core Transport Implementation + +The concrete transport backend in this repository is VAsio. + +The most important implementation area is: + +- `SilKit/source/core/vasio/` + +Key files include: + +- `VAsioConnection.*`: participant-side transport orchestration +- `VAsioRegistry.*`: registry-side transport runtime +- `VAsioPeer.*`: a direct peer connection +- `VAsioProxyPeer.*`: a proxy-backed peer routed through the registry +- `ConnectPeer.*`: low-level connect attempts to a specific peer +- `ConnectKnownParticipants.*`: higher-level coordination of connecting to all discovered peers +- `RemoteConnectionManager.*`: handles remote connection attempts initiated via the registry +- `TransformAcceptorUris.*`: rewrites advertised endpoints for a specific audience participant + +If you are debugging connectivity, this directory is usually more important than any single service implementation directory. + +## What The Registry Actually Does + +The registry has two distinct roles: + +- mandatory discovery and initial connection brokering +- optional proxy fallback when direct participant-to-participant communication is not possible + +That distinction matters. + +Under normal operation: + +- the registry is the first thing a participant connects to +- the registry shares information about already connected participants +- the participants then attempt to connect directly to each other + +This matches the intended runtime model: + +- the registry is a central process for discovery +- communication between participants is peer-to-peer +- if direct transport is unavailable, the registry can be used as a proxy + +The best mental model is: discovery first, direct transport preferred, proxy only as fallback. + +## Participant Join Sequence In Code + +The main participant-side setup lives in `VAsioConnection::JoinSimulation(...)`. + +In the current implementation, it performs these steps in order: + +1. determine the simulation name from the connect URI +2. open participant acceptors +3. connect to the registry and start the IO worker +4. wait for the registry handshake to complete +5. connect to all known participants +6. wait for all participant handshakes to complete + +That flow is a good entry point for debugging startup failures because it separates: + +- local listen setup failures +- registry connection failures +- known-participant handshake failures + +## Acceptor Endpoints + +Participants must be reachable by other participants. +To do that, they open acceptors before connecting to the registry. + +By default, `VAsioConnection` prepares acceptors like this: + +- TCP IPv4 catch-all: `tcp://0.0.0.0:0` +- TCP IPv6 catch-all: `tcp://[::]:0` +- local domain socket path derived from temp-directory state and participant identity + +Important details: + +- `:0` means the operating system chooses a free port +- local domain sockets are enabled only when `EnableDomainSockets` is true +- if all acceptors fail, participant join fails immediately + +Relevant code paths: + +- `PrepareAcceptorEndpointUris(...)` +- `OpenParticipantAcceptors(...)` +- `OpenTcpAcceptors(...)` +- `OpenLocalAcceptors(...)` + +## Middleware Configuration Knobs + +The most relevant participant transport settings are in the middleware configuration section: + +- `RegistryUri` +- `ConnectAttempts` +- `ConnectTimeoutSeconds` +- `EnableDomainSockets` +- `RegistryAsFallbackProxy` +- `AcceptorUris` +- `TcpNoDelay` +- `TcpQuickAck` +- `TcpSendBufferSize` +- `TcpReceiveBufferSize` + +These matter in different ways: + +- `RegistryUri` controls how participants find the registry +- `ConnectAttempts` and `ConnectTimeoutSeconds` shape connection and handshake behavior +- `EnableDomainSockets` changes whether local-domain paths are used at all +- `RegistryAsFallbackProxy` controls whether proxy fallback capability is advertised +- `AcceptorUris` lets developers replace default ephemeral listen endpoints with explicit ones +- the TCP settings tune socket behavior rather than discovery logic + +The most important "special case" setting is `AcceptorUris`, because it can make peer-to-peer connectivity deterministic across firewalls, containers, VMs, or routed networks. + +## Why Endpoint Transformation Exists + +Participants may advertise endpoints that are not directly useful to every other participant. +Typical examples: + +- catch-all addresses like `0.0.0.0` +- loopback addresses only useful to local peers +- local-domain socket paths meaningful only on the same host + +`TransformAcceptorUris(...)` exists to adapt the advertised participant endpoints to a specific audience participant. + +What it does at a high level: + +- inspects the source connection path and audience connection path +- rewrites catch-all TCP acceptors into concrete addresses where appropriate +- preserves local-domain endpoints when they make sense +- orders resulting endpoints by preference + +The current ordering policy prefers: + +- local-domain endpoints first +- then loopback or non-local TCP depending on whether the audience is local + +This is an important file when debugging "the registry told me to connect somewhere unusable" type issues. + +## Registry Connection Step + +Before a participant can talk to peers, it must connect to the registry. + +The participant builds a synthetic `VAsioPeerInfo` for the registry using: + +- a local-domain endpoint if domain sockets are enabled +- the configured `silkit://host:port` connect URI converted to a TCP endpoint + +It then tries to connect using `ConnectPeer`. + +If registry connection fails, the current implementation logs: + +- that the registry could not be reached +- which URIs were attempted +- hints about domain sockets, host resolution, and configuration + +The participant then fails fast with a transport-level error. + +## Discovery Of Known Participants + +Once connected to the registry, the participant announces itself. +The registry handshake gives the participant the set of already known peers. + +The next stage is handled by `ConnectKnownParticipants`. + +Its responsibilities are: + +- store the known participant set +- create a peer state tracker for each discovered participant +- start direct connection attempts to all of them +- track progress until either all replies are received or at least one peer fails completely + +The participant-side promises in `VAsioConnection` then translate this into user-visible success or failure of simulation join. + +## Direct Connect, Remote Connect, Proxy Fallback + +For each known participant, the current connection order is: + +1. direct connect +2. remote connect request +3. registry proxy fallback + +### Direct Connect + +`ConnectKnownParticipants::Peer::StartConnecting()` starts a direct connect with `ConnectPeer`. +On success: + +- a `VAsioPeer` is created +- the peer is registered with the connection +- the handshake proceeds while waiting for the participant announcement reply + +This is the preferred and expected path. + +### Remote Connect Request + +If direct connect fails, SIL Kit may request that the remote participant connects back instead. + +This only works when: + +- local configuration enables the capability +- the remote participant also advertises support for it + +In code, this is the `RequestParticipantConnection` capability path. + +If supported: + +- a `RemoteParticipantConnectRequest` is sent through the registry +- a timer is started while waiting for the remote side to act +- a successful remote connection becomes a regular `VAsioPeer` + +This path matters in asymmetric networking setups where one side can initiate but the other side cannot be reached directly from the first side. + +### Registry Proxy Fallback + +If direct connect and remote connect are not available or do not succeed, SIL Kit may fall back to proxy messaging through the registry. + +This only works when: + +- proxy fallback is enabled in local configuration +- the remote participant also advertises proxy capability + +In code, this is the `ProxyMessage` capability path. + +If proxy fallback is possible: + +- a `VAsioProxyPeer` is created +- the registry remains in the message path for that peer relation +- handshake completion can still proceed, but the communication path is now slower and less direct + +This is why the runtime emits warnings when direct connect fails and the registry must proxy messages. + +## Proxy Message Behavior + +Proxy messages are handled explicitly in `VAsioConnection::ReceiveProxyMessage(...)`. + +Important behavior: + +- proxy messages are ignored if the feature is disabled by configuration +- when acting as proxy, SIL Kit forwards the payload to the target peer +- proxy source-to-destination mappings are tracked +- on disconnect, empty proxy messages can be emitted so destinations learn that the proxied source disappeared + +This disconnect propagation is easy to miss and matters when debugging cleanup behavior for proxied peers. + +## Capabilities Drive Fallback Decisions + +Connection fallback is not based only on local preference. It also depends on advertised capabilities. + +Current capability decisions include: + +- `ProxyMessage` if `registryAsFallbackProxy` is enabled +- `RequestParticipantConnection` if remote participant connection is enabled in configuration + +That means many "why didn't it use proxy" or "why didn't it request reverse connection" questions are really capability-advertising questions. + +Start in these places: + +- `MakeCapabilitiesFromConfiguration(...)` +- `TryRemoteConnectRequest(...)` +- `TryProxyConnect(...)` + +## Failure Modes To Recognize + +There are three common classes of failures. + +### Registry Reachability Failure + +Symptoms: + +- participant cannot connect to the registry at all +- logs mention failed attempts to connect to the registry + +Typical causes: + +- wrong `RegistryUri` +- hostname not resolvable from the participant host +- registry bound only to local sockets or wrong interface +- firewall, NAT, VM, Docker, or WSL boundary issues + +### Participant Reachability Failure + +Symptoms: + +- registry connection succeeds +- timeout occurs while waiting for known participants +- logs list specific unreachable participants and their advertised endpoints + +Typical causes: + +- advertised ports not reachable from peer hosts +- domain sockets advertised where only TCP would work +- loopback or catch-all addressing mismatched to the actual network topology + +### Proxy Fallback Warning + +Symptoms: + +- direct connect warning appears +- communication still works, but via registry proxy + +Typical causes: + +- direct peer connectivity blocked by topology or firewall +- remote-connect path unavailable or unsupported +- proxy capability still enabled, allowing degraded operation + +This is a degraded but not necessarily fatal mode. + +## Practical Debugging Order + +When diagnosing transport issues, this order is usually effective: + +1. Verify the participant can reach the registry. +2. Verify the registry is listening on the intended address family and interface. +3. Inspect advertised acceptor URIs for the failing participant. +4. Check whether domain sockets, loopback, or catch-all addresses make sense for the topology. +5. Check whether direct connect, remote connect, or proxy capability should be available. +6. Only after that inspect protocol-specific service code. + +In practice, many “PubSub is broken” or “CAN does not work” issues are actually transport reachability problems below the service layer. + +## Where To Start For Specific Changes + +If you need to change participant startup transport behavior: + +- start in `VAsioConnection::JoinSimulation(...)` + +If you need to change how peers are connected: + +- start in `ConnectKnownParticipants.*` +- then inspect `ConnectPeer.*` + +If you need to change remote-connect fallback: + +- start in `RemoteConnectionManager.*` +- then inspect `TryRemoteConnectRequest(...)` + +If you need to change proxy behavior: + +- inspect `TryProxyConnect(...)` +- inspect `VAsioProxyPeer.*` +- inspect `ReceiveProxyMessage(...)` + +If you need to change endpoint rewriting or address selection: + +- inspect `TransformAcceptorUris.*` +- inspect acceptor preparation in `VAsioConnection.*` + +## Related Pages + +- [Core Architecture](./core-architecture.md) +- [Repository Layout](./repository-layout.md) +- [Build and Test](./build-and-test.md) +- [Developer Wiki Front Page](./README.md) diff --git a/docs/dev-wiki/release-and-versioning.md b/docs/dev-wiki/release-and-versioning.md new file mode 100644 index 000000000..fb8123eea --- /dev/null +++ b/docs/dev-wiki/release-and-versioning.md @@ -0,0 +1,256 @@ +# Release and Versioning + +This page documents the versioning policy, compatibility boundaries, changelog structure, and packaging facts that are relevant to developers working in this repository. + +It intentionally does not describe release execution workflow. + +## Scope + +This page focuses on: + +- how SIL Kit versions are classified +- what compatibility promises developers must preserve +- where version information lives in the repository +- how release notes are structured in the tree +- what packaging metadata and package components exist + +This page does not cover: + +- release ownership +- branch or tag procedure +- publishing steps +- approval or handoff workflow + +## Semantic Versioning + +SIL Kit uses semantic versioning with the structure `..`. + +The intended meaning is: + +- `MAJOR`: incompatible public API changes +- `MINOR`: compatible public API changes, such as additions or deprecations +- `PATCH`: compatible bug fixes + +When classifying a change, the important question is not how large the implementation diff is. +The important question is whether the externally visible compatibility contract changes. + +## Compatibility Commitments + +There are a few compatibility boundaries that should be treated as hard constraints. + +### Non-Experimental Functionality + +Non-experimental functionality should not be removed without prior deprecation. + +That means: + +- removal is not a routine cleanup action +- deprecation must come first +- version impact must be considered before the removal happens + +### ABI Stability + +Exported symbol ABI compatibility must be preserved. + +Practically, this means developers should avoid incompatible changes such as: + +- changing the signature of an exported function in an incompatible way +- modifying exported structures used in function signatures in an incompatible way + +Even small-looking ABI mistakes can create very difficult runtime failures for downstream users. + +### Network Compatibility + +Network protocol compatibility is also part of the contract. + +If the protocol changes incompatibly, older participants must still be able to detect the incompatibility and refuse the connection safely. + +For developers, this means: + +- transport or handshake changes are release-relevant changes +- compatibility must be thought about at the protocol boundary, not only at the API boundary + +## API Surface Categories + +Not all API surfaces in the repository have the same compatibility status. + +### `C` API + +The `C` API is the main compatibility anchor. + +If a change affects the public API contract in a versioning-sensitive way, the `C` API is the first place to assess impact. + +### Header-Only `C++` API + +The `C++` API in the public headers is a wrapper around the `C` API and follows an hourglass-style design. + +It should be treated as public and user-visible, but it does not carry the same stated inter-version compatibility guarantees as the `C` API. + +That does not make it safe to change casually. +It means developers should distinguish between: + +- strict versioning commitments +- broader user-facing behavior and source-compatibility expectations + +### Experimental API + +Experimental API is outside the normal compatibility promise. + +Important caveats: + +- experimental functions or types may be removed without a major version bump +- experimental signatures still should not be changed incompatibly in a way that breaks ABI silently + +This means experimental code has more freedom than the stable API, but not unlimited freedom. + +## Where Version Information Lives + +The main version metadata is configured in: + +- `SilKit/cmake/SilKitVersion.cmake` + +The important variables there are: + +- `SILKIT_VERSION_MAJOR` +- `SILKIT_VERSION_MINOR` +- `SILKIT_VERSION_PATCH` +- `SILKIT_BUILD_NUMBER` +- `SILKIT_VERSION_SUFFIX` + +In practice: + +- major, minor, and patch define the visible version line +- build number is separate metadata used by builds and packaging +- suffix support exists for annotated version strings + +Developers touching version-sensitive behavior should inspect this file before assuming how version strings are assembled. + +## Changelog Layout + +Release notes are maintained as markdown files under: + +- `docs/changelog/versions/` + +The important files and conventions are: + +- `latest.md`: notes for the current unreleased line +- versioned files such as `.md`: released notes for past versions +- `template.md`: structure template for a version note + +The heading convention supports unreleased notes by using `UNRELEASED` in the heading instead of a date. + +From a repository-maintenance perspective, this means: + +- unreleased changes should accumulate in `latest.md` +- released notes should follow the established markdown structure +- changelog shape is part of the repo contract even though it is not runtime code + +## Packaging Facts + +Packaging is configured in the root `CMakeLists.txt` through CPack. + +### Package Metadata + +The configured package metadata includes: + +- package name +- package version +- package vendor +- package contact +- generated package file name assembled from version and platform metadata + +The resulting package naming uses inputs such as: + +- version +- platform +- architecture +- compiler +- build type information + +### Package Components + +The configured components include: + +- `bin`: binaries +- `dev`: headers and development artifacts +- `utils`: utility tools +- `docs`: documentation, when enabled +- `source`: source package content, when enabled + +Not all components are always present. +Some depend on build configuration. + +### Build Options That Affect Packaging + +Two options matter especially for package contents: + +- `SILKIT_BUILD_DOCS` +- `SILKIT_INSTALL_SOURCE` + +In practical terms: + +- enabling docs allows documentation artifacts to be packaged +- enabling source install allows source-tree packaging content to be included + +### Packaging-Oriented Preset + +`CMakePresets.json` contains a `distrib` preset that is the most packaging-oriented preset currently defined in the repository. + +It is useful for understanding which options are considered important for a release-like build configuration, even when no release procedure is being described here. + +## Change Impact Guide + +When deciding whether a change is patch-, minor-, or major-relevant, classify it by externally visible impact. + +### Usually Patch-Level + +Changes like these are usually patch-level: + +- fixing implementation bugs without changing public API contracts +- performance or robustness improvements that preserve compatibility +- internal refactoring that does not change public behavior in an incompatible way + +### Usually Minor-Level + +Changes like these are usually minor-level: + +- adding new compatible public functionality +- deprecating existing public functionality while keeping it available +- extending behavior in a backward-compatible way + +### Usually Major-Level + +Changes like these are usually major-level: + +- removing stable public functionality +- changing stable public API behavior incompatibly +- breaking compatibility expectations for downstream users in a way they must actively adapt to + +### Experimental Changes + +Experimental API changes should be classified separately from the stable API promise. + +Even when a major bump is not required for experimental removals, developers should still evaluate: + +- whether the change is user-visible +- whether the change breaks ABI silently +- whether transport or runtime compatibility is affected + +## Developer Heuristics + +Before merging a change with versioning implications, it is usually worth asking: + +1. Does this affect the stable public API surface? +2. Does this affect exported ABI? +3. Does this affect participant interoperability or transport handshake behavior? +4. Does this require changelog visibility? +5. Does this change what should appear in packaged outputs? + +If the answer to any of these is yes, the change belongs in versioning-aware review rather than being treated as a normal internal refactor. + +## Related Pages + +- [Build and Test](./build-and-test.md) +- [Core Architecture](./core-architecture.md) +- [Utilities and Processes](./utilities-and-processes.md) +- [Developer Wiki Front Page](./README.md) diff --git a/docs/dev-wiki/repository-layout.md b/docs/dev-wiki/repository-layout.md new file mode 100644 index 000000000..4af0572d6 --- /dev/null +++ b/docs/dev-wiki/repository-layout.md @@ -0,0 +1,159 @@ +# Repository Layout + +This page explains how the SIL Kit repository is organized and where to look when making changes. +It is intended as a practical map for developers, not as an exhaustive description of every target. + +## Top-Level Overview + +The main top-level directories are: + +- `SilKit/`: the library itself, including public headers, implementation, and tests +- `Utilities/`: standalone tools such as the registry, monitor, and system controller +- `Demos/`: example applications and demo programs +- `docs/`: Sphinx and Doxygen documentation sources +- `ThirdParty/`: vendored dependencies managed by the build +- `cmake/`: shared top-level CMake helper modules and packaging files +- `.github/`: CI workflows and GitHub project metadata + +Common generated directories that should usually be ignored during development: + +- `_build/`, `_build*/`: local build trees +- `_install/`, `install*/`: install trees + +## Entry Points + +The main build entry point is the repository root `CMakeLists.txt`. + +That file: + +- declares the top-level build options such as `SILKIT_BUILD_TESTS` and `SILKIT_BUILD_DOCS` +- enables testing globally with `enable_testing()` +- includes shared CMake helpers from `cmake/` and `SilKit/cmake/` +- adds the main subprojects: `SilKit/`, `Utilities/`, `Demos/`, and optionally `docs/` + +`CMakePresets.json` provides the default local and CI-oriented configure, build, and test presets. + +## The `SilKit/` Directory + +`SilKit/` contains the actual product library and most of the code developers will touch. + +Important subdirectories: + +- `SilKit/include/`: public headers installed for consumers of the library +- `SilKit/source/`: implementation code and most internal modules +- `SilKit/IntegrationTests/`: integration and functional tests +- `SilKit/cmake/`: library-specific CMake modules, toolchains, and test helpers +- `SilKit/ci/`: CI scripts and packaging helpers + +### `SilKit/source/` + +The main implementation is split into several areas: + +- `util/`: shared helper code +- `config/`: configuration loading and related support code +- `tracing/`: tracing support +- `core/`: core runtime internals, participant internals, request/reply, and transport-related code +- `services/`: protocol and service implementations such as CAN, Ethernet, LIN, FlexRay, PubSub, RPC, logging, metrics, and orchestration +- `extensions/`: extension-related implementation +- `capi/`: C API implementation layer +- `experimental/`: experimental features, including network simulator internals +- `dashboard/`: dashboard-related client and service code + +The source tree builds up many object libraries which are then linked into the final `SilKit` library. +That means changes in a small submodule often affect the final shared library without introducing a new standalone binary. + +### Public vs. Internal Code + +As a rule of thumb: + +- start in `SilKit/include/` when you need to understand the public API shape +- start in `SilKit/source/` when you need to change behavior or internals +- check `SilKit/source/capi/` when a change must preserve or extend C API behavior + +The repository also distinguishes between public and internal test coverage through separate test executables. + +## Tests + +Tests are centered under `SilKit/IntegrationTests/` and are wired up through `SilKit/cmake/SilKitTest.cmake`. + +There are several aggregate test executables: + +- `SilKitUnitTests` +- `SilKitIntegrationTests` +- `SilKitInternalIntegrationTests` +- `SilKitFunctionalTests` +- `SilKitInternalFunctionalTests` + +CTest entries are registered per test suite, not just per executable. +The helper macro extracts suite names and creates one CTest test per suite using `--gtest_filter=.*`. + +Practical consequence: + +- `ctest` output is usually more granular than the number of binaries alone suggests +- if you add a new `*Test_*.cpp` file through the helper macro, it will usually appear as its own CTest suite name + +## The `Utilities/` Directory + +`Utilities/` contains standalone tools built on top of the library: + +- `SilKitRegistry/` +- `SilKitSystemController/` +- `SilKitMonitor/` + +These are included only when `SILKIT_BUILD_UTILITIES=ON`. +If you are changing startup flows, participant discovery, orchestration, or developer tooling, this directory is often relevant. + +## The `Demos/` Directory + +`Demos/` contains example applications and sample integrations. + +The demos serve two purposes: + +- examples for users of the library +- a source of integration-style validation for common flows + +Notable areas include: + +- `Demos/communication/`: protocol-oriented demos such as CAN, Ethernet, FlexRay, LIN, PubSub, and RPC +- `Demos/api/`: API-oriented examples +- `Demos/tools/Benchmark/`: benchmarking-related demo tooling + +Some integration tests in `SilKit/IntegrationTests/` are based on demo applications, so changes in demos can affect test behavior. + +## The `docs/` Directory + +`docs/` contains the documentation sources used by the Sphinx and Doxygen build. + +Key areas: + +- `docs/dev-wiki/`: this developer wiki +- `docs/_static/` and `docs/_templates/`: documentation assets and templates + +The repository's `docs/` tree contains the separately maintained documentation set, while `docs/dev-wiki/` +acts as the developer-focused knowledge base for repository work. + +The documentation build is enabled only when `SILKIT_BUILD_DOCS=ON`. + +## The `ThirdParty/` Directory + +`ThirdParty/` holds vendored dependencies used by the build. + +Treat this directory carefully: + +- avoid editing vendored code unless there is a concrete reason +- prefer changes in SIL Kit integration code over patching third-party sources +- if a change must touch `ThirdParty/`, document the reason clearly + +## How To Navigate Changes + +When working on a change, this starting point usually works well: + +1. Find the public API or executable entry point involved. +2. Trace from the relevant `CMakeLists.txt` into the concrete source directory. +3. Identify the owning implementation area under `SilKit/source/`, `Utilities/`, or `Demos/`. +4. Check nearby tests and existing docs before editing. + +## Related Pages + +- [Build and Test](./build-and-test.md) +- [Developer Wiki Front Page](./README.md) diff --git a/docs/dev-wiki/utilities-and-processes.md b/docs/dev-wiki/utilities-and-processes.md new file mode 100644 index 000000000..79f8faa66 --- /dev/null +++ b/docs/dev-wiki/utilities-and-processes.md @@ -0,0 +1,325 @@ +# Utilities and Processes + +This page explains the built-in SIL Kit utility executables, what role each one plays in a running system, and how their implementations relate back to the library runtime. + +## Overview + +The repository ships three main utility processes under `Utilities/`: + +- `sil-kit-registry` +- `sil-kit-system-controller` +- `sil-kit-monitor` + +They are all built when `SILKIT_BUILD_UTILITIES=ON`. + +At a high level: + +- the registry is required for discovery +- the system controller is used for coordinated simulation startup and system-wide control +- the monitor is optional and used to observe participant and system state + +One important distinction is that the registry is mandatory, while the other two utilities are reference applications built on top of the library. + +## Build Structure + +The top-level utility aggregation is simple: + +- `Utilities/CMakeLists.txt` adds `SilKitMonitor` +- `Utilities/CMakeLists.txt` adds `SilKitRegistry` +- `Utilities/CMakeLists.txt` adds `SilKitSystemController` + +The three executables are not all built the same way. + +### `sil-kit-registry` + +The registry executable links mostly against internal/static library pieces: + +- `I_SilKit` +- `S_SilKitImpl` +- `O_SilKitRegistry_Config` +- `O_SilKit_Dashboard` + +This reflects that the registry is not just a regular participant application. It hosts the concrete registry runtime directly. + +### `sil-kit-system-controller` + +The system controller links as an application on top of the main library: + +- `SilKit` +- `I_SilKit` +- `O_SilKit_Util_SignalHandler` + +This utility behaves much more like a normal SIL Kit participant. + +### `sil-kit-monitor` + +The monitor is also a participant-style executable and links similarly: + +- `SilKit` +- `I_SilKit` +- `O_SilKit_Util_SignalHandler` + +This is a useful rule of thumb: + +- registry: infrastructure process with deeper internal linkage +- controller and monitor: participant-based tools built on public library behavior + +## `sil-kit-registry` + +### Purpose + +The registry is the mandatory process used for participant discovery. +Participants must be able to connect to it in order to join a SIL Kit system. + +Conceptually, the registry: + +- listens for participant connections +- establishes the initial discovery graph +- provides already-known participant information to newly joining participants +- may also act as a proxy fallback for participant-to-participant communication when direct connectivity is not possible + +The normal steady-state goal is still peer-to-peer participant communication. + +### Source Location + +- `Utilities/SilKitRegistry/` + +Important files: + +- `Registry.cpp` +- `Registry.hpp` +- `config/RegistryConfiguration.*` +- `WindowsServiceMain.*` +- `ExampleRegistryConfiguration.yaml` + +### Implementation Shape + +`Registry.cpp` handles: + +- command-line parsing +- optional registry configuration file loading +- logging setup +- optional dashboard hookup +- configuration sanitization +- creation of the concrete registry object +- startup via `StartListening(...)` +- optional generated configuration-file output +- signal-driven shutdown handling + +The concrete runtime object is created through the internal registry creation path described in the architecture page. + +### Configuration Behavior + +The registry accepts both command-line parameters and a registry configuration file. +The code explicitly supports overriding command-line defaults from the configuration file. + +Important behaviors worth remembering: + +- the effective listen URI may be changed by sanitization and startup +- if port `0` is used, the chosen runtime port can be written back into a generated config file +- the registry can be started with dashboard integration +- on Windows service runs, domain sockets are disabled in the registry configuration path + +This utility is the right place to look when the question is "why is the registry actually listening here?" rather than "why can't participants connect?" + +### Windows Service Support + +The registry has platform-specific support for running as a Windows service. + +That support lives in: + +- `WindowsServiceMain.cpp` +- `WindowsServiceMain.hpp` + +This is specific to the registry utility, not a general property of all utilities. + +## `sil-kit-system-controller` + +### Purpose + +The system controller is the process that defines which participants are required for a coordinated simulation and can issue system-wide control actions such as stop or abort. + +Conceptually, it exists to answer these questions: + +- which participants are required before the coordinated simulation can start? +- when should the workflow configuration be published? +- how should stop or abort be triggered from a central control point? + +### Source Location + +- `Utilities/SilKitSystemController/` + +Primary file: + +- `SystemController.cpp` + +### Implementation Shape + +The implementation is a participant-based application. + +In broad strokes it does the following: + +1. creates a SIL Kit participant +2. creates an experimental system controller service from that participant +3. creates a system monitor +4. observes participant connect and disconnect events +5. waits until all required participants are connected +6. publishes workflow configuration including the required participant set +7. starts a coordinated lifecycle for the controller participant itself +8. handles stop or abort logic on user signal or system-state changes + +The core controller class in the file wires together: + +- `CreateSystemController(...)` +- `CreateSystemMonitor()` +- `CreateLifecycleService(...)` + +This means the executable is a good example of how orchestration services compose in a real participant. + +### Why It Uses The Monitor + +The system controller is not implemented as a blind command sender. +It also observes the system through `ISystemMonitor` callbacks. + +That is how it: + +- tracks which required participants are already connected +- logs remaining required participants +- reacts to stopping or error system states + +This design is useful to remember when changing orchestration behavior: system control and system observation are intentionally coupled in this utility. + +### Stop And Abort Semantics + +The controller handles shutdown pragmatically: + +- if the system is running or paused, it issues `Stop(...)` +- if the system is already aborting, it only logs that situation +- otherwise it attempts `AbortSimulation()` + +It also retries while waiting for final shutdown and escalates from stop to abort if needed. + +This is implementation logic in the utility, not the only possible orchestration policy. +If product behavior questions arise, separate: + +- library-level orchestration semantics +- policy choices made by this specific reference executable + +## `sil-kit-monitor` + +### Purpose + +The monitor is an observer utility for participant and system state. +It is optional and can be started or restarted independently. + +Conceptually, it is the simplest of the three tools: + +- connect to the registry +- observe who is present +- observe participant state transitions +- observe overall system state transitions +- optionally run with lifecycle and time-sync behavior of its own + +### Source Location + +- `Utilities/SilKitMonitor/` + +Primary file: + +- `PassiveSystemMonitor.cpp` + +### Implementation Shape + +The monitor is again a participant-style executable. + +Its base behavior is: + +1. load or synthesize a participant configuration +2. create a participant +3. create a system monitor from that participant +4. register logging callbacks for connection, participant status, and system state changes +5. optionally create a lifecycle service and time sync service, depending on command-line mode +6. wait for signal-based shutdown + +The code shows a useful distinction: + +- the utility can be purely observational without owning a lifecycle +- or it can join with autonomous/coordinated lifecycle behavior +- or it can also participate in virtual time if `--sync` is selected + +That makes it both a monitoring tool and a compact example program for orchestration APIs. + +### Why It Is Described As Passive + +The documentation calls the monitor a passive participant because its main purpose is observation rather than control. +In practice, the code still creates a real participant and can optionally have lifecycle participation. + +So "passive" here should be read as operational intent, not as "not a participant". + +## How The Three Utilities Relate + +These utilities form a useful mental stack: + +- `sil-kit-registry`: transport/discovery infrastructure +- `sil-kit-system-controller`: orchestration policy and system-wide control +- `sil-kit-monitor`: orchestration/system-state observation + +Or in another form: + +- registry answers "how do participants find each other?" +- system controller answers "when is the coordinated system allowed to run and how is it stopped?" +- monitor answers "what is the system doing right now?" + +Only the registry is mandatory for basic operation. +The other two are convenience and reference implementations built on top of the same underlying services available to any SIL Kit participant. + +## Where Logic Lives: Utility vs Library + +When working on these tools, it helps to distinguish utility logic from reusable library behavior. + +Mostly library behavior: + +- registry transport runtime +- participant creation +- lifecycle semantics +- system controller service semantics +- system monitor service semantics + +Mostly utility behavior: + +- command-line parsing +- logging and console output choices +- exact stop/abort policy used by the controller executable +- generated config-file handling in the registry executable +- how the monitor formats and emits observations + +If you are fixing a bug, ask first whether it belongs in: + +- the reusable runtime or service implementation, or +- only in the executable's policy and presentation layer + +## When To Read Which Utility + +Read `SilKitRegistry/` when: + +- participants cannot discover each other +- listen URI or generated configuration behavior looks wrong +- dashboard hookup or Windows service behavior is involved + +Read `SilKitSystemController/` when: + +- coordinated startup does not begin as expected +- required participant handling is wrong +- stop or abort interactions are surprising + +Read `SilKitMonitor/` when: + +- participant or system state observation is unclear +- you need a minimal example of monitor callbacks or optional lifecycle usage + +## Related Pages + +- [Core Architecture](./core-architecture.md) +- [Networking and Transport](./networking-and-transport.md) +- [Repository Layout](./repository-layout.md) +- [Developer Wiki Front Page](./README.md)