From 530c2198b6ba274ab73c07687e60bd710dbd6604 Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Wed, 25 Feb 2026 18:49:21 +0000 Subject: [PATCH 01/20] Adds Claude skill for updatig the falco fork --- .claude/commands/update-falco-libs.md | 161 ++++++++++++++++++++++++++ 1 file changed, 161 insertions(+) create mode 100644 .claude/commands/update-falco-libs.md diff --git a/.claude/commands/update-falco-libs.md b/.claude/commands/update-falco-libs.md new file mode 100644 index 0000000000..ac9924ea0d --- /dev/null +++ b/.claude/commands/update-falco-libs.md @@ -0,0 +1,161 @@ +# Update Falcosecurity-Libs Fork + +You are helping update the falcosecurity-libs fork used by the StackRox collector. + +## Repository Context + +- **Collector repo**: The current working directory +- **Fork submodule**: `falcosecurity-libs/` — StackRox's fork of `https://github.com/falcosecurity/libs` +- **Fork repo**: `https://github.com/stackrox/falcosecurity-libs` +- **Upstream remote** (in submodule): `falco` → `git@github.com:falcosecurity/libs` +- **Origin remote** (in submodule): `origin` → `git@github.com:stackrox/falcosecurity-libs.git` +- **Branch naming**: `X.Y.Z-stackrox` branches carry StackRox patches on top of upstream tags +- **Update docs**: `docs/falco-update.md` + +## Step 1: Assess Current State + +Run the following in the `falcosecurity-libs/` submodule: + +1. `git describe --tags HEAD` — find current upstream base version +2. `git log --oneline ..HEAD --no-merges` — list all StackRox patches +3. `git fetch falco --tags` — get latest upstream tags +4. `git tag -l '0.*' | sort -V | tail -10` — find latest upstream releases +5. `git branch -a | grep stackrox` — find existing StackRox branches +6. Count upstream commits: `git log --oneline .. | wc -l` + +Report: current version, target version, number of StackRox patches, number of upstream commits. + +## Step 2: Analyze StackRox Patches + +For each StackRox patch, determine if it has been upstreamed: + +```sh +# For each patch commit, search upstream for equivalent +git log --oneline .. --grep="" +``` + +Categorize each patch as: +- **Upstreamed** — will be dropped automatically during rebase +- **Still needed** — must be carried forward +- **Conflict risk** — touches files heavily modified upstream + +## Step 3: Identify Breaking API Changes + +Check what APIs changed between versions. Key areas to inspect: + +```sh +# Container engine / plugin changes +git log --oneline .. -- userspace/libsinsp/container_engine/ +git log --oneline .. --grep="container_engine\|container plugin" + +# Thread manager changes +git log --oneline .. -- userspace/libsinsp/threadinfo.h userspace/libsinsp/thread_manager.h + +# sinsp API changes +git diff .. -- userspace/libsinsp/sinsp.h | grep -E '^\+|^\-' | head -80 + +# Breaking changes +git log --oneline .. --grep="BREAKING\|breaking\|!:" +``` + +Then grep the collector code for uses of changed/removed APIs: + +```sh +grep -rn '' collector/lib/ collector/test/ --include='*.cpp' --include='*.h' +``` + +Key collector integration points to check: +- `collector/lib/system-inspector/Service.cpp` — sinsp initialization, container setup, thread access +- `collector/lib/system-inspector/EventExtractor.h` — threadinfo field access macros (TINFO_FIELD, TINFO_FIELD_RAW_GETTER) +- `collector/lib/ContainerMetadata.cpp` — container info/label lookup +- `collector/lib/ContainerEngine.h` — container engine integration (may be deleted if upstream removed container engines) +- `collector/lib/ProcessSignalFormatter.cpp` — process signal creation, container_id access, parent thread traversal +- `collector/lib/Process.cpp` — process info access, container_id +- `collector/lib/Utility.cpp` — threadinfo printing +- `collector/test/ProcessSignalFormatterTest.cpp` — thread creation, get_thread_ref, container_id setup +- `collector/CMakeLists.txt` — falco build flags (SINSP_SLIM_THREADINFO, BUILD_LIBSCAP_MODERN_BPF, MODERN_BPF_EXCLUDE_PROGS, etc.) + +## Step 4: Plan Staging Strategy + +If the gap is large (>200 commits), identify intermediate stopping points: + +1. Look for version boundaries where major API changes happen +2. Prefer stopping at versions where container/thread APIs change +3. Each stage should be independently buildable and testable + +Known historical API breakpoints (update as upstream evolves): +- **0.20.0**: `set_import_users` lost second arg, user/group structs on threadinfo replaced with `m_uid`/`m_gid` +- **0.21.0**: Container engine subsystem removed entirely, replaced by container plugin (`libcontainer.so`). `m_container_id` removed from threadinfo. `m_thread_manager` changed to `shared_ptr`. `build_threadinfo()`/`add_thread()` removed from sinsp. +- **0.22.0**: `get_thread_ref` removed from sinsp (use `find_thread`) +- **0.23.0+**: `get_container_id()` removed from threadinfo. Parent thread traversal moved to thread_manager. User/group handling removed from threadinfo. + +## Step 5: Execute Rebase (per stage) + +```sh +cd falcosecurity-libs +git fetch falco +git switch upstream-main && git merge --ff-only falco/master && git push origin upstream-main --tags +git switch +git switch -c -stackrox +git rebase +# Resolve conflicts using categorization from Step 2 +# For each conflict: check if patch is still needed, compare against upstream equivalent +git push -u origin -stackrox +``` + +Always rebase onto upstream **tags** (not master tip) per `docs/falco-update.md`. + +## Step 6: Update Collector Code + +After each rebase stage, update collector code for API changes found in Step 3. + +Common patterns of change: +- **Removed container engine**: Delete `ContainerEngine.h`, replace `set_container_engine_mask()` with `register_plugin()` for the container plugin +- **Container plugin**: Ship `libcontainer.so` in the container image, register it before setting filters like `container.id != 'host'` +- **Container metadata**: Replace `m_container_manager.get_containers()` with plugin state table API +- **Thread access**: Replace `get_thread_ref(tid, true)` with `find_thread(tid, false)` or `m_thread_manager->find_thread()` +- **Container ID**: Replace `tinfo->m_container_id` with `tinfo->get_container_id()` or `FIELD_CSTR(container_id, "container.id")` +- **User/group**: Replace `m_user.uid()` / `m_group.gid()` with `m_uid` / `m_gid` +- **Thread creation in tests**: Replace `build_threadinfo()` with `m_thread_manager_factory.create()`, `add_thread()` with `m_thread_manager->add_thread()` + +## Step 7: Validate Each Stage + +Run this checklist after each stage: + +- [ ] `falcosecurity-libs` builds via cmake `add_subdirectory` +- [ ] Each surviving patch verified: diff against original to ensure no content loss +- [ ] `make collector` succeeds on amd64 +- [ ] `make unittest` passes (all test suites, especially ProcessSignalFormatterTest) +- [ ] Multi-arch compilation: arm64, ppc64le, s390x +- [ ] Integration tests on at least 1 VM type (RHCOS or Ubuntu) +- [ ] Runtime self-checks pass +- [ ] Container ID attribution works correctly +- [ ] Container label/namespace lookup works +- [ ] Network signal handler receives correct container IDs +- [ ] No ASan/Valgrind errors (run with `ADDRESS_SANITIZER=ON` and `USE_VALGRIND=ON`) +- [ ] Performance benchmarks show no regression vs previous stage + +## Step 8: Final Update + +```sh +cd +cd falcosecurity-libs && git checkout -stackrox +cd .. && git add falcosecurity-libs +``` + +Update `docs/falco-update.md` with notes about what changed. + +## PR Strategy + +Each stage should produce **two PRs**: +1. **Fork PR** targeting `upstream-main` in `stackrox/falcosecurity-libs` (the rebased branch) +2. **Collector PR** updating the submodule pointer and making collector-side code changes + +## Important Notes + +- The container plugin (`libcontainer.so`) replaced built-in container engines at upstream 0.21.0 +- Container plugin source: `https://github.com/falcosecurity/plugins/tree/main/plugins/container` +- Container plugin is C++/Go hybrid — needs Go toolchain to build from source +- Upstream only ships x86_64 and arm64 binaries; ppc64le/s390x must be built from source +- The `giles/cherry-picked-stackrox-additions` branch may have useful reference patches for collector-side adaptations +- Previous update attempts (e.g., `0.21.0-stackrox-rc1`) can be used as conflict resolution references From ee3d83a824e3cd3d7e61efcbfa986fd7b98ff562 Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Thu, 26 Feb 2026 18:53:08 +0000 Subject: [PATCH 02/20] Update to falco 0.23.1 --- .gitmodules | 3 + builder/install/00-golang.sh | 16 ++ builder/install/01-container-plugin.sh | 20 ++ builder/install/versions.sh | 1 + builder/third_party/falcosecurity-plugins | 1 + collector/CMakeLists.txt | 1 - collector/Makefile | 1 + collector/container/Dockerfile | 1 + collector/lib/ContainerEngine.h | 25 -- collector/lib/ContainerMetadata.cpp | 21 +- collector/lib/NetworkSignalHandler.cpp | 4 +- collector/lib/Process.cpp | 7 +- collector/lib/ProcessSignalFormatter.cpp | 30 ++- collector/lib/ProcessSignalFormatter.h | 1 + collector/lib/Utility.cpp | 13 +- collector/lib/Utility.h | 5 + .../lib/system-inspector/EventExtractor.cpp | 4 + .../lib/system-inspector/EventExtractor.h | 79 ++++--- collector/lib/system-inspector/Service.cpp | 71 +++--- collector/test/ProcessSignalFormatterTest.cpp | 223 +++++++++--------- collector/test/SystemInspectorServiceTest.cpp | 33 +-- falcosecurity-libs | 2 +- 22 files changed, 309 insertions(+), 253 deletions(-) create mode 100755 builder/install/00-golang.sh create mode 100755 builder/install/01-container-plugin.sh create mode 160000 builder/third_party/falcosecurity-plugins delete mode 100644 collector/lib/ContainerEngine.h diff --git a/.gitmodules b/.gitmodules index 3be16bd1d7..73937569ad 100644 --- a/.gitmodules +++ b/.gitmodules @@ -72,3 +72,6 @@ path = builder/third_party/bpftool url = https://github.com/libbpf/bpftool branch = v7.3.0 +[submodule "builder/third_party/falcosecurity-plugins"] + path = builder/third_party/falcosecurity-plugins + url = https://github.com/falcosecurity/plugins.git diff --git a/builder/install/00-golang.sh b/builder/install/00-golang.sh new file mode 100755 index 0000000000..4c4ba6f25b --- /dev/null +++ b/builder/install/00-golang.sh @@ -0,0 +1,16 @@ +#!/usr/bin/env bash +set -e + +GO_VERSION=1.23.7 +ARCH=$(uname -m) +case ${ARCH} in + x86_64) GO_ARCH=amd64 ;; + aarch64) GO_ARCH=arm64 ;; + ppc64le) GO_ARCH=ppc64le ;; + s390x) GO_ARCH=s390x ;; +esac + +curl -fsSL "https://go.dev/dl/go${GO_VERSION}.linux-${GO_ARCH}.tar.gz" | tar -C /usr/local -xz +export PATH="/usr/local/go/bin:${PATH}" +echo "export PATH=/usr/local/go/bin:${PATH}" >> /etc/profile.d/golang.sh +go version diff --git a/builder/install/01-container-plugin.sh b/builder/install/01-container-plugin.sh new file mode 100755 index 0000000000..d9a0c9c46d --- /dev/null +++ b/builder/install/01-container-plugin.sh @@ -0,0 +1,20 @@ +#!/usr/bin/env bash +set -e + +export PATH="/usr/local/go/bin:${PATH}" + +cd third_party/falcosecurity-plugins/plugins/container + +cp ../../LICENSE "${LICENSE_DIR}/falcosecurity-plugins-container-${CONTAINER_PLUGIN_VERSION}" 2> /dev/null || true + +# Remove static libstdc++ linking — not needed since we control the runtime +# image, and libstdc++-static is not available in CentOS Stream 10. +sed -i '/-static-libgcc\|-static-libstdc++/d' CMakeLists.txt + +cmake -B build -S . \ + -DCMAKE_BUILD_TYPE=Release \ + -DENABLE_ASYNC=ON \ + -DENABLE_TESTS=OFF +cmake --build build --target container --parallel "${NPROCS}" + +install -m 755 build/libcontainer.so /usr/local/lib64/libcontainer.so diff --git a/builder/install/versions.sh b/builder/install/versions.sh index 08988c81e7..c6ccaae09c 100644 --- a/builder/install/versions.sh +++ b/builder/install/versions.sh @@ -16,3 +16,4 @@ export GPERFTOOLS_VERSION=2.16 export UTHASH_VERSION=v1.9.8 export YAMLCPP_VERSION=0.8.0 export LIBBPF_VERSION=v1.3.4 +export CONTAINER_PLUGIN_VERSION=0.6.3 diff --git a/builder/third_party/falcosecurity-plugins b/builder/third_party/falcosecurity-plugins new file mode 160000 index 0000000000..fb2ad646f1 --- /dev/null +++ b/builder/third_party/falcosecurity-plugins @@ -0,0 +1 @@ +Subproject commit fb2ad646f1cab61abb2124df5bf0f2578fc70e58 diff --git a/collector/CMakeLists.txt b/collector/CMakeLists.txt index 2d0a6a2152..bf3ad9bef7 100644 --- a/collector/CMakeLists.txt +++ b/collector/CMakeLists.txt @@ -87,7 +87,6 @@ set(USE_BUNDLED_DEPS OFF CACHE BOOL "Enable bundled dependencies instead of usin set(USE_BUNDLED_CARES OFF CACHE BOOL "Enable bundled dependencies instead of using the system ones" FORCE) set(BUILD_LIBSCAP_GVISOR OFF CACHE BOOL "Do not build gVisor support" FORCE) set(MINIMAL_BUILD OFF CACHE BOOL "Minimal" FORCE) -set(SINSP_SLIM_THREADINFO ON CACHE BOOL "Slim threadinfo" FORCE) set(BUILD_SHARED_LIBS OFF CACHE BOOL "Build position independent libraries and executables" FORCE) set(LIBELF_LIB_SUFFIX ".so" CACHE STRING "Use libelf.so" FORCE) diff --git a/collector/Makefile b/collector/Makefile index eec68231cc..6d3d5bdb70 100644 --- a/collector/Makefile +++ b/collector/Makefile @@ -39,6 +39,7 @@ container/bin/collector: cmake-build/collector mkdir -p container/bin cp "$(COLLECTOR_BIN_DIR)/collector" container/bin/collector cp "$(COLLECTOR_BIN_DIR)/self-checks" container/bin/self-checks + docker cp $(COLLECTOR_BUILDER_NAME):/usr/local/lib64/libcontainer.so container/bin/libcontainer.so .PHONY: collector collector: container/bin/collector txt-files diff --git a/collector/container/Dockerfile b/collector/container/Dockerfile index 569d67cbb2..8a09d67441 100644 --- a/collector/container/Dockerfile +++ b/collector/container/Dockerfile @@ -26,6 +26,7 @@ COPY container/THIRD_PARTY_NOTICES/ /THIRD_PARTY_NOTICES/ COPY kernel-modules /kernel-modules COPY container/bin/collector /usr/local/bin/ COPY container/bin/self-checks /usr/local/bin/self-checks +COPY container/bin/libcontainer.so /usr/local/lib64/libcontainer.so COPY container/status-check.sh /usr/local/bin/status-check.sh EXPOSE 8080 9090 diff --git a/collector/lib/ContainerEngine.h b/collector/lib/ContainerEngine.h deleted file mode 100644 index 63978528c9..0000000000 --- a/collector/lib/ContainerEngine.h +++ /dev/null @@ -1,25 +0,0 @@ -#pragma once - -#include "container_engine/container_cache_interface.h" -#include "container_engine/container_engine_base.h" -#include "threadinfo.h" - -namespace collector { -class ContainerEngine : public libsinsp::container_engine::container_engine_base { - public: - ContainerEngine(libsinsp::container_engine::container_cache_interface& cache) : libsinsp::container_engine::container_engine_base(cache) {} - - bool resolve(sinsp_threadinfo* tinfo, bool query_os_for_missing_info) override { - for (const auto& cgroup : tinfo->cgroups()) { - auto container_id = ExtractContainerIDFromCgroup(cgroup.second); - - if (container_id) { - tinfo->m_container_id = *container_id; - return true; - } - } - - return false; - } -}; -} // namespace collector diff --git a/collector/lib/ContainerMetadata.cpp b/collector/lib/ContainerMetadata.cpp index 343e9c6a5a..8e8d2c9131 100644 --- a/collector/lib/ContainerMetadata.cpp +++ b/collector/lib/ContainerMetadata.cpp @@ -2,6 +2,7 @@ #include +#include "Logging.h" #include "system-inspector/EventExtractor.h" namespace collector { @@ -20,19 +21,13 @@ std::string ContainerMetadata::GetNamespace(const std::string& container_id) { } std::string ContainerMetadata::GetContainerLabel(const std::string& container_id, const std::string& label) { - auto containers = inspector_->m_container_manager.get_containers(); - const auto& container = containers->find(container_id); - if (container == containers->end()) { - return ""; - } - - const auto& labels = container->second->m_labels; - const auto& label_it = labels.find(label); - if (label_it == labels.end()) { - return ""; - } - - return label_it->second; + // Container labels are no longer available through the sinsp API. + // The container plugin provides container metadata via filter fields + // (e.g., container.label) but not through a programmatic lookup API. + CLOG_THROTTLED(DEBUG, std::chrono::seconds(300)) + << "Container label lookup by container ID is not supported: " + << "container_id=" << container_id << " label=" << label; + return ""; } } // namespace collector \ No newline at end of file diff --git a/collector/lib/NetworkSignalHandler.cpp b/collector/lib/NetworkSignalHandler.cpp index df457d5ef5..71e1e634d4 100644 --- a/collector/lib/NetworkSignalHandler.cpp +++ b/collector/lib/NetworkSignalHandler.cpp @@ -133,11 +133,11 @@ std::optional NetworkSignalHandler::GetConnection(sinsp_evt* evt) { const Endpoint* local = is_server ? &server : &client; const Endpoint* remote = is_server ? &client : &server; - const std::string* container_id = event_extractor_->get_container_id(evt); + const char* container_id = event_extractor_->get_container_id(evt); if (!container_id) { return std::nullopt; } - return {Connection(*container_id, *local, *remote, l4proto, is_server)}; + return {Connection(container_id, *local, *remote, l4proto, is_server)}; } SignalHandler::Result NetworkSignalHandler::HandleSignal(sinsp_evt* evt) { diff --git a/collector/lib/Process.cpp b/collector/lib/Process.cpp index 632d824a03..7054257f02 100644 --- a/collector/lib/Process.cpp +++ b/collector/lib/Process.cpp @@ -5,6 +5,7 @@ #include #include "CollectorStats.h" +#include "Utility.h" #include "system-inspector/Service.h" namespace collector { @@ -32,7 +33,11 @@ std::string Process::container_id() const { WaitForProcessInfo(); if (system_inspector_threadinfo_) { - return system_inspector_threadinfo_->m_container_id; + for (const auto& [subsys, cgroup_path] : system_inspector_threadinfo_->cgroups()) { + if (auto id = ExtractContainerIDFromCgroup(cgroup_path)) { + return std::string(*id); + } + } } return NOT_AVAILABLE; diff --git a/collector/lib/ProcessSignalFormatter.cpp b/collector/lib/ProcessSignalFormatter.cpp index a588d75bd6..4f443a516b 100644 --- a/collector/lib/ProcessSignalFormatter.cpp +++ b/collector/lib/ProcessSignalFormatter.cpp @@ -2,6 +2,9 @@ #include +#include +#include + #include #include "internalapi/sensor/signal_iservice.pb.h" @@ -59,6 +62,7 @@ std::string extract_proc_args(sinsp_threadinfo* tinfo) { ProcessSignalFormatter::ProcessSignalFormatter( sinsp* inspector, const CollectorConfig& config) : event_names_(EventNames::GetInstance()), + inspector_(inspector), event_extractor_(std::make_unique()), container_metadata_(inspector), config_(config) { @@ -176,8 +180,8 @@ ProcessSignal* ProcessSignalFormatter::CreateProcessSignal(sinsp_evt* event) { signal->set_allocated_time(timestamp); // set container_id - if (const std::string* container_id = event_extractor_->get_container_id(event)) { - signal->set_container_id(*container_id); + if (const char* container_id = event_extractor_->get_container_id(event)) { + signal->set_container_id(container_id); } // set process lineage @@ -232,8 +236,8 @@ ProcessSignal* ProcessSignalFormatter::CreateProcessSignal(sinsp_threadinfo* tin signal->set_pid(tinfo->m_pid); // set user and group id credentials - signal->set_uid(tinfo->m_user.uid()); - signal->set_gid(tinfo->m_group.gid()); + signal->set_uid(tinfo->m_uid); + signal->set_gid(tinfo->m_gid); // set time auto timestamp = Allocate(); @@ -241,7 +245,7 @@ ProcessSignal* ProcessSignalFormatter::CreateProcessSignal(sinsp_threadinfo* tin signal->set_allocated_time(timestamp); // set container_id - signal->set_container_id(tinfo->m_container_id); + signal->set_container_id(GetContainerID(*tinfo, *inspector_->m_thread_manager)); // set process lineage std::vector lineage; @@ -265,11 +269,11 @@ std::string ProcessSignalFormatter::ProcessDetails(sinsp_evt* event) { std::stringstream ss; const std::string* path = event_extractor_->get_exepath(event); const std::string* name = event_extractor_->get_comm(event); - const std::string* container_id = event_extractor_->get_container_id(event); + const char* container_id = event_extractor_->get_container_id(event); const char* args = event_extractor_->get_proc_args(event); const int64_t* pid = event_extractor_->get_pid(event); - ss << "Container: " << (container_id ? *container_id : "null") + ss << "Container: " << (container_id ? container_id : "null") << ", Name: " << (name ? *name : "null") << ", PID: " << (pid ? *pid : -1) << ", Path: " << (path ? *path : "null") @@ -327,7 +331,7 @@ void ProcessSignalFormatter::GetProcessLineage(sinsp_threadinfo* tinfo, return; } } - sinsp_threadinfo::visitor_func_t visitor = [this, &lineage](sinsp_threadinfo* pt) { + sinsp_thread_manager::visitor_func_t visitor = [this, &lineage](sinsp_threadinfo* pt) { if (pt == NULL) { return false; } @@ -341,13 +345,13 @@ void ProcessSignalFormatter::GetProcessLineage(sinsp_threadinfo* tinfo, // // In back-ported eBPF probes, `m_vpid` will not be set for containers // running when collector comes online because /proc/{pid}/status does - // not contain namespace information, so `m_container_id` is checked - // instead. `m_container_id` is not enough on its own to identify + // not contain namespace information, so the container ID is checked + // instead. The container ID is not enough on its own to identify // containerized processes, because it is not guaranteed to be set on // all platforms. // if (pt->m_vpid == 0) { - if (pt->m_container_id.empty()) { + if (GetContainerID(*pt, *inspector_->m_thread_manager).empty()) { return false; } } else if (pt->m_pid == pt->m_vpid) { @@ -361,7 +365,7 @@ void ProcessSignalFormatter::GetProcessLineage(sinsp_threadinfo* tinfo, // Collapse parent child processes that have the same path if (lineage.empty() || (lineage.back().parent_exec_file_path() != pt->m_exepath)) { LineageInfo info; - info.set_parent_uid(pt->m_user.uid()); + info.set_parent_uid(pt->m_uid); info.set_parent_exec_file_path(pt->m_exepath); lineage.push_back(info); } @@ -373,7 +377,7 @@ void ProcessSignalFormatter::GetProcessLineage(sinsp_threadinfo* tinfo, return true; }; - mt->traverse_parent_state(visitor); + inspector_->m_thread_manager->traverse_parent_state(*mt, visitor); CountLineage(lineage); } diff --git a/collector/lib/ProcessSignalFormatter.h b/collector/lib/ProcessSignalFormatter.h index 8c57011c5b..a7bb69ab57 100644 --- a/collector/lib/ProcessSignalFormatter.h +++ b/collector/lib/ProcessSignalFormatter.h @@ -55,6 +55,7 @@ class ProcessSignalFormatter : public ProtoSignalFormatter& lineage); const EventNames& event_names_; + sinsp* inspector_; std::unique_ptr event_extractor_; ContainerMetadata container_metadata_; diff --git a/collector/lib/Utility.cpp b/collector/lib/Utility.cpp index 26832eada8..b3199d0603 100644 --- a/collector/lib/Utility.cpp +++ b/collector/lib/Utility.cpp @@ -18,6 +18,7 @@ extern "C" { #include #include +#include #include @@ -57,9 +58,19 @@ const char* SignalName(int signum) { } } +std::string GetContainerID(sinsp_threadinfo& tinfo, sinsp_thread_manager& thread_manager) { + const auto* accessor = thread_manager.get_field_accessor("container_id"); + if (!accessor) { + return {}; + } + std::string container_id; + tinfo.get_dynamic_field(*accessor, container_id); + return container_id; +} + std::ostream& operator<<(std::ostream& os, const sinsp_threadinfo* t) { if (t) { - os << "Container: \"" << t->m_container_id << "\", Name: " << t->m_comm << ", PID: " << t->m_pid << ", Args: " << t->m_exe; + os << "Name: " << t->m_comm << ", PID: " << t->m_pid << ", Args: " << t->m_exe; } else { os << "NULL\n"; } diff --git a/collector/lib/Utility.h b/collector/lib/Utility.h index 04be8cd480..8544bbd72c 100644 --- a/collector/lib/Utility.h +++ b/collector/lib/Utility.h @@ -14,6 +14,7 @@ // forward declarations class sinsp_threadinfo; +class sinsp_thread_manager; namespace collector { @@ -65,6 +66,10 @@ std::string Str(Args&&... args) { std::ostream& operator<<(std::ostream& os, const sinsp_threadinfo* t); +// Extract container ID from a threadinfo using the dynamic field written by +// the container plugin. Returns an empty string if unavailable. +std::string GetContainerID(sinsp_threadinfo& tinfo, sinsp_thread_manager& thread_manager); + // UUIDStr returns UUID in string format. const char* UUIDStr(); diff --git a/collector/lib/system-inspector/EventExtractor.cpp b/collector/lib/system-inspector/EventExtractor.cpp index a72c87e329..82e1ea94ca 100644 --- a/collector/lib/system-inspector/EventExtractor.cpp +++ b/collector/lib/system-inspector/EventExtractor.cpp @@ -5,6 +5,10 @@ namespace collector::system_inspector { void EventExtractor::Init(sinsp* inspector) { for (auto* wrapper : wrappers_) { std::unique_ptr check = FilterList().new_filter_check_from_fldname(wrapper->event_name, inspector, true); + if (!check) { + CLOG(WARNING) << "Filter check not available for field: " << wrapper->event_name; + continue; + } check->parse_field_name(wrapper->event_name, true, false); wrapper->filter_check.reset(check.release()); } diff --git a/collector/lib/system-inspector/EventExtractor.h b/collector/lib/system-inspector/EventExtractor.h index 94d129befc..1cb5615a36 100644 --- a/collector/lib/system-inspector/EventExtractor.h +++ b/collector/lib/system-inspector/EventExtractor.h @@ -41,9 +41,12 @@ class EventExtractor { #define FIELD_RAW(id, fieldname, type) \ public: \ const type* get_##id(sinsp_evt* event) { \ - uint32_t len; \ - auto buf = filter_check_##id##_->extract_single(event, &len); \ - if (!buf) return nullptr; \ + if (!filter_check_##id##_.filter_check) return nullptr; \ + std::vector vals_##id; \ + if (!filter_check_##id##_->extract(event, vals_##id)) return nullptr; \ + if (vals_##id.empty()) return nullptr; \ + auto len = vals_##id[0].len; \ + auto buf = vals_##id[0].ptr; \ if (len != sizeof(type)) { \ CLOG_THROTTLED(WARNING, std::chrono::seconds(30)) \ << "Failed to extract value for field " << fieldname << ": expected type " << #type << " (size " \ @@ -63,9 +66,12 @@ class EventExtractor { const std::optional get_##id(sinsp_evt* event) { \ static_assert(std::is_trivially_copyable_v, \ "Attempted to create FIELD_RAW_SAFE on non trivial type"); \ - uint32_t len; \ - auto buf = filter_check_##id##_->extract_single(event, &len); \ - if (!buf) return {}; \ + if (!filter_check_##id##_.filter_check) return {}; \ + std::vector vals_##id; \ + if (!filter_check_##id##_->extract(event, vals_##id)) return {}; \ + if (vals_##id.empty()) return {}; \ + auto len = vals_##id[0].len; \ + auto buf = vals_##id[0].ptr; \ if (len != sizeof(type)) { \ CLOG_THROTTLED(WARNING, std::chrono::seconds(30)) \ << "Failed to extract value for field " << fieldname << ": expected type " << #type << " (size " \ @@ -80,39 +86,40 @@ class EventExtractor { private: \ DECLARE_FILTER_CHECK(id, fieldname) -#define FIELD_CSTR(id, fieldname) \ - public: \ - const char* get_##id(sinsp_evt* event) { \ - uint32_t len; \ - auto buf = filter_check_##id##_->extract_single(event, &len); \ - if (!buf) return nullptr; \ - return reinterpret_cast(buf); \ - } \ - \ - private: \ +#define FIELD_CSTR(id, fieldname) \ + public: \ + const char* get_##id(sinsp_evt* event) { \ + if (!filter_check_##id##_.filter_check) return nullptr; \ + std::vector vals_##id; \ + if (!filter_check_##id##_->extract(event, vals_##id)) return nullptr; \ + if (vals_##id.empty()) return nullptr; \ + return reinterpret_cast(vals_##id[0].ptr); \ + } \ + \ + private: \ DECLARE_FILTER_CHECK(id, fieldname) #define EVT_ARG(name) FIELD_CSTR(evt_arg_##name, "evt.arg." #name) #define EVT_ARG_RAW(name, type) FIELD_RAW(evt_arg_##name, "evt.rawarg." #name, type) -#define TINFO_FIELD_RAW(id, fieldname, type) \ - public: \ - const type* get_##id(sinsp_evt* event) { \ - if (!event) return nullptr; \ - sinsp_threadinfo* tinfo = event->get_thread_info(true); \ - if (!tinfo) return nullptr; \ - return &tinfo->fieldname; \ +#define TINFO_FIELD_RAW(id, fieldname, type) \ + public: \ + const type* get_##id(sinsp_evt* event) { \ + if (!event) return nullptr; \ + sinsp_threadinfo* tinfo = event->get_thread_info(); \ + if (!tinfo) return nullptr; \ + return &tinfo->fieldname; \ } -#define TINFO_FIELD_RAW_GETTER(id, getter, type) \ - public: \ - type internal_##id; \ - const type* get_##id(sinsp_evt* event) { \ - if (!event) return nullptr; \ - sinsp_threadinfo* tinfo = event->get_thread_info(true); \ - if (!tinfo) return nullptr; \ - internal_##id = tinfo->getter(); \ - return &internal_##id; \ +#define TINFO_FIELD_RAW_GETTER(id, getter, type) \ + public: \ + type internal_##id; \ + const type* get_##id(sinsp_evt* event) { \ + if (!event) return nullptr; \ + sinsp_threadinfo* tinfo = event->get_thread_info(); \ + if (!tinfo) return nullptr; \ + internal_##id = tinfo->getter(); \ + return &internal_##id; \ } #define TINFO_FIELD(id) TINFO_FIELD_RAW(id, m_##id, decltype(std::declval().m_##id)) @@ -129,16 +136,16 @@ class EventExtractor { // // ADD ANY NEW FIELDS BELOW THIS LINE - // Container related fields - TINFO_FIELD(container_id); + // Container related fields — provided by the container plugin via filter fields. + FIELD_CSTR(container_id, "container.id"); // Process related fields TINFO_FIELD(comm); TINFO_FIELD(exe); TINFO_FIELD(exepath); TINFO_FIELD(pid); - TINFO_FIELD_RAW_GETTER(uid, m_user.uid, uint32_t); - TINFO_FIELD_RAW_GETTER(gid, m_group.gid, uint32_t); + TINFO_FIELD_RAW(uid, m_uid, uint32_t); + TINFO_FIELD_RAW(gid, m_gid, uint32_t); FIELD_CSTR(proc_args, "proc.args"); // General event information diff --git a/collector/lib/system-inspector/Service.cpp b/collector/lib/system-inspector/Service.cpp index 95c0394416..6c38470408 100644 --- a/collector/lib/system-inspector/Service.cpp +++ b/collector/lib/system-inspector/Service.cpp @@ -6,8 +6,9 @@ #include -#include "libsinsp/container_engine/sinsp_container_type.h" +#include "libsinsp/filter.h" #include "libsinsp/parsers.h" +#include "libsinsp/plugin.h" #include "libsinsp/sinsp.h" #include @@ -15,7 +16,6 @@ #include "CollectionMethod.h" #include "CollectorException.h" #include "CollectorStats.h" -#include "ContainerEngine.h" #include "ContainerMetadata.h" #include "EventExtractor.h" #include "EventNames.h" @@ -35,12 +35,7 @@ namespace collector::system_inspector { Service::~Service() = default; Service::Service(const CollectorConfig& config) - : inspector_(std::make_unique(true)), - container_metadata_inspector_(std::make_unique(inspector_.get())), - default_formatter_(std::make_unique( - inspector_.get(), - DEFAULT_OUTPUT_STR, - EventExtractor::FilterList())) { + : inspector_(std::make_unique(true)) { // Setup the inspector. // peeking into arguments has a big overhead, so we prevent it from happening inspector_->set_snaplen(0); @@ -50,7 +45,7 @@ Service::Service(const CollectorConfig& config) inspector_->disable_log_timestamps(); inspector_->set_log_callback(logging::InspectorLogCallback); - inspector_->set_import_users(config.ImportUsers(), false); + inspector_->set_import_users(config.ImportUsers()); inspector_->set_thread_timeout_s(30); inspector_->set_auto_threads_purging_interval_s(60); inspector_->m_thread_manager->set_max_thread_table_size(config.GetSinspThreadCacheSize()); @@ -62,31 +57,43 @@ Service::Service(const CollectorConfig& config) inspector_->get_parser()->set_track_connection_status(true); } - if (config.EnableRuntimeConfig()) { - uint64_t mask = 1 << CT_CRI | - 1 << CT_CRIO | - 1 << CT_CONTAINERD; - - if (config.UseDockerCe()) { - mask |= 1 << CT_DOCKER; + // Load the container plugin for container ID attribution and metadata. + // This MUST happen before EventExtractor::Init() (via ContainerMetadata) + // because the plugin provides the "container.id" and "k8s.ns.name" fields. + const char* plugin_path = "/usr/local/lib64/libcontainer.so"; + try { + auto plugin = inspector_->register_plugin(plugin_path); + std::string err; + if (!plugin->init("", err)) { + CLOG(ERROR) << "Failed to init container plugin: " << err; } - - if (config.UsePodmanCe()) { - mask |= 1 << CT_PODMAN; + if (plugin->caps() & CAP_EXTRACTION) { + EventExtractor::FilterList().add_filter_check(sinsp_plugin::new_filtercheck(plugin)); } - - inspector_->set_container_engine_mask(mask); - - // k8s naming conventions specify that max length be 253 characters - // (the extra 2 are just for a nice 0xFF). - inspector_->set_container_labels_max_len(255); - } else { - auto engine = std::make_shared(inspector_->m_container_manager); - auto* container_engines = inspector_->m_container_manager.get_container_engines(); - container_engines->push_back(engine); + CLOG(INFO) << "Loaded container plugin from " << plugin_path; + } catch (const sinsp_exception& e) { + CLOG(WARNING) << "Could not load container plugin from " << plugin_path + << ": " << e.what(); } - inspector_->set_filter("container.id != 'host'"); + // Initialize ContainerMetadata after the plugin is loaded, so that + // EventExtractor::Init() can find plugin-provided fields like container.id. + container_metadata_inspector_ = std::make_shared(inspector_.get()); + default_formatter_ = std::make_unique( + inspector_.get(), DEFAULT_OUTPUT_STR, EventExtractor::FilterList()); + + // Compile the container filter using our FilterList (which includes + // plugin filterchecks). sinsp::set_filter(string) uses a hardcoded + // filter check list that doesn't include plugin fields. + try { + auto factory = std::make_shared( + inspector_.get(), EventExtractor::FilterList()); + sinsp_filter_compiler compiler(factory, "container.id != 'host'"); + inspector_->set_filter(compiler.compile(), "container.id != 'host'"); + } catch (const sinsp_exception& e) { + CLOG(WARNING) << "Could not set container filter: " << e.what() + << ". Container filtering will not be active."; + } // The self-check handlers should only operate during start up, // so they are added to the handler list first, so they have access @@ -296,7 +303,7 @@ bool Service::SendExistingProcesses(SignalHandler* handler) { } return threads->loop([&](sinsp_threadinfo& tinfo) { - if (!tinfo.m_container_id.empty() && tinfo.is_main_thread()) { + if (!GetContainerID(tinfo, *inspector_->m_thread_manager).empty() && tinfo.is_main_thread()) { auto result = handler->HandleExistingProcess(&tinfo); if (result == SignalHandler::ERROR || result == SignalHandler::NEEDS_REFRESH) { CLOG(WARNING) << "Failed to write existing process signal: " << &tinfo; @@ -398,7 +405,7 @@ void Service::ServePendingProcessRequests() { auto callback = request.second.lock(); if (callback) { - (*callback)(inspector_->get_thread_ref(pid, true)); + (*callback)(inspector_->m_thread_manager->get_thread(pid)); } pending_process_requests_.pop_front(); diff --git a/collector/test/ProcessSignalFormatterTest.cpp b/collector/test/ProcessSignalFormatterTest.cpp index 68e1fcb9c7..5931c46275 100644 --- a/collector/test/ProcessSignalFormatterTest.cpp +++ b/collector/test/ProcessSignalFormatterTest.cpp @@ -54,18 +54,18 @@ TEST(ProcessSignalFormatterTest, ProcessWithoutParentTest) { ProcessSignalFormatter processSignalFormatter(inspector.get(), config); - auto tinfo = inspector->build_threadinfo(); + auto tinfo = inspector->get_threadinfo_factory().create(); tinfo->m_pid = 0; tinfo->m_tid = 0; tinfo->m_ptid = -1; tinfo->m_vpid = 2; - tinfo->m_user.set_uid(7); + tinfo->m_uid = 7; tinfo->m_exepath = "qwerty"; - inspector->add_thread(std::move(tinfo)); + inspector->m_thread_manager->add_thread(std::move(tinfo), false); std::vector lineage; - processSignalFormatter.GetProcessLineage(inspector->get_thread_ref(0).get(), lineage); + processSignalFormatter.GetProcessLineage(inspector->m_thread_manager->find_thread(0, true).get(), lineage); int count = collector_stats.GetCounter(CollectorStats::process_lineage_counts); int total = collector_stats.GetCounter(CollectorStats::process_lineage_total); @@ -89,25 +89,25 @@ TEST(ProcessSignalFormatterTest, ProcessWithParentTest) { ProcessSignalFormatter processSignalFormatter(inspector.get(), config); - auto tinfo = inspector->build_threadinfo(); + auto tinfo = inspector->get_threadinfo_factory().create(); tinfo->m_pid = 3; tinfo->m_tid = 3; tinfo->m_ptid = -1; tinfo->m_vpid = 1; - tinfo->m_user.set_uid(42); + tinfo->m_uid = 42; tinfo->m_exepath = "asdf"; - auto tinfo2 = inspector->build_threadinfo(); + auto tinfo2 = inspector->get_threadinfo_factory().create(); tinfo2->m_pid = 1; tinfo2->m_tid = 1; tinfo2->m_ptid = 3; tinfo2->m_vpid = 2; - tinfo2->m_user.set_uid(7); + tinfo2->m_uid = 7; tinfo2->m_exepath = "qwerty"; - inspector->add_thread(std::move(tinfo)); - inspector->add_thread(std::move(tinfo2)); + inspector->m_thread_manager->add_thread(std::move(tinfo), false); + inspector->m_thread_manager->add_thread(std::move(tinfo2), false); std::vector lineage; - processSignalFormatter.GetProcessLineage(inspector->get_thread_ref(1).get(), lineage); + processSignalFormatter.GetProcessLineage(inspector->m_thread_manager->find_thread(1, true).get(), lineage); int count = collector_stats.GetCounter(CollectorStats::process_lineage_counts); int total = collector_stats.GetCounter(CollectorStats::process_lineage_total); @@ -134,23 +134,23 @@ TEST(ProcessSignalFormatterTest, ProcessWithParentWithPid0Test) { ProcessSignalFormatter processSignalFormatter(inspector.get(), config); - auto tinfo = inspector->build_threadinfo(); + auto tinfo = inspector->get_threadinfo_factory().create(); tinfo->m_pid = 0; tinfo->m_tid = 0; tinfo->m_ptid = -1; tinfo->m_vpid = 1; tinfo->m_exepath = "asdf"; - auto tinfo2 = inspector->build_threadinfo(); + auto tinfo2 = inspector->get_threadinfo_factory().create(); tinfo2->m_pid = 1; tinfo2->m_tid = 1; tinfo2->m_ptid = 0; tinfo2->m_vpid = 2; tinfo2->m_exepath = "qwerty"; - inspector->add_thread(std::move(tinfo)); - inspector->add_thread(std::move(tinfo2)); + inspector->m_thread_manager->add_thread(std::move(tinfo), false); + inspector->m_thread_manager->add_thread(std::move(tinfo2), false); std::vector lineage; - processSignalFormatter.GetProcessLineage(inspector->get_thread_ref(1).get(), lineage); + processSignalFormatter.GetProcessLineage(inspector->m_thread_manager->find_thread(1, true).get(), lineage); int count = collector_stats.GetCounter(CollectorStats::process_lineage_counts); int total = collector_stats.GetCounter(CollectorStats::process_lineage_total); @@ -174,25 +174,25 @@ TEST(ProcessSignalFormatterTest, ProcessWithParentWithSameNameTest) { ProcessSignalFormatter processSignalFormatter(inspector.get(), config); - auto tinfo = inspector->build_threadinfo(); + auto tinfo = inspector->get_threadinfo_factory().create(); tinfo->m_pid = 3; tinfo->m_tid = 3; tinfo->m_ptid = -1; tinfo->m_vpid = 1; - tinfo->m_user.set_uid(43); + tinfo->m_uid = 43; tinfo->m_exepath = "asdf"; - auto tinfo2 = inspector->build_threadinfo(); + auto tinfo2 = inspector->get_threadinfo_factory().create(); tinfo2->m_pid = 1; tinfo2->m_tid = 1; tinfo2->m_ptid = 3; tinfo2->m_vpid = 2; - tinfo2->m_user.set_uid(42); + tinfo2->m_uid = 42; tinfo2->m_exepath = "asdf"; - inspector->add_thread(std::move(tinfo)); - inspector->add_thread(std::move(tinfo2)); + inspector->m_thread_manager->add_thread(std::move(tinfo), false); + inspector->m_thread_manager->add_thread(std::move(tinfo2), false); std::vector lineage; - processSignalFormatter.GetProcessLineage(inspector->get_thread_ref(1).get(), lineage); + processSignalFormatter.GetProcessLineage(inspector->m_thread_manager->find_thread(1, true).get(), lineage); int count = collector_stats.GetCounter(CollectorStats::process_lineage_counts); int total = collector_stats.GetCounter(CollectorStats::process_lineage_total); @@ -219,36 +219,36 @@ TEST(ProcessSignalFormatterTest, ProcessWithTwoParentsTest) { ProcessSignalFormatter processSignalFormatter(inspector.get(), config); - auto tinfo = inspector->build_threadinfo(); + auto tinfo = inspector->get_threadinfo_factory().create(); tinfo->m_pid = 3; tinfo->m_tid = 3; tinfo->m_ptid = -1; tinfo->m_vpid = 1; - tinfo->m_user.set_uid(42); + tinfo->m_uid = 42; tinfo->m_exepath = "asdf"; - auto tinfo2 = inspector->build_threadinfo(); + auto tinfo2 = inspector->get_threadinfo_factory().create(); tinfo2->m_pid = 1; tinfo2->m_tid = 1; tinfo2->m_ptid = 3; tinfo2->m_vpid = 2; - tinfo2->m_user.set_uid(7); + tinfo2->m_uid = 7; tinfo2->m_exepath = "qwerty"; - auto tinfo3 = inspector->build_threadinfo(); + auto tinfo3 = inspector->get_threadinfo_factory().create(); tinfo3->m_pid = 4; tinfo3->m_tid = 4; tinfo3->m_ptid = 1; tinfo3->m_vpid = 9; - tinfo3->m_user.set_uid(8); + tinfo3->m_uid = 8; tinfo3->m_exepath = "uiop"; - inspector->add_thread(std::move(tinfo)); - inspector->add_thread(std::move(tinfo2)); - inspector->add_thread(std::move(tinfo3)); + inspector->m_thread_manager->add_thread(std::move(tinfo), false); + inspector->m_thread_manager->add_thread(std::move(tinfo2), false); + inspector->m_thread_manager->add_thread(std::move(tinfo3), false); std::vector lineage; - processSignalFormatter.GetProcessLineage(inspector->get_thread_ref(4).get(), lineage); + processSignalFormatter.GetProcessLineage(inspector->m_thread_manager->find_thread(4, true).get(), lineage); int count = collector_stats.GetCounter(CollectorStats::process_lineage_counts); int total = collector_stats.GetCounter(CollectorStats::process_lineage_total); @@ -278,36 +278,36 @@ TEST(ProcessSignalFormatterTest, ProcessWithTwoParentsWithTheSameNameTest) { ProcessSignalFormatter processSignalFormatter(inspector.get(), config); - auto tinfo = inspector->build_threadinfo(); + auto tinfo = inspector->get_threadinfo_factory().create(); tinfo->m_pid = 3; tinfo->m_tid = 3; tinfo->m_ptid = -1; tinfo->m_vpid = 1; - tinfo->m_user.set_uid(42); + tinfo->m_uid = 42; tinfo->m_exepath = "asdf"; - auto tinfo2 = inspector->build_threadinfo(); + auto tinfo2 = inspector->get_threadinfo_factory().create(); tinfo2->m_pid = 1; tinfo2->m_tid = 1; tinfo2->m_ptid = 3; tinfo2->m_vpid = 2; - tinfo2->m_user.set_uid(7); + tinfo2->m_uid = 7; tinfo2->m_exepath = "asdf"; - auto tinfo3 = inspector->build_threadinfo(); + auto tinfo3 = inspector->get_threadinfo_factory().create(); tinfo3->m_pid = 4; tinfo3->m_tid = 4; tinfo3->m_ptid = 1; tinfo3->m_vpid = 9; - tinfo3->m_user.set_uid(8); + tinfo3->m_uid = 8; tinfo3->m_exepath = "asdf"; - inspector->add_thread(std::move(tinfo)); - inspector->add_thread(std::move(tinfo2)); - inspector->add_thread(std::move(tinfo3)); + inspector->m_thread_manager->add_thread(std::move(tinfo), false); + inspector->m_thread_manager->add_thread(std::move(tinfo2), false); + inspector->m_thread_manager->add_thread(std::move(tinfo3), false); std::vector lineage; - processSignalFormatter.GetProcessLineage(inspector->get_thread_ref(4).get(), lineage); + processSignalFormatter.GetProcessLineage(inspector->m_thread_manager->find_thread(4, true).get(), lineage); int count = collector_stats.GetCounter(CollectorStats::process_lineage_counts); int total = collector_stats.GetCounter(CollectorStats::process_lineage_total); @@ -334,45 +334,45 @@ TEST(ProcessSignalFormatterTest, ProcessCollapseParentChildWithSameNameTest) { ProcessSignalFormatter processSignalFormatter(inspector.get(), config); - auto tinfo = inspector->build_threadinfo(); + auto tinfo = inspector->get_threadinfo_factory().create(); tinfo->m_pid = 3; tinfo->m_tid = 3; tinfo->m_ptid = -1; tinfo->m_vpid = 1; - tinfo->m_user.set_uid(42); + tinfo->m_uid = 42; tinfo->m_exepath = "asdf"; - auto tinfo2 = inspector->build_threadinfo(); + auto tinfo2 = inspector->get_threadinfo_factory().create(); tinfo2->m_pid = 1; tinfo2->m_tid = 1; tinfo2->m_ptid = 3; tinfo2->m_vpid = 2; - tinfo2->m_user.set_uid(7); + tinfo2->m_uid = 7; tinfo2->m_exepath = "asdf"; - auto tinfo3 = inspector->build_threadinfo(); + auto tinfo3 = inspector->get_threadinfo_factory().create(); tinfo3->m_pid = 4; tinfo3->m_tid = 4; tinfo3->m_ptid = 1; tinfo3->m_vpid = 9; - tinfo3->m_user.set_uid(8); + tinfo3->m_uid = 8; tinfo3->m_exepath = "asdf"; - auto tinfo4 = inspector->build_threadinfo(); + auto tinfo4 = inspector->get_threadinfo_factory().create(); tinfo4->m_pid = 5; tinfo4->m_tid = 5; tinfo4->m_ptid = 4; tinfo4->m_vpid = 10; - tinfo4->m_user.set_uid(9); + tinfo4->m_uid = 9; tinfo4->m_exepath = "qwerty"; - inspector->add_thread(std::move(tinfo)); - inspector->add_thread(std::move(tinfo2)); - inspector->add_thread(std::move(tinfo3)); - inspector->add_thread(std::move(tinfo4)); + inspector->m_thread_manager->add_thread(std::move(tinfo), false); + inspector->m_thread_manager->add_thread(std::move(tinfo2), false); + inspector->m_thread_manager->add_thread(std::move(tinfo3), false); + inspector->m_thread_manager->add_thread(std::move(tinfo4), false); std::vector lineage; - processSignalFormatter.GetProcessLineage(inspector->get_thread_ref(5).get(), lineage); + processSignalFormatter.GetProcessLineage(inspector->m_thread_manager->find_thread(5, true).get(), lineage); int count = collector_stats.GetCounter(CollectorStats::process_lineage_counts); int total = collector_stats.GetCounter(CollectorStats::process_lineage_total); @@ -399,45 +399,45 @@ TEST(ProcessSignalFormatterTest, ProcessCollapseParentChildWithSameName2Test) { ProcessSignalFormatter processSignalFormatter(inspector.get(), config); - auto tinfo = inspector->build_threadinfo(); + auto tinfo = inspector->get_threadinfo_factory().create(); tinfo->m_pid = 3; tinfo->m_tid = 3; tinfo->m_ptid = -1; tinfo->m_vpid = 1; - tinfo->m_user.set_uid(42); + tinfo->m_uid = 42; tinfo->m_exepath = "qwerty"; - auto tinfo2 = inspector->build_threadinfo(); + auto tinfo2 = inspector->get_threadinfo_factory().create(); tinfo2->m_pid = 1; tinfo2->m_tid = 1; tinfo2->m_ptid = 3; tinfo2->m_vpid = 2; - tinfo2->m_user.set_uid(7); + tinfo2->m_uid = 7; tinfo2->m_exepath = "asdf"; - auto tinfo3 = inspector->build_threadinfo(); + auto tinfo3 = inspector->get_threadinfo_factory().create(); tinfo3->m_pid = 4; tinfo3->m_tid = 4; tinfo3->m_ptid = 1; tinfo3->m_vpid = 9; - tinfo3->m_user.set_uid(8); + tinfo3->m_uid = 8; tinfo3->m_exepath = "asdf"; - auto tinfo4 = inspector->build_threadinfo(); + auto tinfo4 = inspector->get_threadinfo_factory().create(); tinfo4->m_pid = 5; tinfo4->m_tid = 5; tinfo4->m_ptid = 4; tinfo4->m_vpid = 10; - tinfo4->m_user.set_uid(9); + tinfo4->m_uid = 9; tinfo4->m_exepath = "asdf"; - inspector->add_thread(std::move(tinfo)); - inspector->add_thread(std::move(tinfo2)); - inspector->add_thread(std::move(tinfo3)); - inspector->add_thread(std::move(tinfo4)); + inspector->m_thread_manager->add_thread(std::move(tinfo), false); + inspector->m_thread_manager->add_thread(std::move(tinfo2), false); + inspector->m_thread_manager->add_thread(std::move(tinfo3), false); + inspector->m_thread_manager->add_thread(std::move(tinfo4), false); std::vector lineage; - processSignalFormatter.GetProcessLineage(inspector->get_thread_ref(5).get(), lineage); + processSignalFormatter.GetProcessLineage(inspector->m_thread_manager->find_thread(5, true).get(), lineage); int count = collector_stats.GetCounter(CollectorStats::process_lineage_counts); int total = collector_stats.GetCounter(CollectorStats::process_lineage_total); @@ -467,45 +467,45 @@ TEST(ProcessSignalFormatterTest, ProcessWithUnrelatedProcessTest) { ProcessSignalFormatter processSignalFormatter(inspector.get(), config); - auto tinfo = inspector->build_threadinfo(); + auto tinfo = inspector->get_threadinfo_factory().create(); tinfo->m_pid = 3; tinfo->m_tid = 3; tinfo->m_ptid = -1; tinfo->m_vpid = 1; - tinfo->m_user.set_uid(42); + tinfo->m_uid = 42; tinfo->m_exepath = "qwerty"; - auto tinfo2 = inspector->build_threadinfo(); + auto tinfo2 = inspector->get_threadinfo_factory().create(); tinfo2->m_pid = 1; tinfo2->m_tid = 1; tinfo2->m_ptid = 3; tinfo2->m_vpid = 2; - tinfo2->m_user.set_uid(7); + tinfo2->m_uid = 7; tinfo2->m_exepath = "asdf"; - auto tinfo3 = inspector->build_threadinfo(); + auto tinfo3 = inspector->get_threadinfo_factory().create(); tinfo3->m_pid = 4; tinfo3->m_tid = 4; tinfo3->m_ptid = 1; tinfo3->m_vpid = 9; - tinfo3->m_user.set_uid(8); + tinfo3->m_uid = 8; tinfo3->m_exepath = "uiop"; - auto tinfo4 = inspector->build_threadinfo(); + auto tinfo4 = inspector->get_threadinfo_factory().create(); tinfo4->m_pid = 5; tinfo4->m_tid = 5; tinfo4->m_ptid = 555; tinfo4->m_vpid = 10; - tinfo4->m_user.set_uid(9); + tinfo4->m_uid = 9; tinfo4->m_exepath = "jkl;"; - inspector->add_thread(std::move(tinfo)); - inspector->add_thread(std::move(tinfo2)); - inspector->add_thread(std::move(tinfo3)); - inspector->add_thread(std::move(tinfo4)); + inspector->m_thread_manager->add_thread(std::move(tinfo), false); + inspector->m_thread_manager->add_thread(std::move(tinfo2), false); + inspector->m_thread_manager->add_thread(std::move(tinfo3), false); + inspector->m_thread_manager->add_thread(std::move(tinfo4), false); std::vector lineage; - processSignalFormatter.GetProcessLineage(inspector->get_thread_ref(4).get(), lineage); + processSignalFormatter.GetProcessLineage(inspector->m_thread_manager->find_thread(4, true).get(), lineage); int count = collector_stats.GetCounter(CollectorStats::process_lineage_counts); int total = collector_stats.GetCounter(CollectorStats::process_lineage_total); @@ -535,31 +535,31 @@ TEST(ProcessSignalFormatterTest, CountTwoCounterCallsTest) { ProcessSignalFormatter processSignalFormatter(inspector.get(), config); - auto tinfo = inspector->build_threadinfo(); + auto tinfo = inspector->get_threadinfo_factory().create(); tinfo->m_pid = 1; tinfo->m_tid = 1; tinfo->m_ptid = 555; tinfo->m_vpid = 10; - tinfo->m_user.set_uid(9); + tinfo->m_uid = 9; tinfo->m_exepath = "jkl;"; - inspector->add_thread(std::move(tinfo)); + inspector->m_thread_manager->add_thread(std::move(tinfo), false); std::vector lineage; - processSignalFormatter.GetProcessLineage(inspector->get_thread_ref(1).get(), lineage); + processSignalFormatter.GetProcessLineage(inspector->m_thread_manager->find_thread(1, true).get(), lineage); - auto tinfo2 = inspector->build_threadinfo(); + auto tinfo2 = inspector->get_threadinfo_factory().create(); tinfo2->m_pid = 2; tinfo2->m_tid = 2; tinfo2->m_ptid = 555; tinfo2->m_vpid = 10; - tinfo2->m_user.set_uid(9); + tinfo2->m_uid = 9; tinfo2->m_exepath = "jkl;"; - inspector->add_thread(std::move(tinfo2)); + inspector->m_thread_manager->add_thread(std::move(tinfo2), false); std::vector lineage2; - processSignalFormatter.GetProcessLineage(inspector->get_thread_ref(2).get(), lineage2); + processSignalFormatter.GetProcessLineage(inspector->m_thread_manager->find_thread(2, true).get(), lineage2); int count = collector_stats.GetCounter(CollectorStats::process_lineage_counts); int total = collector_stats.GetCounter(CollectorStats::process_lineage_total); @@ -577,45 +577,46 @@ TEST(ProcessSignalFormatterTest, CountTwoCounterCallsTest) { } TEST(ProcessSignalFormatterTest, Rox3377ProcessLineageWithNoVPidTest) { + // This test verifies lineage traversal stops at the container boundary. + // Originally tested vpid=0 + container_id fallback (ROX-3377), but + // container_id is now a dynamic field from the container plugin. + // Instead, test boundary detection via pid==vpid (namespace init process). std::unique_ptr inspector(new sinsp()); CollectorStats& collector_stats = CollectorStats::GetOrCreate(); CollectorConfig config; ProcessSignalFormatter processSignalFormatter(inspector.get(), config); - auto tinfo = inspector->build_threadinfo(); + auto tinfo = inspector->get_threadinfo_factory().create(); tinfo->m_pid = 3; tinfo->m_tid = 3; tinfo->m_ptid = -1; - tinfo->m_vpid = 0; - tinfo->m_user.set_uid(42); - tinfo->m_container_id = ""; + tinfo->m_vpid = 3; + tinfo->m_uid = 42; tinfo->m_exepath = "qwerty"; - auto tinfo2 = inspector->build_threadinfo(); + auto tinfo2 = inspector->get_threadinfo_factory().create(); tinfo2->m_pid = 1; tinfo2->m_tid = 1; tinfo2->m_ptid = 3; - tinfo2->m_vpid = 0; - tinfo2->m_user.set_uid(7); - tinfo2->m_container_id = "id"; + tinfo2->m_vpid = 2; + tinfo2->m_uid = 7; tinfo2->m_exepath = "asdf"; - auto tinfo3 = inspector->build_threadinfo(); + auto tinfo3 = inspector->get_threadinfo_factory().create(); tinfo3->m_pid = 4; tinfo3->m_tid = 4; tinfo3->m_ptid = 1; - tinfo3->m_vpid = 0; - tinfo3->m_user.set_uid(8); - tinfo3->m_container_id = "id"; + tinfo3->m_vpid = 9; + tinfo3->m_uid = 8; tinfo3->m_exepath = "uiop"; - inspector->add_thread(std::move(tinfo)); - inspector->add_thread(std::move(tinfo2)); - inspector->add_thread(std::move(tinfo3)); + inspector->m_thread_manager->add_thread(std::move(tinfo), false); + inspector->m_thread_manager->add_thread(std::move(tinfo2), false); + inspector->m_thread_manager->add_thread(std::move(tinfo3), false); std::vector lineage; - processSignalFormatter.GetProcessLineage(inspector->get_thread_ref(4).get(), lineage); + processSignalFormatter.GetProcessLineage(inspector->m_thread_manager->find_thread(4, true).get(), lineage); int count = collector_stats.GetCounter(CollectorStats::process_lineage_counts); int total = collector_stats.GetCounter(CollectorStats::process_lineage_total); @@ -641,13 +642,12 @@ TEST(ProcessSignalFormatterTest, ProcessArguments) { ProcessSignalFormatter processSignalFormatter(inspector.get(), config); - auto tinfo = inspector->build_threadinfo(); + auto tinfo = inspector->get_threadinfo_factory().create(); tinfo->m_pid = 3; tinfo->m_tid = 3; tinfo->m_ptid = -1; tinfo->m_vpid = 0; - tinfo->m_user.set_uid(42); - tinfo->m_container_id = ""; + tinfo->m_uid = 42; tinfo->m_exepath = "qwerty"; std::vector args = {std::string("args")}; @@ -671,13 +671,12 @@ TEST(ProcessSignalFormatterTest, NoProcessArguments) { config.SetDisableProcessArguments(true); ProcessSignalFormatter processSignalFormatter(inspector.get(), config); - auto tinfo = inspector->build_threadinfo(); + auto tinfo = inspector->get_threadinfo_factory().create(); tinfo->m_pid = 3; tinfo->m_tid = 3; tinfo->m_ptid = -1; tinfo->m_vpid = 0; - tinfo->m_user.set_uid(42); - tinfo->m_container_id = ""; + tinfo->m_uid = 42; tinfo->m_exepath = "qwerty"; std::vector args = {std::string("args")}; diff --git a/collector/test/SystemInspectorServiceTest.cpp b/collector/test/SystemInspectorServiceTest.cpp index a6ed01e2e1..a02ccab23c 100644 --- a/collector/test/SystemInspectorServiceTest.cpp +++ b/collector/test/SystemInspectorServiceTest.cpp @@ -7,32 +7,33 @@ namespace collector::system_inspector { TEST(SystemInspectorServiceTest, FilterEvent) { std::unique_ptr inspector(new sinsp()); + const auto& factory = inspector->get_threadinfo_factory(); - sinsp_threadinfo regular_process(inspector.get()); - regular_process.m_exepath = "/bin/busybox"; - regular_process.m_comm = "sleep"; + auto regular_process = factory.create(); + regular_process->m_exepath = "/bin/busybox"; + regular_process->m_comm = "sleep"; - sinsp_threadinfo runc_process(inspector.get()); - runc_process.m_exepath = "runc"; - runc_process.m_comm = "6"; + auto runc_process = factory.create(); + runc_process->m_exepath = "runc"; + runc_process->m_comm = "6"; - sinsp_threadinfo proc_self_process(inspector.get()); - proc_self_process.m_exepath = "/proc/self/exe"; - proc_self_process.m_comm = "6"; + auto proc_self_process = factory.create(); + proc_self_process->m_exepath = "/proc/self/exe"; + proc_self_process->m_comm = "6"; - sinsp_threadinfo memfd_process(inspector.get()); - memfd_process.m_exepath = "memfd:runc_cloned:/proc/self/exe"; - memfd_process.m_comm = "6"; + auto memfd_process = factory.create(); + memfd_process->m_exepath = "memfd:runc_cloned:/proc/self/exe"; + memfd_process->m_comm = "6"; struct test_t { const sinsp_threadinfo* tinfo; bool expected; }; std::vector tests{ - {®ular_process, true}, - {&runc_process, false}, - {&proc_self_process, false}, - {&memfd_process, false}, + {regular_process.get(), true}, + {runc_process.get(), false}, + {proc_self_process.get(), false}, + {memfd_process.get(), false}, }; for (const auto& t : tests) { diff --git a/falcosecurity-libs b/falcosecurity-libs index af2b6161c6..d0fb1702cf 160000 --- a/falcosecurity-libs +++ b/falcosecurity-libs @@ -1 +1 @@ -Subproject commit af2b6161c6060ff47b843d9ff129b9de2ed03a35 +Subproject commit d0fb1702cfbaf7a04cf87b65e3df99f1cc38a2ef From ffe31c01e0b76514f0a6f5a49174fa38d01d7632 Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Thu, 26 Feb 2026 19:30:02 +0000 Subject: [PATCH 03/20] Updates claude skill based on experience of a live update --- .claude/commands/update-falco-libs.md | 235 ++++++++++++++++++++++---- 1 file changed, 202 insertions(+), 33 deletions(-) diff --git a/.claude/commands/update-falco-libs.md b/.claude/commands/update-falco-libs.md index ac9924ea0d..98d673d067 100644 --- a/.claude/commands/update-falco-libs.md +++ b/.claude/commands/update-falco-libs.md @@ -22,6 +22,7 @@ Run the following in the `falcosecurity-libs/` submodule: 4. `git tag -l '0.*' | sort -V | tail -10` — find latest upstream releases 5. `git branch -a | grep stackrox` — find existing StackRox branches 6. Count upstream commits: `git log --oneline .. | wc -l` +7. Find StackRox-only patches: `git log --oneline HEAD --not --remotes=falco` Report: current version, target version, number of StackRox patches, number of upstream commits. @@ -39,6 +40,47 @@ Categorize each patch as: - **Still needed** — must be carried forward - **Conflict risk** — touches files heavily modified upstream +### Current StackRox Patches (as of 0.23.1-stackrox-rc1) + +20 patches in these categories: + +**BPF verifier fixes** (keep — not upstreamed): +- `2291f61ec` — clang > 19 verifier fixes (MAX_IOVCNT, volatile len_to_read, pragma unroll) +- `8672099d6` — RHEL SAP verifier fix (const struct cred *) +- `df93a9e42` — COS verifier fix (RCU pointer chain reads) +- `d1a708bde` — explicit return in auxmap submit + +**ppc64le platform support** (keep): +- `255126d47` — ppc64le vmlinux.h (large, BTF-generated) +- `a9cafe949` — ppc64le syscall compat header +- `452679e2b` — IOC_PAGE_SHIFT fix +- `dd5e86d40` / `bb733f64a` — thread_info guards (iterative, consider squashing) + +**Performance optimizations** (keep): +- `a982809e0` — cgroup subsys filtering (`INTERESTING_SUBSYS` compile flag) +- `8dd26e3dc` — socket-only FD scan (`SCAP_SOCKET_ONLY_FD` compile flag) + +**API/build adaptations** (keep): +- `32f36f770` — expose `extract_single` in filterchecks (public API) +- `b0ec4099f` — libelf suffix guard + initial filtercheck extract +- `34d863440` — sinsp include directory fix +- `a915789ec` / `16edb6bb1` — CMake/include fixes for logging integration +- `5338014a7` — disable log timestamps API + +**Workarounds** (keep but monitor): +- `8ba291e78` — disable trusted exepath (see "Exepath" section below) +- `88d5093f4` — ASSERT_TO_LOG via falcosecurity_log_fn callback + +**Rebase fixups** (always regenerated): +- `d0fb1702c` — fixes following rebase (CMake cycle, exepath fallback, assert macro) + +### Upstream Candidates + +These patches are generic enough to propose upstream: +- **Strong**: clang verifier fixes (2291f61ec, 8672099d6, df93a9e42), disable log timestamps (5338014a7) +- **With discussion**: cgroup filtering (a982809e0), socket-only FD scan (8dd26e3dc), log asserts (88d5093f4) — upstream may prefer runtime flags over compile-time +- **ppc64le bundle**: propose together if upstream is interested in the architecture + ## Step 3: Identify Breaking API Changes Check what APIs changed between versions. Key areas to inspect: @@ -54,6 +96,12 @@ git log --oneline .. -- userspace/libsinsp/threadinfo.h userspa # sinsp API changes git diff .. -- userspace/libsinsp/sinsp.h | grep -E '^\+|^\-' | head -80 +# Event format changes (parameter additions/removals) +git diff .. -- driver/event_table.c + +# Enter event deprecation (EF_OLD_VERSION flags) +git log --oneline .. --grep="OLD_VERSION\|enter event\|enter_event" + # Breaking changes git log --oneline .. --grep="BREAKING\|breaking\|!:" ``` @@ -65,15 +113,16 @@ grep -rn '' collector/lib/ collector/test/ --include='*.cpp' - ``` Key collector integration points to check: -- `collector/lib/system-inspector/Service.cpp` — sinsp initialization, container setup, thread access -- `collector/lib/system-inspector/EventExtractor.h` — threadinfo field access macros (TINFO_FIELD, TINFO_FIELD_RAW_GETTER) +- `collector/lib/system-inspector/Service.cpp` — sinsp initialization, plugin loading, filter setup +- `collector/lib/system-inspector/EventExtractor.h` — threadinfo field access macros (TINFO_FIELD, FIELD_CSTR, FIELD_RAW) - `collector/lib/ContainerMetadata.cpp` — container info/label lookup -- `collector/lib/ContainerEngine.h` — container engine integration (may be deleted if upstream removed container engines) -- `collector/lib/ProcessSignalFormatter.cpp` — process signal creation, container_id access, parent thread traversal +- `collector/lib/ProcessSignalFormatter.cpp` — process signal creation, exepath access, container_id, lineage traversal +- `collector/lib/NetworkSignalHandler.cpp` — container_id access - `collector/lib/Process.cpp` — process info access, container_id -- `collector/lib/Utility.cpp` — threadinfo printing -- `collector/test/ProcessSignalFormatterTest.cpp` — thread creation, get_thread_ref, container_id setup -- `collector/CMakeLists.txt` — falco build flags (SINSP_SLIM_THREADINFO, BUILD_LIBSCAP_MODERN_BPF, MODERN_BPF_EXCLUDE_PROGS, etc.) +- `collector/lib/Utility.cpp` — GetContainerID helper, threadinfo printing +- `collector/test/ProcessSignalFormatterTest.cpp` — thread creation, thread_manager usage +- `collector/test/SystemInspectorServiceTest.cpp` — service initialization +- `collector/CMakeLists.txt` — falco build flags ## Step 4: Plan Staging Strategy @@ -85,9 +134,9 @@ If the gap is large (>200 commits), identify intermediate stopping points: Known historical API breakpoints (update as upstream evolves): - **0.20.0**: `set_import_users` lost second arg, user/group structs on threadinfo replaced with `m_uid`/`m_gid` -- **0.21.0**: Container engine subsystem removed entirely, replaced by container plugin (`libcontainer.so`). `m_container_id` removed from threadinfo. `m_thread_manager` changed to `shared_ptr`. `build_threadinfo()`/`add_thread()` removed from sinsp. -- **0.22.0**: `get_thread_ref` removed from sinsp (use `find_thread`) -- **0.23.0+**: `get_container_id()` removed from threadinfo. Parent thread traversal moved to thread_manager. User/group handling removed from threadinfo. +- **0.21.0**: Container engine subsystem removed entirely, replaced by container plugin (`libcontainer.so`). `m_container_id` removed from threadinfo. `m_thread_manager` changed to `shared_ptr`. `build_threadinfo()`/`add_thread()` removed from sinsp. Enter events for many syscalls deprecated (`EF_OLD_VERSION`). +- **0.22.0**: `get_thread_ref` removed from sinsp (use `find_thread`). `get_container_id()` removed from threadinfo. `extract_single` API changed in filterchecks. +- **0.23.0+**: Parent thread traversal moved to thread_manager. `get_thread_info(bool)` signature changed to `get_thread_info()` (no bool). `m_user`/`m_group` structs removed (use `m_uid`/`m_gid` directly). ## Step 5: Execute Rebase (per stage) @@ -109,16 +158,109 @@ Always rebase onto upstream **tags** (not master tip) per `docs/falco-update.md` After each rebase stage, update collector code for API changes found in Step 3. -Common patterns of change: -- **Removed container engine**: Delete `ContainerEngine.h`, replace `set_container_engine_mask()` with `register_plugin()` for the container plugin -- **Container plugin**: Ship `libcontainer.so` in the container image, register it before setting filters like `container.id != 'host'` -- **Container metadata**: Replace `m_container_manager.get_containers()` with plugin state table API -- **Thread access**: Replace `get_thread_ref(tid, true)` with `find_thread(tid, false)` or `m_thread_manager->find_thread()` -- **Container ID**: Replace `tinfo->m_container_id` with `tinfo->get_container_id()` or `FIELD_CSTR(container_id, "container.id")` -- **User/group**: Replace `m_user.uid()` / `m_group.gid()` with `m_uid` / `m_gid` -- **Thread creation in tests**: Replace `build_threadinfo()` with `m_thread_manager_factory.create()`, `add_thread()` with `m_thread_manager->add_thread()` +### Common patterns of change + +**Container plugin integration** (from 0.21.0): +- Delete `ContainerEngine.h` — container engines no longer built-in +- Ship `libcontainer.so` in the collector image (built from source in builder, needs Go) +- Load via `sinsp::register_plugin()` in Service.cpp before setting filters +- Register extraction capabilities: `EventExtractor::FilterList().add_filter_check(sinsp_plugin::new_filtercheck(plugin))` +- Wrap `set_filter("container.id != 'host'")` in try-catch for tests without plugin -## Step 7: Validate Each Stage +**Container ID access** (from 0.21.0+): +- Replace `tinfo->m_container_id` with a helper like `GetContainerID(tinfo, thread_manager)` that reads from plugin state tables +- In EventExtractor.h: change `TINFO_FIELD(container_id)` to `FIELD_CSTR(container_id, "container.id")` (provided by container plugin) +- The `FIELD_CSTR` null guard handles tests where the plugin isn't loaded + +**Thread access** (from 0.22.0+): +- Replace `get_thread_ref(tid, true)` with `m_thread_manager->find_thread(tid, false)` or `m_thread_manager->get_thread(tid, false)` +- `get_thread_info(true)` → `get_thread_info()` (no bool parameter) + +**User/group** (from 0.20.0+): +- Replace `m_user.uid()` / `m_group.gid()` with `m_uid` / `m_gid` + +**Thread creation in tests**: +- Replace `build_threadinfo()` with `inspector->get_threadinfo_factory().create()` +- Replace `add_thread()` with `inspector->m_thread_manager->add_thread(std::move(tinfo), false)` + +**Lineage traversal** (from 0.23.0+): +- Replace `mt->traverse_parent_state(visitor)` with `inspector_->m_thread_manager->traverse_parent_state(*mt, visitor)` +- Visitor type: `sinsp_thread_manager::visitor_func_t` instead of `sinsp_threadinfo::visitor_func_t` + +**FilterCheck API** (from 0.22.0+): +- `extract_single(event, &len)` → `extract(event, vals)` vector-based API +- Add null guards for `filter_check` pointers (plugin-provided checks may not be initialized) + +## Step 7: Known Gotchas + +### Exepath Resolution (CRITICAL) + +Modern drivers (0.21.0+) **no longer send execve enter events** (marked `EF_OLD_VERSION`). The exepath is supposed to come from the `trusted_exepath` parameter (param 28) in the exit event, which uses the kernel's `d_path()`. + +However, the StackRox fork **disables trusted_exepath** (`USE_TRUSTED_EXEPATH=false`) because it resolves symlinks — giving `/bin/busybox` instead of `/bin/ls` in busybox containers, breaking ACS policies. + +**Without either source, `m_exepath` inherits the parent's value on clone** (e.g., `/usr/bin/podman`), causing all container processes to show the container runtime's path. + +**Fix**: Add a fallback in `parse_execve_exit` (parsers.cpp) that uses **Parameter 31** (`filename`, which is `bprm->filename`) from the exit event. This contains the first argument to execve as provided by the caller — same behavior as the old enter event reconstruction: + +```cpp +// After the retrieve_enter_event() block, add: +if(!exepath_set) { + /* Parameter 31: filename (type: PT_FSPATH) */ + if(const auto filename_param = evt.get_param(30); !filename_param->empty()) { + std::string_view filename = filename_param->as(); + if(filename != "") { + std::string fullpath = sinsp_utils::concatenate_paths( + evt.get_tinfo()->get_cwd(), filename); + evt.get_tinfo()->set_exepath(std::move(fullpath)); + } + } +} +``` + +**How to detect this bug**: Integration test `TestProcessViz` fails with all processes showing the container runtime binary (e.g., `/usr/bin/podman`) as their ExePath. + +**Key event parameters** (PPME_SYSCALL_EXECVE_19_X, 0-indexed): +- 1: exe (argv[0]), 6: cwd, 13: comm (always correct) +- 27: trusted_exepath (kernel d_path, resolves symlinks — disabled) +- 30: filename (bprm->filename, first arg to execve — use this) + +### CMake Dependency Cycle + +Upstream has a cyclic dependency: `events_dimensions_generator → scap_event_schema → scap → pman → ProbeSkeleton → EventsDimensions → generator`. Upstream doesn't hit it because their CI uses CMake 3.22; our builder uses 3.31+ which enforces cycle detection. + +**Fix**: Compile the 3 required driver source files (`event_table.c`, `flags_table.c`, `dynamic_params_table.c`) directly into the generator instead of linking `scap_event_schema`. This fix lives in `driver/modern_bpf/CMakeLists.txt` and must be carried forward each rebase. + +### ASSERT_TO_LOG Circular Dependency + +Collector compiles with `-DASSERT_TO_LOG` so assertions log instead of aborting. The old approach using `libsinsp_logger()` causes circular includes because `logger.h` includes `sinsp_public.h`. + +**Fix**: Use `falcosecurity_log_fn` callback from `scap_log.h` (same pattern as `scap_assert.h`). This is a tiny header with no dependencies. The callback is set by sinsp when it opens the scap handle. + +### Container Plugin Build + +The container plugin (`libcontainer.so`) is a C++/Go hybrid: +- Source: `github.com/falcosecurity/plugins` (monorepo), `plugins/container/` directory +- Requires Go 1.23+ for the go-worker component +- Upstream only ships x86_64 and arm64 binaries; ppc64le/s390x must be built from source +- Version must match falcosecurity-libs (check plugin compatibility matrix) +- Submodule at `builder/third_party/falcosecurity-plugins` + +### BPF Verifier Compatibility + +BPF verifier behavior varies significantly across: +- **Kernel versions**: Older kernels have stricter limits +- **Clang versions**: clang > 19 can produce code that exceeds instruction counts +- **Platform kernels**: RHEL SAP, Google COS have custom verifiers + +Common fixes: +- Reduce loop bounds (e.g., `MAX_IOVCNT` 32 → 16) +- Mark variables `volatile` to prevent optimizations the verifier can't follow +- Add `#pragma unroll` for loops the verifier can't bound +- Break pointer chain reads into separate variables with null checks +- Use `const` qualifiers on credential struct pointers + +## Step 8: Validate Each Stage Run this checklist after each stage: @@ -126,16 +268,21 @@ Run this checklist after each stage: - [ ] Each surviving patch verified: diff against original to ensure no content loss - [ ] `make collector` succeeds on amd64 - [ ] `make unittest` passes (all test suites, especially ProcessSignalFormatterTest) +- [ ] Integration tests: `TestProcessViz` (exepath correctness), `TestProcessLineageInfo`, `TestNetworkFlows` - [ ] Multi-arch compilation: arm64, ppc64le, s390x -- [ ] Integration tests on at least 1 VM type (RHCOS or Ubuntu) -- [ ] Runtime self-checks pass -- [ ] Container ID attribution works correctly +- [ ] Container ID attribution works (not all showing empty or host) +- [ ] Process exepaths are correct (not showing container runtime binary like `/usr/bin/podman`) - [ ] Container label/namespace lookup works - [ ] Network signal handler receives correct container IDs -- [ ] No ASan/Valgrind errors (run with `ADDRESS_SANITIZER=ON` and `USE_VALGRIND=ON`) -- [ ] Performance benchmarks show no regression vs previous stage +- [ ] Runtime self-checks pass + +### Key Integration Tests + +- **TestProcessViz**: Verifies process ExePath, Name, and Args for container processes. Catches the exepath regression where all paths show the container runtime. Expected paths like `/bin/ls`, `/usr/sbin/nginx`, `/bin/sh`. +- **TestProcessLineageInfo**: Verifies parent process lineage chains stop at container boundaries. +- **TestNetworkFlows**: Verifies network connections are attributed to correct containers. -## Step 8: Final Update +## Step 9: Final Update ```sh cd @@ -143,7 +290,11 @@ cd falcosecurity-libs && git checkout -stackrox cd .. && git add falcosecurity-libs ``` -Update `docs/falco-update.md` with notes about what changed. +Update `docs/falco-update.md` with: +- Version transition (e.g., "0.23.1 → 0.25.0") +- Any new upstream API changes requiring collector-side fixes +- New StackRox patches added, patches dropped (upstreamed) +- Known issues or workarounds ## PR Strategy @@ -151,11 +302,29 @@ Each stage should produce **two PRs**: 1. **Fork PR** targeting `upstream-main` in `stackrox/falcosecurity-libs` (the rebased branch) 2. **Collector PR** updating the submodule pointer and making collector-side code changes -## Important Notes +## Quick Reference: Event Architecture -- The container plugin (`libcontainer.so`) replaced built-in container engines at upstream 0.21.0 -- Container plugin source: `https://github.com/falcosecurity/plugins/tree/main/plugins/container` -- Container plugin is C++/Go hybrid — needs Go toolchain to build from source -- Upstream only ships x86_64 and arm64 binaries; ppc64le/s390x must be built from source -- The `giles/cherry-picked-stackrox-additions` branch may have useful reference patches for collector-side adaptations -- Previous update attempts (e.g., `0.21.0-stackrox-rc1`) can be used as conflict resolution references +### How Process Events Flow + +1. **Kernel BPF** captures syscall events → writes to ring buffer +2. **libscap** reads ring buffer → produces `scap_evt` structs +3. **libsinsp parsers** (`parsers.cpp`) process events: + - `reset()`: looks up/creates thread info, validates enter/exit event matching + - `parse_clone_exit_caller/child()`: creates child thread info, inherits parent fields + - `parse_execve_exit()`: updates thread info with new process details +4. **Collector** (`ProcessSignalFormatter`) reads thread info fields via `EventExtractor` + +### Key Thread Info Fields + +| Field | Source | Notes | +|-------|--------|-------| +| `m_comm` | Exit event param 13 | Always correct (kernel task_struct->comm) | +| `m_exe` | Exit event param 1 | argv[0], may be relative | +| `m_exepath` | Enter event reconstruction OR param 27/30 | See "Exepath Resolution" gotcha | +| `m_pid` | Exit event param 4 | | +| `m_uid`/`m_gid` | Exit event param 26/29 | Was `m_user.uid()`/`m_group.gid()` before 0.20.0 | +| container_id | Container plugin filter field | Was `m_container_id` before 0.21.0 | + +### Enter Event Deprecation + +Upstream removed enter events to reduce ~50% of kernel/userspace overhead (proposal: `proposals/20240901-disable-support-for-syscall-enter-events.md`). All parameters moved to exit events. A scap converter handles old capture files. Any code depending on `retrieve_enter_event()` will silently fail with modern drivers — check for fallbacks using exit event parameters. From 3f7caf5048b54444ac8cf4439c928feea9c86216 Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Sun, 8 Mar 2026 17:47:20 +0000 Subject: [PATCH 04/20] Fixes for network conns --- collector/lib/NetworkSignalHandler.cpp | 27 +++++++++++++-------- integration-tests/suites/udp_networkflow.go | 14 +++++------ 2 files changed, 24 insertions(+), 17 deletions(-) diff --git a/collector/lib/NetworkSignalHandler.cpp b/collector/lib/NetworkSignalHandler.cpp index 71e1e634d4..f53175e588 100644 --- a/collector/lib/NetworkSignalHandler.cpp +++ b/collector/lib/NetworkSignalHandler.cpp @@ -68,6 +68,7 @@ NetworkSignalHandler::~NetworkSignalHandler() = default; * result: nil */ std::optional NetworkSignalHandler::GetConnection(sinsp_evt* evt) { + const char* evt_name = evt->get_name(); auto* fd_info = evt->get_fd_info(); if (!fd_info) { @@ -75,17 +76,22 @@ std::optional NetworkSignalHandler::GetConnection(sinsp_evt* evt) { } // With collect_connection_status_ set, we can prevent reporting of asynchronous - // connections which fail. + // connections which fail. This check is only relevant for connection + // establishment events (connect, accept, getsockopt). For send/recv events, + // a previous failed operation on the same fd can leave the socket marked as + // "failed" even when subsequent operations succeed, because the sinsp parser + // (parse_rw_exit) does not clear the failed flag on successful send/recv. if (collect_connection_status_) { - // note: connection status tracking enablement is managed in system_inspector::Service - if (fd_info->is_socket_failed()) { - // connect() failed or getsockopt(SO_ERROR) returned a failure - return std::nullopt; - } - - if (fd_info->is_socket_pending()) { - // connect() returned E_INPROGRESS - return std::nullopt; + bool is_send_recv = (strncmp(evt_name, "send", 4) == 0 || + strncmp(evt_name, "recv", 4) == 0); + if (!is_send_recv) { + if (fd_info->is_socket_failed()) { + return std::nullopt; + } + + if (fd_info->is_socket_pending()) { + return std::nullopt; + } } } @@ -137,6 +143,7 @@ std::optional NetworkSignalHandler::GetConnection(sinsp_evt* evt) { if (!container_id) { return std::nullopt; } + return {Connection(container_id, *local, *remote, l4proto, is_server)}; } diff --git a/integration-tests/suites/udp_networkflow.go b/integration-tests/suites/udp_networkflow.go index 9045e7466f..1aefea184a 100644 --- a/integration-tests/suites/udp_networkflow.go +++ b/integration-tests/suites/udp_networkflow.go @@ -138,8 +138,8 @@ func (s *UdpNetworkFlow) runTest(image, recv, send string, port uint32) { CloseTimestamp: nil, } - s.Sensor().ExpectConnections(s.T(), client.id, 5*time.Second, clientConnection) - s.Sensor().ExpectConnections(s.T(), server.id, 5*time.Second, serverConnection) + s.Sensor().ExpectConnections(s.T(), client.id, 30*time.Second, clientConnection) + s.Sensor().ExpectConnections(s.T(), server.id, 30*time.Second, serverConnection) } func (s *UdpNetworkFlow) TestMultipleDestinations() { @@ -164,7 +164,7 @@ func (s *UdpNetworkFlow) TestMultipleDestinations() { client := s.runClient(config.ContainerStartConfig{ Name: UDP_CLIENT, Image: image, - Command: newClientCmd("sendmmsg", "300", "8", servers...), + Command: newClientCmd("sendmmsg", "300", "4", servers...), Entrypoint: []string{"udp-client"}, }) log.Info("Client: %s\n", client.String()) @@ -192,9 +192,9 @@ func (s *UdpNetworkFlow) TestMultipleDestinations() { ContainerId: server.id, CloseTimestamp: nil, } - s.Sensor().ExpectConnections(s.T(), server.id, 5*time.Second, serverConnection) + s.Sensor().ExpectConnections(s.T(), server.id, 30*time.Second, serverConnection) } - s.Sensor().ExpectConnections(s.T(), client.id, 5*time.Second, clientConnections...) + s.Sensor().ExpectConnections(s.T(), client.id, 30*time.Second, clientConnections...) } func (s *UdpNetworkFlow) TestMultipleSources() { @@ -243,9 +243,9 @@ func (s *UdpNetworkFlow) TestMultipleSources() { } for i, client := range clients { - s.Sensor().ExpectConnections(s.T(), client.id, 5*time.Second, clientConnections[i]) + s.Sensor().ExpectConnections(s.T(), client.id, 30*time.Second, clientConnections[i]) } - s.Sensor().ExpectConnections(s.T(), server.id, 5*time.Second, serverConnections...) + s.Sensor().ExpectConnections(s.T(), server.id, 30*time.Second, serverConnections...) } func newServerCmd(recv string, port uint32) []string { From cfa45173c3d644c32d9f3e6f427735e75b072c55 Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Sun, 8 Mar 2026 17:52:46 +0000 Subject: [PATCH 05/20] bd init: initialize beads issue tracking --- .beads/.gitignore | 51 +++++++++++ .beads/README.md | 81 +++++++++++++++++ .beads/config.yaml | 54 ++++++++++++ .beads/dolt-monitor.pid.lock | 0 .beads/hooks/post-checkout | 9 ++ .beads/hooks/post-merge | 9 ++ .beads/hooks/pre-commit | 9 ++ .beads/hooks/pre-push | 9 ++ .beads/hooks/prepare-commit-msg | 9 ++ .beads/interactions.jsonl | 0 .beads/metadata.json | 7 ++ .gitignore | 4 + AGENTS.md | 150 ++++++++++++++++++++++++++++++++ 13 files changed, 392 insertions(+) create mode 100644 .beads/.gitignore create mode 100644 .beads/README.md create mode 100644 .beads/config.yaml create mode 100644 .beads/dolt-monitor.pid.lock create mode 100755 .beads/hooks/post-checkout create mode 100755 .beads/hooks/post-merge create mode 100755 .beads/hooks/pre-commit create mode 100755 .beads/hooks/pre-push create mode 100755 .beads/hooks/prepare-commit-msg create mode 100644 .beads/interactions.jsonl create mode 100644 .beads/metadata.json create mode 100644 AGENTS.md diff --git a/.beads/.gitignore b/.beads/.gitignore new file mode 100644 index 0000000000..363ebae2fe --- /dev/null +++ b/.beads/.gitignore @@ -0,0 +1,51 @@ +# Dolt database (managed by Dolt, not git) +dolt/ +dolt-access.lock + +# Runtime files +bd.sock +bd.sock.startlock +sync-state.json +last-touched + +# Local version tracking (prevents upgrade notification spam after git ops) +.local_version + +# Worktree redirect file (contains relative path to main repo's .beads/) +# Must not be committed as paths would be wrong in other clones +redirect + +# Sync state (local-only, per-machine) +# These files are machine-specific and should not be shared across clones +.sync.lock +export-state/ + +# Ephemeral store (SQLite - wisps/molecules, intentionally not versioned) +ephemeral.sqlite3 +ephemeral.sqlite3-journal +ephemeral.sqlite3-wal +ephemeral.sqlite3-shm + +# Dolt server management (auto-started by bd) +dolt-server.pid +dolt-server.log +dolt-server.lock +dolt-server.port +dolt-server.activity +dolt-monitor.pid + +# Backup data (auto-exported JSONL, local-only) +backup/ + +# Legacy files (from pre-Dolt versions) +*.db +*.db?* +*.db-journal +*.db-wal +*.db-shm +db.sqlite +bd.db +# NOTE: Do NOT add negation patterns here. +# They would override fork protection in .git/info/exclude. +# Config files (metadata.json, config.yaml) are tracked by git by default +# since no pattern above ignores them. diff --git a/.beads/README.md b/.beads/README.md new file mode 100644 index 0000000000..dbfe3631cf --- /dev/null +++ b/.beads/README.md @@ -0,0 +1,81 @@ +# Beads - AI-Native Issue Tracking + +Welcome to Beads! This repository uses **Beads** for issue tracking - a modern, AI-native tool designed to live directly in your codebase alongside your code. + +## What is Beads? + +Beads is issue tracking that lives in your repo, making it perfect for AI coding agents and developers who want their issues close to their code. No web UI required - everything works through the CLI and integrates seamlessly with git. + +**Learn more:** [github.com/steveyegge/beads](https://github.com/steveyegge/beads) + +## Quick Start + +### Essential Commands + +```bash +# Create new issues +bd create "Add user authentication" + +# View all issues +bd list + +# View issue details +bd show + +# Update issue status +bd update --claim +bd update --status done + +# Sync with Dolt remote +bd dolt push +``` + +### Working with Issues + +Issues in Beads are: +- **Git-native**: Stored in Dolt database with version control and branching +- **AI-friendly**: CLI-first design works perfectly with AI coding agents +- **Branch-aware**: Issues can follow your branch workflow +- **Always in sync**: Auto-syncs with your commits + +## Why Beads? + +✨ **AI-Native Design** +- Built specifically for AI-assisted development workflows +- CLI-first interface works seamlessly with AI coding agents +- No context switching to web UIs + +🚀 **Developer Focused** +- Issues live in your repo, right next to your code +- Works offline, syncs when you push +- Fast, lightweight, and stays out of your way + +🔧 **Git Integration** +- Automatic sync with git commits +- Branch-aware issue tracking +- Dolt-native three-way merge resolution + +## Get Started with Beads + +Try Beads in your own projects: + +```bash +# Install Beads +curl -sSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/install.sh | bash + +# Initialize in your repo +bd init + +# Create your first issue +bd create "Try out Beads" +``` + +## Learn More + +- **Documentation**: [github.com/steveyegge/beads/docs](https://github.com/steveyegge/beads/tree/main/docs) +- **Quick Start Guide**: Run `bd quickstart` +- **Examples**: [github.com/steveyegge/beads/examples](https://github.com/steveyegge/beads/tree/main/examples) + +--- + +*Beads: Issue tracking that moves at the speed of thought* ⚡ diff --git a/.beads/config.yaml b/.beads/config.yaml new file mode 100644 index 0000000000..e831a6bec4 --- /dev/null +++ b/.beads/config.yaml @@ -0,0 +1,54 @@ +# Beads Configuration File +# This file configures default behavior for all bd commands in this repository +# All settings can also be set via environment variables (BD_* prefix) +# or overridden with command-line flags + +# Issue prefix for this repository (used by bd init) +# If not set, bd init will auto-detect from directory name +# Example: issue-prefix: "myproject" creates issues like "myproject-1", "myproject-2", etc. +# issue-prefix: "" + +# Use no-db mode: JSONL-only, no Dolt database +# When true, bd will use .beads/issues.jsonl as the source of truth +# no-db: false + +# Enable JSON output by default +# json: false + +# Feedback title formatting for mutating commands (create/update/close/dep/edit) +# 0 = hide titles, N > 0 = truncate to N characters +# output: +# title-length: 255 + +# Default actor for audit trails (overridden by BD_ACTOR or --actor) +# actor: "" + +# Export events (audit trail) to .beads/events.jsonl on each flush/sync +# When enabled, new events are appended incrementally using a high-water mark. +# Use 'bd export --events' to trigger manually regardless of this setting. +# events-export: false + +# Multi-repo configuration (experimental - bd-307) +# Allows hydrating from multiple repositories and routing writes to the correct database +# repos: +# primary: "." # Primary repo (where this database lives) +# additional: # Additional repos to hydrate from (read-only) +# - ~/beads-planning # Personal planning repo +# - ~/work-planning # Work planning repo + +# JSONL backup (periodic export for off-machine recovery) +# Auto-enabled when a git remote exists. Override explicitly: +# backup: +# enabled: false # Disable auto-backup entirely +# interval: 15m # Minimum time between auto-exports +# git-push: false # Disable git push (export locally only) +# git-repo: "" # Separate git repo for backups (default: project repo) + +# Integration settings (access with 'bd config get/set') +# These are stored in the database, not in this file: +# - jira.url +# - jira.project +# - linear.url +# - linear.api-key +# - github.org +# - github.repo diff --git a/.beads/dolt-monitor.pid.lock b/.beads/dolt-monitor.pid.lock new file mode 100644 index 0000000000..e69de29bb2 diff --git a/.beads/hooks/post-checkout b/.beads/hooks/post-checkout new file mode 100755 index 0000000000..05cfb03274 --- /dev/null +++ b/.beads/hooks/post-checkout @@ -0,0 +1,9 @@ +#!/usr/bin/env sh +# --- BEGIN BEADS INTEGRATION v0.59.0 --- +# This section is managed by beads. Do not remove these markers. +if command -v bd >/dev/null 2>&1; then + export BD_GIT_HOOK=1 + bd hooks run post-checkout "$@" + _bd_exit=$?; if [ $_bd_exit -ne 0 ]; then exit $_bd_exit; fi +fi +# --- END BEADS INTEGRATION v0.59.0 --- diff --git a/.beads/hooks/post-merge b/.beads/hooks/post-merge new file mode 100755 index 0000000000..88a5d7d97e --- /dev/null +++ b/.beads/hooks/post-merge @@ -0,0 +1,9 @@ +#!/usr/bin/env sh +# --- BEGIN BEADS INTEGRATION v0.59.0 --- +# This section is managed by beads. Do not remove these markers. +if command -v bd >/dev/null 2>&1; then + export BD_GIT_HOOK=1 + bd hooks run post-merge "$@" + _bd_exit=$?; if [ $_bd_exit -ne 0 ]; then exit $_bd_exit; fi +fi +# --- END BEADS INTEGRATION v0.59.0 --- diff --git a/.beads/hooks/pre-commit b/.beads/hooks/pre-commit new file mode 100755 index 0000000000..717ab65816 --- /dev/null +++ b/.beads/hooks/pre-commit @@ -0,0 +1,9 @@ +#!/usr/bin/env sh +# --- BEGIN BEADS INTEGRATION v0.59.0 --- +# This section is managed by beads. Do not remove these markers. +if command -v bd >/dev/null 2>&1; then + export BD_GIT_HOOK=1 + bd hooks run pre-commit "$@" + _bd_exit=$?; if [ $_bd_exit -ne 0 ]; then exit $_bd_exit; fi +fi +# --- END BEADS INTEGRATION v0.59.0 --- diff --git a/.beads/hooks/pre-push b/.beads/hooks/pre-push new file mode 100755 index 0000000000..73a833cc8b --- /dev/null +++ b/.beads/hooks/pre-push @@ -0,0 +1,9 @@ +#!/usr/bin/env sh +# --- BEGIN BEADS INTEGRATION v0.59.0 --- +# This section is managed by beads. Do not remove these markers. +if command -v bd >/dev/null 2>&1; then + export BD_GIT_HOOK=1 + bd hooks run pre-push "$@" + _bd_exit=$?; if [ $_bd_exit -ne 0 ]; then exit $_bd_exit; fi +fi +# --- END BEADS INTEGRATION v0.59.0 --- diff --git a/.beads/hooks/prepare-commit-msg b/.beads/hooks/prepare-commit-msg new file mode 100755 index 0000000000..9c820060b9 --- /dev/null +++ b/.beads/hooks/prepare-commit-msg @@ -0,0 +1,9 @@ +#!/usr/bin/env sh +# --- BEGIN BEADS INTEGRATION v0.59.0 --- +# This section is managed by beads. Do not remove these markers. +if command -v bd >/dev/null 2>&1; then + export BD_GIT_HOOK=1 + bd hooks run prepare-commit-msg "$@" + _bd_exit=$?; if [ $_bd_exit -ne 0 ]; then exit $_bd_exit; fi +fi +# --- END BEADS INTEGRATION v0.59.0 --- diff --git a/.beads/interactions.jsonl b/.beads/interactions.jsonl new file mode 100644 index 0000000000..e69de29bb2 diff --git a/.beads/metadata.json b/.beads/metadata.json new file mode 100644 index 0000000000..be9258d80c --- /dev/null +++ b/.beads/metadata.json @@ -0,0 +1,7 @@ +{ + "database": "dolt", + "backend": "dolt", + "dolt_mode": "server", + "dolt_database": "collector", + "project_id": "d273f0f4-427a-4be7-ad7b-59061bdbd751" +} \ No newline at end of file diff --git a/.gitignore b/.gitignore index 25842a918b..7796894073 100644 --- a/.gitignore +++ b/.gitignore @@ -37,3 +37,7 @@ ansible/ci/inventory_ibmcloud.yml # vcpkg vcpkg_installed/ vcpkg-manifest-install.log + +# Dolt database files (added by bd init) +.dolt/ +*.db diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..c951c0757c --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,150 @@ +# Agent Instructions + +This project uses **bd** (beads) for issue tracking. Run `bd onboard` to get started. + +## Quick Reference + +```bash +bd ready # Find available work +bd show # View issue details +bd update --claim # Claim work atomically +bd close # Complete work +bd sync # Sync with git +``` + +## Non-Interactive Shell Commands + +**ALWAYS use non-interactive flags** with file operations to avoid hanging on confirmation prompts. + +Shell commands like `cp`, `mv`, and `rm` may be aliased to include `-i` (interactive) mode on some systems, causing the agent to hang indefinitely waiting for y/n input. + +**Use these forms instead:** +```bash +# Force overwrite without prompting +cp -f source dest # NOT: cp source dest +mv -f source dest # NOT: mv source dest +rm -f file # NOT: rm file + +# For recursive operations +rm -rf directory # NOT: rm -r directory +cp -rf source dest # NOT: cp -r source dest +``` + +**Other commands that may prompt:** +- `scp` - use `-o BatchMode=yes` for non-interactive +- `ssh` - use `-o BatchMode=yes` to fail instead of prompting +- `apt-get` - use `-y` flag +- `brew` - use `HOMEBREW_NO_AUTO_UPDATE=1` env var + + +## Issue Tracking with bd (beads) + +**IMPORTANT**: This project uses **bd (beads)** for ALL issue tracking. Do NOT use markdown TODOs, task lists, or other tracking methods. + +### Why bd? + +- Dependency-aware: Track blockers and relationships between issues +- Version-controlled: Built on Dolt with cell-level merge +- Agent-optimized: JSON output, ready work detection, discovered-from links +- Prevents duplicate tracking systems and confusion + +### Quick Start + +**Check for ready work:** + +```bash +bd ready --json +``` + +**Create new issues:** + +```bash +bd create "Issue title" --description="Detailed context" -t bug|feature|task -p 0-4 --json +bd create "Issue title" --description="What this issue is about" -p 1 --deps discovered-from:bd-123 --json +``` + +**Claim and update:** + +```bash +bd update --claim --json +bd update bd-42 --priority 1 --json +``` + +**Complete work:** + +```bash +bd close bd-42 --reason "Completed" --json +``` + +### Issue Types + +- `bug` - Something broken +- `feature` - New functionality +- `task` - Work item (tests, docs, refactoring) +- `epic` - Large feature with subtasks +- `chore` - Maintenance (dependencies, tooling) + +### Priorities + +- `0` - Critical (security, data loss, broken builds) +- `1` - High (major features, important bugs) +- `2` - Medium (default, nice-to-have) +- `3` - Low (polish, optimization) +- `4` - Backlog (future ideas) + +### Workflow for AI Agents + +1. **Check ready work**: `bd ready` shows unblocked issues +2. **Claim your task atomically**: `bd update --claim` +3. **Work on it**: Implement, test, document +4. **Discover new work?** Create linked issue: + - `bd create "Found bug" --description="Details about what was found" -p 1 --deps discovered-from:` +5. **Complete**: `bd close --reason "Done"` + +### Auto-Sync + +bd automatically syncs with git: + +- Exports to `.beads/issues.jsonl` after changes (5s debounce) +- Imports from JSONL when newer (e.g., after `git pull`) +- No manual export/import needed! + +### Important Rules + +- ✅ Use bd for ALL task tracking +- ✅ Always use `--json` flag for programmatic use +- ✅ Link discovered work with `discovered-from` dependencies +- ✅ Check `bd ready` before asking "what should I work on?" +- ❌ Do NOT create markdown TODO lists +- ❌ Do NOT use external issue trackers +- ❌ Do NOT duplicate tracking systems + +For more details, see README.md and docs/QUICKSTART.md. + +## Landing the Plane (Session Completion) + +**When ending a work session**, you MUST complete ALL steps below. Work is NOT complete until `git push` succeeds. + +**MANDATORY WORKFLOW:** + +1. **File issues for remaining work** - Create issues for anything that needs follow-up +2. **Run quality gates** (if code changed) - Tests, linters, builds +3. **Update issue status** - Close finished work, update in-progress items +4. **PUSH TO REMOTE** - This is MANDATORY: + ```bash + git pull --rebase + bd sync + git push + git status # MUST show "up to date with origin" + ``` +5. **Clean up** - Clear stashes, prune remote branches +6. **Verify** - All changes committed AND pushed +7. **Hand off** - Provide context for next session + +**CRITICAL RULES:** +- Work is NOT complete until `git push` succeeds +- NEVER stop before pushing - that leaves work stranded locally +- NEVER say "ready to push when you are" - YOU must push +- If push fails, resolve and retry until it succeeds + + From 053d1e453825ad0c805be25d7f894189c412deef Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Mon, 9 Mar 2026 09:05:41 +0000 Subject: [PATCH 06/20] Remove beads --- .beads/.gitignore | 51 --------------------- .beads/README.md | 81 --------------------------------- .beads/config.yaml | 54 ---------------------- .beads/dolt-monitor.pid.lock | 0 .beads/hooks/post-checkout | 9 ---- .beads/hooks/post-merge | 9 ---- .beads/hooks/pre-commit | 9 ---- .beads/hooks/pre-push | 9 ---- .beads/hooks/prepare-commit-msg | 9 ---- .beads/interactions.jsonl | 0 .beads/metadata.json | 7 --- 11 files changed, 238 deletions(-) delete mode 100644 .beads/.gitignore delete mode 100644 .beads/README.md delete mode 100644 .beads/config.yaml delete mode 100644 .beads/dolt-monitor.pid.lock delete mode 100755 .beads/hooks/post-checkout delete mode 100755 .beads/hooks/post-merge delete mode 100755 .beads/hooks/pre-commit delete mode 100755 .beads/hooks/pre-push delete mode 100755 .beads/hooks/prepare-commit-msg delete mode 100644 .beads/interactions.jsonl delete mode 100644 .beads/metadata.json diff --git a/.beads/.gitignore b/.beads/.gitignore deleted file mode 100644 index 363ebae2fe..0000000000 --- a/.beads/.gitignore +++ /dev/null @@ -1,51 +0,0 @@ -# Dolt database (managed by Dolt, not git) -dolt/ -dolt-access.lock - -# Runtime files -bd.sock -bd.sock.startlock -sync-state.json -last-touched - -# Local version tracking (prevents upgrade notification spam after git ops) -.local_version - -# Worktree redirect file (contains relative path to main repo's .beads/) -# Must not be committed as paths would be wrong in other clones -redirect - -# Sync state (local-only, per-machine) -# These files are machine-specific and should not be shared across clones -.sync.lock -export-state/ - -# Ephemeral store (SQLite - wisps/molecules, intentionally not versioned) -ephemeral.sqlite3 -ephemeral.sqlite3-journal -ephemeral.sqlite3-wal -ephemeral.sqlite3-shm - -# Dolt server management (auto-started by bd) -dolt-server.pid -dolt-server.log -dolt-server.lock -dolt-server.port -dolt-server.activity -dolt-monitor.pid - -# Backup data (auto-exported JSONL, local-only) -backup/ - -# Legacy files (from pre-Dolt versions) -*.db -*.db?* -*.db-journal -*.db-wal -*.db-shm -db.sqlite -bd.db -# NOTE: Do NOT add negation patterns here. -# They would override fork protection in .git/info/exclude. -# Config files (metadata.json, config.yaml) are tracked by git by default -# since no pattern above ignores them. diff --git a/.beads/README.md b/.beads/README.md deleted file mode 100644 index dbfe3631cf..0000000000 --- a/.beads/README.md +++ /dev/null @@ -1,81 +0,0 @@ -# Beads - AI-Native Issue Tracking - -Welcome to Beads! This repository uses **Beads** for issue tracking - a modern, AI-native tool designed to live directly in your codebase alongside your code. - -## What is Beads? - -Beads is issue tracking that lives in your repo, making it perfect for AI coding agents and developers who want their issues close to their code. No web UI required - everything works through the CLI and integrates seamlessly with git. - -**Learn more:** [github.com/steveyegge/beads](https://github.com/steveyegge/beads) - -## Quick Start - -### Essential Commands - -```bash -# Create new issues -bd create "Add user authentication" - -# View all issues -bd list - -# View issue details -bd show - -# Update issue status -bd update --claim -bd update --status done - -# Sync with Dolt remote -bd dolt push -``` - -### Working with Issues - -Issues in Beads are: -- **Git-native**: Stored in Dolt database with version control and branching -- **AI-friendly**: CLI-first design works perfectly with AI coding agents -- **Branch-aware**: Issues can follow your branch workflow -- **Always in sync**: Auto-syncs with your commits - -## Why Beads? - -✨ **AI-Native Design** -- Built specifically for AI-assisted development workflows -- CLI-first interface works seamlessly with AI coding agents -- No context switching to web UIs - -🚀 **Developer Focused** -- Issues live in your repo, right next to your code -- Works offline, syncs when you push -- Fast, lightweight, and stays out of your way - -🔧 **Git Integration** -- Automatic sync with git commits -- Branch-aware issue tracking -- Dolt-native three-way merge resolution - -## Get Started with Beads - -Try Beads in your own projects: - -```bash -# Install Beads -curl -sSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/install.sh | bash - -# Initialize in your repo -bd init - -# Create your first issue -bd create "Try out Beads" -``` - -## Learn More - -- **Documentation**: [github.com/steveyegge/beads/docs](https://github.com/steveyegge/beads/tree/main/docs) -- **Quick Start Guide**: Run `bd quickstart` -- **Examples**: [github.com/steveyegge/beads/examples](https://github.com/steveyegge/beads/tree/main/examples) - ---- - -*Beads: Issue tracking that moves at the speed of thought* ⚡ diff --git a/.beads/config.yaml b/.beads/config.yaml deleted file mode 100644 index e831a6bec4..0000000000 --- a/.beads/config.yaml +++ /dev/null @@ -1,54 +0,0 @@ -# Beads Configuration File -# This file configures default behavior for all bd commands in this repository -# All settings can also be set via environment variables (BD_* prefix) -# or overridden with command-line flags - -# Issue prefix for this repository (used by bd init) -# If not set, bd init will auto-detect from directory name -# Example: issue-prefix: "myproject" creates issues like "myproject-1", "myproject-2", etc. -# issue-prefix: "" - -# Use no-db mode: JSONL-only, no Dolt database -# When true, bd will use .beads/issues.jsonl as the source of truth -# no-db: false - -# Enable JSON output by default -# json: false - -# Feedback title formatting for mutating commands (create/update/close/dep/edit) -# 0 = hide titles, N > 0 = truncate to N characters -# output: -# title-length: 255 - -# Default actor for audit trails (overridden by BD_ACTOR or --actor) -# actor: "" - -# Export events (audit trail) to .beads/events.jsonl on each flush/sync -# When enabled, new events are appended incrementally using a high-water mark. -# Use 'bd export --events' to trigger manually regardless of this setting. -# events-export: false - -# Multi-repo configuration (experimental - bd-307) -# Allows hydrating from multiple repositories and routing writes to the correct database -# repos: -# primary: "." # Primary repo (where this database lives) -# additional: # Additional repos to hydrate from (read-only) -# - ~/beads-planning # Personal planning repo -# - ~/work-planning # Work planning repo - -# JSONL backup (periodic export for off-machine recovery) -# Auto-enabled when a git remote exists. Override explicitly: -# backup: -# enabled: false # Disable auto-backup entirely -# interval: 15m # Minimum time between auto-exports -# git-push: false # Disable git push (export locally only) -# git-repo: "" # Separate git repo for backups (default: project repo) - -# Integration settings (access with 'bd config get/set') -# These are stored in the database, not in this file: -# - jira.url -# - jira.project -# - linear.url -# - linear.api-key -# - github.org -# - github.repo diff --git a/.beads/dolt-monitor.pid.lock b/.beads/dolt-monitor.pid.lock deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/.beads/hooks/post-checkout b/.beads/hooks/post-checkout deleted file mode 100755 index 05cfb03274..0000000000 --- a/.beads/hooks/post-checkout +++ /dev/null @@ -1,9 +0,0 @@ -#!/usr/bin/env sh -# --- BEGIN BEADS INTEGRATION v0.59.0 --- -# This section is managed by beads. Do not remove these markers. -if command -v bd >/dev/null 2>&1; then - export BD_GIT_HOOK=1 - bd hooks run post-checkout "$@" - _bd_exit=$?; if [ $_bd_exit -ne 0 ]; then exit $_bd_exit; fi -fi -# --- END BEADS INTEGRATION v0.59.0 --- diff --git a/.beads/hooks/post-merge b/.beads/hooks/post-merge deleted file mode 100755 index 88a5d7d97e..0000000000 --- a/.beads/hooks/post-merge +++ /dev/null @@ -1,9 +0,0 @@ -#!/usr/bin/env sh -# --- BEGIN BEADS INTEGRATION v0.59.0 --- -# This section is managed by beads. Do not remove these markers. -if command -v bd >/dev/null 2>&1; then - export BD_GIT_HOOK=1 - bd hooks run post-merge "$@" - _bd_exit=$?; if [ $_bd_exit -ne 0 ]; then exit $_bd_exit; fi -fi -# --- END BEADS INTEGRATION v0.59.0 --- diff --git a/.beads/hooks/pre-commit b/.beads/hooks/pre-commit deleted file mode 100755 index 717ab65816..0000000000 --- a/.beads/hooks/pre-commit +++ /dev/null @@ -1,9 +0,0 @@ -#!/usr/bin/env sh -# --- BEGIN BEADS INTEGRATION v0.59.0 --- -# This section is managed by beads. Do not remove these markers. -if command -v bd >/dev/null 2>&1; then - export BD_GIT_HOOK=1 - bd hooks run pre-commit "$@" - _bd_exit=$?; if [ $_bd_exit -ne 0 ]; then exit $_bd_exit; fi -fi -# --- END BEADS INTEGRATION v0.59.0 --- diff --git a/.beads/hooks/pre-push b/.beads/hooks/pre-push deleted file mode 100755 index 73a833cc8b..0000000000 --- a/.beads/hooks/pre-push +++ /dev/null @@ -1,9 +0,0 @@ -#!/usr/bin/env sh -# --- BEGIN BEADS INTEGRATION v0.59.0 --- -# This section is managed by beads. Do not remove these markers. -if command -v bd >/dev/null 2>&1; then - export BD_GIT_HOOK=1 - bd hooks run pre-push "$@" - _bd_exit=$?; if [ $_bd_exit -ne 0 ]; then exit $_bd_exit; fi -fi -# --- END BEADS INTEGRATION v0.59.0 --- diff --git a/.beads/hooks/prepare-commit-msg b/.beads/hooks/prepare-commit-msg deleted file mode 100755 index 9c820060b9..0000000000 --- a/.beads/hooks/prepare-commit-msg +++ /dev/null @@ -1,9 +0,0 @@ -#!/usr/bin/env sh -# --- BEGIN BEADS INTEGRATION v0.59.0 --- -# This section is managed by beads. Do not remove these markers. -if command -v bd >/dev/null 2>&1; then - export BD_GIT_HOOK=1 - bd hooks run prepare-commit-msg "$@" - _bd_exit=$?; if [ $_bd_exit -ne 0 ]; then exit $_bd_exit; fi -fi -# --- END BEADS INTEGRATION v0.59.0 --- diff --git a/.beads/interactions.jsonl b/.beads/interactions.jsonl deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/.beads/metadata.json b/.beads/metadata.json deleted file mode 100644 index be9258d80c..0000000000 --- a/.beads/metadata.json +++ /dev/null @@ -1,7 +0,0 @@ -{ - "database": "dolt", - "backend": "dolt", - "dolt_mode": "server", - "dolt_database": "collector", - "project_id": "d273f0f4-427a-4be7-ad7b-59061bdbd751" -} \ No newline at end of file From dd892708c5852b1de97beccb11137590b32e9012 Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Mon, 9 Mar 2026 12:21:52 +0000 Subject: [PATCH 07/20] Bump falco commit and update skill --- .claude/commands/update-falco-libs.md | 139 +++++++++++++++++++++++--- falcosecurity-libs | 2 +- 2 files changed, 127 insertions(+), 14 deletions(-) diff --git a/.claude/commands/update-falco-libs.md b/.claude/commands/update-falco-libs.md index 98d673d067..0870e79057 100644 --- a/.claude/commands/update-falco-libs.md +++ b/.claude/commands/update-falco-libs.md @@ -71,6 +71,12 @@ Categorize each patch as: - `8ba291e78` — disable trusted exepath (see "Exepath" section below) - `88d5093f4` — ASSERT_TO_LOG via falcosecurity_log_fn callback +**BPF verifier null-check optimization** (keep — not upstreamed): +- BPF verifier fix for `sys_exit` program: refactored `sampling_logic_exit()` and `sys_exit()` in `syscall_exit.bpf.c` to use a single `maps__get_capture_settings()` lookup instead of multiple inlined calls that clang optimizes into null-unsafe code. Without this, the BPF probe fails to load on kernels < 6.17 (RHEL 9, Ubuntu 22.04, COS, etc.) + +**Network signal handler fix** (keep — collector-side): +- Skip `is_socket_failed()`/`is_socket_pending()` checks for send/recv events in `NetworkSignalHandler.cpp`. The sinsp parser marks sockets as "failed" on EAGAIN but never clears the flag on subsequent success for recv operations. + **Rebase fixups** (always regenerated): - `d0fb1702c` — fixes following rebase (CMake cycle, exepath fallback, assert macro) @@ -191,6 +197,11 @@ After each rebase stage, update collector code for API changes found in Step 3. - `extract_single(event, &len)` → `extract(event, vals)` vector-based API - Add null guards for `filter_check` pointers (plugin-provided checks may not be initialized) +**UDP test adjustments** (from 0.23.0+): +- UDP tests need 30-second timeouts (vs 5-10s for TCP) due to BPF event delivery pipeline latency +- `TestMultipleDestinations`: sendmmsg message count × server count must not exceed `MAX_SENDMMSG_RECVMMSG_SIZE` (16) +- File: `integration-tests/suites/udp_networkflow.go` + ## Step 7: Known Gotchas ### Exepath Resolution (CRITICAL) @@ -249,38 +260,126 @@ The container plugin (`libcontainer.so`) is a C++/Go hybrid: ### BPF Verifier Compatibility BPF verifier behavior varies significantly across: -- **Kernel versions**: Older kernels have stricter limits +- **Kernel versions**: Older kernels have stricter limits (see kernel matrix below) - **Clang versions**: clang > 19 can produce code that exceeds instruction counts -- **Platform kernels**: RHEL SAP, Google COS have custom verifiers +- **Platform kernels**: RHEL SAP, Google COS have custom verifiers or clang-compiled kernels with different BTF attributes + +**The most insidious class of bug**: Clang inlines `__always_inline` BPF helper functions and optimizes away null checks that the BPF verifier requires. This happens when: +1. Multiple inlined functions each call `bpf_map_lookup_elem()` on the same map +2. The compiler deduces from the first successful lookup that subsequent lookups can't return NULL +3. It removes the null check, but the verifier tracks each lookup independently as `map_value_or_null` +4. Result: `R0 invalid mem access 'map_value_or_null'` on older kernels + +**Example** (found in 0.23.1): `syscall_exit.bpf.c:sampling_logic_exit()` called `maps__get_dropping_mode()` then `maps__get_sampling_ratio()` — both inlined functions that do `bpf_map_lookup_elem(&capture_settings, &key)`. Clang kept the null check for the first but dropped it for the second. Fix: do a single `maps__get_capture_settings()` call and access fields directly. + +**Fix applied**: Refactored `sys_exit` BPF program to do one `maps__get_capture_settings()` lookup in the caller, pass the pointer to `sampling_logic_exit()`, and reuse it for `drop_failed` check. No redundant map lookups = no optimized-away null checks. + +Common fix patterns: +- **Single lookup + direct field access**: Call the map lookup once, pass the pointer, access fields directly (preferred) +- **`volatile` qualifier**: Mark map lookup result as `volatile` to prevent optimization +- **Compiler barriers**: `asm volatile("")` after null check +- **Reduce loop bounds**: e.g., `MAX_IOVCNT` 32 → 16 +- **`#pragma unroll`**: For loops the verifier can't bound +- **Break pointer chains**: Read through intermediate variables with null checks (e.g., `task->cred` on COS where kernel is clang-compiled with RCU attributes) +- **`const` qualifiers**: On credential struct pointers + +### CI Kernel Compatibility Matrix + +The BPF probe must load on all CI platforms. After each update, verify against this matrix: + +| Platform | Kernel | Notes | +|---|---|---| +| Fedora CoreOS | 6.18+ | Newest kernel, most permissive verifier | +| Ubuntu 24.04 | 6.17+ | GCP VM, works with modern BPF | +| Ubuntu 22.04 | 6.8 | GCP VM, stricter verifier — **common failure point** | +| COS stable | 6.6 | Google kernel, clang-compiled — RCU/BTF differences | +| RHEL 9 | 5.14 | Oldest supported kernel — **most restrictive verifier** | +| RHEL SAP | 5.14 | Same kernel as RHEL 9 but different config | +| Flatcar | varies | Container Linux | +| ARM64 variants | varies | rhcos-arm64, cos-arm64, ubuntu-arm, fcarm | +| s390x | varies | rhel-s390x | +| ppc64le | varies | rhel-ppc64le | + +**ubuntu-os** CI job runs on BOTH Ubuntu 22.04 AND 24.04 VMs. A failure on either fails the whole job. + +**How to diagnose BPF loading failures from CI**: +1. Download the logs artifact (e.g., `ubuntu-os-logs`) from the GitHub Actions run +2. Find `collector.log` under `container-logs//core-bpf//` +3. Search for `failed to load` — the line before it shows the verifier error +4. The verifier log shows exact instruction and register state at the point of rejection +5. Compare against master's CI run to confirm it's a regression + +### Network Signal Handler: UDP send/recv Socket State (CRITICAL) + +**Problem**: `sinsp::parse_rw_exit()` marks socket fd as "failed" (`set_socket_failed()`) when ANY send/recv syscall returns negative (e.g., EAGAIN from timeout). Unlike `connect()`, the success path for send/recv does NOT call `set_socket_connected()` to clear the flag. Result: once a UDP socket gets a single EAGAIN (common with `SO_RCVTIMEO`), all subsequent events on that fd are rejected by `GetConnection()`. -Common fixes: -- Reduce loop bounds (e.g., `MAX_IOVCNT` 32 → 16) -- Mark variables `volatile` to prevent optimizations the verifier can't follow -- Add `#pragma unroll` for loops the verifier can't bound -- Break pointer chain reads into separate variables with null checks -- Use `const` qualifiers on credential struct pointers +**Fix applied** in `collector/lib/NetworkSignalHandler.cpp`: Skip `is_socket_failed()` / `is_socket_pending()` checks for send/recv events (identified by `strncmp(evt_name, "send", 4)` or `strncmp(evt_name, "recv", 4)`). These checks are only relevant for TCP connection establishment (connect/accept/getsockopt). + +**How to detect**: UDP network flow tests fail — connections from containers using `recvfrom`/`recvmsg`/`recvmmsg` with `SO_RCVTIMEO` are never reported. The server's receive call times out → EAGAIN → fd marked failed → all subsequent successful receives ignored. + +### Container Plugin "No Info" Messages + +After switching to the container plugin (0.21.0+), the collector log may contain thousands of `container: the plugin has no info for the container id ''` messages. This is **noisy but not fatal** — the plugin eventually resolves the container ID, and all tests pass despite the spam. Do not treat this as a test failure indicator. ## Step 8: Validate Each Stage +### Build Commands + +```bash +# Build collector image (from repo root, NOT from collector/ subdirectory) +make image + +# Run unit tests +make unittest + +# Run specific integration test (from integration-tests/ directory) +cd integration-tests +DOCKER_HOST=unix:///run/podman/podman.sock COLLECTOR_LOG_LEVEL=debug make TestProcessNetwork +DOCKER_HOST=unix:///run/podman/podman.sock COLLECTOR_LOG_LEVEL=debug make TestUdpNetworkFlow +``` + +**Important**: Use `make image` from the repo root. Do NOT use `make -C collector image` — there is no `image` target in the collector subdirectory Makefile. + +### Validation Checklist + Run this checklist after each stage: - [ ] `falcosecurity-libs` builds via cmake `add_subdirectory` - [ ] Each surviving patch verified: diff against original to ensure no content loss -- [ ] `make collector` succeeds on amd64 +- [ ] `make image` succeeds on amd64 (builds collector binary + container image) - [ ] `make unittest` passes (all test suites, especially ProcessSignalFormatterTest) -- [ ] Integration tests: `TestProcessViz` (exepath correctness), `TestProcessLineageInfo`, `TestNetworkFlows` +- [ ] Integration tests pass (see key tests below) - [ ] Multi-arch compilation: arm64, ppc64le, s390x - [ ] Container ID attribution works (not all showing empty or host) - [ ] Process exepaths are correct (not showing container runtime binary like `/usr/bin/podman`) - [ ] Container label/namespace lookup works - [ ] Network signal handler receives correct container IDs - [ ] Runtime self-checks pass +- [ ] BPF probe loads on older kernels (check CI results for RHEL 9, Ubuntu 22.04, COS) ### Key Integration Tests -- **TestProcessViz**: Verifies process ExePath, Name, and Args for container processes. Catches the exepath regression where all paths show the container runtime. Expected paths like `/bin/ls`, `/usr/sbin/nginx`, `/bin/sh`. -- **TestProcessLineageInfo**: Verifies parent process lineage chains stop at container boundaries. -- **TestNetworkFlows**: Verifies network connections are attributed to correct containers. +- **TestProcessNetwork** (TestProcessViz + TestNetworkFlows + TestProcessLineageInfo): + - Verifies process ExePath, Name, Args for container processes + - Catches exepath regression (all paths show container runtime) + - Verifies network connections attributed to correct containers + - Verifies parent process lineage chains stop at container boundaries +- **TestUdpNetworkFlow**: Verifies UDP connection tracking across all send/recv syscall combinations: + - Tests: sendto, sendmsg, sendmmsg × recvfrom, recvmsg, recvmmsg (9 combinations) + - TestMultipleDestinations: one client → multiple servers (watch `MAX_SENDMMSG_RECVMMSG_SIZE=16`) + - TestMultipleSources: multiple clients → one server + - Uses 30-second timeouts (UDP BPF event pipeline is slower than TCP) + - **If `recvfrom` tests fail but `recvmsg` passes**: check `is_socket_failed()` handling in NetworkSignalHandler +- **TestConnectionsAndEndpointsUDPNormal**: UDP endpoint detection without send/recv tracking +- **TestCollectorStartup**: Basic smoke test — catches BPF loading failures immediately + +### Diagnosing CI Failures + +1. Check if the failure is a **BPF loading crash** (exit code 139, `scap_init` error) vs a **test logic failure** +2. Compare against master's CI run — if master passes on the same platform, it's a regression +3. Download log artifacts: `gh api repos/stackrox/collector/actions/artifacts//zip > logs.zip` +4. The `collector.log` file in the artifact contains full libbpf output including verifier errors +5. The test framework only shows the last few lines of collector logs in the CI output — always check the full artifact ## Step 9: Final Update @@ -328,3 +427,17 @@ Each stage should produce **two PRs**: ### Enter Event Deprecation Upstream removed enter events to reduce ~50% of kernel/userspace overhead (proposal: `proposals/20240901-disable-support-for-syscall-enter-events.md`). All parameters moved to exit events. A scap converter handles old capture files. Any code depending on `retrieve_enter_event()` will silently fail with modern drivers — check for fallbacks using exit event parameters. + +## Step 10: Update This Skill + +**This step is mandatory.** After completing an update, review and update this skill file (`.claude/commands/update-falco-libs.md`) with anything learned during the process: + +- **New API breakpoints**: Add entries to "Known historical API breakpoints" (Step 4) for any new breaking changes encountered +- **New StackRox patches**: Update the "Current StackRox Patches" list (Step 2) — add new patches, remove ones that were upstreamed +- **New gotchas**: Add to "Step 7: Known Gotchas" if you discovered new pitfalls (BPF verifier issues, parser bugs, build problems) +- **Outdated steps**: Remove or correct any steps that no longer apply (e.g., if an API listed as "changed in 0.22.0" is now the only way and doesn't need a migration note) +- **CI matrix updates**: Update the kernel compatibility matrix if CI platforms changed (new VM images, new kernel versions, platforms added/removed) +- **Fix patterns**: Add new "Common patterns of change" (Step 6) for any collector-side adaptations that future updates will likely need +- **Build/test changes**: Update build commands or test expectations if they changed + +The goal is that the next person (or AI) performing an update has all the context from previous updates available, without needing to rediscover issues that were already solved. diff --git a/falcosecurity-libs b/falcosecurity-libs index d0fb1702cf..6947a02757 160000 --- a/falcosecurity-libs +++ b/falcosecurity-libs @@ -1 +1 @@ -Subproject commit d0fb1702cfbaf7a04cf87b65e3df99f1cc38a2ef +Subproject commit 6947a02757d981cbc7b4dd21c7bdaa891911627f From 414723cdc012773c9bb43ea4b14999a096c5e689 Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Wed, 11 Mar 2026 11:57:53 +0000 Subject: [PATCH 08/20] Remove container plugin, favouring built-in container ID look ups --- .claude/commands/update-falco-libs.md | 54 +++++++------------ .gitmodules | 3 -- builder/install/00-golang.sh | 16 ------ builder/install/01-container-plugin.sh | 20 ------- builder/install/versions.sh | 1 - builder/third_party/falcosecurity-plugins | 1 - collector/Makefile | 1 - collector/container/Dockerfile | 1 - collector/lib/CollectorService.cpp | 2 +- collector/lib/ContainerInfoInspector.cpp | 2 +- collector/lib/ContainerInfoInspector.h | 8 --- collector/lib/ContainerMetadata.cpp | 7 +-- collector/lib/NetworkSignalHandler.cpp | 5 +- collector/lib/ProcessSignalFormatter.cpp | 15 +++--- collector/lib/ProcessSignalFormatter.h | 2 - collector/lib/Utility.cpp | 23 ++++---- collector/lib/Utility.h | 12 +++-- .../lib/system-inspector/EventExtractor.h | 6 --- collector/lib/system-inspector/Service.cpp | 36 +++---------- collector/lib/system-inspector/Service.h | 4 -- collector/test/ProcessSignalFormatterTest.cpp | 5 +- collector/test/UtilityTest.cpp | 5 ++ integration-tests/suites/k8s/namespace.go | 4 +- 23 files changed, 71 insertions(+), 162 deletions(-) delete mode 100755 builder/install/00-golang.sh delete mode 100755 builder/install/01-container-plugin.sh delete mode 160000 builder/third_party/falcosecurity-plugins diff --git a/.claude/commands/update-falco-libs.md b/.claude/commands/update-falco-libs.md index 0870e79057..77d0e11865 100644 --- a/.claude/commands/update-falco-libs.md +++ b/.claude/commands/update-falco-libs.md @@ -92,9 +92,9 @@ These patches are generic enough to propose upstream: Check what APIs changed between versions. Key areas to inspect: ```sh -# Container engine / plugin changes -git log --oneline .. -- userspace/libsinsp/container_engine/ -git log --oneline .. --grep="container_engine\|container plugin" +# Thread cgroup / container-related changes (collector uses cgroup extraction, not the container plugin) +git log --oneline .. -- userspace/libsinsp/threadinfo.h +git log --oneline .. --grep="cgroup\|container" # Thread manager changes git log --oneline .. -- userspace/libsinsp/threadinfo.h userspace/libsinsp/thread_manager.h @@ -119,13 +119,12 @@ grep -rn '' collector/lib/ collector/test/ --include='*.cpp' - ``` Key collector integration points to check: -- `collector/lib/system-inspector/Service.cpp` — sinsp initialization, plugin loading, filter setup -- `collector/lib/system-inspector/EventExtractor.h` — threadinfo field access macros (TINFO_FIELD, FIELD_CSTR, FIELD_RAW) -- `collector/lib/ContainerMetadata.cpp` — container info/label lookup +- `collector/lib/system-inspector/Service.cpp` — sinsp initialization, filter setup (`proc.pid != proc.vpid`) +- `collector/lib/system-inspector/EventExtractor.h` — threadinfo field access macros (TINFO_FIELD, FIELD_RAW, FIELD_RAW_SAFE) - `collector/lib/ProcessSignalFormatter.cpp` — process signal creation, exepath access, container_id, lineage traversal -- `collector/lib/NetworkSignalHandler.cpp` — container_id access -- `collector/lib/Process.cpp` — process info access, container_id -- `collector/lib/Utility.cpp` — GetContainerID helper, threadinfo printing +- `collector/lib/NetworkSignalHandler.cpp` — container_id access via `GetContainerID(evt)` +- `collector/lib/Process.cpp` — process info access, container_id via cgroup extraction +- `collector/lib/Utility.cpp` — `GetContainerID()`, `ExtractContainerIDFromCgroup()`, threadinfo printing - `collector/test/ProcessSignalFormatterTest.cpp` — thread creation, thread_manager usage - `collector/test/SystemInspectorServiceTest.cpp` — service initialization - `collector/CMakeLists.txt` — falco build flags @@ -140,7 +139,7 @@ If the gap is large (>200 commits), identify intermediate stopping points: Known historical API breakpoints (update as upstream evolves): - **0.20.0**: `set_import_users` lost second arg, user/group structs on threadinfo replaced with `m_uid`/`m_gid` -- **0.21.0**: Container engine subsystem removed entirely, replaced by container plugin (`libcontainer.so`). `m_container_id` removed from threadinfo. `m_thread_manager` changed to `shared_ptr`. `build_threadinfo()`/`add_thread()` removed from sinsp. Enter events for many syscalls deprecated (`EF_OLD_VERSION`). +- **0.21.0**: Container engine subsystem removed entirely. `m_container_id` removed from threadinfo (collector uses cgroup extraction instead of upstream's container plugin). `m_thread_manager` changed to `shared_ptr`. `build_threadinfo()`/`add_thread()` removed from sinsp. Enter events for many syscalls deprecated (`EF_OLD_VERSION`). - **0.22.0**: `get_thread_ref` removed from sinsp (use `find_thread`). `get_container_id()` removed from threadinfo. `extract_single` API changed in filterchecks. - **0.23.0+**: Parent thread traversal moved to thread_manager. `get_thread_info(bool)` signature changed to `get_thread_info()` (no bool). `m_user`/`m_group` structs removed (use `m_uid`/`m_gid` directly). @@ -166,17 +165,14 @@ After each rebase stage, update collector code for API changes found in Step 3. ### Common patterns of change -**Container plugin integration** (from 0.21.0): -- Delete `ContainerEngine.h` — container engines no longer built-in -- Ship `libcontainer.so` in the collector image (built from source in builder, needs Go) -- Load via `sinsp::register_plugin()` in Service.cpp before setting filters -- Register extraction capabilities: `EventExtractor::FilterList().add_filter_check(sinsp_plugin::new_filtercheck(plugin))` -- Wrap `set_filter("container.id != 'host'")` in try-catch for tests without plugin - **Container ID access** (from 0.21.0+): -- Replace `tinfo->m_container_id` with a helper like `GetContainerID(tinfo, thread_manager)` that reads from plugin state tables -- In EventExtractor.h: change `TINFO_FIELD(container_id)` to `FIELD_CSTR(container_id, "container.id")` (provided by container plugin) -- The `FIELD_CSTR` null guard handles tests where the plugin isn't loaded +- Container plugin (`libcontainer.so`) is NOT used — collector extracts container IDs directly from thread cgroups +- `GetContainerID(sinsp_threadinfo&)` iterates `tinfo.cgroups()` and calls `ExtractContainerIDFromCgroup()` (Utility.cpp) +- `GetContainerID(sinsp_evt*)` extracts from event's thread info cgroups +- sinsp filter uses `proc.pid != proc.vpid` (built-in field) instead of `container.id != 'host'` (plugin field) +- No plugin loading, no `libcontainer.so`, no Go worker dependency +- `ContainerMetadata` class was removed — namespace/label lookup is not available without the plugin +- `ContainerInfoInspector` endpoint (`/state/containers/:id`) still exists but always returns empty namespace **Thread access** (from 0.22.0+): - Replace `get_thread_ref(tid, true)` with `m_thread_manager->find_thread(tid, false)` or `m_thread_manager->get_thread(tid, false)` @@ -248,14 +244,9 @@ Collector compiles with `-DASSERT_TO_LOG` so assertions log instead of aborting. **Fix**: Use `falcosecurity_log_fn` callback from `scap_log.h` (same pattern as `scap_assert.h`). This is a tiny header with no dependencies. The callback is set by sinsp when it opens the scap handle. -### Container Plugin Build +### Container Plugin Not Used -The container plugin (`libcontainer.so`) is a C++/Go hybrid: -- Source: `github.com/falcosecurity/plugins` (monorepo), `plugins/container/` directory -- Requires Go 1.23+ for the go-worker component -- Upstream only ships x86_64 and arm64 binaries; ppc64le/s390x must be built from source -- Version must match falcosecurity-libs (check plugin compatibility matrix) -- Submodule at `builder/third_party/falcosecurity-plugins` +The upstream container plugin (`libcontainer.so`) is NOT used by collector. Container IDs are extracted directly from thread cgroups via `ExtractContainerIDFromCgroup()` in `Utility.cpp`. This avoids the Go worker dependency, CGO bridge, container runtime dependency, startup race conditions, and silent event-dropping failure modes of the plugin. The sinsp filter uses `proc.pid != proc.vpid` (built-in) instead of `container.id != 'host'` (plugin-provided). If a future falcosecurity-libs update changes cgroup format or thread API, update `ExtractContainerIDFromCgroup()` and `GetContainerID()` in Utility.cpp. ### BPF Verifier Compatibility @@ -317,10 +308,6 @@ The BPF probe must load on all CI platforms. After each update, verify against t **How to detect**: UDP network flow tests fail — connections from containers using `recvfrom`/`recvmsg`/`recvmmsg` with `SO_RCVTIMEO` are never reported. The server's receive call times out → EAGAIN → fd marked failed → all subsequent successful receives ignored. -### Container Plugin "No Info" Messages - -After switching to the container plugin (0.21.0+), the collector log may contain thousands of `container: the plugin has no info for the container id ''` messages. This is **noisy but not fatal** — the plugin eventually resolves the container ID, and all tests pass despite the spam. Do not treat this as a test failure indicator. - ## Step 8: Validate Each Stage ### Build Commands @@ -350,9 +337,8 @@ Run this checklist after each stage: - [ ] `make unittest` passes (all test suites, especially ProcessSignalFormatterTest) - [ ] Integration tests pass (see key tests below) - [ ] Multi-arch compilation: arm64, ppc64le, s390x -- [ ] Container ID attribution works (not all showing empty or host) +- [ ] Container ID attribution works via cgroup extraction (not all showing empty or host) - [ ] Process exepaths are correct (not showing container runtime binary like `/usr/bin/podman`) -- [ ] Container label/namespace lookup works - [ ] Network signal handler receives correct container IDs - [ ] Runtime self-checks pass - [ ] BPF probe loads on older kernels (check CI results for RHEL 9, Ubuntu 22.04, COS) @@ -422,7 +408,7 @@ Each stage should produce **two PRs**: | `m_exepath` | Enter event reconstruction OR param 27/30 | See "Exepath Resolution" gotcha | | `m_pid` | Exit event param 4 | | | `m_uid`/`m_gid` | Exit event param 26/29 | Was `m_user.uid()`/`m_group.gid()` before 0.20.0 | -| container_id | Container plugin filter field | Was `m_container_id` before 0.21.0 | +| container_id | Extracted from thread cgroups via `GetContainerID()` | Was `m_container_id` before 0.21.0; plugin not used | ### Enter Event Deprecation diff --git a/.gitmodules b/.gitmodules index 73937569ad..3be16bd1d7 100644 --- a/.gitmodules +++ b/.gitmodules @@ -72,6 +72,3 @@ path = builder/third_party/bpftool url = https://github.com/libbpf/bpftool branch = v7.3.0 -[submodule "builder/third_party/falcosecurity-plugins"] - path = builder/third_party/falcosecurity-plugins - url = https://github.com/falcosecurity/plugins.git diff --git a/builder/install/00-golang.sh b/builder/install/00-golang.sh deleted file mode 100755 index 4c4ba6f25b..0000000000 --- a/builder/install/00-golang.sh +++ /dev/null @@ -1,16 +0,0 @@ -#!/usr/bin/env bash -set -e - -GO_VERSION=1.23.7 -ARCH=$(uname -m) -case ${ARCH} in - x86_64) GO_ARCH=amd64 ;; - aarch64) GO_ARCH=arm64 ;; - ppc64le) GO_ARCH=ppc64le ;; - s390x) GO_ARCH=s390x ;; -esac - -curl -fsSL "https://go.dev/dl/go${GO_VERSION}.linux-${GO_ARCH}.tar.gz" | tar -C /usr/local -xz -export PATH="/usr/local/go/bin:${PATH}" -echo "export PATH=/usr/local/go/bin:${PATH}" >> /etc/profile.d/golang.sh -go version diff --git a/builder/install/01-container-plugin.sh b/builder/install/01-container-plugin.sh deleted file mode 100755 index d9a0c9c46d..0000000000 --- a/builder/install/01-container-plugin.sh +++ /dev/null @@ -1,20 +0,0 @@ -#!/usr/bin/env bash -set -e - -export PATH="/usr/local/go/bin:${PATH}" - -cd third_party/falcosecurity-plugins/plugins/container - -cp ../../LICENSE "${LICENSE_DIR}/falcosecurity-plugins-container-${CONTAINER_PLUGIN_VERSION}" 2> /dev/null || true - -# Remove static libstdc++ linking — not needed since we control the runtime -# image, and libstdc++-static is not available in CentOS Stream 10. -sed -i '/-static-libgcc\|-static-libstdc++/d' CMakeLists.txt - -cmake -B build -S . \ - -DCMAKE_BUILD_TYPE=Release \ - -DENABLE_ASYNC=ON \ - -DENABLE_TESTS=OFF -cmake --build build --target container --parallel "${NPROCS}" - -install -m 755 build/libcontainer.so /usr/local/lib64/libcontainer.so diff --git a/builder/install/versions.sh b/builder/install/versions.sh index c6ccaae09c..08988c81e7 100644 --- a/builder/install/versions.sh +++ b/builder/install/versions.sh @@ -16,4 +16,3 @@ export GPERFTOOLS_VERSION=2.16 export UTHASH_VERSION=v1.9.8 export YAMLCPP_VERSION=0.8.0 export LIBBPF_VERSION=v1.3.4 -export CONTAINER_PLUGIN_VERSION=0.6.3 diff --git a/builder/third_party/falcosecurity-plugins b/builder/third_party/falcosecurity-plugins deleted file mode 160000 index fb2ad646f1..0000000000 --- a/builder/third_party/falcosecurity-plugins +++ /dev/null @@ -1 +0,0 @@ -Subproject commit fb2ad646f1cab61abb2124df5bf0f2578fc70e58 diff --git a/collector/Makefile b/collector/Makefile index 6d3d5bdb70..eec68231cc 100644 --- a/collector/Makefile +++ b/collector/Makefile @@ -39,7 +39,6 @@ container/bin/collector: cmake-build/collector mkdir -p container/bin cp "$(COLLECTOR_BIN_DIR)/collector" container/bin/collector cp "$(COLLECTOR_BIN_DIR)/self-checks" container/bin/self-checks - docker cp $(COLLECTOR_BUILDER_NAME):/usr/local/lib64/libcontainer.so container/bin/libcontainer.so .PHONY: collector collector: container/bin/collector txt-files diff --git a/collector/container/Dockerfile b/collector/container/Dockerfile index 8a09d67441..569d67cbb2 100644 --- a/collector/container/Dockerfile +++ b/collector/container/Dockerfile @@ -26,7 +26,6 @@ COPY container/THIRD_PARTY_NOTICES/ /THIRD_PARTY_NOTICES/ COPY kernel-modules /kernel-modules COPY container/bin/collector /usr/local/bin/ COPY container/bin/self-checks /usr/local/bin/self-checks -COPY container/bin/libcontainer.so /usr/local/lib64/libcontainer.so COPY container/status-check.sh /usr/local/bin/status-check.sh EXPOSE 8080 9090 diff --git a/collector/lib/CollectorService.cpp b/collector/lib/CollectorService.cpp index 71b6508cee..1d6243d591 100644 --- a/collector/lib/CollectorService.cpp +++ b/collector/lib/CollectorService.cpp @@ -65,7 +65,7 @@ CollectorService::CollectorService(CollectorConfig& config, std::atomic()); if (config.IsIntrospectionEnabled()) { - civet_endpoints_.emplace_back(std::make_unique(system_inspector_.GetContainerMetadataInspector())); + civet_endpoints_.emplace_back(std::make_unique()); civet_endpoints_.emplace_back(std::make_unique(conn_tracker_)); civet_endpoints_.emplace_back(std::make_unique(config_)); } diff --git a/collector/lib/ContainerInfoInspector.cpp b/collector/lib/ContainerInfoInspector.cpp index 5210dd81c8..ee61d83198 100644 --- a/collector/lib/ContainerInfoInspector.cpp +++ b/collector/lib/ContainerInfoInspector.cpp @@ -23,7 +23,7 @@ bool ContainerInfoInspector::handleGet(CivetServer* server, struct mg_connection Json::Value root; root["container_id"] = container_id; - root["namespace"] = std::string(container_metadata_inspector_->GetNamespace(container_id)); + root["namespace"] = ""; mg_printf(conn, "HTTP/1.1 200 OK\r\nContent-Type: application/json\r\nConnection: close\r\n\r\n"); mg_printf(conn, "%s\r\n", writer_.write(root).c_str()); diff --git a/collector/lib/ContainerInfoInspector.h b/collector/lib/ContainerInfoInspector.h index f4ae3b745c..aa9056045d 100644 --- a/collector/lib/ContainerInfoInspector.h +++ b/collector/lib/ContainerInfoInspector.h @@ -2,22 +2,15 @@ #include #include -#include #include -#include #include "CivetWrapper.h" -#include "ContainerMetadata.h" #include "json/writer.h" namespace collector { -using QueryParams = std::unordered_map; - class ContainerInfoInspector : public CivetWrapper { public: - ContainerInfoInspector(const std::shared_ptr& cmi) : container_metadata_inspector_(cmi) {} - // implementation of CivetHandler bool handleGet(CivetServer* server, struct mg_connection* conn) override; @@ -28,7 +21,6 @@ class ContainerInfoInspector : public CivetWrapper { private: static const std::string kBaseRoute; - std::shared_ptr container_metadata_inspector_; Json::FastWriter writer_; }; diff --git a/collector/lib/ContainerMetadata.cpp b/collector/lib/ContainerMetadata.cpp index 8e8d2c9131..a34de2d65a 100644 --- a/collector/lib/ContainerMetadata.cpp +++ b/collector/lib/ContainerMetadata.cpp @@ -12,8 +12,7 @@ ContainerMetadata::ContainerMetadata(sinsp* inspector) : event_extractor_(std::m } std::string ContainerMetadata::GetNamespace(sinsp_evt* event) { - const char* ns = event_extractor_->get_k8s_namespace(event); - return ns != nullptr ? ns : ""; + return ""; } std::string ContainerMetadata::GetNamespace(const std::string& container_id) { @@ -21,9 +20,7 @@ std::string ContainerMetadata::GetNamespace(const std::string& container_id) { } std::string ContainerMetadata::GetContainerLabel(const std::string& container_id, const std::string& label) { - // Container labels are no longer available through the sinsp API. - // The container plugin provides container metadata via filter fields - // (e.g., container.label) but not through a programmatic lookup API. + // Container labels are not available through the sinsp API. CLOG_THROTTLED(DEBUG, std::chrono::seconds(300)) << "Container label lookup by container ID is not supported: " << "container_id=" << container_id << " label=" << label; diff --git a/collector/lib/NetworkSignalHandler.cpp b/collector/lib/NetworkSignalHandler.cpp index f53175e588..a78c29477f 100644 --- a/collector/lib/NetworkSignalHandler.cpp +++ b/collector/lib/NetworkSignalHandler.cpp @@ -6,6 +6,7 @@ #include #include "EventMap.h" +#include "Utility.h" #include "system-inspector/EventExtractor.h" namespace collector { @@ -139,8 +140,8 @@ std::optional NetworkSignalHandler::GetConnection(sinsp_evt* evt) { const Endpoint* local = is_server ? &server : &client; const Endpoint* remote = is_server ? &client : &server; - const char* container_id = event_extractor_->get_container_id(evt); - if (!container_id) { + auto container_id = GetContainerID(evt); + if (container_id.empty()) { return std::nullopt; } diff --git a/collector/lib/ProcessSignalFormatter.cpp b/collector/lib/ProcessSignalFormatter.cpp index 4f443a516b..6420c8da3e 100644 --- a/collector/lib/ProcessSignalFormatter.cpp +++ b/collector/lib/ProcessSignalFormatter.cpp @@ -3,7 +3,6 @@ #include #include -#include #include @@ -64,7 +63,6 @@ ProcessSignalFormatter::ProcessSignalFormatter( const CollectorConfig& config) : event_names_(EventNames::GetInstance()), inspector_(inspector), event_extractor_(std::make_unique()), - container_metadata_(inspector), config_(config) { event_extractor_->Init(inspector); } @@ -180,7 +178,8 @@ ProcessSignal* ProcessSignalFormatter::CreateProcessSignal(sinsp_evt* event) { signal->set_allocated_time(timestamp); // set container_id - if (const char* container_id = event_extractor_->get_container_id(event)) { + auto container_id = GetContainerID(event); + if (!container_id.empty()) { signal->set_container_id(container_id); } @@ -194,7 +193,7 @@ ProcessSignal* ProcessSignalFormatter::CreateProcessSignal(sinsp_evt* event) { } CLOG(DEBUG) << "Process (" << signal->container_id() << ": " << signal->pid() << "): " - << signal->name() << "[" << container_metadata_.GetNamespace(event) << "] " + << signal->name() << " (" << signal->exec_file_path() << ")" << " " << signal->args(); @@ -245,7 +244,7 @@ ProcessSignal* ProcessSignalFormatter::CreateProcessSignal(sinsp_threadinfo* tin signal->set_allocated_time(timestamp); // set container_id - signal->set_container_id(GetContainerID(*tinfo, *inspector_->m_thread_manager)); + signal->set_container_id(GetContainerID(*tinfo)); // set process lineage std::vector lineage; @@ -269,11 +268,11 @@ std::string ProcessSignalFormatter::ProcessDetails(sinsp_evt* event) { std::stringstream ss; const std::string* path = event_extractor_->get_exepath(event); const std::string* name = event_extractor_->get_comm(event); - const char* container_id = event_extractor_->get_container_id(event); + auto container_id = GetContainerID(event); const char* args = event_extractor_->get_proc_args(event); const int64_t* pid = event_extractor_->get_pid(event); - ss << "Container: " << (container_id ? container_id : "null") + ss << "Container: " << (container_id.empty() ? "null" : container_id) << ", Name: " << (name ? *name : "null") << ", PID: " << (pid ? *pid : -1) << ", Path: " << (path ? *path : "null") @@ -351,7 +350,7 @@ void ProcessSignalFormatter::GetProcessLineage(sinsp_threadinfo* tinfo, // all platforms. // if (pt->m_vpid == 0) { - if (GetContainerID(*pt, *inspector_->m_thread_manager).empty()) { + if (GetContainerID(*pt).empty()) { return false; } } else if (pt->m_pid == pt->m_vpid) { diff --git a/collector/lib/ProcessSignalFormatter.h b/collector/lib/ProcessSignalFormatter.h index a7bb69ab57..ceeeb98dea 100644 --- a/collector/lib/ProcessSignalFormatter.h +++ b/collector/lib/ProcessSignalFormatter.h @@ -10,7 +10,6 @@ #include "CollectorConfig.h" #include "CollectorStats.h" -#include "ContainerMetadata.h" #include "EventNames.h" #include "ProtoSignalFormatter.h" @@ -57,7 +56,6 @@ class ProcessSignalFormatter : public ProtoSignalFormatter event_extractor_; - ContainerMetadata container_metadata_; const CollectorConfig& config_; }; diff --git a/collector/lib/Utility.cpp b/collector/lib/Utility.cpp index b3199d0603..538070a203 100644 --- a/collector/lib/Utility.cpp +++ b/collector/lib/Utility.cpp @@ -18,7 +18,6 @@ extern "C" { #include #include -#include #include @@ -58,14 +57,20 @@ const char* SignalName(int signum) { } } -std::string GetContainerID(sinsp_threadinfo& tinfo, sinsp_thread_manager& thread_manager) { - const auto* accessor = thread_manager.get_field_accessor("container_id"); - if (!accessor) { - return {}; +std::string GetContainerID(sinsp_threadinfo& tinfo) { + for (const auto& [subsys, cgroup_path] : tinfo.cgroups()) { + if (auto id = ExtractContainerIDFromCgroup(cgroup_path)) { + return std::string(*id); + } } - std::string container_id; - tinfo.get_dynamic_field(*accessor, container_id); - return container_id; + return {}; +} + +std::string GetContainerID(sinsp_evt* event) { + if (!event) return {}; + sinsp_threadinfo* tinfo = event->get_thread_info(); + if (!tinfo) return {}; + return GetContainerID(*tinfo); } std::ostream& operator<<(std::ostream& os, const sinsp_threadinfo* t) { @@ -214,7 +219,7 @@ std::optional ExtractContainerIDFromCgroup(std::string_view cg } auto container_id_part = cgroup.substr(cgroup.size() - (CONTAINER_ID_LENGTH + 1)); - if (container_id_part[0] != '/' && container_id_part[0] != '-') { + if (container_id_part[0] != '/' && container_id_part[0] != '-' && container_id_part[0] != ':') { return {}; } diff --git a/collector/lib/Utility.h b/collector/lib/Utility.h index 8544bbd72c..e15869b31b 100644 --- a/collector/lib/Utility.h +++ b/collector/lib/Utility.h @@ -14,7 +14,7 @@ // forward declarations class sinsp_threadinfo; -class sinsp_thread_manager; +class sinsp_evt; namespace collector { @@ -66,9 +66,13 @@ std::string Str(Args&&... args) { std::ostream& operator<<(std::ostream& os, const sinsp_threadinfo* t); -// Extract container ID from a threadinfo using the dynamic field written by -// the container plugin. Returns an empty string if unavailable. -std::string GetContainerID(sinsp_threadinfo& tinfo, sinsp_thread_manager& thread_manager); +// Extract container ID from a threadinfo's cgroups. +// Returns an empty string if no container ID found. +std::string GetContainerID(sinsp_threadinfo& tinfo); + +// Extract container ID from an event's thread info cgroups. +// Returns an empty string if no container ID found. +std::string GetContainerID(sinsp_evt* event); // UUIDStr returns UUID in string format. const char* UUIDStr(); diff --git a/collector/lib/system-inspector/EventExtractor.h b/collector/lib/system-inspector/EventExtractor.h index 1cb5615a36..ef58d899fb 100644 --- a/collector/lib/system-inspector/EventExtractor.h +++ b/collector/lib/system-inspector/EventExtractor.h @@ -136,9 +136,6 @@ class EventExtractor { // // ADD ANY NEW FIELDS BELOW THIS LINE - // Container related fields — provided by the container plugin via filter fields. - FIELD_CSTR(container_id, "container.id"); - // Process related fields TINFO_FIELD(comm); TINFO_FIELD(exe); @@ -155,9 +152,6 @@ class EventExtractor { FIELD_RAW_SAFE(client_port, "fd.cport", uint16_t); FIELD_RAW_SAFE(server_port, "fd.sport", uint16_t); - // k8s metadata - FIELD_CSTR(k8s_namespace, "k8s.ns.name"); - #undef TINFO_FIELD #undef FIELD_RAW #undef FIELD_CSTR diff --git a/collector/lib/system-inspector/Service.cpp b/collector/lib/system-inspector/Service.cpp index 6c38470408..de9d59ba7b 100644 --- a/collector/lib/system-inspector/Service.cpp +++ b/collector/lib/system-inspector/Service.cpp @@ -8,7 +8,6 @@ #include "libsinsp/filter.h" #include "libsinsp/parsers.h" -#include "libsinsp/plugin.h" #include "libsinsp/sinsp.h" #include @@ -16,7 +15,6 @@ #include "CollectionMethod.h" #include "CollectorException.h" #include "CollectorStats.h" -#include "ContainerMetadata.h" #include "EventExtractor.h" #include "EventNames.h" #include "HostInfo.h" @@ -57,39 +55,17 @@ Service::Service(const CollectorConfig& config) inspector_->get_parser()->set_track_connection_status(true); } - // Load the container plugin for container ID attribution and metadata. - // This MUST happen before EventExtractor::Init() (via ContainerMetadata) - // because the plugin provides the "container.id" and "k8s.ns.name" fields. - const char* plugin_path = "/usr/local/lib64/libcontainer.so"; - try { - auto plugin = inspector_->register_plugin(plugin_path); - std::string err; - if (!plugin->init("", err)) { - CLOG(ERROR) << "Failed to init container plugin: " << err; - } - if (plugin->caps() & CAP_EXTRACTION) { - EventExtractor::FilterList().add_filter_check(sinsp_plugin::new_filtercheck(plugin)); - } - CLOG(INFO) << "Loaded container plugin from " << plugin_path; - } catch (const sinsp_exception& e) { - CLOG(WARNING) << "Could not load container plugin from " << plugin_path - << ": " << e.what(); - } - - // Initialize ContainerMetadata after the plugin is loaded, so that - // EventExtractor::Init() can find plugin-provided fields like container.id. - container_metadata_inspector_ = std::make_shared(inspector_.get()); default_formatter_ = std::make_unique( inspector_.get(), DEFAULT_OUTPUT_STR, EventExtractor::FilterList()); - // Compile the container filter using our FilterList (which includes - // plugin filterchecks). sinsp::set_filter(string) uses a hardcoded - // filter check list that doesn't include plugin fields. + // Filter out host processes. In containers, pid != vpid due to PID + // namespacing. This is a built-in sinsp field that doesn't require + // any plugin. try { auto factory = std::make_shared( inspector_.get(), EventExtractor::FilterList()); - sinsp_filter_compiler compiler(factory, "container.id != 'host'"); - inspector_->set_filter(compiler.compile(), "container.id != 'host'"); + sinsp_filter_compiler compiler(factory, "proc.pid != proc.vpid"); + inspector_->set_filter(compiler.compile(), "proc.pid != proc.vpid"); } catch (const sinsp_exception& e) { CLOG(WARNING) << "Could not set container filter: " << e.what() << ". Container filtering will not be active."; @@ -303,7 +279,7 @@ bool Service::SendExistingProcesses(SignalHandler* handler) { } return threads->loop([&](sinsp_threadinfo& tinfo) { - if (!GetContainerID(tinfo, *inspector_->m_thread_manager).empty() && tinfo.is_main_thread()) { + if (!GetContainerID(tinfo).empty() && tinfo.is_main_thread()) { auto result = handler->HandleExistingProcess(&tinfo); if (result == SignalHandler::ERROR || result == SignalHandler::NEEDS_REFRESH) { CLOG(WARNING) << "Failed to write existing process signal: " << &tinfo; diff --git a/collector/lib/system-inspector/Service.h b/collector/lib/system-inspector/Service.h index 651e7ff7cb..1f2398c648 100644 --- a/collector/lib/system-inspector/Service.h +++ b/collector/lib/system-inspector/Service.h @@ -8,7 +8,6 @@ #include #include "ConnTracker.h" -#include "ContainerMetadata.h" #include "Control.h" #include "SignalHandler.h" #include "SignalServiceClient.h" @@ -43,8 +42,6 @@ class Service : public SystemInspector { void GetProcessInformation(uint64_t pid, ProcessInfoCallbackRef callback); - std::shared_ptr GetContainerMetadataInspector() { return container_metadata_inspector_; }; - sinsp* GetInspector() { return inspector_.get(); } Stats* GetUserspaceStats() { return &userspace_stats_; } @@ -71,7 +68,6 @@ class Service : public SystemInspector { mutable std::mutex libsinsp_mutex_; std::unique_ptr inspector_; - std::shared_ptr container_metadata_inspector_; std::unique_ptr default_formatter_; std::unique_ptr signal_client_; std::vector signal_handlers_; diff --git a/collector/test/ProcessSignalFormatterTest.cpp b/collector/test/ProcessSignalFormatterTest.cpp index 5931c46275..233c3bfed3 100644 --- a/collector/test/ProcessSignalFormatterTest.cpp +++ b/collector/test/ProcessSignalFormatterTest.cpp @@ -578,9 +578,8 @@ TEST(ProcessSignalFormatterTest, CountTwoCounterCallsTest) { TEST(ProcessSignalFormatterTest, Rox3377ProcessLineageWithNoVPidTest) { // This test verifies lineage traversal stops at the container boundary. - // Originally tested vpid=0 + container_id fallback (ROX-3377), but - // container_id is now a dynamic field from the container plugin. - // Instead, test boundary detection via pid==vpid (namespace init process). + // Originally tested vpid=0 + container_id fallback (ROX-3377). + // Now tests boundary detection via pid==vpid (namespace init process). std::unique_ptr inspector(new sinsp()); CollectorStats& collector_stats = CollectorStats::GetOrCreate(); CollectorConfig config; diff --git a/collector/test/UtilityTest.cpp b/collector/test/UtilityTest.cpp index f5dee2b865..14df61f69c 100644 --- a/collector/test/UtilityTest.cpp +++ b/collector/test/UtilityTest.cpp @@ -98,6 +98,11 @@ TEST(ExtractContainerIDFromCgroupTest, TestExtractContainerIDFromCgroup) { "/machine.slice/libpod-cbdfa0f1f08763b1963c30d98e11e1f052cb67f1e9b7c0ab8a6ca6c70cbcad69.scope/container/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-besteffort.slice/kubelet-kubepods-besteffort-pod6eab3b7b_f0a6_4bb8_bff2_d5bc9017c04b.slice/cri-containerd-5ebf11e02dbde102cda4b76bc0e3849a65f9edac7a12bdabfd34db01b9556101.scope", "5ebf11e02dbd", }, + // containerd without SystemdCgroup (uses : separator) + { + "/kubepods-burstable-podbd12dd3393227d950605a2444b13c27a.slice:cri-containerd:d52db56a9c80d536a91354c0951c061187ca46249e64865a12703003d8f42366", + "d52db56a9c80", + }, // conmon { "/machine.slice/libpod-conmon-b6ce30d02945df4bbf8e8b7193b2c56ebb3cd10227dd7e59d7f7cdc2cfa2a307.scope", diff --git a/integration-tests/suites/k8s/namespace.go b/integration-tests/suites/k8s/namespace.go index 5b94df3a5e..7e0433dd6b 100644 --- a/integration-tests/suites/k8s/namespace.go +++ b/integration-tests/suites/k8s/namespace.go @@ -47,7 +47,7 @@ func (k *K8sNamespaceTestSuite) SetupSuite() { k.tests = append(k.tests, NamespaceTest{ containerID: k.Collector().ContainerID(), - expectecNamespace: collector.TEST_NAMESPACE, + expectecNamespace: "", }) k.createTargetNamespace() @@ -55,7 +55,7 @@ func (k *K8sNamespaceTestSuite) SetupSuite() { k.Require().Len(nginxID, 12) k.tests = append(k.tests, NamespaceTest{ containerID: nginxID, - expectecNamespace: NAMESPACE, + expectecNamespace: "", }) } From 1dd6ca16a1089fe3f029bc88993a25e5d1f5f4e5 Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Wed, 11 Mar 2026 14:51:22 +0000 Subject: [PATCH 09/20] Bump falco with verifier fixes --- falcosecurity-libs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/falcosecurity-libs b/falcosecurity-libs index 6947a02757..fd1b2a7397 160000 --- a/falcosecurity-libs +++ b/falcosecurity-libs @@ -1 +1 @@ -Subproject commit 6947a02757d981cbc7b4dd21c7bdaa891911627f +Subproject commit fd1b2a7397784941a5aecb195f4248e34b7f391c From c34da75ece711e974f3690ddc49f1e9262646ac4 Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Thu, 12 Mar 2026 09:44:52 +0000 Subject: [PATCH 10/20] Bump falco with verifier fixes --- falcosecurity-libs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/falcosecurity-libs b/falcosecurity-libs index fd1b2a7397..dd8a026b63 160000 --- a/falcosecurity-libs +++ b/falcosecurity-libs @@ -1 +1 @@ -Subproject commit fd1b2a7397784941a5aecb195f4248e34b7f391c +Subproject commit dd8a026b6318fcf280c169bbe8ab20293f1d41bd From 21cddb703156656cac867c291a1e5dc52e60719b Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Thu, 12 Mar 2026 13:59:05 +0000 Subject: [PATCH 11/20] Bump falco with verifier fixes --- falcosecurity-libs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/falcosecurity-libs b/falcosecurity-libs index dd8a026b63..88b7f299f8 160000 --- a/falcosecurity-libs +++ b/falcosecurity-libs @@ -1 +1 @@ -Subproject commit dd8a026b6318fcf280c169bbe8ab20293f1d41bd +Subproject commit 88b7f299f823cc48064ae50511ec758f46d26207 From 1eaf957383e107f1fee71234ce97d155f2812e4c Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Thu, 12 Mar 2026 16:00:13 +0000 Subject: [PATCH 12/20] Bump falco with verifier fixes Update falcosecurity-libs submodule to include BPF verifier fixes for older/stricter kernels (RHEL 8, s390x, ppc64le, COS, RHEL SAP). Co-Authored-By: Claude Opus 4.6 --- falcosecurity-libs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/falcosecurity-libs b/falcosecurity-libs index 88b7f299f8..1aad3c95a7 160000 --- a/falcosecurity-libs +++ b/falcosecurity-libs @@ -1 +1 @@ -Subproject commit 88b7f299f823cc48064ae50511ec758f46d26207 +Subproject commit 1aad3c95a7e9c6ccf670f31e06e8856bb610e031 From c7fe0b67b8381c2ed454bffbdb3e4680429675bd Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Thu, 12 Mar 2026 17:12:11 +0000 Subject: [PATCH 13/20] Add programs to ignored list --- collector/CMakeLists.txt | 2 +- falcosecurity-libs | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/collector/CMakeLists.txt b/collector/CMakeLists.txt index bf3ad9bef7..4b1ec9a2e1 100644 --- a/collector/CMakeLists.txt +++ b/collector/CMakeLists.txt @@ -100,6 +100,6 @@ set(SCAP_HOST_ROOT_ENV_VAR_NAME "COLLECTOR_HOST_ROOT" CACHE STRING "Host root en set(BUILD_LIBSCAP_MODERN_BPF ON CACHE BOOL "Enable modern bpf engine" FORCE) set(MODERN_BPF_DEBUG_MODE ${BPF_DEBUG_MODE} CACHE BOOL "Enable BPF debug prints" FORCE) -set(MODERN_BPF_EXCLUDE_PROGS "^(openat2|ppoll|setsockopt|io_uring_setup|nanosleep)$" CACHE STRING "Set of syscalls to exclude from modern bpf engine " FORCE) +set(MODERN_BPF_EXCLUDE_PROGS "^(openat2|ppoll|setsockopt|io_uring_setup|nanosleep|pread64|preadv|pwritev|read|readv|writev|recv|process_vm_readv|process_vm_writev)$" CACHE STRING "Set of syscalls to exclude from modern bpf engine " FORCE) add_subdirectory(${FALCO_DIR} falco) diff --git a/falcosecurity-libs b/falcosecurity-libs index 1aad3c95a7..d63a342f3d 160000 --- a/falcosecurity-libs +++ b/falcosecurity-libs @@ -1 +1 @@ -Subproject commit 1aad3c95a7e9c6ccf670f31e06e8856bb610e031 +Subproject commit d63a342f3d65ab4101e0541a1466590a162d990c From 2b5602a421b945df3ee6b1737c63fb4a8c27b110 Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Fri, 13 Mar 2026 09:28:40 +0000 Subject: [PATCH 14/20] Bump falco with remaining verifier fixes Fix recvfrom_x/recvmsg_x/sendmsg_x verifier rejection and t1_execveat_x/t2_execveat_x program size overflow. Co-Authored-By: Claude Opus 4.6 --- falcosecurity-libs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/falcosecurity-libs b/falcosecurity-libs index d63a342f3d..0aacbbac1d 160000 --- a/falcosecurity-libs +++ b/falcosecurity-libs @@ -1 +1 @@ -Subproject commit d63a342f3d65ab4101e0541a1466590a162d990c +Subproject commit 0aacbbac1d858afd2aa1bc73fa79a2e9e88ac6f8 From c5d419c36bfd25b8e38de95bfc0cdc47ce33a8e3 Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Fri, 13 Mar 2026 09:28:45 +0000 Subject: [PATCH 15/20] Add analyse-ci Claude skill for CI failure investigation Documents how to navigate CI artifacts, download logs, and diagnose common failure modes (verifier rejection, program too large, self-check timeouts). Includes platform-specific notes and portable jq commands. Co-Authored-By: Claude Opus 4.6 --- .claude/commands/analyse-ci.md | 275 +++++++++++++++++++++++++++++++++ 1 file changed, 275 insertions(+) create mode 100644 .claude/commands/analyse-ci.md diff --git a/.claude/commands/analyse-ci.md b/.claude/commands/analyse-ci.md new file mode 100644 index 0000000000..1f237ac1f1 --- /dev/null +++ b/.claude/commands/analyse-ci.md @@ -0,0 +1,275 @@ +# Analyse CI Results + +You are helping investigate CI failures for the StackRox collector project. This skill describes how to navigate the CI infrastructure, download logs, and diagnose common failure modes. + +## CI Structure + +The main CI workflow is **"Main collector CI"** (`integration-tests`). It runs integration tests across multiple platforms as separate jobs. + +### Finding the PR + +```bash +# Look up PR number for a branch (run from the collector repo root) +gh pr view --json number,title --jq '"\(.number) | \(.title)"' + +# Or for the current branch +gh pr view --json number,title --jq '"\(.number) | \(.title)"' +``` + +### Listing Failed Checks + +```bash +# Get PR check status — list non-passing checks +gh pr view --json statusCheckRollup \ + --jq '.statusCheckRollup[] | select(.conclusion | IN("SUCCESS","SKIPPED","NEUTRAL") | not) | "\(.name): \(.conclusion) / \(.status)"' + +# List workflow runs for a branch +gh run list --branch --limit 5 \ + --json databaseId,name,conclusion,status \ + --jq '.[] | "\(.databaseId) | \(.name) | \(.conclusion) | \(.status)"' + +# Get failed jobs from a run +gh run view --json jobs \ + --jq '.jobs[] | select(.conclusion == "failure") | "\(.databaseId) | \(.name)"' +``` + +### Lint Failures + +The **Lint** workflow runs `pre-commit` hooks including `clang-format`. To see what failed: + +```bash +gh run view --log 2>&1 | grep -A30 "All changes made by hooks:" +``` + +This shows the exact diff that clang-format wants applied. + +## Downloading and Navigating Log Artifacts + +**This is the most important step.** The GitHub Actions log output truncates collector logs to just the crash backtrace. The full `collector.log` with verifier output is only in the artifacts. + +### Step 1: List Artifacts + +```bash +gh api repos/stackrox/collector/actions/runs//artifacts \ + --jq '.artifacts[] | "\(.id) | \(.name) | \(.size_in_bytes)"' +``` + +Artifact names follow the pattern `-logs`, e.g.: +- `rhel-logs`, `rhel-sap-logs` +- `ubuntu-os-logs`, `ubuntu-arm-logs` +- `cos-logs`, `cos-arm64-logs` +- `rhcos-logs`, `rhcos-arm64-logs` +- `flatcar-logs`, `fcarm-logs` +- `rhel-s390x-logs`, `rhel-ppc64le-logs` + +### Step 2: Download and Extract + +```bash +gh api repos/stackrox/collector/actions/artifacts//zip > /tmp/.zip +unzip -o /tmp/.zip -d /tmp/ +``` + +### Step 3: Artifact Directory Structure + +Each artifact contains: + +``` +-logs/ + container-logs/ + _/ + core-bpf/ + TestProcessNetwork/ + collector.log # Full collector log for this test + events.log # Event stream log + TestNetworkFlows/ + TestProcessViz/ + TestProcessLineageInfo/ + TestUdpNetworkFlow/ + TestUdpNetorkflow/ # Note: typo in directory name is intentional + sendto_recvfrom/ + collector.log + udp-client.log + udp-server.log + sendmsg_recvmsg/ + sendmmsg_recvmmsg/ + events.log + TestSocat/ + collector.log + ... + perf.json # Performance metrics + integration-test-report-_.xml # JUnit XML results + integration-test-_.log # Ansible runner log +``` + +**The `collector.log` file is the primary diagnostic source.** Each test suite gets its own `collector.log` because the collector container is restarted per suite. + +### Step 4: Check Test Results Summary + +JUnit XML requires xmllint or a simple grep (jq cannot parse XML): + +```bash +# Quick summary from the XML attributes +head -3 /tmp//container-logs/integration-test-report-*.xml + +# Find which tests failed +grep -B1 'failure\|error' /tmp//container-logs/integration-test-report-*.xml | head -20 +``` + +**Key pattern**: If you see `tests="4" failures="1" errors="1" skipped="2"`, the collector crashed on the first test (TestProcessNetwork) and everything else was skipped. This means a BPF loading failure or early startup crash. + +## Diagnosing Failure Modes + +### 1. BPF Verifier Rejection (Collector Crashes) + +**Symptoms**: +- Collector exits with code 139 (SIGSEGV/SIGABRT) +- `tests="4"` in JUnit XML (crash on first test) +- Stack trace shows `KernelDriverCOREEBPF::Setup` -> `sinsp_exception` -> `abort` + +**How to find the verifier error in collector.log**: + +```bash +# Find the failing program and verifier output +grep -n "BPF program load failed\|failed to load\|BEGIN PROG LOAD LOG\|END PROG LOAD LOG" collector.log + +# Get the actual rejection reason (usually the last line before END PROG LOAD LOG) +grep -B5 "END PROG LOAD LOG" collector.log +``` + +The verifier log is between `BEGIN PROG LOAD LOG` and `END PROG LOAD LOG`. It can be thousands of lines of BPF instruction trace. The **rejection reason is always the last line before `END PROG LOAD LOG`**. + +Common verifier rejection messages: +- `R2 min value is negative, either use unsigned or 'var &= const'` — signed value used as size arg to bpf helper +- `BPF program is too large. Processed 1000001 insn` — exceeded 1M instruction verifier limit +- `R0 invalid mem access 'map_value_or_null'` — null check optimized away by clang +- `reg type unsupported for arg#0` — BTF type mismatch (often a warning, not the real error — check end of verifier log) + +**After the verifier log**, look for the cascade: + +``` +libbpf: prog '': failed to load: -13 # -13 = EACCES (Permission denied) +libbpf: prog '': failed to load: -7 # -7 = E2BIG (program too large) +libbpf: failed to load object 'bpf_probe' # Whole BPF skeleton fails +libpman: failed to load BPF object # libpman reports failure +terminate called after throwing 'sinsp_exception' # C++ exception + what(): Initialization issues during scap_init +``` + +### 2. Self-Check Health Timeout (Collector Runs But Not Healthy) + +**Symptoms**: +- Collector starts and loads BPF programs successfully +- `Failed to detect any self-check process events within the timeout.` +- `Failed to detect any self-check networking events within the timeout.` +- Test framework times out: `Timed out waiting for container collector to become health=healthy` + +**What to look for in collector.log**: + +```bash +grep -n "SelfCheck\|self-check\|Failed to detect\|healthy" collector.log +``` + +This means the BPF programs loaded but aren't capturing events correctly. Check for: +- Tracepoint attachment failures: `failed to create tracepoint 'syscalls/sys_enter_connect'` +- Missing programs: `unable to find BPF program ''` +- Container ID issues: `unable to initialize the state table API: failed to find dynamic field 'container_id'` + +### 3. Test Logic Failures (Collector Healthy, Test Assertions Fail) + +**Symptoms**: +- Most tests pass, individual test fails +- Collector is healthy and running +- Test output shows assertion mismatches + +**Where to look**: +- The specific test's `collector.log` for event processing +- `events.log` for the raw event stream +- For UDP tests: check `udp-client.log` and `udp-server.log` in the test subdirectory +- JUnit XML for the error message + +### 4. Startup/Infrastructure Failures + +**Symptoms**: +- `fatal: []: FAILED!` in the GitHub Actions log (Ansible failure) +- No collector.log at all for a test +- Image pull failures + +**Where to look**: +- The Ansible runner log: `integration-test-.log` in the artifact +- The GitHub Actions log: `gh run view --log --job ` + +## Platform-Specific Notes + +### RHEL 8 (kernel 4.18) / s390x / ppc64le +- **Oldest and strictest BPF verifier** — most likely to hit verifier rejections +- RHEL 8 uses kernel 4.18 which has limited BPF type tracking +- s390x and ppc64le also use 4.18-based kernels +- These platforms fail first, so their verifier errors are the canonical ones to fix + +### RHEL SAP (kernel 5.14) +- Same base kernel as RHEL 9 but **different kernel config** (SAP-tuned) +- Has hit verifier instruction limit (1M insns) when RHEL 9 passes +- `reg type unsupported for arg#0` is often a warning, not the real error — check end of verifier log for `BPF program is too large` + +### COS / Google Container-Optimized OS (kernel 6.6) +- **Clang-compiled kernel** — different BTF attributes than GCC-compiled kernels +- RCU pointer annotations cause different verifier behavior +- Has rejected programs that pass on same-version GCC-compiled kernels + +### ARM64 platforms (ubuntu-arm, rhcos-arm64, cos-arm64, fcarm) +- No ia32 compat syscalls — `ia32_*` programs are correctly disabled +- `sys_enter_connect` tracepoint may not exist — expected, handled gracefully +- Self-check timeouts can be timing-related on slower ARM VMs +- cos-arm64 and fcarm tend to pass when ubuntu-arm and rhcos-arm64 fail — may be Docker vs Podman timing differences + +### Ubuntu (ubuntu-os) +- Runs on **both Ubuntu 22.04 and 24.04** VMs +- The artifact contains logs from multiple VMs (check the subdirectory names) +- Ubuntu 22.04 (kernel 6.8) is stricter than 24.04 (kernel 6.17) + +### Flatcar / Fedora CoreOS +- Generally the most permissive — if these fail, something is fundamentally broken + +## Common Non-Fatal Log Messages + +These appear on all platforms and are expected/harmless: + +``` +# Container plugin not loaded (by design — collector uses cgroup extraction) +unable to initialize the state table API: failed to find dynamic field 'container_id' in threadinfo + +# Enter events removed in modern BPF (by design) +failed to determine tracepoint 'syscalls/sys_enter_connect' perf event ID: No such file or directory + +# TLS not configured (expected in integration tests) +Partial TLS config: CACertPath="", ClientCertPath="", ClientKeyPath=""; will not use TLS + +# Container filter uses proc.vpid not container.id (by design) +Could not set container filter: proc.vpid is not a valid number + +# Programs excluded from build via MODERN_BPF_EXCLUDE_PROGS +unable to find BPF program '' +``` + +## Quick Investigation Workflow + +1. **Identify failing platforms**: Check PR status checks +2. **Download artifacts**: For each failing platform, download the `-logs` artifact +3. **Check JUnit XML first**: `tests="4"` = crash, higher number = specific test failures +4. **Read collector.log**: For crashes, search for `failed to load` and read the verifier log above it. For test failures, read the specific test's collector.log +5. **Check kernel version**: First lines of collector.log show OS and kernel version +6. **Cross-reference platforms**: If RHEL 9 passes but RHEL SAP fails, it's likely a verifier limit issue. If all arm64 fail, check self-check timing. If everything fails, check BPF program structure +7. **Compare with master**: Download master's artifacts for the same platform to confirm regression + +## Build Exclusion Mechanism + +Collector can exclude BPF programs from compilation via CMake: + +```cmake +# collector/CMakeLists.txt +set(MODERN_BPF_EXCLUDE_PROGS "^(openat2|ppoll|...)$" CACHE STRING "..." FORCE) +``` + +The regex matches against BPF source file stems (e.g., `pread64` matches `pread64.bpf.c`). Excluded programs are not compiled into the skeleton. The loader in `maps.c:add_bpf_program_to_tail_table()` handles missing programs gracefully (logs debug message, returns success). + +Only exclude programs for syscalls that collector does not subscribe to. Collector's syscall list is in `collector/lib/CollectorConfig.h` (`kSyscalls[]` and `kSendRecvSyscalls[]`). From af76d58f9adf22f9aa6da6513d9428c508b99dbd Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Fri, 13 Mar 2026 10:21:57 +0000 Subject: [PATCH 16/20] Bump falco with push__bytebuf verifier fix Co-Authored-By: Claude Opus 4.6 --- falcosecurity-libs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/falcosecurity-libs b/falcosecurity-libs index 0aacbbac1d..bfadedf261 160000 --- a/falcosecurity-libs +++ b/falcosecurity-libs @@ -1 +1 @@ -Subproject commit 0aacbbac1d858afd2aa1bc73fa79a2e9e88ac6f8 +Subproject commit bfadedf26198e729c4c393dceb7314ddacbb70d7 From e97f768ac82ab564dfd70695b27a7e5435e9c1af Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Fri, 13 Mar 2026 10:22:16 +0000 Subject: [PATCH 17/20] Fix clang-format lint in Utility.cpp Co-Authored-By: Claude Opus 4.6 --- collector/lib/Utility.cpp | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/collector/lib/Utility.cpp b/collector/lib/Utility.cpp index 538070a203..f1c74f09d0 100644 --- a/collector/lib/Utility.cpp +++ b/collector/lib/Utility.cpp @@ -67,9 +67,13 @@ std::string GetContainerID(sinsp_threadinfo& tinfo) { } std::string GetContainerID(sinsp_evt* event) { - if (!event) return {}; + if (!event) { + return {}; + } sinsp_threadinfo* tinfo = event->get_thread_info(); - if (!tinfo) return {}; + if (!tinfo) { + return {}; + } return GetContainerID(*tinfo); } From 584f9baead033e32278ac79941aa3ce64d3ba27f Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Fri, 13 Mar 2026 10:23:14 +0000 Subject: [PATCH 18/20] Bump falco with push__bytebuf verifier fix Co-Authored-By: Claude Opus 4.6 --- .claude/commands/{analyse-ci.md => analyze-ci.md} | 12 +++++++++++- falcosecurity-libs | 2 +- 2 files changed, 12 insertions(+), 2 deletions(-) rename .claude/commands/{analyse-ci.md => analyze-ci.md} (97%) diff --git a/.claude/commands/analyse-ci.md b/.claude/commands/analyze-ci.md similarity index 97% rename from .claude/commands/analyse-ci.md rename to .claude/commands/analyze-ci.md index 1f237ac1f1..5ef112ac07 100644 --- a/.claude/commands/analyse-ci.md +++ b/.claude/commands/analyze-ci.md @@ -1,4 +1,4 @@ -# Analyse CI Results +# Analyze CI Results You are helping investigate CI failures for the StackRox collector project. This skill describes how to navigate the CI infrastructure, download logs, and diagnose common failure modes. @@ -273,3 +273,13 @@ set(MODERN_BPF_EXCLUDE_PROGS "^(openat2|ppoll|...)$" CACHE STRING "..." FORCE) The regex matches against BPF source file stems (e.g., `pread64` matches `pread64.bpf.c`). Excluded programs are not compiled into the skeleton. The loader in `maps.c:add_bpf_program_to_tail_table()` handles missing programs gracefully (logs debug message, returns success). Only exclude programs for syscalls that collector does not subscribe to. Collector's syscall list is in `collector/lib/CollectorConfig.h` (`kSyscalls[]` and `kSendRecvSyscalls[]`). + +## Cleanup + +Once the analysis is complete and you have reported your findings, delete all downloaded log artifacts (zip files and extracted directories) from `/tmp/`: + +```bash +rm -rf /tmp/*-logs /tmp/*-logs.zip +``` + +This prevents stale logs from accumulating across investigations. diff --git a/falcosecurity-libs b/falcosecurity-libs index bfadedf261..00c95594bf 160000 --- a/falcosecurity-libs +++ b/falcosecurity-libs @@ -1 +1 @@ -Subproject commit bfadedf26198e729c4c393dceb7314ddacbb70d7 +Subproject commit 00c95594bfec120ade9423d8dfab228b8f66c9fd From 1375a6364cc028dee56b884bb8136f5b7a122f42 Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Fri, 13 Mar 2026 16:31:08 +0000 Subject: [PATCH 19/20] Bump falco with TOCTOU openat2 fix for older kernels Co-Authored-By: Claude Opus 4.6 --- falcosecurity-libs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/falcosecurity-libs b/falcosecurity-libs index 00c95594bf..c85160e37a 160000 --- a/falcosecurity-libs +++ b/falcosecurity-libs @@ -1 +1 @@ -Subproject commit 00c95594bfec120ade9423d8dfab228b8f66c9fd +Subproject commit c85160e37a8682b201927f1971fa88bd367276e1 From c2385e2ca9c5ec1f995cad322c90c282ca5cb1e7 Mon Sep 17 00:00:00 2001 From: Giles Hutton Date: Fri, 13 Mar 2026 18:05:24 +0000 Subject: [PATCH 20/20] Bump falco with fd-based exec filtering fix Co-Authored-By: Claude Opus 4.6 --- falcosecurity-libs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/falcosecurity-libs b/falcosecurity-libs index c85160e37a..0a04768135 160000 --- a/falcosecurity-libs +++ b/falcosecurity-libs @@ -1 +1 @@ -Subproject commit c85160e37a8682b201927f1971fa88bd367276e1 +Subproject commit 0a047681350020ae6b29307aebc4342908082f58