-
Notifications
You must be signed in to change notification settings - Fork 52
enable indefinite duration telemetry and flamegraph collection on remote targets #595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request enables indefinite duration telemetry collection on remote targets by improving signal handling and process management. The key changes include:
- Implementing robust signal handling to properly propagate SIGINT/SIGTERM to child processes, particularly for parallel script execution via
parallel_master.sh - Simplifying the command execution API by introducing separate
RunCommandandRunCommandExmethods, removing unused parameters from numerous call sites - Changing the default telemetry duration from 30 seconds to 0 (indefinite) and relaxing validation requirements for remote target collection
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/target/target.go | Splits RunCommand interface into simple and extended versions, adds newProcessGroup parameter |
| internal/target/remote_target.go | Implements new RunCommand and RunCommandEx methods, updates all call sites to use simplified API |
| internal/target/local_target.go | Implements new RunCommand and RunCommandEx methods with process group support |
| internal/target/helpers.go | Adds newProcessGroup parameter to command execution helpers, implements process group isolation via Setpgid |
| internal/script/script.go | Updates script execution to use RunCommandEx with new process group for parallel_master.sh, adds PID file creation |
| internal/common/targets.go | Updates all RunCommand call sites to use simplified API without unused parameters |
| internal/common/common.go | Adds configureSignalHandler function for robust signal propagation and cleanup, integrates with reporting workflow |
| cmd/telemetry/telemetry.go | Changes default duration from 30 to 0 seconds, removes validation requiring positive duration for remote targets |
| cmd/metrics/process.go | Updates RunCommand call sites to use simplified API |
| cmd/metrics/nmi_watchdog.go | Updates RunCommand call sites to use simplified API |
| cmd/metrics/metadata.go | Updates RunCommand call sites to use simplified API |
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
…ut duration Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
This pull request introduces several improvements and refactorings to signal handling, duration flag behavior, and command execution throughout the codebase. The most significant changes are the addition of a robust signal handler to ensure proper cleanup of child processes (especially when running in parallel or on remote targets), and updates to how duration flags are handled for indefinite runs. Additionally, the interface for running commands on targets has been simplified.
Signal Handling and Process Cleanup:
Added a new
configureSignalHandlerfunction ininternal/common/common.goto catch SIGINT and SIGTERM, send signals to child processes and remote collection scripts, and wait for cleanup before exiting. This ensures proper resource management, especially when running scripts in parallel or remotely. [1] [2]Integrated the signal handler into the main data collection flow, ensuring it is set up when collecting from targets.
Duration Flag and Indefinite Collection:
Changed the default value of the
durationflag in bothflamegraphandtelemetrycommands from 30 to 0, allowing for indefinite collection by default. Updated help text to clarify that a duration of 0 means the collection will run until interrupted. [1] [2] [3]Updated validation logic to allow a duration of 0 (indefinite), and removed the restriction that prevented indefinite duration on remote targets for telemetry. [1] [2]
Enhanced the collection status messaging to indicate when Ctrl+C can be used to stop an indefinite collection. [1] [2]
Command Execution Interface Simplification:
RunCommandthroughout the codebase by removing unused parameters, simplifying the interface for running commands on targets. This affects several files, includingcmd/metrics/metadata.go,cmd/metrics/nmi_watchdog.go,cmd/metrics/process.go, andinternal/common/targets.go. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]Code Organization and Imports:
context,os/exec,time) required for new signal handling and process management functionality. [1] [2]Other Minor Refactoring:
internal/common/common.gofor improved readability.These changes collectively improve the robustness of signal handling, make it easier to run indefinite collections, and simplify the code for executing commands on targets.