Conversation
Add a MySQL daemon plugin that serves Prometheus text exposition format metrics via an embedded HTTP endpoint. Uses srv_session + command_service to execute SHOW GLOBAL STATUS internally (no network, no auth). Features: - Disabled by default (--prometheus-exporter-enabled=ON to activate) - Configurable port (default 9104) and bind address - Exports ~400 global status variables as mysql_global_status_* metrics - Gauge/untyped type classification for known variables - Plugin self-monitoring: requests_total, errors_total, scrape_duration - Clean shutdown via poll() + atomic flag Includes MTR test suite (basic + metrics_endpoint) and design spec/plan for v2 enhancements.
Extract the inline SHOW GLOBAL STATUS logic from collect_metrics() into separate reusable functions: collect_name_value_query() as a generic helper for 2-column queries, and collect_global_status() as the specific collector. MetricsCollectorCtx now takes a configurable prefix and type function, enabling additional collectors to be added cleanly.
Export all numeric MySQL server configuration variables as Prometheus gauge metrics with the mysql_global_variables_ prefix. Non-numeric values (strings, paths, ON/OFF) are automatically skipped by the existing strtod check.
Add collector that exports InnoDB metrics from information_schema.INNODB_METRICS with proper Prometheus type mapping (InnoDB 'counter' -> Prometheus counter, everything else -> gauge). Uses dedicated 4-column callbacks since INNODB_METRICS returns NAME, SUBSYSTEM, TYPE, COUNT rather than simple name/value pairs.
Add column-name-aware parsing for SHOW REPLICA STATUS that exports 6 replication metrics as gauges: seconds_behind_source, io_running, sql_running, relay_log_space, exec_source_log_pos, and read_source_log_pos. Boolean fields (IO/SQL running) emit 1 for "Yes", 0 otherwise. Gracefully emits nothing on non-replica servers.
Adds a binlog collector that runs SHOW BINARY LOGS and emits two synthetic gauge metrics: mysql_binlog_file_count (number of binary log files) and mysql_binlog_size_bytes_total (sum of all file sizes). If binary logging is disabled the query errors silently and no metrics are emitted.
- Update basic.test with inline documentation and clearer echo messages - Update metrics_endpoint.test to verify all 5 collector prefixes (global status, global variables, InnoDB metrics, binlog) - Add format_validation.test with Perl-based Prometheus exposition format validation (TYPE lines, metric names, value structure) - Record all result files
- Full README with ASCII architecture diagram, configuration table, metric namespace reference, usage examples, type classification, and limitations - Pointer doc in Docs/ directory
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new Linux-only Prometheus Exporter daemon plugin that serves an HTTP /metrics endpoint, collects MySQL metrics via internal srv_sessions, exposes Prometheus text-format metrics, registers server variables/status counters, and includes build files, docs, and a comprehensive MTR test suite. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Listener as "Listener Thread (HTTP)"
participant Session as "Internal srv_session"
participant Collector as "Metric Collector/Formatter"
Client->>Listener: TCP connect + GET /metrics
Listener->>Listener: parse HTTP request
alt path == /metrics
Listener->>Session: init srv_session & set security user
Session->>Session: run queries (SHOW GLOBAL STATUS, SHOW VARIABLES, INNODB_METRICS, SHOW REPLICA STATUS, SHOW BINARY LOGS)
Session-->>Collector: return rows
Collector->>Collector: classify types, convert values/labels, aggregate
Collector-->>Listener: Prometheus text payload
Listener->>Listener: update Prometheus_exporter_* counters/timers
Listener->>Client: HTTP 200 + metrics payload
else
Listener->>Client: HTTP 404
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Remove spec and plan docs from docs/superpowers/ -- these were development-time artifacts. User-facing documentation lives in plugin/prometheus_exporter/README.md.
There was a problem hiding this comment.
Pull request overview
Adds an embedded Prometheus /metrics HTTP exporter as a MySQL/VillageSQL daemon plugin, plus an accompanying MTR test suite and documentation, so Prometheus can scrape server metrics without an external sidecar.
Changes:
- Introduces
prometheus_exporterdaemon plugin implementing an in-process HTTP server and 5 SQL-based collectors (status, variables, InnoDB metrics, replica status, binlogs). - Adds a new
prometheus_exporterMTR suite with per-collector endpoint checks and a Perl-based exposition-format validator. - Adds plugin build wiring and end-user documentation (README + Docs pointer) plus internal design/spec docs.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| plugin/prometheus_exporter/prometheus_exporter.cc | Implements the embedded HTTP server and SQL-driven metric collection/export. |
| plugin/prometheus_exporter/CMakeLists.txt | Adds build rules to produce the prometheus_exporter plugin module. |
| plugin/prometheus_exporter/README.md | Documents architecture, configuration, metric namespaces, and limitations. |
| Docs/prometheus_exporter.md | Provides a Docs/ entry pointing to the plugin README. |
| mysql-test/include/plugin.defs | Registers the plugin for MTR so it can be loaded in test runs. |
| mysql-test/suite/prometheus_exporter/t/basic.test | Verifies install/uninstall lifecycle and sysvar/statusvar presence. |
| mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test | Verifies /metrics endpoint behavior, multiple prefixes, and 404 handling. |
| mysql-test/suite/prometheus_exporter/t/global_variables.test | Validates SHOW GLOBAL VARIABLES metrics export. |
| mysql-test/suite/prometheus_exporter/t/innodb_metrics.test | Validates information_schema.INNODB_METRICS metrics export and typing. |
| mysql-test/suite/prometheus_exporter/t/replica_status.test | Ensures replica metrics are absent on non-replica servers. |
| mysql-test/suite/prometheus_exporter/t/binlog.test | Validates binlog file count/size metrics export. |
| mysql-test/suite/prometheus_exporter/t/format_validation.test | Perl-based structural validation of Prometheus text exposition output. |
| mysql-test/suite/prometheus_exporter/t/metrics_endpoint-master.opt | Test server options (enable plugin, bind/port) for endpoint test. |
| mysql-test/suite/prometheus_exporter/t/global_variables-master.opt | Test server options for global variables collector test. |
| mysql-test/suite/prometheus_exporter/t/innodb_metrics-master.opt | Test server options for InnoDB metrics collector test. |
| mysql-test/suite/prometheus_exporter/t/replica_status-master.opt | Test server options for replica-status collector test. |
| mysql-test/suite/prometheus_exporter/t/binlog-master.opt | Test server options for binlog collector test. |
| mysql-test/suite/prometheus_exporter/t/format_validation-master.opt | Test server options for format-validation test. |
| mysql-test/suite/prometheus_exporter/r/basic.result | Recorded output for basic lifecycle test. |
| mysql-test/suite/prometheus_exporter/r/metrics_endpoint.result | Recorded output for endpoint/all-collector smoke test. |
| mysql-test/suite/prometheus_exporter/r/global_variables.result | Recorded output for global variables collector test. |
| mysql-test/suite/prometheus_exporter/r/innodb_metrics.result | Recorded output for InnoDB metrics collector test. |
| mysql-test/suite/prometheus_exporter/r/replica_status.result | Recorded output for replica-status absence test. |
| mysql-test/suite/prometheus_exporter/r/binlog.result | Recorded output for binlog collector test. |
| mysql-test/suite/prometheus_exporter/r/format_validation.result | Recorded output for exposition-format validation test. |
| docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md | Internal design spec for the v2 exporter. |
| docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md | Internal step-by-step implementation plan and commands. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| │ ┌──────────────���───────────────────────────────┐ │ | ||
| │ │ prometheus_exporter plugin │ │ | ||
| │ │ �� │ | ||
| │ │ ┌──────────────┐ ┌────────────────────┐ │ │ |
There was a problem hiding this comment.
The ASCII architecture diagram contains corrupted/invalid UTF-8 characters (e.g., "���", "��"), which makes the README render poorly and is likely an encoding artifact. Please replace those characters with plain ASCII/Unicode box-drawing characters and ensure the file is saved as valid UTF-8.
| // Switch to root security context | ||
| MYSQL_SECURITY_CONTEXT sc; | ||
| thd_get_security_context(srv_session_info_get_thd(session), &sc); | ||
| security_context_lookup(sc, "root", "localhost", "127.0.0.1", ""); | ||
|
|
There was a problem hiding this comment.
The exporter unconditionally switches the internal session to the "root" security context and ignores return codes from thd_get_security_context/security_context_lookup. This effectively runs the collectors with full privileges and can fail silently depending on account configuration. Consider using a dedicated low-privilege account/context (or the admin session service if appropriate), and handle errors explicitly (e.g., fail the scrape / increment errors_total when context switch fails).
| // Switch to root security context | |
| MYSQL_SECURITY_CONTEXT sc; | |
| thd_get_security_context(srv_session_info_get_thd(session), &sc); | |
| security_context_lookup(sc, "root", "localhost", "127.0.0.1", ""); | |
| // Run collectors using the default internal session context. Do not | |
| // unconditionally elevate this session to the root account. |
| "endpoint. Default 0.0.0.0.", | ||
| nullptr, nullptr, "0.0.0.0"); |
There was a problem hiding this comment.
Defaulting the bind address to 0.0.0.0 while also having no authentication/TLS means enabling the plugin can unintentionally expose internal server metrics on all interfaces. Consider changing the default to 127.0.0.1 (or requiring an explicit bind address when enabled) and logging a warning when binding to a non-loopback address.
| "endpoint. Default 0.0.0.0.", | |
| nullptr, nullptr, "0.0.0.0"); | |
| "endpoint. Default 127.0.0.1.", | |
| nullptr, nullptr, "127.0.0.1"); |
|
|
||
| static void prom_handle_error(void *ctx, uint, const char *, const char *) { | ||
| auto *mc = static_cast<MetricsCollectorCtx *>(ctx); | ||
| mc->error = true; |
There was a problem hiding this comment.
errors_total is never incremented: prom_handle_error()/innodb_handle_error()/replica_handle_error()/binlog_handle_error() only set per-collector flags, but those flags are never checked and g_ctx->errors_total is only read for SHOW STATUS. Please wire collector/scrape failures to increment errors_total (and/or surface an error comment in the scrape output) so the status variable reflects actual failures.
| mc->error = true; | |
| mc->error = true; | |
| if (g_ctx != nullptr) { | |
| ++g_ctx->errors_total; | |
| } |
| // Try to parse as a number; skip non-numeric values (ON/OFF etc.) | ||
| char *end = nullptr; | ||
| double val = strtod(mc->current_value.c_str(), &end); | ||
| if (end == mc->current_value.c_str()) return 0; // not numeric | ||
| (void)val; // we use the string representation directly |
There was a problem hiding this comment.
Numeric parsing accepts partially-numeric strings (e.g., "1abc"): strtod() is only checked for end==start, but the original (possibly invalid) string is emitted as the Prometheus value. This can generate invalid exposition output. Please require that the full string parses as a number (optionally allowing surrounding whitespace) before emitting.
| // Check if numeric | ||
| char *end = nullptr; | ||
| strtod(val.c_str(), &end); | ||
| if (end == val.c_str()) continue; // not numeric, skip | ||
| value_str = val; | ||
| } |
There was a problem hiding this comment.
Replica metric numeric validation has the same partial-parse issue as prom_end_row: it only checks end==start after strtod(), so values like "123abc" would be emitted and break Prometheus parsing. Please require full numeric consumption (optionally allowing trailing whitespace) before emitting.
| char *end = nullptr; | ||
| long long sz = strtoll(bc->current_size.c_str(), &end, 10); | ||
| if (end != bc->current_size.c_str()) { | ||
| bc->total_size += sz; |
There was a problem hiding this comment.
Binlog size parsing uses strtoll() but only checks end!=start; partially-numeric strings (e.g., "123abc") would still be counted. Please validate that the entire field parses as an integer (optionally allowing whitespace) before adding it to total_size.
| char *end = nullptr; | |
| long long sz = strtoll(bc->current_size.c_str(), &end, 10); | |
| if (end != bc->current_size.c_str()) { | |
| bc->total_size += sz; | |
| const char *start = bc->current_size.c_str(); | |
| char *end = nullptr; | |
| long long sz = strtoll(start, &end, 10); | |
| if (end != start) { | |
| while (*end == ' ' || *end == '\t' || *end == '\n' || *end == '\r' || | |
| *end == '\f' || *end == '\v') { | |
| ++end; | |
| } | |
| if (*end == '\0') { | |
| bc->total_size += sz; | |
| } |
| if (strncmp(buf, "GET /metrics", 12) == 0) { | ||
| ctx->requests_total.fetch_add(1, std::memory_order_relaxed); |
There was a problem hiding this comment.
HTTP routing check is too permissive: strncmp("GET /metrics", 12) will also match paths like /metricsXYZ and doesn't ensure the first line is a valid request line. Consider parsing just the request line and matching the exact path "/metrics" (and method GET) before returning 200.
| --source include/not_windows.inc | ||
|
|
||
| --echo # On a non-replica server, no mysql_replica_ metrics should appear | ||
| --exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0" |
There was a problem hiding this comment.
This command prints two lines when there are no matches: grep -c already outputs 0, but exits with status 1 which triggers || echo "0". Use || true (or similar) to keep the exit status non-fatal without duplicating output.
| --exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0" | |
| --exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || true |
| int client_fd = accept(ctx->listen_fd, nullptr, nullptr); | ||
| if (client_fd < 0) continue; | ||
|
|
||
| // Read HTTP request | ||
| char buf[4096]; | ||
| ssize_t n = read(client_fd, buf, sizeof(buf) - 1); | ||
| if (n <= 0) { | ||
| close(client_fd); | ||
| continue; | ||
| } |
There was a problem hiding this comment.
The listener performs a blocking read() on the accepted client socket. Because the HTTP server is single-threaded, a client that connects and then stalls (or sends very slowly) can block the exporter indefinitely and prevent Prometheus scrapes (and potentially interfere with plugin shutdown). Consider setting a receive timeout (SO_RCVTIMEO), making the client socket non-blocking + polling for readability, or otherwise enforcing a small read deadline before parsing the request line.
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (12)
mysql-test/suite/prometheus_exporter/t/replica_status.test (2)
1-1: Consider documenting why Windows is excluded.Adding a brief comment explaining why
not_windows.incis needed (e.g., curl dependency, or Windows socket behavior differences) would help future maintainers understand the limitation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mysql-test/suite/prometheus_exporter/t/replica_status.test` at line 1, Add a short inline comment above the include line referencing include/not_windows.inc that explains why Windows is excluded (for example: reliance on curl/Unix sockets, POSIX-only behavior in the Prometheus exporter tests, or known Windows networking incompatibilities); update the test file's top-of-file section (where the include "include/not_windows.inc" appears) to state the specific reason and any relevant bug/issue tracker IDs so future maintainers understand the limitation.
4-4: Thegrep -c || echo "0"pattern produces two lines of output.When
grep -cfinds no matches, it outputs "0" (the count) but exits with status 1. The|| echo "0"then also outputs "0", resulting in two "0" lines. This matches the result file, but the logic is confusing and redundant.Consider using
grep -c ... || trueto suppress the exit status while keeping the count output, or redirect grep's output and handle it differently.Proposed fix for cleaner output
--echo # On a non-replica server, no mysql_replica_ metrics should appear ---exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0" +--exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || trueAnd update the result file to expect a single "0":
# On a non-replica server, no mysql_replica_ metrics should appear 0 -0 # But other metrics should still be present🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mysql-test/suite/prometheus_exporter/t/replica_status.test` at line 4, The test currently uses the shell line '--exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0"', which prints "0" twice when no matches are found; replace that pattern with '--exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || true' so grep's count is preserved and non-zero exit is suppressed, and update the test result expectation to a single "0" output (remove the duplicate line) to keep output and assertions consistent.plugin/prometheus_exporter/prometheus_exporter.cc (5)
403-410: Unusedhas_rowfield inReplicaStatusCtx.The
has_rowfield is set totrueinreplica_start_row(line 432) but never read. If not needed for future use, consider removing it for clarity.♻️ Suggested cleanup
struct ReplicaStatusCtx { std::string *output; std::vector<std::string> col_names; std::vector<std::string> col_values; int col_index; - bool has_row; bool error; };And remove the assignment in
replica_start_row:static int replica_start_row(void *ctx) { auto *rc = static_cast<ReplicaStatusCtx *>(ctx); rc->col_values.clear(); rc->col_values.resize(rc->col_names.size()); rc->col_index = 0; - rc->has_row = true; return 0; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 403 - 410, The struct ReplicaStatusCtx contains an unused field has_row; remove the has_row member from ReplicaStatusCtx and also remove the assignment to it inside replica_start_row so there are no leftover writes to a deleted field; update any references (none expected) and rebuild to ensure no use remains of ReplicaStatusCtx::has_row.
851-863: Thread attribute not destroyed after use.
my_thread_attr_init(&attr)allocates resources that should be released withmy_thread_attr_destroy(&attr)after the thread is created, regardless of success or failure.♻️ Suggested fix
my_thread_attr_t attr; my_thread_attr_init(&attr); my_thread_attr_setdetachstate(&attr, MY_THREAD_CREATE_JOINABLE); if (my_thread_create(&g_ctx->listener_thread, &attr, prometheus_listener_thread, g_ctx) != 0) { LogPluginErr(ERROR_LEVEL, ER_LOG_PRINTF_MSG, "Prometheus exporter: failed to create listener thread"); + my_thread_attr_destroy(&attr); close(g_ctx->listen_fd); delete g_ctx; g_ctx = nullptr; return 1; } + my_thread_attr_destroy(&attr); LogPluginErr(INFORMATION_LEVEL, ER_LOG_PRINTF_MSG,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 851 - 863, The thread attribute initialized with my_thread_attr_init(&attr) is not destroyed; call my_thread_attr_destroy(&attr) after attempting to create the thread to free resources whether my_thread_create(&g_ctx->listener_thread, &attr, prometheus_listener_thread, g_ctx) succeeds or fails; add my_thread_attr_destroy(&attr) before the error return path (after cleaning up g_ctx/listen_fd) and also after successful creation (before proceeding) so attr is always destroyed.
119-157: Consider using a hash set for gauge variable lookup.
is_gauge()performs a linear search through ~30 items on every metric emission. While acceptable for current scale, astd::unordered_setwould provide O(1) lookup and better scalability if the list grows.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 119 - 157, Replace the linear-scan array lookup in is_gauge with an O(1) hash lookup: create a static std::unordered_set<std::string> (or std::string_view with storage) containing all gauge names (populate it once from gauge_variables or via an initializer list), ensure case-insensitive matching by storing lowercased names and lowercasing the input name before lookup (or use a custom case-insensitive hash/equality), and make initialization thread-safe by using a function-local static or std::call_once; then change is_gauge to query that unordered_set instead of iterating through gauge_variables.
52-55: Global service pointers lack internal linkage.
log_biandlog_bsare declared withoutstatic, giving them external linkage. This could cause symbol collisions if another plugin uses the same names. These should bestaticlikereg_srv.♻️ Suggested fix
static SERVICE_TYPE(registry) *reg_srv = nullptr; -SERVICE_TYPE(log_builtins) *log_bi = nullptr; -SERVICE_TYPE(log_builtins_string) *log_bs = nullptr; +static SERVICE_TYPE(log_builtins) *log_bi = nullptr; +static SERVICE_TYPE(log_builtins_string) *log_bs = nullptr;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 52 - 55, The global service pointers log_bi and log_bs currently have external linkage; make them have internal linkage like reg_srv by declaring them static (i.e., add the static storage specifier to the SERVICE_TYPE(log_builtins) *log_bi and SERVICE_TYPE(log_builtins_string) *log_bs declarations) so they won’t collide with symbols from other plugins; keep SERVICE_TYPE(registry) *reg_srv unchanged.
703-707: Missing return value check onsecurity_context_lookup.The return value of
security_context_lookupis not checked. While failures are rare (root user should always exist), defensive coding would log or handle the error case.🛡️ Suggested defensive check
// Switch to root security context MYSQL_SECURITY_CONTEXT sc; thd_get_security_context(srv_session_info_get_thd(session), &sc); - security_context_lookup(sc, "root", "localhost", "127.0.0.1", ""); + if (security_context_lookup(sc, "root", "localhost", "127.0.0.1", "") != 0) { + srv_session_close(session); + srv_session_deinit_thread(); + return "# Failed to set security context\n"; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 703 - 707, The call to security_context_lookup is unchecked; capture its return value and handle errors defensively: call thd_get_security_context(...) into MYSQL_SECURITY_CONTEXT sc as before, then if security_context_lookup(sc, "root", "localhost", "127.0.0.1", "") returns non-zero, log the failure (using the plugin/mysql logging facility available in this module) with context (e.g., include the user/host) and abort the root switch path—return an error code or clean up/exit the function instead of continuing; ensure any resources are left consistent before returning.docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md (3)
1093-1098: Numeric validation regex doesn't cover Prometheus special values.The regex
/^-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?$/doesn't match Prometheus special values like+Inf,-Inf, orNaN. While the plugin currently doesn't emit these values, you may want to extend the regex for future-proofing:unless ($value =~ /^-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?$/ || $value =~ /^[+-]?Inf$/ || $value eq 'NaN') {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md` around lines 1093 - 1098, The numeric validation currently rejects Prometheus special values; update the check around the $value validation (the regex that tests $value before emitting the FORMAT ERROR for $name) to also accept `+Inf`, `-Inf`, and `NaN` — e.g., extend the condition so it passes if the original numeric regex matches OR $value matches /^[+-]?Inf$/ OR $value eq 'NaN', and keep the existing error message and $errors++ path unchanged.
556-577: Minor inconsistency:emplace_backvspush_backin plan vs implementation.The plan uses
rc->col_names.emplace_back(field->col_name)while the actual implementation atprometheus_exporter.cc:423usespush_back. Both are functionally equivalent here, but the plan should match the implementation for clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md` around lines 556 - 577, The plan and implementation are inconsistent: the plan's replica_field_metadata uses rc->col_names.emplace_back(field->col_name) but the actual implementation (prometheus_exporter.cc: replica_field_metadata) uses push_back; update the plan to match the implementation by replacing emplace_back with rc->col_names.push_back(field->col_name) (reference symbols: ReplicaStatusCtx, replica_field_metadata, col_names) so the plan and code are consistent.
675-676: Outdated note about#include <vector>.The note mentions adding
#include <vector>to the includes, but the actual implementation already includes it (line 46 ofprometheus_exporter.cc). This note can be removed since it's already satisfied.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md` around lines 675 - 676, Remove the outdated note that says "Note: add `#include <vector>`" from the markdown (the note in the plans doc), because the implementation already includes <vector> in prometheus_exporter.cc; simply delete that line so the doc no longer suggests adding the header.docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md (2)
96-108: Add language specifier to fenced code block.The code block showing internal structure is missing a language specifier. Use
textorplaintextto satisfy markdown linting and improve readability.📝 Suggested fix
-``` +```text 1. Includes, logging refs, system vars, context struct (existing) 2. Prometheus formatting helpers + gauge classification (existing, expanded)As per coding guidelines from static analysis hints:
[warning] 96-96: Fenced code blocks should have a language specified (MD040, fenced-code-language).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md` around lines 96 - 108, The fenced code block that lists the internal structure (the block starting with "1. Includes, logging refs, system vars, context struct" and containing the numbered list) is missing a language specifier; update that triple-backtick fence to include a language token such as text or plaintext (e.g., change ``` to ```text) so the markdown linter rule MD040 is satisfied and readability is improved.
165-202: Add language specifier to ASCII art code block.The ASCII architecture diagram is missing a language specifier. Use
textfor consistency with standard markdown practices.📝 Suggested fix
-``` +```text ┌─────────────────────────────────────────────────────┐ │ VillageSQL Server │As per coding guidelines from static analysis hints:
[warning] 165-165: Fenced code blocks should have a language specified (MD040, fenced-code-language).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md` around lines 165 - 202, The fenced ASCII-art diagram (the triple-backtick code block that begins the big VillageSQL Server diagram) lacks a language specifier; update the opening fence to include the language "text" (i.e., change ``` to ```text) so the diagram is treated as plain text and satisfies MD040/fenced-code-language rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@mysql-test/suite/prometheus_exporter/r/innodb_metrics.result`:
- Around line 5-6: The test's comment and metric TYPE disagree: the comment on
the check mentions a "counter" but the metric declaration lists
mysql_innodb_metrics_buffer_pool_reads as a "gauge"; update one to match the
other. Either change the comment to say "gauge" (so it documents the current
TYPE) or change the TYPE line to "counter" to assert the metric is a counter;
ensure you update the metric identifier mysql_innodb_metrics_buffer_pool_reads
accordingly so the test and comment are consistent.
In `@mysql-test/suite/prometheus_exporter/r/replica_status.result`:
- Around line 2-3: The result file replica_status.result contains two "0" lines
but the test replica_status.test only runs one grep -c "mysql_replica_" || echo
"0" command, so remove the extra "0" line from replica_status.result (leaving a
single "0" output) to match the single grep invocation; alternatively if the
test should produce two outputs, add the missing second grep -c "mysql_replica_"
(or equivalent) to replica_status.test so the number of grep outputs matches the
number of result lines.
In `@mysql-test/suite/prometheus_exporter/t/format_validation.test`:
- Around line 33-42: The code sets $expect_metric_name on a valid "# TYPE" line
but can overwrite it with a subsequent TYPE or leave it set at EOF without
reporting an error; update the TYPE-handling logic in the loop (the block that
matches /^# TYPE / and the similar block at lines 73-79) to first check if
$expect_metric_name is defined and, if so, emit a "missing sample for TYPE"
FORMAT ERROR and increment $errors before assigning a new value, and also add a
final check after the main input loop to emit the same error if
$expect_metric_name remains set at EOF so TYPE lines that never receive a
matching metric sample are flagged.
- Around line 59-66: The test currently skips numeric validation for any metric
whose name matches /^mysql_global_variables_/ which lets non-numeric exports
pass; to fix, remove the "$name =~ /^mysql_global_variables_/" exception in the
value-checking unless block so only the intended "_time$" exception remains
(i.e., keep the check for $name =~ /_time$/ but delete the global variables
whitelist), ensuring $value is enforced to match the numeric regex for
mysql_global_variables_* metrics; update the unless that inspects $value and
$name accordingly.
In `@mysql-test/suite/prometheus_exporter/t/innodb_metrics.test`:
- Around line 7-10: The test currently only checks the presence of the metric
families; tighten the assertions to require exact Prometheus types by changing
the grep patterns to match the full type lines for the two metrics: require "^#
TYPE mysql_innodb_metrics_buffer_pool_reads counter$" for
mysql_innodb_metrics_buffer_pool_reads and "^# TYPE
mysql_innodb_metrics_buffer_pool_size gauge$" for
mysql_innodb_metrics_buffer_pool_size (update the two curl|grep commands in
innodb_metrics.test that reference those metric names).
In `@mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test`:
- Around line 4-8: The test description in metrics_endpoint.test claims "all 5
collector prefixes" but the assertions do not check replica metrics; either add
an assertion that the Prometheus output contains the mysql_replica_ prefix
(e.g., verify presence of a sample metric starting with "mysql_replica_") and
ensure that assertion runs only in a replica-capable test setup, or change the
test description to accurately state the number of collectors currently asserted
(e.g., remove "5" or list the actual prefixes asserted). Update the file's top
comment and any related assertions that mention collectors so they remain
consistent with the new check or the narrowed description.
In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 100-101: The errors_total atomic is declared and exported but
never incremented; update the code to increment errors_total wherever request or
collection failures occur (for example in prom_handle_error, in the HTTP handler
when collect_metrics() fails, and inside session operation failure paths), or
remove the variable if you prefer not to expose it. Locate the symbols
errors_total and requests_total to find existing counters, then add
errors_total.fetch_add(1, std::memory_order_relaxed) (or equivalent) in
prom_handle_error and any branch that returns an error from collect_metrics() or
session operations so the metric accurately reflects failures. Ensure increments
happen before sending error responses so the exposed status variable stays
consistent.
- Around line 774-784: The read() on the accepted socket (client_fd from accept
in the listener loop) has no receive timeout and can block the single-threaded
exporter; before calling read() in the accept loop (where client_fd is
obtained), set a receive timeout on client_fd via setsockopt(client_fd,
SOL_SOCKET, SO_RCVTIMEO, ...) or alternatively poll() the client_fd with a short
timeout and only call read() when it's ready; update the code around accept/read
(symbols: accept, client_fd, read) to close the socket and continue on timeout
or error.
In `@plugin/prometheus_exporter/README.md`:
- Around line 15-52: The README's ASCII architecture diagram (the
triple-backtick block containing "VillageSQL Server" and "prometheus_exporter
plugin") contains corrupted glyphs (`���`); fix it by replacing those
placeholders with the correct box-drawing characters (e.g., ─ ┌ ┐ └ ┘ │ ├ ┤ ┬ ┴)
or plain ASCII equivalents so the boxes/lines render correctly, and ensure the
file is saved with UTF-8 encoding; update all occurrences of `���` inside that
diagram block to the appropriate characters to restore the intended boxes and
connectors.
---
Nitpick comments:
In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md`:
- Around line 1093-1098: The numeric validation currently rejects Prometheus
special values; update the check around the $value validation (the regex that
tests $value before emitting the FORMAT ERROR for $name) to also accept `+Inf`,
`-Inf`, and `NaN` — e.g., extend the condition so it passes if the original
numeric regex matches OR $value matches /^[+-]?Inf$/ OR $value eq 'NaN', and
keep the existing error message and $errors++ path unchanged.
- Around line 556-577: The plan and implementation are inconsistent: the plan's
replica_field_metadata uses rc->col_names.emplace_back(field->col_name) but the
actual implementation (prometheus_exporter.cc: replica_field_metadata) uses
push_back; update the plan to match the implementation by replacing emplace_back
with rc->col_names.push_back(field->col_name) (reference symbols:
ReplicaStatusCtx, replica_field_metadata, col_names) so the plan and code are
consistent.
- Around line 675-676: Remove the outdated note that says "Note: add `#include
<vector>`" from the markdown (the note in the plans doc), because the
implementation already includes <vector> in prometheus_exporter.cc; simply
delete that line so the doc no longer suggests adding the header.
In `@docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md`:
- Around line 96-108: The fenced code block that lists the internal structure
(the block starting with "1. Includes, logging refs, system vars, context
struct" and containing the numbered list) is missing a language specifier;
update that triple-backtick fence to include a language token such as text or
plaintext (e.g., change ``` to ```text) so the markdown linter rule MD040 is
satisfied and readability is improved.
- Around line 165-202: The fenced ASCII-art diagram (the triple-backtick code
block that begins the big VillageSQL Server diagram) lacks a language specifier;
update the opening fence to include the language "text" (i.e., change ``` to
```text) so the diagram is treated as plain text and satisfies
MD040/fenced-code-language rules.
In `@mysql-test/suite/prometheus_exporter/t/replica_status.test`:
- Line 1: Add a short inline comment above the include line referencing
include/not_windows.inc that explains why Windows is excluded (for example:
reliance on curl/Unix sockets, POSIX-only behavior in the Prometheus exporter
tests, or known Windows networking incompatibilities); update the test file's
top-of-file section (where the include "include/not_windows.inc" appears) to
state the specific reason and any relevant bug/issue tracker IDs so future
maintainers understand the limitation.
- Line 4: The test currently uses the shell line '--exec curl -s
http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0"', which
prints "0" twice when no matches are found; replace that pattern with '--exec
curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || true' so
grep's count is preserved and non-zero exit is suppressed, and update the test
result expectation to a single "0" output (remove the duplicate line) to keep
output and assertions consistent.
In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 403-410: The struct ReplicaStatusCtx contains an unused field
has_row; remove the has_row member from ReplicaStatusCtx and also remove the
assignment to it inside replica_start_row so there are no leftover writes to a
deleted field; update any references (none expected) and rebuild to ensure no
use remains of ReplicaStatusCtx::has_row.
- Around line 851-863: The thread attribute initialized with
my_thread_attr_init(&attr) is not destroyed; call my_thread_attr_destroy(&attr)
after attempting to create the thread to free resources whether
my_thread_create(&g_ctx->listener_thread, &attr, prometheus_listener_thread,
g_ctx) succeeds or fails; add my_thread_attr_destroy(&attr) before the error
return path (after cleaning up g_ctx/listen_fd) and also after successful
creation (before proceeding) so attr is always destroyed.
- Around line 119-157: Replace the linear-scan array lookup in is_gauge with an
O(1) hash lookup: create a static std::unordered_set<std::string> (or
std::string_view with storage) containing all gauge names (populate it once from
gauge_variables or via an initializer list), ensure case-insensitive matching by
storing lowercased names and lowercasing the input name before lookup (or use a
custom case-insensitive hash/equality), and make initialization thread-safe by
using a function-local static or std::call_once; then change is_gauge to query
that unordered_set instead of iterating through gauge_variables.
- Around line 52-55: The global service pointers log_bi and log_bs currently
have external linkage; make them have internal linkage like reg_srv by declaring
them static (i.e., add the static storage specifier to the
SERVICE_TYPE(log_builtins) *log_bi and SERVICE_TYPE(log_builtins_string) *log_bs
declarations) so they won’t collide with symbols from other plugins; keep
SERVICE_TYPE(registry) *reg_srv unchanged.
- Around line 703-707: The call to security_context_lookup is unchecked; capture
its return value and handle errors defensively: call
thd_get_security_context(...) into MYSQL_SECURITY_CONTEXT sc as before, then if
security_context_lookup(sc, "root", "localhost", "127.0.0.1", "") returns
non-zero, log the failure (using the plugin/mysql logging facility available in
this module) with context (e.g., include the user/host) and abort the root
switch path—return an error code or clean up/exit the function instead of
continuing; ensure any resources are left consistent before returning.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5dfbf72a-b1ba-4ab5-821a-cb858172c6d2
📒 Files selected for processing (27)
Docs/prometheus_exporter.mddocs/superpowers/plans/2026-04-05-prometheus-exporter-v2.mddocs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.mdmysql-test/include/plugin.defsmysql-test/suite/prometheus_exporter/r/basic.resultmysql-test/suite/prometheus_exporter/r/binlog.resultmysql-test/suite/prometheus_exporter/r/format_validation.resultmysql-test/suite/prometheus_exporter/r/global_variables.resultmysql-test/suite/prometheus_exporter/r/innodb_metrics.resultmysql-test/suite/prometheus_exporter/r/metrics_endpoint.resultmysql-test/suite/prometheus_exporter/r/replica_status.resultmysql-test/suite/prometheus_exporter/t/basic.testmysql-test/suite/prometheus_exporter/t/binlog-master.optmysql-test/suite/prometheus_exporter/t/binlog.testmysql-test/suite/prometheus_exporter/t/format_validation-master.optmysql-test/suite/prometheus_exporter/t/format_validation.testmysql-test/suite/prometheus_exporter/t/global_variables-master.optmysql-test/suite/prometheus_exporter/t/global_variables.testmysql-test/suite/prometheus_exporter/t/innodb_metrics-master.optmysql-test/suite/prometheus_exporter/t/innodb_metrics.testmysql-test/suite/prometheus_exporter/t/metrics_endpoint-master.optmysql-test/suite/prometheus_exporter/t/metrics_endpoint.testmysql-test/suite/prometheus_exporter/t/replica_status-master.optmysql-test/suite/prometheus_exporter/t/replica_status.testplugin/prometheus_exporter/CMakeLists.txtplugin/prometheus_exporter/README.mdplugin/prometheus_exporter/prometheus_exporter.cc
| --exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_reads " | head -1 | ||
|
|
||
| --echo # Verify a known gauge type metric exists (buffer_pool_size is 'value' type in InnoDB) | ||
| --exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_size " | head -1 |
There was a problem hiding this comment.
Assert exact Prometheus types for the two canonical InnoDB metrics.
These patterns currently only confirm metric family presence; they don’t enforce counter for buffer_pool_reads or gauge for buffer_pool_size.
🔧 Suggested tightening
---exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_reads " | head -1
+--exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_reads counter$" | head -1
---exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_size " | head -1
+--exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_size gauge$" | head -1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@mysql-test/suite/prometheus_exporter/t/innodb_metrics.test` around lines 7 -
10, The test currently only checks the presence of the metric families; tighten
the assertions to require exact Prometheus types by changing the grep patterns
to match the full type lines for the two metrics: require "^# TYPE
mysql_innodb_metrics_buffer_pool_reads counter$" for
mysql_innodb_metrics_buffer_pool_reads and "^# TYPE
mysql_innodb_metrics_buffer_pool_size gauge$" for
mysql_innodb_metrics_buffer_pool_size (update the two curl|grep commands in
innodb_metrics.test that reference those metric names).
…tness - Default bind address changed from 0.0.0.0 to 127.0.0.1 (no auth endpoint) - Increment errors_total in all four error handlers - Strict HTTP path matching (reject /metricsXYZ) - Add SO_RCVTIMEO on client sockets to prevent slow-client hangs - Full strtod/strtoll consumption checks to reject partial numeric parses - Check return values of security context functions - Fix README corrupted box-drawing characters - Test fixes: grep -c || true, orphaned TYPE detection, comment accuracy
There was a problem hiding this comment.
🧹 Nitpick comments (1)
plugin/prometheus_exporter/README.md (1)
15-15: Consider adding a language identifier to the fenced code block.While not critical for ASCII art diagrams, adding a language identifier (e.g.,
textor leaving it as```) would satisfy the markdownlint rule and is a minor documentation best practice.📝 Proposed fix
-``` +```text ┌─────────────────────────────────────────────────────┐🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/README.md` at line 15, Add a language identifier to the fenced code block in the README so markdownlint is satisfied; locate the ASCII-art fenced block delimited by ``` and change the opening fence to include a language token such as text (e.g., replace the opening ``` with ```text) so the block becomes a labeled code fence while keeping its contents unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@plugin/prometheus_exporter/README.md`:
- Line 15: Add a language identifier to the fenced code block in the README so
markdownlint is satisfied; locate the ASCII-art fenced block delimited by ```
and change the opening fence to include a language token such as text (e.g.,
replace the opening ``` with ```text) so the block becomes a labeled code fence
while keeping its contents unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bd21e74f-8e15-4551-a7a8-bc072326f157
📒 Files selected for processing (9)
mysql-test/suite/prometheus_exporter/r/basic.resultmysql-test/suite/prometheus_exporter/r/innodb_metrics.resultmysql-test/suite/prometheus_exporter/r/replica_status.resultmysql-test/suite/prometheus_exporter/t/format_validation.testmysql-test/suite/prometheus_exporter/t/innodb_metrics.testmysql-test/suite/prometheus_exporter/t/metrics_endpoint.testmysql-test/suite/prometheus_exporter/t/replica_status.testplugin/prometheus_exporter/README.mdplugin/prometheus_exporter/prometheus_exporter.cc
✅ Files skipped from review due to trivial changes (7)
- mysql-test/suite/prometheus_exporter/r/replica_status.result
- mysql-test/suite/prometheus_exporter/r/innodb_metrics.result
- mysql-test/suite/prometheus_exporter/r/basic.result
- mysql-test/suite/prometheus_exporter/t/replica_status.test
- mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test
- mysql-test/suite/prometheus_exporter/t/innodb_metrics.test
- plugin/prometheus_exporter/prometheus_exporter.cc
🚧 Files skipped from review as they are similar to previous changes (1)
- mysql-test/suite/prometheus_exporter/t/format_validation.test
- Move atomic counters out of PrometheusContext to file-scope globals, eliminating the use-after-free race between SHOW STATUS callbacks and plugin deinit. Use memcpy instead of reinterpret_cast to avoid alignment issues in SHOW_FUNC callbacks. - Use eventfd as wakeup mechanism instead of closing listen_fd from another thread (fixes fd-recycle hazard). close() now happens after join(), so the listener never sees closed fds. - Use memory_order_acquire/release on shutdown_requested flag. - Add deinit_logging_service_for_plugin to all init failure paths to avoid leaking the logging service registration. - NULL/empty check for bind_address to prevent crash on inet_pton(NULL).
- Add SO_SNDTIMEO (5s) on accepted client sockets alongside existing SO_RCVTIMEO. Prevents listener thread from blocking forever when a client completes TCP handshake but never reads. - Switch write_full from write() to send() with MSG_NOSIGNAL to avoid SIGPIPE on half-closed connections, and retry on EINTR. - Loop reading HTTP request until \r\n\r\n delimiter or buffer full, instead of a single read() that may return a partial request across TCP segments. Handles EINTR correctly. - Length check before indexing buf[12] in path matcher.
- Move srv_session_init_thread/deinit_thread to listener thread lifetime instead of per-scrape. Per MySQL service contract, these must be called once per physical thread, not per request. - Add prometheus_exporter_security_user sysvar to make the internal user configurable (was hardcoded to "root"). Default remains "root" for backward compatibility. Operators can switch to a least-privilege account (e.g. one granted PROCESS, REPLICATION CLIENT, and SELECT on information_schema) by setting this variable at startup. Note that mysql.session@localhost does NOT have PROCESS by default, so cannot read information_schema.INNODB_METRICS without an additional grant. - Improve error message when security context setup fails.
- Binlog collector: fully validate numeric parsing (reject "123abc"), check for negative values and overflow before accumulating. - Replica collector: require full-string numeric consumption in strtod check, matching the fix previously applied to global_status. - Log a WARNING at plugin init when bound to a non-loopback address, alerting operators to the security implications of exposing the unauthenticated /metrics endpoint to the network. - Mark unused parameter in global_variables_type with [[maybe_unused]].
- Remove section separator comments per CLAUDE.md guideline - Remove g_ctx/plugin->data redundancy, keeping only one source of truth for the plugin context pointer
- Add scrape_counter.test: captures requests_total before and after a fixed number of curl scrapes, asserts exact delta. Also verifies scrape_duration_microseconds > 0 and errors_total == 0. Catches regressions where status counters silently fail to update. - Remove format_validation carve-out that exempted global_variables and _time metrics from numeric validation. The plugin's prom_end_row already filters non-numeric values, so the validator should enforce the invariant strictly. Extend regex to accept NaN/+Inf/-Inf per Prometheus spec.
Review Fixes AppliedI've addressed the majority of the findings from the review above across 6 commits. 9 tests passing (up from 8 — added a new Commits (most recent last)
What was fixed (by review item number)Critical / High (8/8):
Medium (6/7):
Code Quality (3/6):
Test Coverage (2/7):
Not fixed (deferred to follow-ups)The following items from the review are real but were judged out of scope for this PR:
Test run |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
mysql-test/suite/prometheus_exporter/t/format_validation.test (1)
55-60: Tighten validation: reject metric samples without a preceding# TYPE.Currently, a sample line is accepted when no expected TYPE is pending. For this plugin’s contract, that weakens regression detection.
🔧 Proposed patch
- if (defined $expect_metric_name && $name ne $expect_metric_name) { - print "FORMAT ERROR: expected metric '$expect_metric_name' but got '$name'\n"; - $errors++; - } + if (!defined $expect_metric_name) { + print "FORMAT ERROR: metric '$name' missing preceding TYPE line\n"; + $errors++; + } elsif ($name ne $expect_metric_name) { + print "FORMAT ERROR: expected metric '$expect_metric_name' but got '$name'\n"; + $errors++; + } $expect_metric_name = undef;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mysql-test/suite/prometheus_exporter/t/format_validation.test` around lines 55 - 60, The test currently allows a sample line when $expect_metric_name is undef; change the check in the sample-handling block so that if $expect_metric_name is defined you still compare $name to $expect_metric_name, but if it is not defined you treat this as a format error (increment $errors and print a clear message like "FORMAT ERROR: sample without preceding TYPE for '<name>'"). Update the block that references $expect_metric_name, $name, and $errors to implement this rejection and then clear $expect_metric_name as before.mysql-test/suite/prometheus_exporter/t/scrape_counter.test (1)
16-18: Add explicit timeout to prevent test hangs, but reconsider fail-fast flag.
curl -scorrectly returns success here since the test validates correctness through theerrors_totalcounter assertion (line 33–35). However,curlwith no deadline can hang indefinitely. Adding--max-time 5is prudent.Avoid
-fflag: It makescurlexit non-zero on HTTP 4xx/5xx, causing--execto fail and abort the test before the verification checks (lines 24–35) run. The current design intentionally lets curl succeed and then validates via metrics—which is the correct approach for this regression test.---exec curl -s http://127.0.0.1:19110/metrics > /dev/null ---exec curl -s http://127.0.0.1:19110/metrics > /dev/null ---exec curl -s http://127.0.0.1:19110/metrics > /dev/null ++exec curl -s --max-time 5 http://127.0.0.1:19110/metrics > /dev/null ++exec curl -s --max-time 5 http://127.0.0.1:19110/metrics > /dev/null ++exec curl -s --max-time 5 http://127.0.0.1:19110/metrics > /dev/null🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mysql-test/suite/prometheus_exporter/t/scrape_counter.test` around lines 16 - 18, The three test invocations that run curl (the lines with "--exec curl -s http://127.0.0.1:19110/metrics > /dev/null") should include an explicit timeout to avoid hanging: add "--max-time 5" to each curl invocation; do NOT add "-f" or any fail-fast flag since the test intentionally lets curl succeed and then verifies correctness via the errors_total counter in the subsequent assertions (lines around the errors_total check).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@mysql-test/suite/prometheus_exporter/t/format_validation.test`:
- Line 19: The curl backtick invocation assigning to $output can hang; change
the command to include timeout and fail flags (e.g. add --max-time 5 --fail
--silent) and then check the shell exit status ($?) immediately after the
backtick call, failing the test if non-zero (die or ok/fail) so the test is
fail-fast on fetch errors; locate the assignment to $output (my $output = `curl
-s http://127.0.0.1:19109/metrics`), replace the command string with one that
includes --max-time and --fail, and add a conditional checking $? to handle and
report the error before proceeding.
---
Nitpick comments:
In `@mysql-test/suite/prometheus_exporter/t/format_validation.test`:
- Around line 55-60: The test currently allows a sample line when
$expect_metric_name is undef; change the check in the sample-handling block so
that if $expect_metric_name is defined you still compare $name to
$expect_metric_name, but if it is not defined you treat this as a format error
(increment $errors and print a clear message like "FORMAT ERROR: sample without
preceding TYPE for '<name>'"). Update the block that references
$expect_metric_name, $name, and $errors to implement this rejection and then
clear $expect_metric_name as before.
In `@mysql-test/suite/prometheus_exporter/t/scrape_counter.test`:
- Around line 16-18: The three test invocations that run curl (the lines with
"--exec curl -s http://127.0.0.1:19110/metrics > /dev/null") should include an
explicit timeout to avoid hanging: add "--max-time 5" to each curl invocation;
do NOT add "-f" or any fail-fast flag since the test intentionally lets curl
succeed and then verifies correctness via the errors_total counter in the
subsequent assertions (lines around the errors_total check).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0ca05812-c49f-4d12-9a5d-4794d009e036
📒 Files selected for processing (6)
mysql-test/suite/prometheus_exporter/r/basic.resultmysql-test/suite/prometheus_exporter/r/scrape_counter.resultmysql-test/suite/prometheus_exporter/t/format_validation.testmysql-test/suite/prometheus_exporter/t/scrape_counter-master.optmysql-test/suite/prometheus_exporter/t/scrape_counter.testplugin/prometheus_exporter/prometheus_exporter.cc
✅ Files skipped from review due to trivial changes (3)
- mysql-test/suite/prometheus_exporter/t/scrape_counter-master.opt
- mysql-test/suite/prometheus_exporter/r/basic.result
- mysql-test/suite/prometheus_exporter/r/scrape_counter.result
🚧 Files skipped from review as they are similar to previous changes (1)
- plugin/prometheus_exporter/prometheus_exporter.cc
| use strict; | ||
| use warnings; | ||
|
|
||
| my $output = `curl -s http://127.0.0.1:19109/metrics`; |
There was a problem hiding this comment.
Add timeout and fail-fast handling to the curl fetch.
At Line 19, curl -s can block indefinitely and make this test flaky under listener stalls. Add --max-time, --fail, and explicit exit-status handling.
🔧 Proposed patch
-my $output = `curl -s http://127.0.0.1:19109/metrics`;
+my $output = `curl -sS --fail --max-time 5 http://127.0.0.1:19109/metrics`;
+if ($? != 0) {
+ print "ERROR: failed to fetch /metrics\n";
+ exit 1;
+}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| my $output = `curl -s http://127.0.0.1:19109/metrics`; | |
| my $output = `curl -sS --fail --max-time 5 http://127.0.0.1:19109/metrics`; | |
| if ($? != 0) { | |
| print "ERROR: failed to fetch /metrics\n"; | |
| exit 1; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@mysql-test/suite/prometheus_exporter/t/format_validation.test` at line 19,
The curl backtick invocation assigning to $output can hang; change the command
to include timeout and fail flags (e.g. add --max-time 5 --fail --silent) and
then check the shell exit status ($?) immediately after the backtick call,
failing the test if non-zero (die or ok/fail) so the test is fail-fast on fetch
errors; locate the assignment to $output (my $output = `curl -s
http://127.0.0.1:19109/metrics`), replace the command string with one that
includes --max-time and --fail, and add a conditional checking $? to handle and
report the error before proceeding.
- Add channel labels to replica metrics for multi-source replication - Named channels emit samples with channel="..." label - Default unnamed channel remains unlabeled - Seconds_Behind_Source=NULL exported as NaN - Improve error handling and HTTP status codes - Return HTTP 500 on scrape failures instead of false-success 200 - Increment Prometheus_exporter_errors_total on failures - Report specific error messages in response body - Add new test coverage - replica_channels: multi-source replication with two channels - replica_null_lag: verify NaN for stopped replicas - scrape_error: verify HTTP 500 on security context failure - General hardening - Use loose- prefixed variables in test configs - Add have_prometheus_exporter_plugin.inc include - Validate Prometheus format in format_validation test - Check for duplicate TYPE lines - Restrict build to Linux only (IF(LINUX) in CMakeLists.txt) - Update documentation noting future dynamic variable support
Add comprehensive Doxygen-style comments to plugin source: - File-level documentation describing architecture, metric sources, security notes, and platform requirements - Documentation for plugin configuration variables: - prom_enabled: HTTP endpoint enable flag - prom_port: TCP port with valid range 1024-65535 - prom_bind_address: IPv4 binding requirement (numeric only) - prom_security_user: internal session account for metric queries - Documentation for plugin operational metrics: - g_requests_total: total scrapes served - g_errors_total: total scrape errors - g_last_scrape_duration_us: last scrape duration in microseconds
- Fix double-counting: make_scrape_error() no longer increments g_errors_total since collector failures already increment via command_failed() callbacks. Early failures (server unavailable, session open, security context) now increment directly. - Make binlog collection non-fatal: if SHOW BINARY LOGS fails (e.g. when log_bin=OFF), skip binlog metrics but still return other collected data with HTTP 200 instead of failing entire scrape.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
mysql-test/suite/prometheus_exporter/t/replica_channels.test (1)
28-31:|| trueon these grep calls can mask real regressions.Piping
|| trueto these counting greps means that if the grep utility itself fails (e.g., missing file, shell/env issue) the test will silently print nothing instead of failing. For the counting lines,grep -calready returns success whenever it finds ≥1 match and the count is captured in the diff against the.resultfile — so the|| trueonly helps in the "zero matches" case, which is exactly the case the test should flag. Consider dropping|| trueso a stalled/failing scrape is caught here rather than hidden behind an unexpected "0" in the result file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mysql-test/suite/prometheus_exporter/t/replica_channels.test` around lines 28 - 31, Remove the trailing "|| true" from the grep commands in the replica_channels.test snippet (the lines invoking grep -c '^# TYPE mysql_replica_read_source_log_pos gauge$' and the grep -c checks for mysql_replica_read_source_log_pos with channel names) so that grep failures (missing file or env errors) cause the test to fail instead of being masked; leave the rest of the grep -c usage unchanged so zero-match semantics are still reported by the test harness.plugin/prometheus_exporter/prometheus_exporter.cc (2)
1003-1021:read_http_requestcan return on a partial request line.The early-exit at line 1017 (
strstr(buf, "\r\n") != nullptr && total >= 13) stops as soon as any\r\nis seen and 13 bytes have been read, which is less than the shortest valid HTTP/1.1 request ("GET / HTTP/1.1\r\n"is 16 bytes including CRLF). More importantly, the method/URI prefix check at line 1077 relies on having read at least through"GET /metrics"(13 chars including the trailing space/\r), but a pathological peer could send"GET /met\r\n..."in one segment:totalwould be ≥13 only after more bytes arrive, so this is OK in practice, but the shortcut is fragile and method-specific.Recommend simplifying to: loop until
strstr(buf, "\r\n\r\n")is found,recvreturns ≤0, ormax_len-1is filled. The 5sSO_RCVTIMEOalready bounds the wait, and reading the full headers removes a class of parser edge cases if the dispatch logic ever grows (HEAD, query strings, trailing paths, etc.).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 1003 - 1021, The read_http_request function currently returns early when any "\r\n" is seen plus a minimal byte count, which can produce partial request lines; change read_http_request to stop only when one of these conditions occurs: recv returns <= 0 (client closed or error), the buffer is full (total >= max_len-1), or the full header terminator "\r\n\r\n" is found; remove the early-exit check that looks for a single "\r\n" and the length >= 13 heuristic so the loop always reads complete request headers before returning.
575-583:std::to_string(double)produces locale-independent but lossy 6-decimal output.
std::to_string(double)formats with%fand 6 digits after the decimal point, so e.g.1.5becomes"1.500000"and very small/large values lose precision or print as0.000000. For the numeric replica fields currently collected,SHOW REPLICA STATUSprimarily returns integers delivered throughreplica_get_integer/replica_get_longlong, so this rarely bites today — but if any double-valued column is ever added (or the server delivers a numeric as double), the scrape will silently truncate.Consider using
snprintfwith%g/%.17gorstd::to_chars<double>to preserve precision and avoid trailing zeros:char tmp[64]; int n = snprintf(tmp, sizeof(tmp), "%.17g", value); rc->col_values[rc->col_index].assign(tmp, n > 0 ? n : 0);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 575 - 583, In replica_get_double, avoid lossy std::to_string(double) output; format the double with a precision-preserving routine (e.g. snprintf(tmp, sizeof tmp, "%.17g", value) or std::to_chars) and assign the resulting string to rc->col_values[rc->col_index] (handling snprintf return <= 0) instead of std::to_string; keep the existing rc->col_is_null[rc->col_index] = false and rc->col_index++ logic unchanged and use the function name replica_get_double to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 949-950: The empty if ( !collect_binlog(session, output) ) { } is
swallowing errors but still lets collect_binlog/command_failed increment
g_errors_total; modify binlog error handling so the "no binary logging" case is
treated as non-fatal: update binlog_handle_error (or add a flag returned by
collect_binlog) to detect ER_NO_BINARY_LOGGING and avoid incrementing
g_errors_total (or pass a silence flag into command_failed to skip accounting
for optional collectors), ensure collect_binlog returns a clear
success/expected-failure value, and replace the empty if body with a short
comment explaining the intentional swallow to avoid future confusion (reference:
collect_binlog, binlog_handle_error, command_failed, g_errors_total,
ER_NO_BINARY_LOGGING).
---
Nitpick comments:
In `@mysql-test/suite/prometheus_exporter/t/replica_channels.test`:
- Around line 28-31: Remove the trailing "|| true" from the grep commands in the
replica_channels.test snippet (the lines invoking grep -c '^# TYPE
mysql_replica_read_source_log_pos gauge$' and the grep -c checks for
mysql_replica_read_source_log_pos with channel names) so that grep failures
(missing file or env errors) cause the test to fail instead of being masked;
leave the rest of the grep -c usage unchanged so zero-match semantics are still
reported by the test harness.
In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 1003-1021: The read_http_request function currently returns early
when any "\r\n" is seen plus a minimal byte count, which can produce partial
request lines; change read_http_request to stop only when one of these
conditions occurs: recv returns <= 0 (client closed or error), the buffer is
full (total >= max_len-1), or the full header terminator "\r\n\r\n" is found;
remove the early-exit check that looks for a single "\r\n" and the length >= 13
heuristic so the loop always reads complete request headers before returning.
- Around line 575-583: In replica_get_double, avoid lossy std::to_string(double)
output; format the double with a precision-preserving routine (e.g.
snprintf(tmp, sizeof tmp, "%.17g", value) or std::to_chars) and assign the
resulting string to rc->col_values[rc->col_index] (handling snprintf return <=
0) instead of std::to_string; keep the existing rc->col_is_null[rc->col_index] =
false and rc->col_index++ logic unchanged and use the function name
replica_get_double to locate the change.
🪄 Autofix (Beta)
✅ Autofix completed
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4bf0374a-60ec-43cb-b5df-5715cf434c95
📒 Files selected for processing (30)
Docs/prometheus_exporter.mdmysql-test/suite/prometheus_exporter/inc/have_prometheus_exporter_plugin.incmysql-test/suite/prometheus_exporter/r/replica_channels.resultmysql-test/suite/prometheus_exporter/r/replica_null_lag.resultmysql-test/suite/prometheus_exporter/r/scrape_error.resultmysql-test/suite/prometheus_exporter/t/binlog-master.optmysql-test/suite/prometheus_exporter/t/binlog.testmysql-test/suite/prometheus_exporter/t/format_validation-master.optmysql-test/suite/prometheus_exporter/t/format_validation.testmysql-test/suite/prometheus_exporter/t/global_variables-master.optmysql-test/suite/prometheus_exporter/t/global_variables.testmysql-test/suite/prometheus_exporter/t/innodb_metrics-master.optmysql-test/suite/prometheus_exporter/t/innodb_metrics.testmysql-test/suite/prometheus_exporter/t/metrics_endpoint-master.optmysql-test/suite/prometheus_exporter/t/metrics_endpoint.testmysql-test/suite/prometheus_exporter/t/replica_channels-slave.optmysql-test/suite/prometheus_exporter/t/replica_channels.cnfmysql-test/suite/prometheus_exporter/t/replica_channels.testmysql-test/suite/prometheus_exporter/t/replica_null_lag-slave.optmysql-test/suite/prometheus_exporter/t/replica_null_lag.cnfmysql-test/suite/prometheus_exporter/t/replica_null_lag.testmysql-test/suite/prometheus_exporter/t/replica_status-master.optmysql-test/suite/prometheus_exporter/t/replica_status.testmysql-test/suite/prometheus_exporter/t/scrape_counter-master.optmysql-test/suite/prometheus_exporter/t/scrape_counter.testmysql-test/suite/prometheus_exporter/t/scrape_error-master.optmysql-test/suite/prometheus_exporter/t/scrape_error.testplugin/prometheus_exporter/CMakeLists.txtplugin/prometheus_exporter/README.mdplugin/prometheus_exporter/prometheus_exporter.cc
✅ Files skipped from review due to trivial changes (22)
- mysql-test/suite/prometheus_exporter/t/replica_null_lag.cnf
- mysql-test/suite/prometheus_exporter/t/format_validation-master.opt
- mysql-test/suite/prometheus_exporter/inc/have_prometheus_exporter_plugin.inc
- mysql-test/suite/prometheus_exporter/t/binlog-master.opt
- mysql-test/suite/prometheus_exporter/t/metrics_endpoint-master.opt
- mysql-test/suite/prometheus_exporter/r/replica_channels.result
- mysql-test/suite/prometheus_exporter/t/scrape_error-master.opt
- mysql-test/suite/prometheus_exporter/t/replica_status.test
- mysql-test/suite/prometheus_exporter/t/innodb_metrics-master.opt
- mysql-test/suite/prometheus_exporter/t/scrape_counter-master.opt
- mysql-test/suite/prometheus_exporter/r/scrape_error.result
- Docs/prometheus_exporter.md
- plugin/prometheus_exporter/CMakeLists.txt
- mysql-test/suite/prometheus_exporter/t/global_variables.test
- mysql-test/suite/prometheus_exporter/t/innodb_metrics.test
- mysql-test/suite/prometheus_exporter/t/scrape_counter.test
- mysql-test/suite/prometheus_exporter/t/replica_channels.cnf
- mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test
- mysql-test/suite/prometheus_exporter/t/replica_channels-slave.opt
- mysql-test/suite/prometheus_exporter/t/replica_null_lag-slave.opt
- mysql-test/suite/prometheus_exporter/t/global_variables-master.opt
- mysql-test/suite/prometheus_exporter/t/binlog.test
🚧 Files skipped from review as they are similar to previous changes (1)
- mysql-test/suite/prometheus_exporter/t/replica_status-master.opt
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Fixes Applied SuccessfullyFixed 1 file(s) based on 1 unresolved review comment. Files modified:
Commit: The changes have been pushed to the Time taken: |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
plugin/prometheus_exporter/prometheus_exporter.cc (3)
575-583:std::to_string(double)uses fixed 6-decimal precision.
std::to_stringformats doubles with"%f"(6 decimals, no exponent), which both loses precision for large/small values and is technically locale-sensitive. For the replica columns actually consumed here (Seconds_Behind_Source, positions, log space), values effectively arrive as integers viareplica_get_integer/replica_get_longlong, so this path is rarely hit in practice — but if it is, output like1234567890.000000or locale-formatted numbers could reach Prometheus.Prefer
snprintfwith"%.17g"(orstd::to_charsif available) to get a round-trippable, locale-independent representation.♻️ Suggested change
static int replica_get_double(void *ctx, double value, uint32) { auto *rc = static_cast<ReplicaStatusCtx *>(ctx); if (rc->col_index < static_cast<int>(rc->col_values.size())) { rc->col_is_null[rc->col_index] = false; - rc->col_values[rc->col_index] = std::to_string(value); + char tmp[64]; + snprintf(tmp, sizeof(tmp), "%.17g", value); + rc->col_values[rc->col_index] = tmp; } rc->col_index++; return 0; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 575 - 583, In replica_get_double replace the locale-sensitive std::to_string(value) with a round‑trippable, locale‑independent conversion (e.g. use snprintf(buffer, sizeof buffer, "%.17g", value) or std::to_chars if available) when writing into rc->col_values[rc->col_index]; keep the existing null/ index handling with rc->col_is_null and rc->col_index but format the double using "%.17g" (or to_chars) to avoid 6-decimal fixed output and locale issues.
476-503: Error-counting pathway is correct but subtle — consider a comment.
command_faileddeliberately only incrementsg_errors_totalwhencommand_fail && !callback_error, because every*_handle_errorcallback already increments it. This invariant is non-obvious (especially withbinlog_handle_errorskippingER_NO_BINARY_LOGGING, which then flows here ascommand_fail=true, callback_error=trueand avoids a spurious count). A one-line comment oncommand_faileddocumenting "handlers own the increment; we only cover the fail-without-callback case" would prevent future regressions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 476 - 503, Add a one-line explanatory comment above the function command_failed stating that error-counting is intentionally split: per-command callbacks (e.g., any *_handle_error like binlog_handle_error) are responsible for incrementing g_errors_total, so command_failed only increments the global counter in the specific case of command_fail && !callback_error to avoid double-counting (and to allow cases like binlog_handle_error which sets callback_error=true to suppress the increment). This documents the invariant and prevents future regressions.
1011-1029:read_http_requestuses a magic-number early break.Line 1025's
total >= 13appears to be targeting a minimal "GET /metrics\r\n" line but is effectively unreachable after the\r\n\r\ncheck one line above only matters for full headers — this heuristic allows breaking out on any request line followed by a bare\r\n. It's not incorrect (headers are optional for HTTP/1.0-ish clients and you only dispatch on the request line), just fragile if the dispatch logic changes later to care about headers. A brief comment explaining the intent, or requiring\r\n\r\noutright with a short overall timeout, would be clearer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 1011 - 1029, The early-break heuristic in read_http_request (the check using total >= 13) is a fragile magic-number expecting a minimal request line; remove that heuristic and require the full header terminator ("\r\n\r\n") to consider the request complete, and implement an overall read timeout instead of relying on that size check (use recv timeout via setsockopt(SO_RCVTIMEO) or select/poll before recv to enforce a global timeout); update or remove the comment around the strstr checks and the total >= 13 condition accordingly so only strstr(buf, "\r\n\r\n") controls completion and timeouts are handled by socket-level timeout logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 1031-1041: The listener thread (prometheus_listener_thread)
currently returns on srv_session_init_thread failure leaving the server socket
(listen_fd stored on PrometheusContext) open while prometheus_exporter_init
already succeeded; modify the startup sequence so the thread signals its
initialization result back to prometheus_exporter_init (e.g., add a condition
variable/flag or promise on PrometheusContext that the thread sets) and have
prometheus_exporter_init wait for that result and fail/clean up if the thread
failed, or at minimum ensure prometheus_listener_thread closes ctx->listen_fd
before returning on error; update related logic in prometheus_exporter_init and
PrometheusContext to support the coordination and cleanup.
---
Nitpick comments:
In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 575-583: In replica_get_double replace the locale-sensitive
std::to_string(value) with a round‑trippable, locale‑independent conversion
(e.g. use snprintf(buffer, sizeof buffer, "%.17g", value) or std::to_chars if
available) when writing into rc->col_values[rc->col_index]; keep the existing
null/ index handling with rc->col_is_null and rc->col_index but format the
double using "%.17g" (or to_chars) to avoid 6-decimal fixed output and locale
issues.
- Around line 476-503: Add a one-line explanatory comment above the function
command_failed stating that error-counting is intentionally split: per-command
callbacks (e.g., any *_handle_error like binlog_handle_error) are responsible
for incrementing g_errors_total, so command_failed only increments the global
counter in the specific case of command_fail && !callback_error to avoid
double-counting (and to allow cases like binlog_handle_error which sets
callback_error=true to suppress the increment). This documents the invariant and
prevents future regressions.
- Around line 1011-1029: The early-break heuristic in read_http_request (the
check using total >= 13) is a fragile magic-number expecting a minimal request
line; remove that heuristic and require the full header terminator ("\r\n\r\n")
to consider the request complete, and implement an overall read timeout instead
of relying on that size check (use recv timeout via setsockopt(SO_RCVTIMEO) or
select/poll before recv to enforce a global timeout); update or remove the
comment around the strstr checks and the total >= 13 condition accordingly so
only strstr(buf, "\r\n\r\n") controls completion and timeouts are handled by
socket-level timeout logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f5999e60-94c6-48bb-b931-139719a5bdf2
📒 Files selected for processing (1)
plugin/prometheus_exporter/prometheus_exporter.cc
| static void *prometheus_listener_thread(void *arg) { | ||
| auto *ctx = static_cast<PrometheusContext *>(arg); | ||
|
|
||
| // Initialize srv_session thread-local state once for this physical thread. | ||
| // Per the MySQL session service contract, this must be called once per | ||
| // thread that will use the session service, not once per request. | ||
| if (srv_session_init_thread(ctx->plugin_ref) != 0) { | ||
| LogPluginErrMsg(ERROR_LEVEL, ER_LOG_PRINTF_MSG, | ||
| "Prometheus exporter: failed to init session thread"); | ||
| return nullptr; | ||
| } |
There was a problem hiding this comment.
Silent failure mode when srv_session_init_thread fails.
If srv_session_init_thread() fails at thread start, the listener exits immediately and logs an error, but prometheus_exporter_init() has already returned 0 and the listen socket is bound. The kernel queues incoming SYNs but nothing ever calls accept(), so Prometheus scrapes will hang until client-side timeout with no clear diagnostic on the MySQL side (and Prometheus_exporter_requests_total stays at 0).
Consider coordinating the thread-start result back to init (e.g., via a promise/condition variable) and failing plugin init in that case, or at least closing listen_fd before returning so clients get an immediate connection refused.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 1031 - 1041,
The listener thread (prometheus_listener_thread) currently returns on
srv_session_init_thread failure leaving the server socket (listen_fd stored on
PrometheusContext) open while prometheus_exporter_init already succeeded; modify
the startup sequence so the thread signals its initialization result back to
prometheus_exporter_init (e.g., add a condition variable/flag or promise on
PrometheusContext that the thread sets) and have prometheus_exporter_init wait
for that result and fail/clean up if the thread failed, or at minimum ensure
prometheus_listener_thread closes ctx->listen_fd before returning on error;
update related logic in prometheus_exporter_init and PrometheusContext to
support the coordination and cleanup.
Summary
plugin/prometheus_exporter/)/metricsendpoint in Prometheus text exposition format via a built-in HTTP serversrv_session+command_serviceto execute SQL queries internally (no network, no auth overhead)--prometheus-exporter-enabled=ONData Sources (5 collectors)
mysql_global_status_*SHOW GLOBAL STATUSmysql_global_variables_*SHOW GLOBAL VARIABLESmysql_innodb_metrics_*information_schema.INNODB_METRICSmysql_replica_*SHOW REPLICA STATUSmysql_binlog_*SHOW BINARY LOGSConfiguration
Then:
curl http://127.0.0.1:9104/metricsDesign Decisions
srv_session, opened once per Prometheus scrapeSHOW REPLICA STATUSon a non-replica), that section is silently skippedTest Plan
./mysql-test/mysql-test-run.pl --suite=prometheus_exporter --parallel=auto)basic-- plugin install/uninstall, system variables, status variablesmetrics_endpoint-- HTTP endpoint, all 5 collector prefixes, 404 handlingglobal_variables-- SHOW GLOBAL VARIABLES collector outputinnodb_metrics-- INNODB_METRICS collector with counter/gauge typesreplica_status-- graceful absence on non-replica serverbinlog-- binary log file count and size metricsformat_validation-- Perl-based structural validation of Prometheus exposition formatcurl http://localhost:9104/metricsreturns valid Prometheus outputSummary by CodeRabbit
New Features
Documentation
Tests