Skip to content

Embedded Prometheus metrics exporter plugin#1

Open
renecannao wants to merge 19 commits intomainfrom
prometheus
Open

Embedded Prometheus metrics exporter plugin#1
renecannao wants to merge 19 commits intomainfrom
prometheus

Conversation

@renecannao
Copy link
Copy Markdown

@renecannao renecannao commented Apr 5, 2026

Summary

  • Add an embedded Prometheus metrics exporter as a MySQL daemon plugin (plugin/prometheus_exporter/)
  • Serves /metrics endpoint in Prometheus text exposition format via a built-in HTTP server
  • Uses srv_session + command_service to execute SQL queries internally (no network, no auth overhead)
  • Disabled by default; enabled via --prometheus-exporter-enabled=ON
  • Zero changes to existing MySQL/VillageSQL code outside the plugin directory and test suite

Data Sources (5 collectors)

Prefix Source Metrics
mysql_global_status_* SHOW GLOBAL STATUS ~400 server status counters
mysql_global_variables_* SHOW GLOBAL VARIABLES Numeric config values
mysql_innodb_metrics_* information_schema.INNODB_METRICS ~200 InnoDB internals
mysql_replica_* SHOW REPLICA STATUS Replication lag, thread state
mysql_binlog_* SHOW BINARY LOGS File count, total size

Configuration

[mysqld]
plugin-load=prometheus_exporter=prometheus_exporter.so
prometheus-exporter-enabled=ON
prometheus-exporter-port=9104
prometheus-exporter-bind-address=127.0.0.1

Then: curl http://127.0.0.1:9104/metrics

Design Decisions

  • SQL queries over direct struct access: Uses stable SQL interfaces rather than internal server structs, making the plugin resilient to MySQL version changes during rebases
  • Single session per scrape: All collectors share one srv_session, opened once per Prometheus scrape
  • Graceful failure per collector: If a query fails (e.g., SHOW REPLICA STATUS on a non-replica), that section is silently skipped

Test Plan

  • 7 MTR tests pass (./mysql-test/mysql-test-run.pl --suite=prometheus_exporter --parallel=auto)
    • basic -- plugin install/uninstall, system variables, status variables
    • metrics_endpoint -- HTTP endpoint, all 5 collector prefixes, 404 handling
    • global_variables -- SHOW GLOBAL VARIABLES collector output
    • innodb_metrics -- INNODB_METRICS collector with counter/gauge types
    • replica_status -- graceful absence on non-replica server
    • binlog -- binary log file count and size metrics
    • format_validation -- Perl-based structural validation of Prometheus exposition format
  • Manual: curl http://localhost:9104/metrics returns valid Prometheus output
  • Manual: Prometheus can scrape the endpoint and display metrics

Summary by CodeRabbit

  • New Features

    • Prometheus exporter plugin: serves MySQL metrics at /metrics; configurable enablement, bind address, port, and security user; exposes exporter self‑metrics (requests, errors, scrape duration); Linux-only build.
  • Documentation

    • Added comprehensive docs and quick‑start examples for installing, configuring, and scraping the exporter (including replication labeling and numeric IPv4 bind-address requirement).
  • Tests

    • Added extensive test suite validating endpoint behavior, exposition format, metric families (status, variables, InnoDB, replication, binlog), scrape counters, and error handling.

Add a MySQL daemon plugin that serves Prometheus text exposition format
metrics via an embedded HTTP endpoint. Uses srv_session + command_service
to execute SHOW GLOBAL STATUS internally (no network, no auth).

Features:
- Disabled by default (--prometheus-exporter-enabled=ON to activate)
- Configurable port (default 9104) and bind address
- Exports ~400 global status variables as mysql_global_status_* metrics
- Gauge/untyped type classification for known variables
- Plugin self-monitoring: requests_total, errors_total, scrape_duration
- Clean shutdown via poll() + atomic flag

Includes MTR test suite (basic + metrics_endpoint) and design spec/plan
for v2 enhancements.
Extract the inline SHOW GLOBAL STATUS logic from collect_metrics() into
separate reusable functions: collect_name_value_query() as a generic
helper for 2-column queries, and collect_global_status() as the specific
collector. MetricsCollectorCtx now takes a configurable prefix and type
function, enabling additional collectors to be added cleanly.
Export all numeric MySQL server configuration variables as Prometheus
gauge metrics with the mysql_global_variables_ prefix. Non-numeric
values (strings, paths, ON/OFF) are automatically skipped by the
existing strtod check.
Add collector that exports InnoDB metrics from
information_schema.INNODB_METRICS with proper Prometheus type mapping
(InnoDB 'counter' -> Prometheus counter, everything else -> gauge).
Uses dedicated 4-column callbacks since INNODB_METRICS returns
NAME, SUBSYSTEM, TYPE, COUNT rather than simple name/value pairs.
Add column-name-aware parsing for SHOW REPLICA STATUS that exports
6 replication metrics as gauges: seconds_behind_source, io_running,
sql_running, relay_log_space, exec_source_log_pos, and
read_source_log_pos. Boolean fields (IO/SQL running) emit 1 for
"Yes", 0 otherwise. Gracefully emits nothing on non-replica servers.
Adds a binlog collector that runs SHOW BINARY LOGS and emits two
synthetic gauge metrics: mysql_binlog_file_count (number of binary
log files) and mysql_binlog_size_bytes_total (sum of all file sizes).
If binary logging is disabled the query errors silently and no
metrics are emitted.
- Update basic.test with inline documentation and clearer echo messages
- Update metrics_endpoint.test to verify all 5 collector prefixes
  (global status, global variables, InnoDB metrics, binlog)
- Add format_validation.test with Perl-based Prometheus exposition
  format validation (TYPE lines, metric names, value structure)
- Record all result files
- Full README with ASCII architecture diagram, configuration table,
  metric namespace reference, usage examples, type classification,
  and limitations
- Pointer doc in Docs/ directory
Copilot AI review requested due to automatic review settings April 5, 2026 05:59
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new Linux-only Prometheus Exporter daemon plugin that serves an HTTP /metrics endpoint, collects MySQL metrics via internal srv_sessions, exposes Prometheus text-format metrics, registers server variables/status counters, and includes build files, docs, and a comprehensive MTR test suite.

Changes

Cohort / File(s) Summary
Documentation
Docs/prometheus_exporter.md, plugin/prometheus_exporter/README.md
New overview and detailed README describing plugin behavior, configuration, metric namespaces, usage, and operational limitations.
Plugin Source & Build
plugin/prometheus_exporter/CMakeLists.txt, plugin/prometheus_exporter/prometheus_exporter.cc
New Linux-guarded CMake target and daemon plugin implementation: listener thread, HTTP /metrics handling, internal srv_session metric collection across SHOWs/INNODB_METRICS/SHOW REPLICA STATUS/SHOW BINARY LOGS, formatting, lifecycle, security-user handling, and exported SHOW variables / system vars.
Test Harness Registration
mysql-test/include/plugin.defs
Added plugin entry for test harness integration and environment gating.
MTR Tests & Options
mysql-test/suite/prometheus_exporter/t/*.test, mysql-test/suite/prometheus_exporter/t/*-master.opt, mysql-test/suite/prometheus_exporter/t/*-slave.opt, mysql-test/suite/prometheus_exporter/t/*.cnf
New test cases and option files exercising install/uninstall, /metrics endpoint, format validation, global vars, InnoDB, binlog, replica scenarios (multi-channel), scrape counters, and error handling; includes test runtime config files.
Test Results / Fixtures / Includes
mysql-test/suite/prometheus_exporter/r/*.result, mysql-test/suite/prometheus_exporter/inc/have_prometheus_exporter_plugin.inc
New expected-result fixtures validating metric presence/types/values, scrape behavior, replica labeling and NaN handling, scrape error cases, and helper include to gate tests when plugin not available.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Listener as "Listener Thread (HTTP)"
    participant Session as "Internal srv_session"
    participant Collector as "Metric Collector/Formatter"

    Client->>Listener: TCP connect + GET /metrics
    Listener->>Listener: parse HTTP request
    alt path == /metrics
        Listener->>Session: init srv_session & set security user
        Session->>Session: run queries (SHOW GLOBAL STATUS, SHOW VARIABLES, INNODB_METRICS, SHOW REPLICA STATUS, SHOW BINARY LOGS)
        Session-->>Collector: return rows
        Collector->>Collector: classify types, convert values/labels, aggregate
        Collector-->>Listener: Prometheus text payload
        Listener->>Listener: update Prometheus_exporter_* counters/timers
        Listener->>Client: HTTP 200 + metrics payload
    else
        Listener->>Client: HTTP 404
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped a thread and opened a port,

Gathered SHOWs and numbers of sort.
Labels and gauges, counters in tow,
Scrapes counted thrice — a tiny show.
Metrics served warm, ready to go.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Embedded Prometheus metrics exporter plugin' directly and clearly describes the main change: adding a new Prometheus exporter plugin with embedded HTTP metrics endpoint.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch prometheus

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Remove spec and plan docs from docs/superpowers/ -- these were
development-time artifacts. User-facing documentation lives in
plugin/prometheus_exporter/README.md.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an embedded Prometheus /metrics HTTP exporter as a MySQL/VillageSQL daemon plugin, plus an accompanying MTR test suite and documentation, so Prometheus can scrape server metrics without an external sidecar.

Changes:

  • Introduces prometheus_exporter daemon plugin implementing an in-process HTTP server and 5 SQL-based collectors (status, variables, InnoDB metrics, replica status, binlogs).
  • Adds a new prometheus_exporter MTR suite with per-collector endpoint checks and a Perl-based exposition-format validator.
  • Adds plugin build wiring and end-user documentation (README + Docs pointer) plus internal design/spec docs.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
plugin/prometheus_exporter/prometheus_exporter.cc Implements the embedded HTTP server and SQL-driven metric collection/export.
plugin/prometheus_exporter/CMakeLists.txt Adds build rules to produce the prometheus_exporter plugin module.
plugin/prometheus_exporter/README.md Documents architecture, configuration, metric namespaces, and limitations.
Docs/prometheus_exporter.md Provides a Docs/ entry pointing to the plugin README.
mysql-test/include/plugin.defs Registers the plugin for MTR so it can be loaded in test runs.
mysql-test/suite/prometheus_exporter/t/basic.test Verifies install/uninstall lifecycle and sysvar/statusvar presence.
mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test Verifies /metrics endpoint behavior, multiple prefixes, and 404 handling.
mysql-test/suite/prometheus_exporter/t/global_variables.test Validates SHOW GLOBAL VARIABLES metrics export.
mysql-test/suite/prometheus_exporter/t/innodb_metrics.test Validates information_schema.INNODB_METRICS metrics export and typing.
mysql-test/suite/prometheus_exporter/t/replica_status.test Ensures replica metrics are absent on non-replica servers.
mysql-test/suite/prometheus_exporter/t/binlog.test Validates binlog file count/size metrics export.
mysql-test/suite/prometheus_exporter/t/format_validation.test Perl-based structural validation of Prometheus text exposition output.
mysql-test/suite/prometheus_exporter/t/metrics_endpoint-master.opt Test server options (enable plugin, bind/port) for endpoint test.
mysql-test/suite/prometheus_exporter/t/global_variables-master.opt Test server options for global variables collector test.
mysql-test/suite/prometheus_exporter/t/innodb_metrics-master.opt Test server options for InnoDB metrics collector test.
mysql-test/suite/prometheus_exporter/t/replica_status-master.opt Test server options for replica-status collector test.
mysql-test/suite/prometheus_exporter/t/binlog-master.opt Test server options for binlog collector test.
mysql-test/suite/prometheus_exporter/t/format_validation-master.opt Test server options for format-validation test.
mysql-test/suite/prometheus_exporter/r/basic.result Recorded output for basic lifecycle test.
mysql-test/suite/prometheus_exporter/r/metrics_endpoint.result Recorded output for endpoint/all-collector smoke test.
mysql-test/suite/prometheus_exporter/r/global_variables.result Recorded output for global variables collector test.
mysql-test/suite/prometheus_exporter/r/innodb_metrics.result Recorded output for InnoDB metrics collector test.
mysql-test/suite/prometheus_exporter/r/replica_status.result Recorded output for replica-status absence test.
mysql-test/suite/prometheus_exporter/r/binlog.result Recorded output for binlog collector test.
mysql-test/suite/prometheus_exporter/r/format_validation.result Recorded output for exposition-format validation test.
docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md Internal design spec for the v2 exporter.
docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md Internal step-by-step implementation plan and commands.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread plugin/prometheus_exporter/README.md Outdated
Comment on lines +19 to +22
│ ┌──────────────���───────────────────────────────┐ │
│ │ prometheus_exporter plugin │ │
│ │ �� │
│ │ ┌──────────────┐ ┌────────────────────┐ │ │
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ASCII architecture diagram contains corrupted/invalid UTF-8 characters (e.g., "���", "��"), which makes the README render poorly and is likely an encoding artifact. Please replace those characters with plain ASCII/Unicode box-drawing characters and ensure the file is saved as valid UTF-8.

Copilot uses AI. Check for mistakes.
Comment on lines +703 to +707
// Switch to root security context
MYSQL_SECURITY_CONTEXT sc;
thd_get_security_context(srv_session_info_get_thd(session), &sc);
security_context_lookup(sc, "root", "localhost", "127.0.0.1", "");

Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exporter unconditionally switches the internal session to the "root" security context and ignores return codes from thd_get_security_context/security_context_lookup. This effectively runs the collectors with full privileges and can fail silently depending on account configuration. Consider using a dedicated low-privilege account/context (or the admin session service if appropriate), and handle errors explicitly (e.g., fail the scrape / increment errors_total when context switch fails).

Suggested change
// Switch to root security context
MYSQL_SECURITY_CONTEXT sc;
thd_get_security_context(srv_session_info_get_thd(session), &sc);
security_context_lookup(sc, "root", "localhost", "127.0.0.1", "");
// Run collectors using the default internal session context. Do not
// unconditionally elevate this session to the root account.

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +81
"endpoint. Default 0.0.0.0.",
nullptr, nullptr, "0.0.0.0");
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting the bind address to 0.0.0.0 while also having no authentication/TLS means enabling the plugin can unintentionally expose internal server metrics on all interfaces. Consider changing the default to 127.0.0.1 (or requiring an explicit bind address when enabled) and logging a warning when binding to a non-loopback address.

Suggested change
"endpoint. Default 0.0.0.0.",
nullptr, nullptr, "0.0.0.0");
"endpoint. Default 127.0.0.1.",
nullptr, nullptr, "127.0.0.1");

Copilot uses AI. Check for mistakes.

static void prom_handle_error(void *ctx, uint, const char *, const char *) {
auto *mc = static_cast<MetricsCollectorCtx *>(ctx);
mc->error = true;
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors_total is never incremented: prom_handle_error()/innodb_handle_error()/replica_handle_error()/binlog_handle_error() only set per-collector flags, but those flags are never checked and g_ctx->errors_total is only read for SHOW STATUS. Please wire collector/scrape failures to increment errors_total (and/or surface an error comment in the scrape output) so the status variable reflects actual failures.

Suggested change
mc->error = true;
mc->error = true;
if (g_ctx != nullptr) {
++g_ctx->errors_total;
}

Copilot uses AI. Check for mistakes.
Comment on lines +204 to +208
// Try to parse as a number; skip non-numeric values (ON/OFF etc.)
char *end = nullptr;
double val = strtod(mc->current_value.c_str(), &end);
if (end == mc->current_value.c_str()) return 0; // not numeric
(void)val; // we use the string representation directly
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numeric parsing accepts partially-numeric strings (e.g., "1abc"): strtod() is only checked for end==start, but the original (possibly invalid) string is emitted as the Prometheus value. This can generate invalid exposition output. Please require that the full string parses as a number (optionally allowing surrounding whitespace) before emitting.

Copilot uses AI. Check for mistakes.
Comment on lines +482 to +487
// Check if numeric
char *end = nullptr;
strtod(val.c_str(), &end);
if (end == val.c_str()) continue; // not numeric, skip
value_str = val;
}
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replica metric numeric validation has the same partial-parse issue as prom_end_row: it only checks end==start after strtod(), so values like "123abc" would be emitted and break Prometheus parsing. Please require full numeric consumption (optionally allowing trailing whitespace) before emitting.

Copilot uses AI. Check for mistakes.
Comment on lines +570 to +573
char *end = nullptr;
long long sz = strtoll(bc->current_size.c_str(), &end, 10);
if (end != bc->current_size.c_str()) {
bc->total_size += sz;
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binlog size parsing uses strtoll() but only checks end!=start; partially-numeric strings (e.g., "123abc") would still be counted. Please validate that the entire field parses as an integer (optionally allowing whitespace) before adding it to total_size.

Suggested change
char *end = nullptr;
long long sz = strtoll(bc->current_size.c_str(), &end, 10);
if (end != bc->current_size.c_str()) {
bc->total_size += sz;
const char *start = bc->current_size.c_str();
char *end = nullptr;
long long sz = strtoll(start, &end, 10);
if (end != start) {
while (*end == ' ' || *end == '\t' || *end == '\n' || *end == '\r' ||
*end == '\f' || *end == '\v') {
++end;
}
if (*end == '\0') {
bc->total_size += sz;
}

Copilot uses AI. Check for mistakes.
Comment on lines +786 to +787
if (strncmp(buf, "GET /metrics", 12) == 0) {
ctx->requests_total.fetch_add(1, std::memory_order_relaxed);
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP routing check is too permissive: strncmp("GET /metrics", 12) will also match paths like /metricsXYZ and doesn't ensure the first line is a valid request line. Consider parsing just the request line and matching the exact path "/metrics" (and method GET) before returning 200.

Copilot uses AI. Check for mistakes.
--source include/not_windows.inc

--echo # On a non-replica server, no mysql_replica_ metrics should appear
--exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0"
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command prints two lines when there are no matches: grep -c already outputs 0, but exits with status 1 which triggers || echo "0". Use || true (or similar) to keep the exit status non-fatal without duplicating output.

Suggested change
--exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0"
--exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || true

Copilot uses AI. Check for mistakes.
Comment on lines +774 to +783
int client_fd = accept(ctx->listen_fd, nullptr, nullptr);
if (client_fd < 0) continue;

// Read HTTP request
char buf[4096];
ssize_t n = read(client_fd, buf, sizeof(buf) - 1);
if (n <= 0) {
close(client_fd);
continue;
}
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The listener performs a blocking read() on the accepted client socket. Because the HTTP server is single-threaded, a client that connects and then stalls (or sends very slowly) can block the exporter indefinitely and prevent Prometheus scrapes (and potentially interfere with plugin shutdown). Consider setting a receive timeout (SO_RCVTIMEO), making the client socket non-blocking + polling for readability, or otherwise enforcing a small read deadline before parsing the request line.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (12)
mysql-test/suite/prometheus_exporter/t/replica_status.test (2)

1-1: Consider documenting why Windows is excluded.

Adding a brief comment explaining why not_windows.inc is needed (e.g., curl dependency, or Windows socket behavior differences) would help future maintainers understand the limitation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mysql-test/suite/prometheus_exporter/t/replica_status.test` at line 1, Add a
short inline comment above the include line referencing include/not_windows.inc
that explains why Windows is excluded (for example: reliance on curl/Unix
sockets, POSIX-only behavior in the Prometheus exporter tests, or known Windows
networking incompatibilities); update the test file's top-of-file section (where
the include "include/not_windows.inc" appears) to state the specific reason and
any relevant bug/issue tracker IDs so future maintainers understand the
limitation.

4-4: The grep -c || echo "0" pattern produces two lines of output.

When grep -c finds no matches, it outputs "0" (the count) but exits with status 1. The || echo "0" then also outputs "0", resulting in two "0" lines. This matches the result file, but the logic is confusing and redundant.

Consider using grep -c ... || true to suppress the exit status while keeping the count output, or redirect grep's output and handle it differently.

Proposed fix for cleaner output
 --echo # On a non-replica server, no mysql_replica_ metrics should appear
---exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0"
+--exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || true

And update the result file to expect a single "0":

 # On a non-replica server, no mysql_replica_ metrics should appear
 0
-0
 # But other metrics should still be present
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mysql-test/suite/prometheus_exporter/t/replica_status.test` at line 4, The
test currently uses the shell line '--exec curl -s
http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0"', which
prints "0" twice when no matches are found; replace that pattern with '--exec
curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || true' so
grep's count is preserved and non-zero exit is suppressed, and update the test
result expectation to a single "0" output (remove the duplicate line) to keep
output and assertions consistent.
plugin/prometheus_exporter/prometheus_exporter.cc (5)

403-410: Unused has_row field in ReplicaStatusCtx.

The has_row field is set to true in replica_start_row (line 432) but never read. If not needed for future use, consider removing it for clarity.

♻️ Suggested cleanup
 struct ReplicaStatusCtx {
   std::string *output;
   std::vector<std::string> col_names;
   std::vector<std::string> col_values;
   int col_index;
-  bool has_row;
   bool error;
 };

And remove the assignment in replica_start_row:

 static int replica_start_row(void *ctx) {
   auto *rc = static_cast<ReplicaStatusCtx *>(ctx);
   rc->col_values.clear();
   rc->col_values.resize(rc->col_names.size());
   rc->col_index = 0;
-  rc->has_row = true;
   return 0;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 403 - 410,
The struct ReplicaStatusCtx contains an unused field has_row; remove the has_row
member from ReplicaStatusCtx and also remove the assignment to it inside
replica_start_row so there are no leftover writes to a deleted field; update any
references (none expected) and rebuild to ensure no use remains of
ReplicaStatusCtx::has_row.

851-863: Thread attribute not destroyed after use.

my_thread_attr_init(&attr) allocates resources that should be released with my_thread_attr_destroy(&attr) after the thread is created, regardless of success or failure.

♻️ Suggested fix
   my_thread_attr_t attr;
   my_thread_attr_init(&attr);
   my_thread_attr_setdetachstate(&attr, MY_THREAD_CREATE_JOINABLE);

   if (my_thread_create(&g_ctx->listener_thread, &attr,
                        prometheus_listener_thread, g_ctx) != 0) {
     LogPluginErr(ERROR_LEVEL, ER_LOG_PRINTF_MSG,
                  "Prometheus exporter: failed to create listener thread");
+    my_thread_attr_destroy(&attr);
     close(g_ctx->listen_fd);
     delete g_ctx;
     g_ctx = nullptr;
     return 1;
   }
+  my_thread_attr_destroy(&attr);

   LogPluginErr(INFORMATION_LEVEL, ER_LOG_PRINTF_MSG,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 851 - 863,
The thread attribute initialized with my_thread_attr_init(&attr) is not
destroyed; call my_thread_attr_destroy(&attr) after attempting to create the
thread to free resources whether my_thread_create(&g_ctx->listener_thread,
&attr, prometheus_listener_thread, g_ctx) succeeds or fails; add
my_thread_attr_destroy(&attr) before the error return path (after cleaning up
g_ctx/listen_fd) and also after successful creation (before proceeding) so attr
is always destroyed.

119-157: Consider using a hash set for gauge variable lookup.

is_gauge() performs a linear search through ~30 items on every metric emission. While acceptable for current scale, a std::unordered_set would provide O(1) lookup and better scalability if the list grows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 119 - 157,
Replace the linear-scan array lookup in is_gauge with an O(1) hash lookup:
create a static std::unordered_set<std::string> (or std::string_view with
storage) containing all gauge names (populate it once from gauge_variables or
via an initializer list), ensure case-insensitive matching by storing lowercased
names and lowercasing the input name before lookup (or use a custom
case-insensitive hash/equality), and make initialization thread-safe by using a
function-local static or std::call_once; then change is_gauge to query that
unordered_set instead of iterating through gauge_variables.

52-55: Global service pointers lack internal linkage.

log_bi and log_bs are declared without static, giving them external linkage. This could cause symbol collisions if another plugin uses the same names. These should be static like reg_srv.

♻️ Suggested fix
 static SERVICE_TYPE(registry) *reg_srv = nullptr;
-SERVICE_TYPE(log_builtins) *log_bi = nullptr;
-SERVICE_TYPE(log_builtins_string) *log_bs = nullptr;
+static SERVICE_TYPE(log_builtins) *log_bi = nullptr;
+static SERVICE_TYPE(log_builtins_string) *log_bs = nullptr;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 52 - 55, The
global service pointers log_bi and log_bs currently have external linkage; make
them have internal linkage like reg_srv by declaring them static (i.e., add the
static storage specifier to the SERVICE_TYPE(log_builtins) *log_bi and
SERVICE_TYPE(log_builtins_string) *log_bs declarations) so they won’t collide
with symbols from other plugins; keep SERVICE_TYPE(registry) *reg_srv unchanged.

703-707: Missing return value check on security_context_lookup.

The return value of security_context_lookup is not checked. While failures are rare (root user should always exist), defensive coding would log or handle the error case.

🛡️ Suggested defensive check
   // Switch to root security context
   MYSQL_SECURITY_CONTEXT sc;
   thd_get_security_context(srv_session_info_get_thd(session), &sc);
-  security_context_lookup(sc, "root", "localhost", "127.0.0.1", "");
+  if (security_context_lookup(sc, "root", "localhost", "127.0.0.1", "") != 0) {
+    srv_session_close(session);
+    srv_session_deinit_thread();
+    return "# Failed to set security context\n";
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 703 - 707,
The call to security_context_lookup is unchecked; capture its return value and
handle errors defensively: call thd_get_security_context(...) into
MYSQL_SECURITY_CONTEXT sc as before, then if security_context_lookup(sc, "root",
"localhost", "127.0.0.1", "") returns non-zero, log the failure (using the
plugin/mysql logging facility available in this module) with context (e.g.,
include the user/host) and abort the root switch path—return an error code or
clean up/exit the function instead of continuing; ensure any resources are left
consistent before returning.
docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md (3)

1093-1098: Numeric validation regex doesn't cover Prometheus special values.

The regex /^-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?$/ doesn't match Prometheus special values like +Inf, -Inf, or NaN. While the plugin currently doesn't emit these values, you may want to extend the regex for future-proofing:

unless ($value =~ /^-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?$/ ||
        $value =~ /^[+-]?Inf$/ ||
        $value eq 'NaN') {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md` around lines
1093 - 1098, The numeric validation currently rejects Prometheus special values;
update the check around the $value validation (the regex that tests $value
before emitting the FORMAT ERROR for $name) to also accept `+Inf`, `-Inf`, and
`NaN` — e.g., extend the condition so it passes if the original numeric regex
matches OR $value matches /^[+-]?Inf$/ OR $value eq 'NaN', and keep the existing
error message and $errors++ path unchanged.

556-577: Minor inconsistency: emplace_back vs push_back in plan vs implementation.

The plan uses rc->col_names.emplace_back(field->col_name) while the actual implementation at prometheus_exporter.cc:423 uses push_back. Both are functionally equivalent here, but the plan should match the implementation for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md` around lines 556
- 577, The plan and implementation are inconsistent: the plan's
replica_field_metadata uses rc->col_names.emplace_back(field->col_name) but the
actual implementation (prometheus_exporter.cc: replica_field_metadata) uses
push_back; update the plan to match the implementation by replacing emplace_back
with rc->col_names.push_back(field->col_name) (reference symbols:
ReplicaStatusCtx, replica_field_metadata, col_names) so the plan and code are
consistent.

675-676: Outdated note about #include <vector>.

The note mentions adding #include <vector> to the includes, but the actual implementation already includes it (line 46 of prometheus_exporter.cc). This note can be removed since it's already satisfied.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md` around lines 675
- 676, Remove the outdated note that says "Note: add `#include <vector>`" from
the markdown (the note in the plans doc), because the implementation already
includes <vector> in prometheus_exporter.cc; simply delete that line so the doc
no longer suggests adding the header.
docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md (2)

96-108: Add language specifier to fenced code block.

The code block showing internal structure is missing a language specifier. Use text or plaintext to satisfy markdown linting and improve readability.

📝 Suggested fix
-```
+```text
 1. Includes, logging refs, system vars, context struct        (existing)
 2. Prometheus formatting helpers + gauge classification        (existing, expanded)

As per coding guidelines from static analysis hints: [warning] 96-96: Fenced code blocks should have a language specified (MD040, fenced-code-language).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md` around
lines 96 - 108, The fenced code block that lists the internal structure (the
block starting with "1. Includes, logging refs, system vars, context struct" and
containing the numbered list) is missing a language specifier; update that
triple-backtick fence to include a language token such as text or plaintext
(e.g., change ``` to ```text) so the markdown linter rule MD040 is satisfied and
readability is improved.

165-202: Add language specifier to ASCII art code block.

The ASCII architecture diagram is missing a language specifier. Use text for consistency with standard markdown practices.

📝 Suggested fix
-```
+```text
 ┌─────────────────────────────────────────────────────┐
 │                  VillageSQL Server                   │

As per coding guidelines from static analysis hints: [warning] 165-165: Fenced code blocks should have a language specified (MD040, fenced-code-language).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md` around
lines 165 - 202, The fenced ASCII-art diagram (the triple-backtick code block
that begins the big VillageSQL Server diagram) lacks a language specifier;
update the opening fence to include the language "text" (i.e., change ``` to
```text) so the diagram is treated as plain text and satisfies
MD040/fenced-code-language rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mysql-test/suite/prometheus_exporter/r/innodb_metrics.result`:
- Around line 5-6: The test's comment and metric TYPE disagree: the comment on
the check mentions a "counter" but the metric declaration lists
mysql_innodb_metrics_buffer_pool_reads as a "gauge"; update one to match the
other. Either change the comment to say "gauge" (so it documents the current
TYPE) or change the TYPE line to "counter" to assert the metric is a counter;
ensure you update the metric identifier mysql_innodb_metrics_buffer_pool_reads
accordingly so the test and comment are consistent.

In `@mysql-test/suite/prometheus_exporter/r/replica_status.result`:
- Around line 2-3: The result file replica_status.result contains two "0" lines
but the test replica_status.test only runs one grep -c "mysql_replica_" || echo
"0" command, so remove the extra "0" line from replica_status.result (leaving a
single "0" output) to match the single grep invocation; alternatively if the
test should produce two outputs, add the missing second grep -c "mysql_replica_"
(or equivalent) to replica_status.test so the number of grep outputs matches the
number of result lines.

In `@mysql-test/suite/prometheus_exporter/t/format_validation.test`:
- Around line 33-42: The code sets $expect_metric_name on a valid "# TYPE" line
but can overwrite it with a subsequent TYPE or leave it set at EOF without
reporting an error; update the TYPE-handling logic in the loop (the block that
matches /^# TYPE / and the similar block at lines 73-79) to first check if
$expect_metric_name is defined and, if so, emit a "missing sample for TYPE"
FORMAT ERROR and increment $errors before assigning a new value, and also add a
final check after the main input loop to emit the same error if
$expect_metric_name remains set at EOF so TYPE lines that never receive a
matching metric sample are flagged.
- Around line 59-66: The test currently skips numeric validation for any metric
whose name matches /^mysql_global_variables_/ which lets non-numeric exports
pass; to fix, remove the "$name =~ /^mysql_global_variables_/" exception in the
value-checking unless block so only the intended "_time$" exception remains
(i.e., keep the check for $name =~ /_time$/ but delete the global variables
whitelist), ensuring $value is enforced to match the numeric regex for
mysql_global_variables_* metrics; update the unless that inspects $value and
$name accordingly.

In `@mysql-test/suite/prometheus_exporter/t/innodb_metrics.test`:
- Around line 7-10: The test currently only checks the presence of the metric
families; tighten the assertions to require exact Prometheus types by changing
the grep patterns to match the full type lines for the two metrics: require "^#
TYPE mysql_innodb_metrics_buffer_pool_reads counter$" for
mysql_innodb_metrics_buffer_pool_reads and "^# TYPE
mysql_innodb_metrics_buffer_pool_size gauge$" for
mysql_innodb_metrics_buffer_pool_size (update the two curl|grep commands in
innodb_metrics.test that reference those metric names).

In `@mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test`:
- Around line 4-8: The test description in metrics_endpoint.test claims "all 5
collector prefixes" but the assertions do not check replica metrics; either add
an assertion that the Prometheus output contains the mysql_replica_ prefix
(e.g., verify presence of a sample metric starting with "mysql_replica_") and
ensure that assertion runs only in a replica-capable test setup, or change the
test description to accurately state the number of collectors currently asserted
(e.g., remove "5" or list the actual prefixes asserted). Update the file's top
comment and any related assertions that mention collectors so they remain
consistent with the new check or the narrowed description.

In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 100-101: The errors_total atomic is declared and exported but
never incremented; update the code to increment errors_total wherever request or
collection failures occur (for example in prom_handle_error, in the HTTP handler
when collect_metrics() fails, and inside session operation failure paths), or
remove the variable if you prefer not to expose it. Locate the symbols
errors_total and requests_total to find existing counters, then add
errors_total.fetch_add(1, std::memory_order_relaxed) (or equivalent) in
prom_handle_error and any branch that returns an error from collect_metrics() or
session operations so the metric accurately reflects failures. Ensure increments
happen before sending error responses so the exposed status variable stays
consistent.
- Around line 774-784: The read() on the accepted socket (client_fd from accept
in the listener loop) has no receive timeout and can block the single-threaded
exporter; before calling read() in the accept loop (where client_fd is
obtained), set a receive timeout on client_fd via setsockopt(client_fd,
SOL_SOCKET, SO_RCVTIMEO, ...) or alternatively poll() the client_fd with a short
timeout and only call read() when it's ready; update the code around accept/read
(symbols: accept, client_fd, read) to close the socket and continue on timeout
or error.

In `@plugin/prometheus_exporter/README.md`:
- Around line 15-52: The README's ASCII architecture diagram (the
triple-backtick block containing "VillageSQL Server" and "prometheus_exporter
plugin") contains corrupted glyphs (`���`); fix it by replacing those
placeholders with the correct box-drawing characters (e.g., ─ ┌ ┐ └ ┘ │ ├ ┤ ┬ ┴)
or plain ASCII equivalents so the boxes/lines render correctly, and ensure the
file is saved with UTF-8 encoding; update all occurrences of `���` inside that
diagram block to the appropriate characters to restore the intended boxes and
connectors.

---

Nitpick comments:
In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md`:
- Around line 1093-1098: The numeric validation currently rejects Prometheus
special values; update the check around the $value validation (the regex that
tests $value before emitting the FORMAT ERROR for $name) to also accept `+Inf`,
`-Inf`, and `NaN` — e.g., extend the condition so it passes if the original
numeric regex matches OR $value matches /^[+-]?Inf$/ OR $value eq 'NaN', and
keep the existing error message and $errors++ path unchanged.
- Around line 556-577: The plan and implementation are inconsistent: the plan's
replica_field_metadata uses rc->col_names.emplace_back(field->col_name) but the
actual implementation (prometheus_exporter.cc: replica_field_metadata) uses
push_back; update the plan to match the implementation by replacing emplace_back
with rc->col_names.push_back(field->col_name) (reference symbols:
ReplicaStatusCtx, replica_field_metadata, col_names) so the plan and code are
consistent.
- Around line 675-676: Remove the outdated note that says "Note: add `#include
<vector>`" from the markdown (the note in the plans doc), because the
implementation already includes <vector> in prometheus_exporter.cc; simply
delete that line so the doc no longer suggests adding the header.

In `@docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md`:
- Around line 96-108: The fenced code block that lists the internal structure
(the block starting with "1. Includes, logging refs, system vars, context
struct" and containing the numbered list) is missing a language specifier;
update that triple-backtick fence to include a language token such as text or
plaintext (e.g., change ``` to ```text) so the markdown linter rule MD040 is
satisfied and readability is improved.
- Around line 165-202: The fenced ASCII-art diagram (the triple-backtick code
block that begins the big VillageSQL Server diagram) lacks a language specifier;
update the opening fence to include the language "text" (i.e., change ``` to
```text) so the diagram is treated as plain text and satisfies
MD040/fenced-code-language rules.

In `@mysql-test/suite/prometheus_exporter/t/replica_status.test`:
- Line 1: Add a short inline comment above the include line referencing
include/not_windows.inc that explains why Windows is excluded (for example:
reliance on curl/Unix sockets, POSIX-only behavior in the Prometheus exporter
tests, or known Windows networking incompatibilities); update the test file's
top-of-file section (where the include "include/not_windows.inc" appears) to
state the specific reason and any relevant bug/issue tracker IDs so future
maintainers understand the limitation.
- Line 4: The test currently uses the shell line '--exec curl -s
http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0"', which
prints "0" twice when no matches are found; replace that pattern with '--exec
curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || true' so
grep's count is preserved and non-zero exit is suppressed, and update the test
result expectation to a single "0" output (remove the duplicate line) to keep
output and assertions consistent.

In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 403-410: The struct ReplicaStatusCtx contains an unused field
has_row; remove the has_row member from ReplicaStatusCtx and also remove the
assignment to it inside replica_start_row so there are no leftover writes to a
deleted field; update any references (none expected) and rebuild to ensure no
use remains of ReplicaStatusCtx::has_row.
- Around line 851-863: The thread attribute initialized with
my_thread_attr_init(&attr) is not destroyed; call my_thread_attr_destroy(&attr)
after attempting to create the thread to free resources whether
my_thread_create(&g_ctx->listener_thread, &attr, prometheus_listener_thread,
g_ctx) succeeds or fails; add my_thread_attr_destroy(&attr) before the error
return path (after cleaning up g_ctx/listen_fd) and also after successful
creation (before proceeding) so attr is always destroyed.
- Around line 119-157: Replace the linear-scan array lookup in is_gauge with an
O(1) hash lookup: create a static std::unordered_set<std::string> (or
std::string_view with storage) containing all gauge names (populate it once from
gauge_variables or via an initializer list), ensure case-insensitive matching by
storing lowercased names and lowercasing the input name before lookup (or use a
custom case-insensitive hash/equality), and make initialization thread-safe by
using a function-local static or std::call_once; then change is_gauge to query
that unordered_set instead of iterating through gauge_variables.
- Around line 52-55: The global service pointers log_bi and log_bs currently
have external linkage; make them have internal linkage like reg_srv by declaring
them static (i.e., add the static storage specifier to the
SERVICE_TYPE(log_builtins) *log_bi and SERVICE_TYPE(log_builtins_string) *log_bs
declarations) so they won’t collide with symbols from other plugins; keep
SERVICE_TYPE(registry) *reg_srv unchanged.
- Around line 703-707: The call to security_context_lookup is unchecked; capture
its return value and handle errors defensively: call
thd_get_security_context(...) into MYSQL_SECURITY_CONTEXT sc as before, then if
security_context_lookup(sc, "root", "localhost", "127.0.0.1", "") returns
non-zero, log the failure (using the plugin/mysql logging facility available in
this module) with context (e.g., include the user/host) and abort the root
switch path—return an error code or clean up/exit the function instead of
continuing; ensure any resources are left consistent before returning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5dfbf72a-b1ba-4ab5-821a-cb858172c6d2

📥 Commits

Reviewing files that changed from the base of the PR and between 957d549 and ba7b2c0.

📒 Files selected for processing (27)
  • Docs/prometheus_exporter.md
  • docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md
  • docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md
  • mysql-test/include/plugin.defs
  • mysql-test/suite/prometheus_exporter/r/basic.result
  • mysql-test/suite/prometheus_exporter/r/binlog.result
  • mysql-test/suite/prometheus_exporter/r/format_validation.result
  • mysql-test/suite/prometheus_exporter/r/global_variables.result
  • mysql-test/suite/prometheus_exporter/r/innodb_metrics.result
  • mysql-test/suite/prometheus_exporter/r/metrics_endpoint.result
  • mysql-test/suite/prometheus_exporter/r/replica_status.result
  • mysql-test/suite/prometheus_exporter/t/basic.test
  • mysql-test/suite/prometheus_exporter/t/binlog-master.opt
  • mysql-test/suite/prometheus_exporter/t/binlog.test
  • mysql-test/suite/prometheus_exporter/t/format_validation-master.opt
  • mysql-test/suite/prometheus_exporter/t/format_validation.test
  • mysql-test/suite/prometheus_exporter/t/global_variables-master.opt
  • mysql-test/suite/prometheus_exporter/t/global_variables.test
  • mysql-test/suite/prometheus_exporter/t/innodb_metrics-master.opt
  • mysql-test/suite/prometheus_exporter/t/innodb_metrics.test
  • mysql-test/suite/prometheus_exporter/t/metrics_endpoint-master.opt
  • mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test
  • mysql-test/suite/prometheus_exporter/t/replica_status-master.opt
  • mysql-test/suite/prometheus_exporter/t/replica_status.test
  • plugin/prometheus_exporter/CMakeLists.txt
  • plugin/prometheus_exporter/README.md
  • plugin/prometheus_exporter/prometheus_exporter.cc

Comment thread mysql-test/suite/prometheus_exporter/r/innodb_metrics.result Outdated
Comment thread mysql-test/suite/prometheus_exporter/r/replica_status.result Outdated
Comment thread mysql-test/suite/prometheus_exporter/t/format_validation.test
Comment thread mysql-test/suite/prometheus_exporter/t/format_validation.test Outdated
Comment on lines +7 to +10
--exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_reads " | head -1

--echo # Verify a known gauge type metric exists (buffer_pool_size is 'value' type in InnoDB)
--exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_size " | head -1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert exact Prometheus types for the two canonical InnoDB metrics.

These patterns currently only confirm metric family presence; they don’t enforce counter for buffer_pool_reads or gauge for buffer_pool_size.

🔧 Suggested tightening
---exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_reads " | head -1
+--exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_reads counter$" | head -1

---exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_size " | head -1
+--exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_size gauge$" | head -1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mysql-test/suite/prometheus_exporter/t/innodb_metrics.test` around lines 7 -
10, The test currently only checks the presence of the metric families; tighten
the assertions to require exact Prometheus types by changing the grep patterns
to match the full type lines for the two metrics: require "^# TYPE
mysql_innodb_metrics_buffer_pool_reads counter$" for
mysql_innodb_metrics_buffer_pool_reads and "^# TYPE
mysql_innodb_metrics_buffer_pool_size gauge$" for
mysql_innodb_metrics_buffer_pool_size (update the two curl|grep commands in
innodb_metrics.test that reference those metric names).

Comment thread mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test
Comment thread plugin/prometheus_exporter/prometheus_exporter.cc Outdated
Comment thread plugin/prometheus_exporter/prometheus_exporter.cc Outdated
Comment thread plugin/prometheus_exporter/README.md
…tness

- Default bind address changed from 0.0.0.0 to 127.0.0.1 (no auth endpoint)
- Increment errors_total in all four error handlers
- Strict HTTP path matching (reject /metricsXYZ)
- Add SO_RCVTIMEO on client sockets to prevent slow-client hangs
- Full strtod/strtoll consumption checks to reject partial numeric parses
- Check return values of security context functions
- Fix README corrupted box-drawing characters
- Test fixes: grep -c || true, orphaned TYPE detection, comment accuracy
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
plugin/prometheus_exporter/README.md (1)

15-15: Consider adding a language identifier to the fenced code block.

While not critical for ASCII art diagrams, adding a language identifier (e.g., text or leaving it as ```) would satisfy the markdownlint rule and is a minor documentation best practice.

📝 Proposed fix
-```
+```text
 ┌─────────────────────────────────────────────────────┐
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/README.md` at line 15, Add a language identifier
to the fenced code block in the README so markdownlint is satisfied; locate the
ASCII-art fenced block delimited by ``` and change the opening fence to include
a language token such as text (e.g., replace the opening ``` with ```text) so
the block becomes a labeled code fence while keeping its contents unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@plugin/prometheus_exporter/README.md`:
- Line 15: Add a language identifier to the fenced code block in the README so
markdownlint is satisfied; locate the ASCII-art fenced block delimited by ```
and change the opening fence to include a language token such as text (e.g.,
replace the opening ``` with ```text) so the block becomes a labeled code fence
while keeping its contents unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bd21e74f-8e15-4551-a7a8-bc072326f157

📥 Commits

Reviewing files that changed from the base of the PR and between ba7b2c0 and f709eb0.

📒 Files selected for processing (9)
  • mysql-test/suite/prometheus_exporter/r/basic.result
  • mysql-test/suite/prometheus_exporter/r/innodb_metrics.result
  • mysql-test/suite/prometheus_exporter/r/replica_status.result
  • mysql-test/suite/prometheus_exporter/t/format_validation.test
  • mysql-test/suite/prometheus_exporter/t/innodb_metrics.test
  • mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test
  • mysql-test/suite/prometheus_exporter/t/replica_status.test
  • plugin/prometheus_exporter/README.md
  • plugin/prometheus_exporter/prometheus_exporter.cc
✅ Files skipped from review due to trivial changes (7)
  • mysql-test/suite/prometheus_exporter/r/replica_status.result
  • mysql-test/suite/prometheus_exporter/r/innodb_metrics.result
  • mysql-test/suite/prometheus_exporter/r/basic.result
  • mysql-test/suite/prometheus_exporter/t/replica_status.test
  • mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test
  • mysql-test/suite/prometheus_exporter/t/innodb_metrics.test
  • plugin/prometheus_exporter/prometheus_exporter.cc
🚧 Files skipped from review as they are similar to previous changes (1)
  • mysql-test/suite/prometheus_exporter/t/format_validation.test

- Move atomic counters out of PrometheusContext to file-scope globals,
  eliminating the use-after-free race between SHOW STATUS callbacks
  and plugin deinit. Use memcpy instead of reinterpret_cast to avoid
  alignment issues in SHOW_FUNC callbacks.
- Use eventfd as wakeup mechanism instead of closing listen_fd
  from another thread (fixes fd-recycle hazard). close() now happens
  after join(), so the listener never sees closed fds.
- Use memory_order_acquire/release on shutdown_requested flag.
- Add deinit_logging_service_for_plugin to all init failure paths
  to avoid leaking the logging service registration.
- NULL/empty check for bind_address to prevent crash on inet_pton(NULL).
- Add SO_SNDTIMEO (5s) on accepted client sockets alongside existing
  SO_RCVTIMEO. Prevents listener thread from blocking forever when
  a client completes TCP handshake but never reads.
- Switch write_full from write() to send() with MSG_NOSIGNAL to
  avoid SIGPIPE on half-closed connections, and retry on EINTR.
- Loop reading HTTP request until \r\n\r\n delimiter or buffer full,
  instead of a single read() that may return a partial request
  across TCP segments. Handles EINTR correctly.
- Length check before indexing buf[12] in path matcher.
- Move srv_session_init_thread/deinit_thread to listener thread lifetime
  instead of per-scrape. Per MySQL service contract, these must be
  called once per physical thread, not per request.

- Add prometheus_exporter_security_user sysvar to make the internal
  user configurable (was hardcoded to "root"). Default remains "root"
  for backward compatibility. Operators can switch to a least-privilege
  account (e.g. one granted PROCESS, REPLICATION CLIENT, and SELECT on
  information_schema) by setting this variable at startup. Note that
  mysql.session@localhost does NOT have PROCESS by default, so cannot
  read information_schema.INNODB_METRICS without an additional grant.

- Improve error message when security context setup fails.
- Binlog collector: fully validate numeric parsing (reject "123abc"),
  check for negative values and overflow before accumulating.
- Replica collector: require full-string numeric consumption in strtod
  check, matching the fix previously applied to global_status.
- Log a WARNING at plugin init when bound to a non-loopback address,
  alerting operators to the security implications of exposing the
  unauthenticated /metrics endpoint to the network.
- Mark unused parameter in global_variables_type with [[maybe_unused]].
- Remove section separator comments per CLAUDE.md guideline
- Remove g_ctx/plugin->data redundancy, keeping only one source of
  truth for the plugin context pointer
- Add scrape_counter.test: captures requests_total before and after
  a fixed number of curl scrapes, asserts exact delta. Also verifies
  scrape_duration_microseconds > 0 and errors_total == 0. Catches
  regressions where status counters silently fail to update.
- Remove format_validation carve-out that exempted global_variables
  and _time metrics from numeric validation. The plugin's prom_end_row
  already filters non-numeric values, so the validator should enforce
  the invariant strictly. Extend regex to accept NaN/+Inf/-Inf per
  Prometheus spec.
@renecannao
Copy link
Copy Markdown
Author

Review Fixes Applied

I've addressed the majority of the findings from the review above across 6 commits. 9 tests passing (up from 8 — added a new scrape_counter test).

Commits (most recent last)

SHA What
8efd77f2d2a Critical safety: use-after-free race, unsafe shutdown, NULL bind_address, logging leak
f4041d3aed6 HTTP robustness: SO_SNDTIMEO, read loop for TCP segmentation, EINTR handling, MSG_NOSIGNAL
dccbd944818 srv_session lifecycle fix + configurable security user via new sysvar
9c388765b45 Medium items: binlog numeric validation, replica strict parse, non-loopback warning, unused param
d96ee346f5b Code quality: removed section separators (CLAUDE.md), removed g_ctx/plugin->data redundancy
d92a1fe775e Tests: new scrape_counter delta test, removed format_validation carve-out

What was fixed (by review item number)

Critical / High (8/8):

Medium (6/7):

Code Quality (3/6):

Test Coverage (2/7):

Not fixed (deferred to follow-ups)

The following items from the review are real but were judged out of scope for this PR:

Test run

prometheus_exporter.basic              pass
prometheus_exporter.binlog             pass
prometheus_exporter.format_validation  pass
prometheus_exporter.global_variables   pass
prometheus_exporter.innodb_metrics     pass
prometheus_exporter.metrics_endpoint   pass
prometheus_exporter.replica_status     pass
prometheus_exporter.scrape_counter     pass  (new)
shutdown_report                        pass
Completed: All 9 tests were successful.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
mysql-test/suite/prometheus_exporter/t/format_validation.test (1)

55-60: Tighten validation: reject metric samples without a preceding # TYPE.

Currently, a sample line is accepted when no expected TYPE is pending. For this plugin’s contract, that weakens regression detection.

🔧 Proposed patch
-    if (defined $expect_metric_name && $name ne $expect_metric_name) {
-      print "FORMAT ERROR: expected metric '$expect_metric_name' but got '$name'\n";
-      $errors++;
-    }
+    if (!defined $expect_metric_name) {
+      print "FORMAT ERROR: metric '$name' missing preceding TYPE line\n";
+      $errors++;
+    } elsif ($name ne $expect_metric_name) {
+      print "FORMAT ERROR: expected metric '$expect_metric_name' but got '$name'\n";
+      $errors++;
+    }
     $expect_metric_name = undef;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mysql-test/suite/prometheus_exporter/t/format_validation.test` around lines
55 - 60, The test currently allows a sample line when $expect_metric_name is
undef; change the check in the sample-handling block so that if
$expect_metric_name is defined you still compare $name to $expect_metric_name,
but if it is not defined you treat this as a format error (increment $errors and
print a clear message like "FORMAT ERROR: sample without preceding TYPE for
'<name>'"). Update the block that references $expect_metric_name, $name, and
$errors to implement this rejection and then clear $expect_metric_name as
before.
mysql-test/suite/prometheus_exporter/t/scrape_counter.test (1)

16-18: Add explicit timeout to prevent test hangs, but reconsider fail-fast flag.

curl -s correctly returns success here since the test validates correctness through the errors_total counter assertion (line 33–35). However, curl with no deadline can hang indefinitely. Adding --max-time 5 is prudent.

Avoid -f flag: It makes curl exit non-zero on HTTP 4xx/5xx, causing --exec to fail and abort the test before the verification checks (lines 24–35) run. The current design intentionally lets curl succeed and then validates via metrics—which is the correct approach for this regression test.

---exec curl -s http://127.0.0.1:19110/metrics > /dev/null
---exec curl -s http://127.0.0.1:19110/metrics > /dev/null
---exec curl -s http://127.0.0.1:19110/metrics > /dev/null
++exec curl -s --max-time 5 http://127.0.0.1:19110/metrics > /dev/null
++exec curl -s --max-time 5 http://127.0.0.1:19110/metrics > /dev/null
++exec curl -s --max-time 5 http://127.0.0.1:19110/metrics > /dev/null
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mysql-test/suite/prometheus_exporter/t/scrape_counter.test` around lines 16 -
18, The three test invocations that run curl (the lines with "--exec curl -s
http://127.0.0.1:19110/metrics > /dev/null") should include an explicit timeout
to avoid hanging: add "--max-time 5" to each curl invocation; do NOT add "-f" or
any fail-fast flag since the test intentionally lets curl succeed and then
verifies correctness via the errors_total counter in the subsequent assertions
(lines around the errors_total check).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mysql-test/suite/prometheus_exporter/t/format_validation.test`:
- Line 19: The curl backtick invocation assigning to $output can hang; change
the command to include timeout and fail flags (e.g. add --max-time 5 --fail
--silent) and then check the shell exit status ($?) immediately after the
backtick call, failing the test if non-zero (die or ok/fail) so the test is
fail-fast on fetch errors; locate the assignment to $output (my $output = `curl
-s http://127.0.0.1:19109/metrics`), replace the command string with one that
includes --max-time and --fail, and add a conditional checking $? to handle and
report the error before proceeding.

---

Nitpick comments:
In `@mysql-test/suite/prometheus_exporter/t/format_validation.test`:
- Around line 55-60: The test currently allows a sample line when
$expect_metric_name is undef; change the check in the sample-handling block so
that if $expect_metric_name is defined you still compare $name to
$expect_metric_name, but if it is not defined you treat this as a format error
(increment $errors and print a clear message like "FORMAT ERROR: sample without
preceding TYPE for '<name>'"). Update the block that references
$expect_metric_name, $name, and $errors to implement this rejection and then
clear $expect_metric_name as before.

In `@mysql-test/suite/prometheus_exporter/t/scrape_counter.test`:
- Around line 16-18: The three test invocations that run curl (the lines with
"--exec curl -s http://127.0.0.1:19110/metrics > /dev/null") should include an
explicit timeout to avoid hanging: add "--max-time 5" to each curl invocation;
do NOT add "-f" or any fail-fast flag since the test intentionally lets curl
succeed and then verifies correctness via the errors_total counter in the
subsequent assertions (lines around the errors_total check).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0ca05812-c49f-4d12-9a5d-4794d009e036

📥 Commits

Reviewing files that changed from the base of the PR and between f709eb0 and d92a1fe.

📒 Files selected for processing (6)
  • mysql-test/suite/prometheus_exporter/r/basic.result
  • mysql-test/suite/prometheus_exporter/r/scrape_counter.result
  • mysql-test/suite/prometheus_exporter/t/format_validation.test
  • mysql-test/suite/prometheus_exporter/t/scrape_counter-master.opt
  • mysql-test/suite/prometheus_exporter/t/scrape_counter.test
  • plugin/prometheus_exporter/prometheus_exporter.cc
✅ Files skipped from review due to trivial changes (3)
  • mysql-test/suite/prometheus_exporter/t/scrape_counter-master.opt
  • mysql-test/suite/prometheus_exporter/r/basic.result
  • mysql-test/suite/prometheus_exporter/r/scrape_counter.result
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugin/prometheus_exporter/prometheus_exporter.cc

use strict;
use warnings;

my $output = `curl -s http://127.0.0.1:19109/metrics`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout and fail-fast handling to the curl fetch.

At Line 19, curl -s can block indefinitely and make this test flaky under listener stalls. Add --max-time, --fail, and explicit exit-status handling.

🔧 Proposed patch
-my $output = `curl -s http://127.0.0.1:19109/metrics`;
+my $output = `curl -sS --fail --max-time 5 http://127.0.0.1:19109/metrics`;
+if ($? != 0) {
+  print "ERROR: failed to fetch /metrics\n";
+  exit 1;
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
my $output = `curl -s http://127.0.0.1:19109/metrics`;
my $output = `curl -sS --fail --max-time 5 http://127.0.0.1:19109/metrics`;
if ($? != 0) {
print "ERROR: failed to fetch /metrics\n";
exit 1;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mysql-test/suite/prometheus_exporter/t/format_validation.test` at line 19,
The curl backtick invocation assigning to $output can hang; change the command
to include timeout and fail flags (e.g. add --max-time 5 --fail --silent) and
then check the shell exit status ($?) immediately after the backtick call,
failing the test if non-zero (die or ok/fail) so the test is fail-fast on fetch
errors; locate the assignment to $output (my $output = `curl -s
http://127.0.0.1:19109/metrics`), replace the command string with one that
includes --max-time and --fail, and add a conditional checking $? to handle and
report the error before proceeding.

- Add channel labels to replica metrics for multi-source replication
  - Named channels emit samples with channel="..." label
  - Default unnamed channel remains unlabeled
  - Seconds_Behind_Source=NULL exported as NaN

- Improve error handling and HTTP status codes
  - Return HTTP 500 on scrape failures instead of false-success 200
  - Increment Prometheus_exporter_errors_total on failures
  - Report specific error messages in response body

- Add new test coverage
  - replica_channels: multi-source replication with two channels
  - replica_null_lag: verify NaN for stopped replicas
  - scrape_error: verify HTTP 500 on security context failure

- General hardening
  - Use loose- prefixed variables in test configs
  - Add have_prometheus_exporter_plugin.inc include
  - Validate Prometheus format in format_validation test
  - Check for duplicate TYPE lines

- Restrict build to Linux only (IF(LINUX) in CMakeLists.txt)
- Update documentation noting future dynamic variable support
Add comprehensive Doxygen-style comments to plugin source:
- File-level documentation describing architecture, metric sources,
  security notes, and platform requirements
- Documentation for plugin configuration variables:
  - prom_enabled: HTTP endpoint enable flag
  - prom_port: TCP port with valid range 1024-65535
  - prom_bind_address: IPv4 binding requirement (numeric only)
  - prom_security_user: internal session account for metric queries
- Documentation for plugin operational metrics:
  - g_requests_total: total scrapes served
  - g_errors_total: total scrape errors
  - g_last_scrape_duration_us: last scrape duration in microseconds
- Fix double-counting: make_scrape_error() no longer increments
  g_errors_total since collector failures already increment via
  command_failed() callbacks. Early failures (server unavailable,
  session open, security context) now increment directly.

- Make binlog collection non-fatal: if SHOW BINARY LOGS fails (e.g.
  when log_bin=OFF), skip binlog metrics but still return other
  collected data with HTTP 200 instead of failing entire scrape.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
mysql-test/suite/prometheus_exporter/t/replica_channels.test (1)

28-31: || true on these grep calls can mask real regressions.

Piping || true to these counting greps means that if the grep utility itself fails (e.g., missing file, shell/env issue) the test will silently print nothing instead of failing. For the counting lines, grep -c already returns success whenever it finds ≥1 match and the count is captured in the diff against the .result file — so the || true only helps in the "zero matches" case, which is exactly the case the test should flag. Consider dropping || true so a stalled/failing scrape is caught here rather than hidden behind an unexpected "0" in the result file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mysql-test/suite/prometheus_exporter/t/replica_channels.test` around lines 28
- 31, Remove the trailing "|| true" from the grep commands in the
replica_channels.test snippet (the lines invoking grep -c '^# TYPE
mysql_replica_read_source_log_pos gauge$' and the grep -c checks for
mysql_replica_read_source_log_pos with channel names) so that grep failures
(missing file or env errors) cause the test to fail instead of being masked;
leave the rest of the grep -c usage unchanged so zero-match semantics are still
reported by the test harness.
plugin/prometheus_exporter/prometheus_exporter.cc (2)

1003-1021: read_http_request can return on a partial request line.

The early-exit at line 1017 (strstr(buf, "\r\n") != nullptr && total >= 13) stops as soon as any \r\n is seen and 13 bytes have been read, which is less than the shortest valid HTTP/1.1 request ("GET / HTTP/1.1\r\n" is 16 bytes including CRLF). More importantly, the method/URI prefix check at line 1077 relies on having read at least through "GET /metrics" (13 chars including the trailing space/\r), but a pathological peer could send "GET /met\r\n..." in one segment: total would be ≥13 only after more bytes arrive, so this is OK in practice, but the shortcut is fragile and method-specific.

Recommend simplifying to: loop until strstr(buf, "\r\n\r\n") is found, recv returns ≤0, or max_len-1 is filled. The 5s SO_RCVTIMEO already bounds the wait, and reading the full headers removes a class of parser edge cases if the dispatch logic ever grows (HEAD, query strings, trailing paths, etc.).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 1003 - 1021,
The read_http_request function currently returns early when any "\r\n" is seen
plus a minimal byte count, which can produce partial request lines; change
read_http_request to stop only when one of these conditions occurs: recv returns
<= 0 (client closed or error), the buffer is full (total >= max_len-1), or the
full header terminator "\r\n\r\n" is found; remove the early-exit check that
looks for a single "\r\n" and the length >= 13 heuristic so the loop always
reads complete request headers before returning.

575-583: std::to_string(double) produces locale-independent but lossy 6-decimal output.

std::to_string(double) formats with %f and 6 digits after the decimal point, so e.g. 1.5 becomes "1.500000" and very small/large values lose precision or print as 0.000000. For the numeric replica fields currently collected, SHOW REPLICA STATUS primarily returns integers delivered through replica_get_integer/replica_get_longlong, so this rarely bites today — but if any double-valued column is ever added (or the server delivers a numeric as double), the scrape will silently truncate.

Consider using snprintf with %g/%.17g or std::to_chars<double> to preserve precision and avoid trailing zeros:

char tmp[64];
int n = snprintf(tmp, sizeof(tmp), "%.17g", value);
rc->col_values[rc->col_index].assign(tmp, n > 0 ? n : 0);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 575 - 583, In
replica_get_double, avoid lossy std::to_string(double) output; format the double
with a precision-preserving routine (e.g. snprintf(tmp, sizeof tmp, "%.17g",
value) or std::to_chars) and assign the resulting string to
rc->col_values[rc->col_index] (handling snprintf return <= 0) instead of
std::to_string; keep the existing rc->col_is_null[rc->col_index] = false and
rc->col_index++ logic unchanged and use the function name replica_get_double to
locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 949-950: The empty if ( !collect_binlog(session, output) ) { } is
swallowing errors but still lets collect_binlog/command_failed increment
g_errors_total; modify binlog error handling so the "no binary logging" case is
treated as non-fatal: update binlog_handle_error (or add a flag returned by
collect_binlog) to detect ER_NO_BINARY_LOGGING and avoid incrementing
g_errors_total (or pass a silence flag into command_failed to skip accounting
for optional collectors), ensure collect_binlog returns a clear
success/expected-failure value, and replace the empty if body with a short
comment explaining the intentional swallow to avoid future confusion (reference:
collect_binlog, binlog_handle_error, command_failed, g_errors_total,
ER_NO_BINARY_LOGGING).

---

Nitpick comments:
In `@mysql-test/suite/prometheus_exporter/t/replica_channels.test`:
- Around line 28-31: Remove the trailing "|| true" from the grep commands in the
replica_channels.test snippet (the lines invoking grep -c '^# TYPE
mysql_replica_read_source_log_pos gauge$' and the grep -c checks for
mysql_replica_read_source_log_pos with channel names) so that grep failures
(missing file or env errors) cause the test to fail instead of being masked;
leave the rest of the grep -c usage unchanged so zero-match semantics are still
reported by the test harness.

In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 1003-1021: The read_http_request function currently returns early
when any "\r\n" is seen plus a minimal byte count, which can produce partial
request lines; change read_http_request to stop only when one of these
conditions occurs: recv returns <= 0 (client closed or error), the buffer is
full (total >= max_len-1), or the full header terminator "\r\n\r\n" is found;
remove the early-exit check that looks for a single "\r\n" and the length >= 13
heuristic so the loop always reads complete request headers before returning.
- Around line 575-583: In replica_get_double, avoid lossy std::to_string(double)
output; format the double with a precision-preserving routine (e.g.
snprintf(tmp, sizeof tmp, "%.17g", value) or std::to_chars) and assign the
resulting string to rc->col_values[rc->col_index] (handling snprintf return <=
0) instead of std::to_string; keep the existing rc->col_is_null[rc->col_index] =
false and rc->col_index++ logic unchanged and use the function name
replica_get_double to locate the change.
🪄 Autofix (Beta)

✅ Autofix completed


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4bf0374a-60ec-43cb-b5df-5715cf434c95

📥 Commits

Reviewing files that changed from the base of the PR and between d92a1fe and 30322bd.

📒 Files selected for processing (30)
  • Docs/prometheus_exporter.md
  • mysql-test/suite/prometheus_exporter/inc/have_prometheus_exporter_plugin.inc
  • mysql-test/suite/prometheus_exporter/r/replica_channels.result
  • mysql-test/suite/prometheus_exporter/r/replica_null_lag.result
  • mysql-test/suite/prometheus_exporter/r/scrape_error.result
  • mysql-test/suite/prometheus_exporter/t/binlog-master.opt
  • mysql-test/suite/prometheus_exporter/t/binlog.test
  • mysql-test/suite/prometheus_exporter/t/format_validation-master.opt
  • mysql-test/suite/prometheus_exporter/t/format_validation.test
  • mysql-test/suite/prometheus_exporter/t/global_variables-master.opt
  • mysql-test/suite/prometheus_exporter/t/global_variables.test
  • mysql-test/suite/prometheus_exporter/t/innodb_metrics-master.opt
  • mysql-test/suite/prometheus_exporter/t/innodb_metrics.test
  • mysql-test/suite/prometheus_exporter/t/metrics_endpoint-master.opt
  • mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test
  • mysql-test/suite/prometheus_exporter/t/replica_channels-slave.opt
  • mysql-test/suite/prometheus_exporter/t/replica_channels.cnf
  • mysql-test/suite/prometheus_exporter/t/replica_channels.test
  • mysql-test/suite/prometheus_exporter/t/replica_null_lag-slave.opt
  • mysql-test/suite/prometheus_exporter/t/replica_null_lag.cnf
  • mysql-test/suite/prometheus_exporter/t/replica_null_lag.test
  • mysql-test/suite/prometheus_exporter/t/replica_status-master.opt
  • mysql-test/suite/prometheus_exporter/t/replica_status.test
  • mysql-test/suite/prometheus_exporter/t/scrape_counter-master.opt
  • mysql-test/suite/prometheus_exporter/t/scrape_counter.test
  • mysql-test/suite/prometheus_exporter/t/scrape_error-master.opt
  • mysql-test/suite/prometheus_exporter/t/scrape_error.test
  • plugin/prometheus_exporter/CMakeLists.txt
  • plugin/prometheus_exporter/README.md
  • plugin/prometheus_exporter/prometheus_exporter.cc
✅ Files skipped from review due to trivial changes (22)
  • mysql-test/suite/prometheus_exporter/t/replica_null_lag.cnf
  • mysql-test/suite/prometheus_exporter/t/format_validation-master.opt
  • mysql-test/suite/prometheus_exporter/inc/have_prometheus_exporter_plugin.inc
  • mysql-test/suite/prometheus_exporter/t/binlog-master.opt
  • mysql-test/suite/prometheus_exporter/t/metrics_endpoint-master.opt
  • mysql-test/suite/prometheus_exporter/r/replica_channels.result
  • mysql-test/suite/prometheus_exporter/t/scrape_error-master.opt
  • mysql-test/suite/prometheus_exporter/t/replica_status.test
  • mysql-test/suite/prometheus_exporter/t/innodb_metrics-master.opt
  • mysql-test/suite/prometheus_exporter/t/scrape_counter-master.opt
  • mysql-test/suite/prometheus_exporter/r/scrape_error.result
  • Docs/prometheus_exporter.md
  • plugin/prometheus_exporter/CMakeLists.txt
  • mysql-test/suite/prometheus_exporter/t/global_variables.test
  • mysql-test/suite/prometheus_exporter/t/innodb_metrics.test
  • mysql-test/suite/prometheus_exporter/t/scrape_counter.test
  • mysql-test/suite/prometheus_exporter/t/replica_channels.cnf
  • mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test
  • mysql-test/suite/prometheus_exporter/t/replica_channels-slave.opt
  • mysql-test/suite/prometheus_exporter/t/replica_null_lag-slave.opt
  • mysql-test/suite/prometheus_exporter/t/global_variables-master.opt
  • mysql-test/suite/prometheus_exporter/t/binlog.test
🚧 Files skipped from review as they are similar to previous changes (1)
  • mysql-test/suite/prometheus_exporter/t/replica_status-master.opt

Comment thread plugin/prometheus_exporter/prometheus_exporter.cc
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 1 file(s) based on 1 unresolved review comment.

Files modified:

  • plugin/prometheus_exporter/prometheus_exporter.cc

Commit: d1352f9fe7991a0a972b2726b30290364ac4118e

The changes have been pushed to the prometheus branch.

Time taken: 5m 12s

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
plugin/prometheus_exporter/prometheus_exporter.cc (3)

575-583: std::to_string(double) uses fixed 6-decimal precision.

std::to_string formats doubles with "%f" (6 decimals, no exponent), which both loses precision for large/small values and is technically locale-sensitive. For the replica columns actually consumed here (Seconds_Behind_Source, positions, log space), values effectively arrive as integers via replica_get_integer/replica_get_longlong, so this path is rarely hit in practice — but if it is, output like 1234567890.000000 or locale-formatted numbers could reach Prometheus.

Prefer snprintf with "%.17g" (or std::to_chars if available) to get a round-trippable, locale-independent representation.

♻️ Suggested change
 static int replica_get_double(void *ctx, double value, uint32) {
   auto *rc = static_cast<ReplicaStatusCtx *>(ctx);
   if (rc->col_index < static_cast<int>(rc->col_values.size())) {
     rc->col_is_null[rc->col_index] = false;
-    rc->col_values[rc->col_index] = std::to_string(value);
+    char tmp[64];
+    snprintf(tmp, sizeof(tmp), "%.17g", value);
+    rc->col_values[rc->col_index] = tmp;
   }
   rc->col_index++;
   return 0;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 575 - 583, In
replica_get_double replace the locale-sensitive std::to_string(value) with a
round‑trippable, locale‑independent conversion (e.g. use snprintf(buffer, sizeof
buffer, "%.17g", value) or std::to_chars if available) when writing into
rc->col_values[rc->col_index]; keep the existing null/ index handling with
rc->col_is_null and rc->col_index but format the double using "%.17g" (or
to_chars) to avoid 6-decimal fixed output and locale issues.

476-503: Error-counting pathway is correct but subtle — consider a comment.

command_failed deliberately only increments g_errors_total when command_fail && !callback_error, because every *_handle_error callback already increments it. This invariant is non-obvious (especially with binlog_handle_error skipping ER_NO_BINARY_LOGGING, which then flows here as command_fail=true, callback_error=true and avoids a spurious count). A one-line comment on command_failed documenting "handlers own the increment; we only cover the fail-without-callback case" would prevent future regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 476 - 503,
Add a one-line explanatory comment above the function command_failed stating
that error-counting is intentionally split: per-command callbacks (e.g., any
*_handle_error like binlog_handle_error) are responsible for incrementing
g_errors_total, so command_failed only increments the global counter in the
specific case of command_fail && !callback_error to avoid double-counting (and
to allow cases like binlog_handle_error which sets callback_error=true to
suppress the increment). This documents the invariant and prevents future
regressions.

1011-1029: read_http_request uses a magic-number early break.

Line 1025's total >= 13 appears to be targeting a minimal "GET /metrics\r\n" line but is effectively unreachable after the \r\n\r\n check one line above only matters for full headers — this heuristic allows breaking out on any request line followed by a bare \r\n. It's not incorrect (headers are optional for HTTP/1.0-ish clients and you only dispatch on the request line), just fragile if the dispatch logic changes later to care about headers. A brief comment explaining the intent, or requiring \r\n\r\n outright with a short overall timeout, would be clearer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 1011 - 1029,
The early-break heuristic in read_http_request (the check using total >= 13) is
a fragile magic-number expecting a minimal request line; remove that heuristic
and require the full header terminator ("\r\n\r\n") to consider the request
complete, and implement an overall read timeout instead of relying on that size
check (use recv timeout via setsockopt(SO_RCVTIMEO) or select/poll before recv
to enforce a global timeout); update or remove the comment around the strstr
checks and the total >= 13 condition accordingly so only strstr(buf, "\r\n\r\n")
controls completion and timeouts are handled by socket-level timeout logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 1031-1041: The listener thread (prometheus_listener_thread)
currently returns on srv_session_init_thread failure leaving the server socket
(listen_fd stored on PrometheusContext) open while prometheus_exporter_init
already succeeded; modify the startup sequence so the thread signals its
initialization result back to prometheus_exporter_init (e.g., add a condition
variable/flag or promise on PrometheusContext that the thread sets) and have
prometheus_exporter_init wait for that result and fail/clean up if the thread
failed, or at minimum ensure prometheus_listener_thread closes ctx->listen_fd
before returning on error; update related logic in prometheus_exporter_init and
PrometheusContext to support the coordination and cleanup.

---

Nitpick comments:
In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 575-583: In replica_get_double replace the locale-sensitive
std::to_string(value) with a round‑trippable, locale‑independent conversion
(e.g. use snprintf(buffer, sizeof buffer, "%.17g", value) or std::to_chars if
available) when writing into rc->col_values[rc->col_index]; keep the existing
null/ index handling with rc->col_is_null and rc->col_index but format the
double using "%.17g" (or to_chars) to avoid 6-decimal fixed output and locale
issues.
- Around line 476-503: Add a one-line explanatory comment above the function
command_failed stating that error-counting is intentionally split: per-command
callbacks (e.g., any *_handle_error like binlog_handle_error) are responsible
for incrementing g_errors_total, so command_failed only increments the global
counter in the specific case of command_fail && !callback_error to avoid
double-counting (and to allow cases like binlog_handle_error which sets
callback_error=true to suppress the increment). This documents the invariant and
prevents future regressions.
- Around line 1011-1029: The early-break heuristic in read_http_request (the
check using total >= 13) is a fragile magic-number expecting a minimal request
line; remove that heuristic and require the full header terminator ("\r\n\r\n")
to consider the request complete, and implement an overall read timeout instead
of relying on that size check (use recv timeout via setsockopt(SO_RCVTIMEO) or
select/poll before recv to enforce a global timeout); update or remove the
comment around the strstr checks and the total >= 13 condition accordingly so
only strstr(buf, "\r\n\r\n") controls completion and timeouts are handled by
socket-level timeout logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f5999e60-94c6-48bb-b931-139719a5bdf2

📥 Commits

Reviewing files that changed from the base of the PR and between 30322bd and d1352f9.

📒 Files selected for processing (1)
  • plugin/prometheus_exporter/prometheus_exporter.cc

Comment on lines +1031 to +1041
static void *prometheus_listener_thread(void *arg) {
auto *ctx = static_cast<PrometheusContext *>(arg);

// Initialize srv_session thread-local state once for this physical thread.
// Per the MySQL session service contract, this must be called once per
// thread that will use the session service, not once per request.
if (srv_session_init_thread(ctx->plugin_ref) != 0) {
LogPluginErrMsg(ERROR_LEVEL, ER_LOG_PRINTF_MSG,
"Prometheus exporter: failed to init session thread");
return nullptr;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent failure mode when srv_session_init_thread fails.

If srv_session_init_thread() fails at thread start, the listener exits immediately and logs an error, but prometheus_exporter_init() has already returned 0 and the listen socket is bound. The kernel queues incoming SYNs but nothing ever calls accept(), so Prometheus scrapes will hang until client-side timeout with no clear diagnostic on the MySQL side (and Prometheus_exporter_requests_total stays at 0).

Consider coordinating the thread-start result back to init (e.g., via a promise/condition variable) and failing plugin init in that case, or at least closing listen_fd before returning so clients get an immediate connection refused.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 1031 - 1041,
The listener thread (prometheus_listener_thread) currently returns on
srv_session_init_thread failure leaving the server socket (listen_fd stored on
PrometheusContext) open while prometheus_exporter_init already succeeded; modify
the startup sequence so the thread signals its initialization result back to
prometheus_exporter_init (e.g., add a condition variable/flag or promise on
PrometheusContext that the thread sets) and have prometheus_exporter_init wait
for that result and fail/clean up if the thread failed, or at minimum ensure
prometheus_listener_thread closes ctx->listen_fd before returning on error;
update related logic in prometheus_exporter_init and PrometheusContext to
support the coordination and cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants