Skip to content

[BlackboxBenchmarking] Add daily cron job to aggregate fuzzer stats#5265

Open
dylanjew wants to merge 10 commits intodylanj/builtin-indexfrom
dylanj/aggregate-stats
Open

[BlackboxBenchmarking] Add daily cron job to aggregate fuzzer stats#5265
dylanjew wants to merge 10 commits intodylanj/builtin-indexfrom
dylanj/aggregate-stats

Conversation

@dylanjew
Copy link
Copy Markdown
Collaborator

@dylanjew dylanjew commented May 4, 2026

Adds cron job to aggregate fuzzer stats into a daily bigquery table fuzzer_stats.daily_stats.

Context

We will use this to benchmark our blackbox fuzzers, previously we couldn't easily join the fuzzing hours from BigQuery with the bugs filed by clusterfuzz in our dashboards. We need a separate aggregated table because the fuzzer_stats JobRun tables are all in separate datasets per fuzzer, and we can't simply query across all of those datasets in BigQuery or Plx.

The cron job defaults to yesterdays stats so we can run it after the stats are loaded into bigquery, but takes a date flag so we can backfill days as necessary.

Idempotency

Whenever a date is inserted, the schema uses WRITE_TRUNCATE with a date partition to overwrite all of the rows for that date. So if the job runs multiple times for the same day, it will not add additional rows but overwrite any previous rows for that date.

This simplifies edge cases where the job fails or runs multiple times. We can just make sure the last run of the job succeeds and the data will be correct. It will just pull in the latest data on the JobRun tables for the fuzzers.

Example query:

select fuzzer_name,
SUM(fuzzing_duration) as fuzzing_duration,
SUM(testcases_executed) as testcases_executed,
from `your-project.fuzzer_stats.daily_stats`
group by fuzzer_name
order by fuzzing_duration desc
limit 1000;

The remaining work here is to set up the cron job configuration. This PR only adds the logic for the job. crbug.com/501066151

Related PRs:

These migrate the bigquery and datastore schemas to support the new fields
#5264
#5263

Testing

Ran this against the dev data and verified that the fuzzer stats bigquery table is populated.
Logs from dev: https://paste.googleplex.com/4884361662038016

After the job inserted the aggregated rows into BigQuery, I was able to compare the aggregated testcase stats and fuzzing hours between fuzzers for a given date range.

@dylanjew dylanjew requested a review from a team as a code owner May 4, 2026 14:37
@dylanjew dylanjew changed the title Add daily cron job to aggregate fuzzer stats [BlackboxBenchmarking] Add daily cron job to aggregate fuzzer stats May 4, 2026
@dylanjew dylanjew force-pushed the dylanj/aggregate-stats branch from ffc1050 to 21c24de Compare May 4, 2026 14:52
@dylanjew dylanjew requested a review from aakallam May 4, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant