Skip to content

Reduce benchmark diff noise by requiring a minimum absolute delta for regressions#4423

Merged
timotheeguerin merged 5 commits into
mainfrom
copilot/investigate-benchmark-diff-issue
May 14, 2026
Merged

Reduce benchmark diff noise by requiring a minimum absolute delta for regressions#4423
timotheeguerin merged 5 commits into
mainfrom
copilot/investigate-benchmark-diff-issue

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 13, 2026

Benchmark PR comments were over-reporting regressions for changes that should be performance-neutral, mostly due to large % swings on tiny sub-millisecond metrics. This updates benchmark comparison to suppress those false positives while preserving meaningful regression detection.

  • Regression significance logic

    • Added isNotableMetricChange in packages/benchmark/src/compare.ts.
    • A metric is now considered notable only when both conditions hold:
      • abs(percentChange) >= threshold (existing behavior)
      • abs(changeMs) >= 1ms (new guard)
    • hasNotableChanges now uses the same gate, aligning failure signaling with comment output.
  • PR comment and summary filtering

    • Updated packages/benchmark/src/format-comment.ts to apply the same significance rule for:
      • top-level regression count/table
      • change indicators in markdown and console summaries
    • Result: noisy tiny metrics (e.g., 0.05ms -> 0.06ms) no longer inflate regression counts.
  • Targeted benchmark tests

    • Added packages/benchmark/test/compare.test.ts to cover:
      • high-% but tiny absolute changes are excluded
      • custom minChangeMs behavior
      • regression summary excludes metrics below the minimum absolute threshold
export function isNotableMetricChange(
  metric: MetricComparison,
  threshold = 5,
  minChangeMs = 1,
): boolean {
  return Math.abs(metric.percentChange) >= threshold && Math.abs(metric.change) >= minChangeMs;
}

Copilot AI and others added 2 commits May 13, 2026 21:58
Agent-Logs-Url: https://github.com/Azure/typespec-azure/sessions/3251ab40-9ac4-4400-a7e0-ff7bc5285c8d

Co-authored-by: timotheeguerin <1031227+timotheeguerin@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Azure/typespec-azure/sessions/3251ab40-9ac4-4400-a7e0-ff7bc5285c8d

Co-authored-by: timotheeguerin <1031227+timotheeguerin@users.noreply.github.com>
Copilot AI changed the title [WIP] Investigate benchmark diff reporting inconsistencies Reduce benchmark diff noise by requiring a minimum absolute delta for regressions May 13, 2026
Copilot AI requested a review from timotheeguerin May 13, 2026 22:02
@timotheeguerin
Copy link
Copy Markdown
Member

@copilot run formatter and fix spell check

Agent-Logs-Url: https://github.com/Azure/typespec-azure/sessions/b6602ba8-92f9-40bd-9a97-079110c078fd

Co-authored-by: timotheeguerin <1031227+timotheeguerin@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 13, 2026

@copilot run formatter and fix spell check

Done in 8316068. I ran formatting on the benchmark files touched by this PR, fixed the spell-check failure in compare.test.ts, and re-ran the targeted benchmark tests plus cspell/prettier checks.

@timotheeguerin timotheeguerin marked this pull request as ready for review May 14, 2026 18:15
@github-actions
Copy link
Copy Markdown
Contributor

⚡ Benchmark Results

⚠️ 1 metric(s) regressed above the +5% threshold:

Metric Baseline Current Change
 ↳ emit/@typespec/openapi3/write 🟢 13.6ms 🟢 15.0ms +10.1% 🔴
Full details – comparing d833788 vs baseline 1425f8e
Metric Baseline Current Change
total 🔴 780.3ms 🔴 791.3ms +1.4%
loader 🟢 154.4ms 🟢 153.1ms -0.8%
resolver 🟢 16.4ms 🟢 16.9ms +2.9%
checker 🟢 189.1ms 🟢 189.7ms +0.3%
validation 🟢 40.7ms 🟢 42.6ms +4.6%
 ↳ validation/@azure-tools/typespec-azure-core 🟢 6.1ms 🟢 6.3ms +3.3%
 ↳ validation/@typespec/http 🟢 5.6ms 🟢 6.2ms +9.7%
 ↳ validation/@typespec/rest 🟢 0.5ms 🟢 0.6ms +2.1%
 ↳ validation/@typespec/versioning 🔴 26.3ms 🔴 27.4ms +4.1%
 ↳ validation/compiler 🟢 1.4ms 🟢 1.5ms +9.2%
linter 🟢 130.3ms 🟢 133.8ms +2.6%
 ↳ linter/@azure-tools/typespec-azure-core/auth-required 🟢 0.0ms 🟢 0.0ms +2.9%
 ↳ linter/@azure-tools/typespec-azure-core/bad-record-type 🟢 0.2ms 🟢 0.2ms +0.4%
 ↳ linter/@azure-tools/typespec-azure-core/byos 🟢 5.5ms 🟢 5.9ms +6.5%
 ↳ linter/@azure-tools/typespec-azure-core/casing-style 🟢 0.5ms 🟢 0.6ms +3.6%
 ↳ linter/@azure-tools/typespec-azure-core/composition-over-inheritance 🟢 0.1ms 🟢 0.1ms +5.5%
 ↳ linter/@azure-tools/typespec-azure-core/documentation-required 🟢 0.7ms 🟢 0.8ms +11.7%
 ↳ linter/@azure-tools/typespec-azure-core/friendly-name 🟢 0.7ms 🟢 0.6ms -17.2%
 ↳ linter/@azure-tools/typespec-azure-core/key-visibility-required 🟢 0.1ms 🟢 0.1ms +9.1%
 ↳ linter/@azure-tools/typespec-azure-core/known-encoding 🟢 0.2ms 🟢 0.2ms +3.0%
 ↳ linter/@azure-tools/typespec-azure-core/long-running-polling-operation-required 🟢 0.3ms 🟢 0.3ms +6.4%
 ↳ linter/@azure-tools/typespec-azure-core/no-case-mismatch 🟢 0.2ms 🟢 0.2ms +9.9%
 ↳ linter/@azure-tools/typespec-azure-core/no-closed-literal-union 🟢 0.2ms 🟢 0.2ms +9.9%
 ↳ linter/@azure-tools/typespec-azure-core/no-enum 🟢 0.0ms 🟢 0.0ms +3.8%
 ↳ linter/@azure-tools/typespec-azure-core/no-error-status-codes 🟢 0.1ms 🟢 0.1ms -4.0%
 ↳ linter/@azure-tools/typespec-azure-core/no-explicit-routes-resource-ops 🟢 0.1ms 🟢 0.1ms +7.6%
 ↳ linter/@azure-tools/typespec-azure-core/no-format 🟢 0.4ms 🟢 0.4ms +3.9%
 ↳ linter/@azure-tools/typespec-azure-core/no-generic-numeric 🟢 0.3ms 🟢 0.3ms +13.1%
 ↳ linter/@azure-tools/typespec-azure-core/no-header-explode 🟡 19.6ms 🟡 19.9ms +1.1%
 ↳ linter/@azure-tools/typespec-azure-core/no-legacy-usage 🟢 1.0ms 🟢 1.0ms +0.9%
 ↳ linter/@azure-tools/typespec-azure-core/no-multiple-discriminator 🟢 0.1ms 🟢 0.1ms +22.9%
 ↳ linter/@azure-tools/typespec-azure-core/no-nullable 🟢 0.2ms 🟢 0.2ms +15.5%
 ↳ linter/@azure-tools/typespec-azure-core/no-offsetdatetime 🟢 1.1ms 🟢 1.2ms +3.4%
 ↳ linter/@azure-tools/typespec-azure-core/no-openapi 🟢 1.6ms 🟢 1.7ms +3.6%
 ↳ linter/@azure-tools/typespec-azure-core/no-private-usage 🟢 1.6ms 🟢 1.7ms +2.5%
 ↳ linter/@azure-tools/typespec-azure-core/no-query-explode 🟡 19.6ms 🟡 19.8ms +0.7%
 ↳ linter/@azure-tools/typespec-azure-core/no-response-body 🔴 22.7ms 🔴 23.1ms +2.1%
 ↳ linter/@azure-tools/typespec-azure-core/no-rest-library-interfaces 🟢 0.0ms 🟢 0.0ms +1.7%
 ↳ linter/@azure-tools/typespec-azure-core/no-route-parameter-name-mismatch 🟢 5.2ms 🟢 5.2ms -0.8%
 ↳ linter/@azure-tools/typespec-azure-core/no-rpc-path-params 🟢 0.2ms 🟢 0.2ms +10.0%
 ↳ linter/@azure-tools/typespec-azure-core/no-string-discriminator 🟢 0.0ms 🟢 0.0ms +12.3%
 ↳ linter/@azure-tools/typespec-azure-core/no-unknown 🟢 0.1ms 🟢 0.1ms +7.2%
 ↳ linter/@azure-tools/typespec-azure-core/no-unnamed-union 🟢 0.3ms 🟢 0.3ms +9.9%
 ↳ linter/@azure-tools/typespec-azure-core/operation-missing-api-version 🟢 0.1ms 🟢 0.2ms +20.7%
 ↳ linter/@azure-tools/typespec-azure-core/request-body-problem 🟢 0.2ms 🟢 0.2ms +30.1%
 ↳ linter/@azure-tools/typespec-azure-core/require-versioned 🟢 0.0ms 🟢 0.0ms +17.0%
 ↳ linter/@azure-tools/typespec-azure-core/response-schema-problem 🔴 21.7ms 🔴 22.1ms +2.0%
 ↳ linter/@azure-tools/typespec-azure-core/rpc-operation-request-body 🟢 0.3ms 🟢 0.3ms -1.9%
 ↳ linter/@azure-tools/typespec-azure-core/spread-discriminated-model 🟢 0.2ms 🟢 0.2ms +4.9%
 ↳ linter/@azure-tools/typespec-azure-core/use-standard-names 🟢 5.1ms 🟢 5.3ms +4.9%
 ↳ linter/@azure-tools/typespec-azure-core/use-standard-operations 🟢 0.1ms 🟢 0.1ms -1.3%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-common-types-version 🟢 4.2ms 🟢 4.3ms +2.3%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-custom-resource-no-key 🟢 0.1ms 🟢 0.1ms +2.7%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-custom-resource-usage-discourage 🟢 0.1ms 🟢 0.1ms +3.5%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-delete-operation-response-codes 🟢 5.5ms 🟢 5.6ms +3.0%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-no-path-casing-conflicts 🟢 4.7ms 🟢 4.6ms -2.0%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-no-record 🟢 0.3ms 🟢 0.3ms +7.0%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-post-operation-response-codes 🟢 0.5ms 🟢 0.5ms +12.8%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-put-operation-response-codes 🟢 0.0ms 🟢 0.0ms +2.3%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-action-no-segment 🟢 0.2ms 🟢 0.2ms +5.8%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-duplicate-property 🟢 0.1ms 🟢 0.1ms +1.7%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-interface-requires-decorator 🟢 0.0ms 🟢 0.0ms +19.7%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-invalid-action-verb 🟢 0.1ms 🟢 0.1ms +11.2%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-invalid-envelope-property 🟢 0.1ms 🟢 0.1ms +6.1%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-invalid-version-format 🟢 0.0ms 🟢 0.0ms +0.5%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-key-invalid-chars 🟢 0.2ms 🟢 0.2ms +2.9%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-name-pattern 🟢 0.0ms 🟢 0.0ms +3.2%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-operation 🟢 0.1ms 🟢 0.1ms +1.4%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-operation-response 🟢 4.7ms 🟢 4.7ms +1.1%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-patch 🟢 0.3ms 🟢 0.3ms +6.5%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-path-segment-invalid-chars 🟢 0.2ms 🟢 0.2ms +2.5%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/arm-resource-provisioning-state 🟢 0.1ms 🟢 0.1ms +4.0%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/beyond-nesting-levels 🟢 0.1ms 🟢 0.1ms +3.9%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/empty-updateable-properties 🟢 0.1ms 🟢 0.1ms +1.6%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/improper-subscription-list-operation 🟢 0.0ms 🟢 0.0ms +21.4%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/lro-location-header 🟡 14.0ms 🟡 14.9ms +6.5%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/missing-operations-endpoint 🟢 0.0ms 🟢 0.0ms -1.5%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/missing-x-ms-identifiers 🟢 0.3ms 🟢 0.3ms +2.3%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/no-empty-model 🟢 0.1ms 🟢 0.1ms +7.2%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/no-resource-delete-operation 🟢 0.2ms 🟢 0.2ms +5.4%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/no-response-body 🔴 22.1ms 🔴 22.9ms +3.7%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/patch-envelope 🟢 0.1ms 🟢 0.1ms +8.0%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/resource-name 🟢 0.1ms 🟢 0.1ms -0.1%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/secret-prop 🟢 2.1ms 🟢 2.2ms +6.9%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/unsupported-type 🟢 0.4ms 🟢 0.4ms -0.2%
 ↳ linter/@azure-tools/typespec-azure-resource-manager/version-progression 🟢 0.0ms 🟢 0.0ms -1.9%
 ↳ linter/@azure-tools/typespec-client-generator-core/property-name-conflict 🟢 1.0ms 🟢 1.0ms +2.9%
 ↳ linter/@azure-tools/typespec-client-generator-core/require-client-suffix 🟢 0.2ms 🟢 0.2ms +6.6%
emit 🟡 245.6ms 🟡 249.5ms +1.6%
 ↳ emit/@azure-tools/typespec-autorest 🟢 154.1ms 🟢 156.4ms +1.5%
 ↳ emit/@typespec/openapi3 🟢 135.4ms 🟢 138.2ms +2.1%
 ↳ emit/@typespec/openapi3/compute 🟢 121.6ms 🟢 123.2ms +1.3%
 ↳ emit/@typespec/openapi3/write 🟢 13.6ms 🟢 15.0ms +10.1% 🔴

Averaged across 3 specs (azure-arm-resource-manager, azure-core-dataplane, azure-full).
Threshold: changes > ±5% are highlighted.
🟢 Fast · 🟡 Moderate (stages >200ms, rules >10ms) · 🔴 Slow (stages >400ms, rules >20ms)

@timotheeguerin timotheeguerin added this pull request to the merge queue May 14, 2026
Merged via the queue into main with commit f97cb0c May 14, 2026
22 checks passed
@timotheeguerin timotheeguerin deleted the copilot/investigate-benchmark-diff-issue branch May 14, 2026 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Benchmark diff still reporting a lot of diff

3 participants