Skip to content

Mask apt-daily timers in SPEC CPU profiles to prevent VC restarts on Ubuntu 24.04#694

Merged
AlexWFMS merged 2 commits intomainfrom
users/alexwill/mask-apt-daily-ubuntu24
Apr 23, 2026
Merged

Mask apt-daily timers in SPEC CPU profiles to prevent VC restarts on Ubuntu 24.04#694
AlexWFMS merged 2 commits intomainfrom
users/alexwill/mask-apt-daily-ubuntu24

Conversation

@AlexWFMS
Copy link
Copy Markdown
Contributor

@AlexWFMS AlexWFMS commented Apr 23, 2026

Summary

Prepends a MaskAptDailyTimers ExecuteCommand dependency to the four SPEC CPU profiles (FPRATE, FPSPEED, INTRATE, INTSPEED) that masks apt-daily.timer, apt-daily-upgrade.timer, and their services on Linux (x64 + arm64).

Problem

On Ubuntu 24.04, running packed SPEC CPU workloads shows Virtual Client being killed mid-run between 06:00–07:00 UTC, desynchronizing packed workload results. The same experiment configuration on Ubuntu 22.04 does not reproduce.

Root cause chain:

  1. Ubuntu 24.04 default-installs the needrestart package (22.04 does not). Its default config $nrconf{restart} = ''a'' auto-restarts services whose libraries were updated.
  2. apt-daily-upgrade.timer fires at 06:00 UTC with RandomizedDelaySec=60min.
  3. When it installs any package touching a library VC depends on (.NET, libc, openssl), needrestart kills VC via SIGTERM/SIGKILL, and systemd restarts it, producing a second Platform.Initialize event.

Evidence

Queried ~4,700 Ubuntu 24.04 VMs against the production WorkloadDiagnostics/JunoStaging clusters:

  • 29% of Ubuntu 24.04 VMs show a mid-run VC restart.
  • 0% on Ubuntu 22.04 (jammy), Windows, or focal in the same window.
  • 80% of restarts at hour 6 UTC (3,835 of 4,768).
  • Minute-of-hour distribution is uniform 0–59 — the signature of RandomizedDelaySec=60min.
  • 94% of hour-6 restarts had a gap > 2h from the first Platform.Initialize (killed mid-run, not crash-at-start).
  • No pre-restart VC error logs — consistent with external SIGKILL.

Change

{
    "Type": "ExecuteCommand",
    "Parameters": {
        "Scenario": "MaskAptDailyTimers",
        "SupportedPlatforms": "linux-x64,linux-arm64",
        "Command": "bash -c ''systemctl mask apt-daily.timer apt-daily-upgrade.timer apt-daily.service apt-daily-upgrade.service; systemctl stop apt-daily.timer apt-daily-upgrade.timer apt-daily.service apt-daily-upgrade.service; exit 0''"
    }
}
  • SupportedPlatforms filter skips Windows.
  • Wrapped in bash -c ''... ; exit 0'' so it is idempotent and non-fatal on non-Ubuntu Linux distros (CentOS/Suse/etc.) where the units do not exist.
  • Placed as the first dependency so the timers are neutralized before the long SPEC runtime window.

Smoke test (Ubuntu 24.04.4 LTS Azure VM, Standard_D4s_v6)

Ran the exact Command string via az vm run-command:

  • Baseline: needrestart 3.6-7ubuntu4.5 installed; both timers enabled/active; apt-daily-upgrade.timer next-run scheduled at 06:22:56 UTC (matches the observed restart window).
  • After: systemctl is-enabledmasked for all four units; list-timers shows no next-run.
  • Idempotent: second invocation also exited 0.
  • stderr only contained the expected Created symlink /etc/systemd/system/... -> /dev/null lines.

Not changed

  • PERF-SPECJBB.json / PERF-SPECJVM.json / PERF-GPU-SPECVIEW.json / POWER-SPEC*.json — left alone to keep this PR scoped to the four SPEC CPU 2017 profiles where the issue is reported. Can extend later if desired.
  • needrestart config itself — masking the timer is a narrower fix than disabling needrestart globally.

Alex Williams-Ferreira and others added 2 commits April 22, 2026 17:30
…Ubuntu 24.04

On Ubuntu 24.04, apt-daily-upgrade.timer fires between 06:00-07:00 UTC
and triggers needrestart (default installed on 24.04, not 22.04) which
auto-restarts services whose libraries were updated. This has been
observed to SIGKILL VirtualClient mid-run, desynchronizing packed
SPEC CPU experiments and invalidating results.

Analysis of ~4,700 Ubuntu 24.04 SYSAUTO VMs showed ~29% restart rate,
with 80% concentrated at hour 6 UTC and minute-of-hour distribution
uniform 0-59 (signature of RandomizedDelaySec=60min on the timer).
Ubuntu 22.04, Windows, and focal VMs showed 0% restarts in the same
window.

Masking both apt-daily.timer and apt-daily-upgrade.timer (plus their
services) as a dependency step at profile startup removes the trigger.
Filtered to linux-x64,linux-arm64 via SupportedPlatforms.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@AlexWFMS AlexWFMS force-pushed the users/alexwill/mask-apt-daily-ubuntu24 branch from 41b7a3c to 3d0e7e0 Compare April 23, 2026 00:32
@AlexWFMS AlexWFMS enabled auto-merge (squash) April 23, 2026 00:33
@AlexWFMS AlexWFMS merged commit 3b0174b into main Apr 23, 2026
5 checks passed
@AlexWFMS AlexWFMS deleted the users/alexwill/mask-apt-daily-ubuntu24 branch April 23, 2026 17:25
AlexWFMS pushed a commit that referenced this pull request Apr 24, 2026
Problem
-------
On Ubuntu 24.04+, the default installation of `needrestart` combined with the
`apt-daily-upgrade.timer` (fires daily 06:00-07:00 UTC with
RandomizedDelaySec=60min) automatically restarts any service whose shared
libraries are updated by unattended upgrades. For long-running Virtual
Client workloads this manifests as VC being SIGKILL'd mid-run - observed
at ~29% of VMs in CRC SYSAUTO experiments, concentrated at hour 6 UTC
(uniform 0-59 minute distribution, matching the apt timer signature).

Web / distro research confirmed this behavior is specific to Ubuntu 24.04+:
- Ubuntu 24.04+: needrestart installed AND auto-restart-on-unattended-upgrade
  is the default (this is the regression).
- Ubuntu 22.04/22.10/23.04: needrestart installed but list-only in
  non-interactive mode; not known to cause the issue in the field.
- Debian 11/12: needrestart not installed by default.

Fix
---
Add a best-effort startup hook in ExecuteProfileCommand that:
- runs exactly once per VC invocation, immediately after Platform.Initialize
- is a no-op on non-Unix platforms
- parses the Ubuntu major version out of PRETTY_NAME and only runs on >=24
- masks + stops the four apt-daily units via `bash -c "..."` (double-quoted
  because .NET Process argument tokenization follows Windows
  CommandLineToArgvW rules - single quotes do not group)
- swallows any exception so VC startup is never blocked by this mitigation
- logs telemetry (DisabledLinuxAutoUpdates / DisableLinuxAutoUpdatesFailed)

Because the mitigation now runs unconditionally for every profile on the
affected OS, the per-profile MaskAptDailyTimers step added to the SPEC CPU
and FIO profiles in PR #694 is redundant and has been removed.

Bumped VERSION to 3.1.3.

Tests
-----
Added parameterized coverage for TryGetUbuntuMajorVersion (9 cases, all
passing). ExecuteProfileCommandTests: 24/24 pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants