Skip to content

fix: add Apple M5 family and graceful fallback for unknown Apple silicon#449

Open
sam-shridhar1950f wants to merge 1 commit into
GradientHQ:mainfrom
sam-shridhar1950f:fix/apple-silicon-m5-and-graceful-fallback
Open

fix: add Apple M5 family and graceful fallback for unknown Apple silicon#449
sam-shridhar1950f wants to merge 1 commit into
GradientHQ:mainfrom
sam-shridhar1950f:fix/apple-silicon-m5-and-graceful-fallback

Conversation

@sam-shridhar1950f
Copy link
Copy Markdown

@sam-shridhar1950f sam-shridhar1950f commented May 20, 2026

Closes #439

What

AppleSiliconHardwareInfo.detect() raised RuntimeError on any Apple chip missing from _APPLE_PEAK_FP16, so parallax join crashed on M5 machines:

RuntimeError: Unknown Apple silicon chip 'M5 Max' detected. Please add it to the _APPLE_PEAK_FP16 dictionary.

This PR does two things:

1. Add M5 / M5 Pro / M5 Max to the table

Values are GPU-ALU-only FP16, consistent with the existing M1-M4 rows (which are GPU_cores × per_core_rate. e.g. M4 = 0.852/core across all variants):

Chip GPU cores TFLOPS FP16
M5 10 8.08
M5 Pro 20 16.16
M5 Max 40 32.32

Derivation: max-bin GPU core counts (confirmed via Apple newsroom / Wikipedia) × the M5's measured per-core ALU rate of 0.808 TFLOPS/core, derived from the base M5's ~8.08 TFLOPS theoretical FP16 ALU ceiling (roofline analysis: 1578 MHz, 128 ALUs/core, double-rate FP16). This is flat-to-slightly-below M4 (0.852/core) because M5's gains live in its new per-core Neural Accelerators, not the ALUs.

⚠️ One decision for maintainers

These are GPU-ALU-only figures, matching how M1-M4 were sourced. But M5 introduced per-core Neural Accelerators, which is what MLX uses for matmul during inference. With those engaged, effective FP16 throughput is roughly 2× these values (~70 TFLOPS for M5 Max). Since tflops_fp16 feeds the inference roofline estimate in scheduling/node.py, you may prefer the with-accelerator figures for the M5 rows. I went with ALU-only for consistency with the existing column; happy to switch to ~70/~33/~16 (or add an MLX-aware adjustment) if you'd rather the number reflect real M5 inference throughput.

2. Stop hard-crashing on unknown chips

detect() now estimates FP16 from GPU core count (via system_profiler) at the latest known per-core rate, or a base-chip default if the core count can't be read, and logs a warning instead of raising. This mirrors the NVIDIA path, which already falls back conservatively for unrecognized GPUs (_match_gpu_specs). The Apple path was the lone hard-crasher, so the next unreleased generation degrades gracefully rather than taking down parallax join.

Tests

tests/test_server_info.py covers the table entries and both fallback branches (regression for #439):

  • known chip → tabulated value
  • unknown chip + readable core count → estimated from cores
  • unknown chip + unreadable core count → conservative default
  • unknown chip → does not raise

All pass locally (mocked subprocess/psutil; no hardware dependency). black, isort, and ruff clean.

AppleSiliconHardwareInfo.detect() raised RuntimeError on any Apple chip
missing from _APPLE_PEAK_FP16, so `parallax join` crashed on M5 machines:

    RuntimeError: Unknown Apple silicon chip 'M5 Max' detected.

Two changes:

1. Add M5 / M5 Pro / M5 Max entries. Values are GPU-ALU-only FP16,
   consistent with the M1-M4 rows: max-bin GPU cores (10/20/40) times the
   M5's measured per-core ALU rate (0.808 TFLOPS/core, derived from the
   base M5's ~8.08 TFLOPS theoretical FP16 ALU ceiling). This is flat-to-
   slightly-below M4 because M5's gains live in its new per-core Neural
   Accelerators, not the ALUs.

2. Stop hard-crashing on unknown chips. detect() now estimates FP16 from
   GPU core count (via system_profiler) at the latest known per-core rate,
   or a base-chip default if the core count can't be read, and logs a
   warning. This mirrors the NVIDIA path, which already falls back
   conservatively for unrecognized GPUs, so a future M-series generation
   degrades gracefully instead of taking down `parallax join`.

Adds tests/test_server_info.py covering the table entries and both
fallback branches (regression for the crash).

Closes GradientHQ#439

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sam-shridhar1950f sam-shridhar1950f requested a review from a team May 20, 2026 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: RuntimeError: Unknown Apple silicon chip 'M5 Max' detected. Please add it to the _APPLE_PEAK_FP16 dictionary.

1 participant