Skip to content

macOS / Apple M5: all models fail to load with 500 error due to Metal backend crash (bfloat/half mismatch in ggml_metal_library_init) #643

@diy-nerd

Description

@diy-nerd

What happened?
Since today, Ollama stopped loading any model at all on my Mac. Initially I suspected a corrupted model state after a memory exhaustion / crash situation, because earlier I had been experimenting with a model and pushed RAM usage close to the limit.

After that incident, I had to reboot the machine. Since then, every model fails to load, including very small ones.

At first I thought it might be similar to this issue:

ollama/ollama#15410
However, after checking manifests/blobs and then doing a full reinstall plus deleting ~/.ollama, the problem still persists.

The error is always:

bash

Collapse
Save
Copy
1
Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details
This turned out not to be a RAM/resource issue. The server log shows the runner crashing during Metal library initialization.

Environment
Ollama version: 0.20.5
OS: macOS
Chip / GPU: Apple M5
RAM: 32 GB
Available memory at load time: ~16 GB free
Install method: reinstalled Ollama app, also deleted ~/.ollama
Relevant log excerpt shows:

text

Collapse
Save
Copy
1
2
3
4
GPU name: Apple M5
GPU family: MTLGPUFamilyApple10
GPU family: MTLGPUFamilyCommon3
GPU family: MTLGPUFamilyMetal4
Steps to reproduce
Start Ollama normally on macOS Apple M5
Pull and run any model, for example:
bash

Collapse
Save
Copy
1
ollama run nemotron-3-nano:4b
or even a smaller one.

Model downloads successfully
Load fails with:
bash

Collapse
Save
Copy
1
Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details
What I already tried
Rebooted the Mac
Reinstalled Ollama
Deleted entire ~/.ollama
Pulled a fresh model again
Confirmed RAM is available
Problem affects all models, not just one specific model
So this does not appear to be corrupted model files or simple memory pressure.

Relevant logs
The important part seems to be a Metal shader/library compilation failure, followed by runner abort:

text

Collapse
Save
Copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3
...
static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>'
"Input types must match cooperative tensor types"
...
static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>'
"Input types must match cooperative tensor types"
...
ggml_metal_init: error: failed to initialize the Metal library
ggml_backend_metal_device_init: error: failed to allocate context
ggml-backend.cpp:258: GGML_ASSERT(backend) failed
SIGABRT: abort
...
time=2026-04-10T21:27:35.557+02:00 level=ERROR source=server.go:316 msg="llama runner terminated" error="exit status 2"
time=2026-04-10T21:27:35.557+02:00 level=ERROR source=server.go:1219 msg="do load request" error="Post "http://127.0.0.1:50255/load\": EOF"
time=2026-04-10T21:27:35.557+02:00 level=INFO source=sched.go:511 msg="Load failed" model=/Users/david/.ollama/models/blobs/sha256-... error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
There are also repeated lines like:

text

Collapse
Save
Copy
1
2
ggml_metal_init: the device does not have a precompiled Metal library - this is unexpected
ggml_metal_init: will try to compile it on the fly
and then compilation fails.

Why I think this is a Metal / Apple M5 compatibility bug
This does not look like a generic OOM/resource problem because:

plenty of RAM is free
reinstall + deleting ~/.ollama does not help
all models fail the same way
the crash happens specifically during ggml_metal_library_init
the log points to type mismatch in Metal tensor ops (bfloat vs half)
this machine uses a very new GPU family: Apple M5 / MTLGPUFamilyApple10
So it looks like Ollama / ggml Metal backend may currently be incompatible with this Apple GPU / Metal toolchain combination.

Expected behavior
Models should load normally, or Ollama should gracefully fall back instead of crashing the runner.

Actual behavior
Any model load request fails with HTTP 500 because the runner crashes during Metal backend initialization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions