Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
151 commits
Select commit Hold shift + click to select a range
e489a5c
server: support OAI /v1/audio/transcriptions API (#21863)
ngxson Apr 14, 2026
6a6780a
vulkan: Support GGML_TYPE_NVFP4 (#21455)
jeffbolznv Apr 14, 2026
56666fa
common: skip reasoning budget sampler when no budget is requested (#2…
berkidem Apr 14, 2026
5a23695
ggml-webgpu: Update register tiling matmul to use f32 accumulation (#…
reeselevine Apr 14, 2026
acc37a4
cmake: fix CMP0194 warning on Windows with MSVC (#21630)
texasich Apr 14, 2026
2e05f06
ggml : fix ARM NEON nvfp4 dot product on non-dotprod targets (#21559)
richarddd Apr 14, 2026
be76dd0
vendor : update BoringSSL to 0.20260413.0 (#21881)
angt Apr 14, 2026
aa0f189
metal : add XIELU unary op (#20802)
seyoungjeong Apr 14, 2026
f4b5bf2
ci : re-enable mac workflows (#21894)
ggerganov Apr 14, 2026
1f30ac0
vulkan: Programmatically add RoundingModeRTE to all shaders when the …
jeffbolznv Apr 14, 2026
707c0b7
mtmd: add mtmd_image_tokens_get_decoder_pos() API (#21851)
ngxson Apr 14, 2026
c0de6ed
metal : fix FA support logic (#21898)
ggerganov Apr 14, 2026
fae3a28
ggml : remove ggml-ext.h (#21869)
ngxson Apr 14, 2026
5d14e5d
hexagon: optimization for HMX mat_mul (#21554)
njsyw1997 Apr 14, 2026
e39eba2
read n_ctx back after making llama_context (#21939)
smashedpumpkin Apr 15, 2026
e1a9a6d
autoparser: support case of JSON_NATIVE with per-call markers (test c…
pwilkin Apr 15, 2026
8dc530b
ci: disable test-backend-ops on Vulkan llvmpipe run and resture defau…
0cc4m Apr 15, 2026
80d8770
docs: more extensive RoPE documentation [no ci] (#21953)
ngxson Apr 15, 2026
adb541a
rpc : add native RDMA transport for RPC backend (RoCEv2) (#20590)
dvv101111 Apr 15, 2026
014dca4
CUDA: manage NCCL communicators in context (#21891)
JohannesGaessler Apr 15, 2026
a620695
CUDA: require explicit opt-in for P2P access (#21910)
JohannesGaessler Apr 15, 2026
20d3bc2
ggml-webgpu: Fix dequantization helpers to not pass in pointers (#21872)
reeselevine Apr 15, 2026
7e72b38
cuda: Q1_0 initial backend (#21629)
khosravipasha Apr 15, 2026
b3d7587
vulkan: optimize im2col (#21713)
0cc4m Apr 15, 2026
408225b
server: use random media marker (#21962)
ngxson Apr 15, 2026
b1be68e
[SYCL] Fix Q8_0 reorder: garbage on 2nd prompt + crash on full VRAM (…
PMZFX Apr 16, 2026
8612ed1
ci : Use ggml-org/ccache-action on RISC-V as well (#21632)
luhenry Apr 16, 2026
82677a6
ggml-webgpu: compute pass batching and removing profiling overhead (#…
reeselevine Apr 16, 2026
90fb96a
devops : added spirv-headers to nix (#21965)
yuannan Apr 16, 2026
5637536
ggml : implemented simd_gemm kernel for riscv vector extension (#20627)
rehan-10xengineer Apr 16, 2026
1e796eb
ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot …
rehan-10xengineer Apr 16, 2026
ae2d348
metal: Implement ROLL op (#21946)
kushagharahi Apr 16, 2026
3f7c29d
ggml: add graph_reused (#21764)
am17an Apr 16, 2026
03b3d07
Convert: Fix NemotronH Config Parsing (#21664)
anavp-nvidia Apr 16, 2026
b572d1e
codeowners: add team member comments (#21714)
0cc4m Apr 16, 2026
f772f6e
model : support NVFP4 tensors for Gemma4 (#21971)
CISC Apr 16, 2026
9db77a0
model : refactor QKV into common build_qkv and create_tensor_qkv help…
JoursBleu Apr 16, 2026
4adac43
server: tests: fetch random media marker via /apply-template (#21962)…
ServeurpersoCom Apr 16, 2026
e45dbde
opencl: add q5_K gemm and gemv kernels for Adreno (#21595)
shaofeiqi Apr 16, 2026
4fbdabd
model: using single llm_build per arch (#21970)
ngxson Apr 16, 2026
85dde8d
hexagon: optimize HMX matmul operations (#21071)
chraac Apr 16, 2026
089dd41
cmake: use glob to collect src/models sources (#22005)
ngxson Apr 16, 2026
30dce2c
cli : use get_media_marker (#22017)
CISC Apr 16, 2026
5e6c0e1
opencl: refactor q8_0 set_tensor and mul_mat host side dispatch for A…
lhez Apr 17, 2026
fcc7508
model : Gemma4 model type detection (#22027)
EZForever Apr 17, 2026
6990e2f
libs : rename libcommon -> libllama-common (#21936)
ggerganov Apr 17, 2026
268d61e
mtmd: add missing struct tag (#22023)
65a Apr 17, 2026
a279d0f
ci : add android arm64 build and release (#21647)
ykhrustalev Apr 17, 2026
b94050e
CUDA: use LRU based eviction for cuda graphs (#21611)
am17an Apr 17, 2026
45cac7c
ggml-webgpu: fix compiler warnings and refactor FlashAttention encodi…
reeselevine Apr 17, 2026
fd1c0ec
llama: fit ctx size for CPU only (#21568)
JohannesGaessler Apr 18, 2026
89a5474
convert : fix (ignore for now) typings errors (#22002)
CISC Apr 18, 2026
83d58e0
ci : free disk space for rocm release (#22012)
CISC Apr 18, 2026
59accc8
ggml-backend-meta: add multi-segment read support in get_tensor (#22063)
ssam18 Apr 18, 2026
23b8cc4
android : libcommon -> libllama-common (#22076)
CISC Apr 18, 2026
4f02d47
model : refactor bias tensor variable names (#22079)
CISC Apr 18, 2026
9e5647a
server: Expose `media_tag` on /props endpoint. (#22028)
cetarthoriphros Apr 18, 2026
91fef95
rpc : refactor the RPC transport (#21998)
rgerganov Apr 19, 2026
455d8e4
server : speculative checkpointing (#19493)
srogmann Apr 19, 2026
09b4efa
cmake: remove CMP0194 policy to restore MSVC builds (#21934)
texasich Apr 19, 2026
8685e7b
convert : support sentence-transformer 5.4 config files (#22087)
Bing-su Apr 19, 2026
037bfe3
ci : install spirv-headers for vulkan-cross (#22109)
CISC Apr 19, 2026
bcdcc10
ggml : reduce CPU overhead in meta backend (#22041)
gaugarg-nv Apr 19, 2026
1912407
mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos (breaking change…
ngxson Apr 19, 2026
471540a
HIP: Remove unesscary NCCL_CHECK (#21914)
IMbackK Apr 19, 2026
d5b780a
common/autoparser : allow space after tool call (#22073)
aldehir Apr 19, 2026
4eac5b4
CUDA: refactor mma data loading for AMD (#22051)
JohannesGaessler Apr 19, 2026
e365e65
vendor : update cpp-httplib to 0.42.0 (#21781)
cabelo Apr 19, 2026
9d49acb
server: rename --clear-idle to --cache-idle-slots (#21741)
yychyo Apr 20, 2026
788fcbc
[SYCL] Fix reorder MMVQ assert on unaligned vocab sizes (#22035)
PMZFX Apr 20, 2026
de71b5f
server : refactor "use checkpoint" logic (#22114)
ggerganov Apr 20, 2026
81df3f7
fix: GLM-DSA crash in llama-tokenize when using vocab_only (#22102)
ssam18 Apr 20, 2026
a678916
mtmd: refactor mtmd_decode_use_mrope (#22161)
ngxson Apr 20, 2026
a6cc43c
ggml-webgpu: updated matrix-vector multiplication (#21738)
neha-ha Apr 20, 2026
7f251fd
ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) (#21636)
pl752 Apr 20, 2026
fb19f94
TP: fix 0-sized tensor slices, AllReduce fallback (#21808)
JohannesGaessler Apr 20, 2026
fd6ae4c
Tensor-parallel: Fix delayed AllReduce on Gemma-4 MoE (#22129)
gaugarg-nv Apr 20, 2026
cf8b0db
server : remove /api endpoints (#22165)
ggerganov Apr 20, 2026
86f8daa
mtmd: correct get_n_pos / get_decoder_pos (#22175)
ngxson Apr 20, 2026
9789512
ggml-cuda: flush legacy pool on OOM and retry (#22155)
leonardHONG Apr 20, 2026
ff6b106
server : fix hardcoded proxy connection timeout in router mode (#1876…
xris99 Apr 21, 2026
cfe9838
fit-params : refactor + add option to output estimated memory per dev…
ggerganov Apr 21, 2026
041fe83
ggml : bump version to 0.10.0 (ggml/1463)
ggerganov Apr 21, 2026
4889afb
sync : ggml
ggerganov Apr 21, 2026
cd03ec7
llama-ext : fix exports (#22202)
ggerganov Apr 21, 2026
9998d88
mtmd: correct mtmd_decode_use_mrope() (#22188)
ngxson Apr 21, 2026
82209ef
vulkan: Support F16 OP_FILL (#22177)
jeffbolznv Apr 21, 2026
7fc1c4e
metal : workaround macOS GPU interactivity watchdog (#22216)
ggerganov Apr 21, 2026
606fa42
vendor : update cpp-httplib to 0.43.1 (#22143)
cabelo Apr 21, 2026
52f1096
openvino: driver setup, CI split, thread safety, and NPU optimization…
wine99 Apr 21, 2026
84652b8
arg : add --spec-default (#22223)
ggerganov Apr 21, 2026
98d2d28
mtmd: Add support for Reka Edge 2603 (#21616)
kwajiehao Apr 21, 2026
72d693e
spec : reset i_last when low acceptance streak occurs (#22168)
treo Apr 21, 2026
2248799
hexagon: fix missing v79 entry in libggml-htp.inf (#22194)
mengshengwu Apr 21, 2026
5a4cd67
Hexagon: DAIG op (#22195)
shreyajn Apr 21, 2026
04fe84b
server: allow cancel loading model (#21814)
ngxson Apr 21, 2026
2799d93
ggml-webgpu: reset CPU/GPU profiling time when freeing context (#22050)
yomaytk Apr 21, 2026
0dedb9e
hexagon: add support for FILL op (#22198)
aparmp-quic Apr 21, 2026
ca7f7b7
ggml-webgpu(shader): support conv2d kernels. (#21964)
Constannnnnt Apr 22, 2026
134d6e5
common/chat, server: refactor, move all conversion functions to commo…
pwilkin Apr 22, 2026
750579f
common: Refactoring sampler parameters (#20429) (#22233)
ezturner Apr 22, 2026
7bfe60f
mtmd, llama : Update HunyuanVL vision-language model support (#22037)
ManaEstras Apr 22, 2026
17f6245
server: ignore reasoning content from transcription api (#21905)
ngxson Apr 22, 2026
82d3f4d
mtmd: also support LLAMA_ROPE_TYPE_NONE (#22242)
ngxson Apr 22, 2026
225088e
sycl: Improve mul_mat_id memory efficiency and add BF16 fast path (#2…
qnixsynapse Apr 22, 2026
bcb5eeb
speculative-simple : add checkpoint support (#22227)
ggerganov Apr 22, 2026
8bccdbb
chat: fix parallel_tool_calls default setting based on model capabili…
pwilkin Apr 22, 2026
6da7168
ggml-webgpu: Add fused RMS_NORM + MUL (#21983)
yomaytk Apr 22, 2026
0d0764d
[WebGPU] Implement async tensor api and event api (#22099)
nikhilJain17 Apr 22, 2026
6217b49
HIP: flip GGML_HIP_GRAPHS to default on (#22254)
IMbackK Apr 23, 2026
86db42e
CUDA: fuse relu + sqr (#22249)
anavp-nvidia Apr 23, 2026
b76429a
ggml-webgpu: add support for im2col (#22259)
Constannnnnt Apr 23, 2026
60b68a6
sycl : fused MoE mul_mat_vec_q for TG (#21920)
abotsis Apr 23, 2026
5eaee65
convert : Handle ModelOpt produced mixed precision model during conve…
ynankani Apr 23, 2026
4ead6fd
[SYCL] Update oneapi 2025.3.3, Seperate SYCL build, release Ubuntu 24…
NeoZhangJianyu Apr 23, 2026
96c1db2
ggml-base: use MATH_LIBRARY variable instead of hardcoded 'm' (#22239)
ggerganov Apr 23, 2026
930e021
gitignore: add AGENTS.local.md (#22246)
ggerganov Apr 23, 2026
8635e22
metal : fix event synchronization (#22260)
ggerganov Apr 23, 2026
550d684
server: Enable transcriptions API for LFM2-Audio (#22000)
tdakhran Apr 23, 2026
0dd7f91
cli : cleanup auto-completion code (#21745)
matthiasstraka Apr 23, 2026
9012c50
model-conversion : fix mmproj output file name [no ci] (#22274)
danbev Apr 23, 2026
0949beb
fix build number for sycl release (#22283)
CISC Apr 23, 2026
c807c6e
server: (anthropic API) fix prefix caching (#21793)
kvc0 Apr 23, 2026
12568ca
vendor : update LibreSSL to 4.3.1 (#22285)
angt Apr 23, 2026
c78fb90
server: fix heap-buffer-overflow from negative n_discard (CVE-2026-21…
SongTonyLi Apr 23, 2026
185cbff
server : convert_anthropic_to_oai: also copy chat_template_kwargs (#2…
Soreepeong Apr 23, 2026
187a456
Enable testing on Snapdragon devices (#21051)
shreyajn Apr 23, 2026
5d2b52d
hexagon: add support for basic and extended Op profiling (#22269)
max-krasnyansky Apr 23, 2026
fa0b8a7
cli: Remove redundant local sampling variables (#20429) (#22264)
ezturner Apr 23, 2026
e5f070a
fix(shader): handle the buffer aliasing for rms fuse (#22266)
Constannnnnt Apr 23, 2026
8bc492e
hexagon: add SOLVE_TRI op (#21974)
mengshengwu Apr 24, 2026
793d0a7
server: rename debug tags to match --cache-idle-slots naming (#22292)
yychyo Apr 24, 2026
ffdd983
server : fix swa-full logic (#22288)
ggerganov Apr 24, 2026
017f090
jinja : remove unused header (#22310)
ggerganov Apr 24, 2026
e583f3b
ggml : minor coding style (#22308)
ggerganov Apr 24, 2026
dc80c52
common : fix jinja warnings with clang 21 (#22313)
angt Apr 24, 2026
15fa3c4
metal : print GPU description (#22318)
ggerganov Apr 24, 2026
f65bc34
hexagon: use DIRID 13 in libggml-htp.inf for modern InfVerif (#22306)
mengshengwu Apr 24, 2026
13d36cf
ggml-webgpu: enable FLASH_ATTN_EXT on browser without subgroup matrix…
ArberSephirotheca Apr 24, 2026
a702f39
CI Snapdragon: Switch ubuntu-latest to ubuntu-slim runner (#22303)
shreyajn Apr 24, 2026
361fe72
Hexagon: Bump HMX Frequency to Max Corner (#22334)
trivikram-reddy1 Apr 24, 2026
0adede8
parser: fix structured output bug (#22302)
pwilkin Apr 24, 2026
dd2914d
ggml-webgpu: support for SSM_SCAN and disable set_rows error checking…
reeselevine Apr 25, 2026
eddd7a1
[SYCL] Optimize Q4_0 mul_mat for Arc770, add scripts (#22291)
arthw Apr 25, 2026
8ea8fee
gitignore : add .pi + personal SYSTEM.md (#22316)
ggerganov Apr 25, 2026
9d34231
llama-quant : default ftype param `Q5_1` --> `Q8_0` (#20828)
ddh0 Apr 25, 2026
d164904
metal : optimize Metal Tensor API usage for GGML_OP_MUL_MAT (#20962)
Developer-Ecosystem-Engineering Apr 25, 2026
9725a31
CUDA: reduce MMQ stream-k overhead (#22298)
JohannesGaessler Apr 25, 2026
98dc141
spec : fix vocab compat checks (#22358)
ggerganov Apr 25, 2026
dcad77c
chat: fix handling of space in reasoning markers (#22353)
pwilkin Apr 25, 2026
4be2164
feat:upgrade
lochjin Apr 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG ONEAPI_VERSION=2025.3.2-0-devel-ubuntu24.04
ARG ONEAPI_VERSION=2025.3.3-0-devel-ubuntu24.04

## Build Image

Expand Down
2 changes: 2 additions & 0 deletions .devops/nix/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
vulkan-loader,
openssl,
shaderc,
spirv-headers,
useBlas ?
builtins.all (x: !x) [
useCuda
Expand Down Expand Up @@ -145,6 +146,7 @@ effectiveStdenv.mkDerivation (finalAttrs: {
ninja
pkg-config
git
spirv-headers
]
++ optionals useCuda [
cudaPackages.cuda_nvcc
Expand Down
50 changes: 48 additions & 2 deletions .devops/openvino.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,19 @@ ARG OPENVINO_VERSION_MAJOR=2026.0
ARG OPENVINO_VERSION_FULL=2026.0.0.20965.c6d6a13a886
ARG UBUNTU_VERSION=24.04

# Optional proxy build arguments - empty by default
# Intel GPU driver versions. https://github.com/intel/compute-runtime/releases
ARG IGC_VERSION=v2.30.1
ARG IGC_VERSION_FULL=2_2.30.1+20950
ARG COMPUTE_RUNTIME_VERSION=26.09.37435.1
ARG COMPUTE_RUNTIME_VERSION_FULL=26.09.37435.1-0
ARG IGDGMM_VERSION=22.9.0

# Intel NPU driver versions. https://github.com/intel/linux-npu-driver/releases
ARG NPU_DRIVER_VERSION=v1.32.0
ARG NPU_DRIVER_FULL=v1.32.0.20260402-23905121947
ARG LIBZE1_VERSION=1.27.0-1~24.04~ppa2

# Optional proxy build arguments
ARG http_proxy=
ARG https_proxy=

Expand Down Expand Up @@ -78,13 +90,47 @@ ARG http_proxy
ARG https_proxy

RUN apt-get update \
&& apt-get install -y libgomp1 libtbb12 curl \
&& apt-get install -y libgomp1 libtbb12 curl wget ocl-icd-libopencl1 \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete

# Install GPU drivers
ARG IGC_VERSION
ARG IGC_VERSION_FULL
ARG COMPUTE_RUNTIME_VERSION
ARG COMPUTE_RUNTIME_VERSION_FULL
ARG IGDGMM_VERSION
RUN mkdir /tmp/neo/ && cd /tmp/neo/ \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-core-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-opencl-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-ocloc-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-ocloc_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-opencl-icd-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-opencl-icd_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libigdgmm12_${IGDGMM_VERSION}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libze-intel-gpu1-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libze-intel-gpu1_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& dpkg --install *.deb \
&& rm -rf /tmp/neo/

# Install NPU drivers
ARG NPU_DRIVER_VERSION
ARG NPU_DRIVER_FULL
ARG LIBZE1_VERSION
RUN mkdir /tmp/npu/ && cd /tmp/npu/ \
&& wget https://github.com/intel/linux-npu-driver/releases/download/${NPU_DRIVER_VERSION}/linux-npu-driver-${NPU_DRIVER_FULL}-ubuntu2404.tar.gz \
&& tar -xf linux-npu-driver-${NPU_DRIVER_FULL}-ubuntu2404.tar.gz \
&& dpkg --install *.deb \
&& rm -rf /tmp/npu/

RUN cd /tmp \
&& wget https://snapshot.ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu/20260324T100000Z/pool/main/l/level-zero-loader/libze1_${LIBZE1_VERSION}_amd64.deb \
&& dpkg --install libze1_${LIBZE1_VERSION}_amd64.deb \
&& rm libze1_${LIBZE1_VERSION}_amd64.deb

COPY --from=build /app/lib/ /app/

### Full (all binaries)
Expand Down
2 changes: 1 addition & 1 deletion .devops/vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ RUN apt update && apt install -y git build-essential cmake wget xz-utils

# Install SSL and Vulkan SDK dependencies
RUN apt install -y libssl-dev curl \
libxcb-xinput0 libxcb-xinerama0 libxcb-cursor-dev libvulkan-dev glslc
libxcb-xinput0 libxcb-xinerama0 libxcb-cursor-dev libvulkan-dev glslc spirv-headers

# Build it
WORKDIR /app
Expand Down
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

<!-- You can provide more details and link related discussions here. Delete this section if not applicable -->

# Requirements
## Requirements

<!-- IMPORTANT: Please do NOT delete this section, otherwise your PR may be rejected -->

Expand Down
116 changes: 116 additions & 0 deletions .github/workflows/build-and-test-snapdragon.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
name: CI (snapdragon)

on:
workflow_dispatch:
push:
branches:
- master
paths:
- '.github/workflows/build-and-test-snapdragon.yml'
- 'ggml/include/ggml-hexagon.h'
- 'ggml/src/ggml-hexagon/**'
- 'docs/backend/snapdragon/**'
- 'scripts/snapdragon/**'
- 'CMakePresets.json'

pull_request:
types: [opened, synchronize, reopened]
paths:
- '.github/workflows/build-and-test-snapdragon.yml'
- 'ggml/include/ggml-hexagon.h'
- 'ggml/src/ggml-hexagon/**'
- 'docs/backend/snapdragon/**'
- 'scripts/snapdragon/**'
- 'CMakePresets.json'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true

jobs:
android-ndk-snapdragon:
runs-on: ubuntu-latest
container:
image: 'ghcr.io/snapdragon-toolchain/arm64-android:v0.3'
defaults:
run:
shell: bash

steps:
- name: Clone
uses: actions/checkout@v6
with:
fetch-depth: 0
lfs: false

- name: Build Llama.CPP for Snapdragon Android
id: build_llama_cpp_snapdragon_android
run: |
cp docs/backend/snapdragon/CMakeUserPresets.json .
cmake --preset arm64-android-snapdragon-release -B build
cmake --build build
cmake --install build --prefix pkg-snapdragon/llama.cpp

- name: Upload Llama.CPP Snapdragon Android Build Artifact
if: ${{ always() && steps.build_llama_cpp_snapdragon_android.outcome == 'success' }}
uses: actions/upload-artifact@v6
with:
name: llama-cpp-android-arm64-snapdragon
path: pkg-snapdragon/llama.cpp

test-snapdragon-qdc:
name: Test on QDC Android Device (${{ matrix.device }})
needs: [android-ndk-snapdragon]
runs-on: ubuntu-slim
strategy:
fail-fast: false
matrix:
device: [SM8750, SM8650, SM8850]

steps:
- name: Checkout
uses: actions/checkout@v6

- name: Download build artifact
uses: actions/download-artifact@v7
with:
name: llama-cpp-android-arm64-snapdragon
path: pkg-snapdragon/llama.cpp

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.x'
cache: pip

- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y curl unzip

- name: Install QDC SDK wheel
run: |
curl -fSL -o qdc_sdk.zip https://softwarecenter.qualcomm.com/api/download/software/tools/Qualcomm_Device_Cloud_SDK/All/0.2.3/qualcomm_device_cloud_sdk-0.2.3.zip
unzip qdc_sdk.zip -d qdc_sdk
pip install qdc_sdk/qualcomm_device_cloud_sdk-0.2.3-py3-none-any.whl

- name: Check QDC API key
id: check_secret
env:
QDC_API_KEY: ${{ secrets.QDC_API_KEY }}
run: echo "has-qdc-key=${{ env.QDC_API_KEY != '' }}" >> "$GITHUB_OUTPUT"

- name: Run QDC tests (${{ matrix.device }})
if: steps.check_secret.outputs.has-qdc-key == 'true'
run: |
python scripts/snapdragon/qdc/run_qdc_jobs.py \
--test all \
--pkg-dir pkg-snapdragon/llama.cpp \
--model-url "https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_0.gguf" \
--device ${{ matrix.device }}
env:
QDC_API_KEY: ${{ secrets.QDC_API_KEY }}

- name: Cleanup
if: always()
run: rm -rf pkg-snapdragon qdc_sdk qdc_sdk.zip
51 changes: 19 additions & 32 deletions .github/workflows/build-android.yml
Original file line number Diff line number Diff line change
@@ -1,26 +1,24 @@
name: CI (android)

on:
workflow_dispatch: # allows manual triggering
workflow_dispatch:
push:
branches:
- master
paths: [
'.github/workflows/build-android.yml',
'**/CMakeLists.txt',
'**/.cmake',
'**/*.h',
'**/*.hpp',
'**/*.c',
'**/*.cpp'
]
paths:
- '.github/workflows/build-android.yml'
- '**/CMakeLists.txt'
- '**/.cmake'
- '**/*.h'
- '**/*.hpp'
- '**/*.c'
- '**/*.cpp'

pull_request:
types: [opened, synchronize, reopened]
paths: [
'.github/workflows/build-android.yml',
'examples/llama.android/**'
]
paths:
- '.github/workflows/build-android.yml'
- 'examples/llama.android/**'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
Expand Down Expand Up @@ -51,7 +49,7 @@ jobs:
distribution: zulu

- name: Setup Android SDK
uses: android-actions/setup-android@9fc6c4e9069bf8d3d10b2204b1fb8f6ef7065407 # v3
uses: android-actions/setup-android@40fd30fb8d7440372e1316f5d1809ec01dcd3699 # v4.0.1
with:
log-accepted-android-sdk-licenses: false

Expand All @@ -67,35 +65,24 @@ jobs:
defaults:
run:
shell: bash
strategy:
matrix:
include:
- build: 'arm64-cpu'
defines: '-D ANDROID_ABI=arm64-v8a -D ANDROID_PLATFORM=android-31 -D CMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_ROOT}/build/cmake/android.toolchain.cmake -D GGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=armv8.5-a+fp16+i8mm -G Ninja -D LLAMA_OPENSSL=OFF -D GGML_OPENMP=OFF'
- build: 'arm64-snapdragon'
defines: '--preset arm64-android-snapdragon-release'

steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
with:
fetch-depth: 0
lfs: false

- name: Build Llama.CPP for Hexagon Android
id: build_llama_cpp_hexagon_android
- name: Build
id: ndk_build
run: |
if [[ "${{ matrix.build }}" == "arm64-snapdragon" ]]; then
cp docs/backend/snapdragon/CMakeUserPresets.json .
fi
cmake ${{ matrix.defines }} -B build
cmake -D ANDROID_ABI=arm64-v8a -D ANDROID_PLATFORM=android-31 -D CMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_ROOT}/build/cmake/android.toolchain.cmake -D GGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=armv8.5-a+fp16+i8mm -G Ninja -D LLAMA_OPENSSL=OFF -D GGML_OPENMP=OFF -B build
cmake --build build
cmake --install build --prefix pkg-adb/llama.cpp

- name: Upload Llama.CPP Hexagon Android Build Artifact
if: ${{ always() && steps.build_llama_cpp_hexagon_android.outcome == 'success' }}
- name: Upload Android Build Artifact
if: ${{ always() && steps.ndk_build.outcome == 'success' }}
uses: actions/upload-artifact@v6
with:
name: llama-cpp-android-${{ matrix.build }}
name: llama-cpp-android-arm64-cpu
path: pkg-adb/llama.cpp
1 change: 1 addition & 0 deletions .github/workflows/build-cross.yml
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@ jobs:
apt-get install -y --no-install-recommends \
build-essential \
glslc \
spirv-headers \
gcc-14-loongarch64-linux-gnu \
g++-14-loongarch64-linux-gnu \
libvulkan-dev:loong64
Expand Down
Loading