This PR implements the previously stubbed state management methods in the _internals.py module and updates the corresponding API calls in llama.py to use the correct underlying C++ function names. by bsides230 · Pull Request #2134 · abetlen/llama-cpp-python

bsides230 · 2026-03-05T02:30:47Z

Builds on previous PR https://github.com/abetlen/llama-cpp-python/pull/2133/

Key Changes
Implemented copy_state_data() method in Llama._ctx that wraps llama_cpp.llama_state_get_data()
Implemented set_state_data() method in Llama.ctx that wraps llama_cpp.llama_state_set_data()
Updated save_state() method to call llama_state_get_data() instead of the deprecated llama_copy_state_data() and pass the state_size parameter
Updated load_state() method to call llama_state_set_data() instead of llama_set_state_data() and pass the state_size parameter
Corrected function call in save_state() from llama_get_state_size() to llama_state_get_size() for consistency
Implementation Details
The changes align the Python wrapper with the underlying C++ API by using the newer llama_state* function naming convention. The size parameter is now explicitly passed to both copy_state_data() and set_state_data() methods, which is required by the updated C++ interface.

Updates the llama.cpp submodule to da348c9df which includes support for the Qwen 3.5 model architecture (hybrid SSM + attention). Changes to Python bindings: 1. llama_cpp.py: Sync llama_context_params struct with upstream C API - flash_attn (bool) → flash_attn_type (enum llama_flash_attn_type) - Add samplers (void*) and n_samplers (size_t) fields - Add LLAMA_FLASH_ATTN_TYPE_* enum constants 2. llama.py: Update flash_attn parameter handling - Map flash_attn=True/False to flash_attn_type=1/0 3. _ctypes_extensions.py: Graceful handling of deprecated symbols - ctypes_function decorator returns stub instead of crashing when a symbol is not found in the shared library Tested with Qwen3.5-0.8B-Q4_K_M.gguf on Apple Silicon (M1 Pro): - Cold start: ~4s (vs ~40s with mlx-vlm) - Inference: ~0.6s per chat completion - Model loads and runs correctly on Metal GPU

… the _internals.py module and updates the corresponding API calls in llama.py to use the correct underlying C++ function names. Key Changes Implemented copy_state_data() method in Llama._ctx that wraps llama_cpp.llama_state_get_data() Implemented set_state_data() method in Llama._ctx that wraps llama_cpp.llama_state_set_data() Updated save_state() method to call llama_state_get_data() instead of the deprecated llama_copy_state_data() and pass the state_size parameter Updated load_state() method to call llama_state_set_data() instead of llama_set_state_data() and pass the state_size parameter Corrected function call in save_state() from llama_get_state_size() to llama_state_get_size() for consistency Implementation Details The changes align the Python wrapper with the underlying C++ API by using the newer llama_state_* function naming convention. The size parameter is now explicitly passed to both copy_state_data() and set_state_data() methods, which is required by the updated C++ interface.

codavidgarcia and others added 6 commits March 3, 2026 17:31

fix: set BUILD_NUMBER and LLAMA_INSTALL_VERSION for mtmd build

eacc258

fix: return bool from kv_cache_seq_rm for partial removal detection

0124847

fix: handle GDN hybrid models that reject partial memory removal

47aedc2

Update llama.cpp submodule to latest ggml-org

2ee9d3d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This PR implements the previously stubbed state management methods in the _internals.py module and updates the corresponding API calls in llama.py to use the correct underlying C++ function names.#2134

This PR implements the previously stubbed state management methods in the _internals.py module and updates the corresponding API calls in llama.py to use the correct underlying C++ function names.#2134
bsides230 wants to merge 6 commits into
abetlen:mainfrom
bsides230:kv-caching-issue

bsides230 commented Mar 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bsides230 commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bsides230 commented Mar 5, 2026 •

edited

Loading