This PR implements the previously stubbed state management methods in the _internals.py module and updates the corresponding API calls in llama.py to use the correct underlying C++ function names.#2134
Open
bsides230 wants to merge 6 commits intoabetlen:mainfrom
Conversation
Updates the llama.cpp submodule to da348c9df which includes support for
the Qwen 3.5 model architecture (hybrid SSM + attention).
Changes to Python bindings:
1. llama_cpp.py: Sync llama_context_params struct with upstream C API
- flash_attn (bool) → flash_attn_type (enum llama_flash_attn_type)
- Add samplers (void*) and n_samplers (size_t) fields
- Add LLAMA_FLASH_ATTN_TYPE_* enum constants
2. llama.py: Update flash_attn parameter handling
- Map flash_attn=True/False to flash_attn_type=1/0
3. _ctypes_extensions.py: Graceful handling of deprecated symbols
- ctypes_function decorator returns stub instead of crashing
when a symbol is not found in the shared library
Tested with Qwen3.5-0.8B-Q4_K_M.gguf on Apple Silicon (M1 Pro):
- Cold start: ~4s (vs ~40s with mlx-vlm)
- Inference: ~0.6s per chat completion
- Model loads and runs correctly on Metal GPU
… the _internals.py module and updates the corresponding API calls in llama.py to use the correct underlying C++ function names. Key Changes Implemented copy_state_data() method in Llama._ctx that wraps llama_cpp.llama_state_get_data() Implemented set_state_data() method in Llama._ctx that wraps llama_cpp.llama_state_set_data() Updated save_state() method to call llama_state_get_data() instead of the deprecated llama_copy_state_data() and pass the state_size parameter Updated load_state() method to call llama_state_set_data() instead of llama_set_state_data() and pass the state_size parameter Corrected function call in save_state() from llama_get_state_size() to llama_state_get_size() for consistency Implementation Details The changes align the Python wrapper with the underlying C++ API by using the newer llama_state_* function naming convention. The size parameter is now explicitly passed to both copy_state_data() and set_state_data() methods, which is required by the updated C++ interface.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Builds on previous PR https://github.com/abetlen/llama-cpp-python/pull/2133/
Key Changes
Implemented copy_state_data() method in Llama._ctx that wraps llama_cpp.llama_state_get_data()
Implemented set_state_data() method in Llama.ctx that wraps llama_cpp.llama_state_set_data()
Updated save_state() method to call llama_state_get_data() instead of the deprecated llama_copy_state_data() and pass the state_size parameter
Updated load_state() method to call llama_state_set_data() instead of llama_set_state_data() and pass the state_size parameter
Corrected function call in save_state() from llama_get_state_size() to llama_state_get_size() for consistency
Implementation Details
The changes align the Python wrapper with the underlying C++ API by using the newer llama_state* function naming convention. The size parameter is now explicitly passed to both copy_state_data() and set_state_data() methods, which is required by the updated C++ interface.