feat: add streaming tool use to llama-cpp-python#71
Merged
Conversation
Member
Author
|
Upstream PR to bring these improvements to llama-cpp-python: abetlen/llama-cpp-python#1884 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes:
ragandasync_ragimplementation that opens the door to Agentic RAG and user-defined tools.queryparameter description to require that it is a single-faceted question (i.e., a non-compound question) to encourage parallel function calling for compound questions.ragand the improvedchatml-function-callingchat handler.Changes to llama-cpp-python's
chatml-function-callingchat handler:a. ✨ If no system message is supplied, add an empty system message to hold the tool metadata.
b. ✨ Add function descriptions to the system message so that tool use is better informed (fixes chatml-function-callling not adding tool description to the prompt. abetlen/llama-cpp-python#1869).
c. ✨ Replace
printstatements relating to JSON grammars withRuntimeWarningwarnings.d. ✅ Add tests with fairly broad coverage of the different scenarios.
a. ✨ Add support for more than one function call by making this a special case of "Automatic tool choice" with a single tool (subsumes Support parallel function calls with tool_choice abetlen/llama-cpp-python#1503).
a. ✨ Use user-defined
stopandmax_tokens.b. 🐛 Replace incorrect use of follow-up grammar with user-defined grammar.
a. ✨ Add support for streaming the function calls (fixes Feature request: add support for streaming tool use abetlen/llama-cpp-python#1883).
b. ✨ Make tool calling more robust by giving the LLM an explicit way to terminate the tool calls by wrapping them in a
<function_calls></function_calls>block.c. 🐛 Add missing ":" stop token to determine whether to continue with another tool call, which prevented parallel function calling (fixes chatml-function-calling chat format fails to generate multi calls to the same tool abetlen/llama-cpp-python#1756).
d. ✨ Set temperature=0 to determine whether to continue with another tool call, similar to the initial decision on whether to call a tool.