feat: add streaming tool use to llama-cpp-python by lsorber · Pull Request #71 · superlinear-ai/raglite

lsorber · 2024-12-25T22:34:49Z

Changes:

✨ Enable streaming tool use for llama-cpp-python models (see below for details). The result is a more simple rag and async_rag implementation that opens the door to Agentic RAG and user-defined tools.
✨ Update the query parameter description to require that it is a single-faceted question (i.e., a non-compound question) to encourage parallel function calling for compound questions.
✅ Add fairly thorough tests for both rag and the improved chatml-function-calling chat handler.

Changes to llama-cpp-python's chatml-function-calling chat handler:

General:
a. ✨ If no system message is supplied, add an empty system message to hold the tool metadata.
b. ✨ Add function descriptions to the system message so that tool use is better informed (fixes chatml-function-callling not adding tool description to the prompt. abetlen/llama-cpp-python#1869).
c. ✨ Replace print statements relating to JSON grammars with RuntimeWarning warnings.
d. ✅ Add tests with fairly broad coverage of the different scenarios.
Case "Tool choice by user":
a. ✨ Add support for more than one function call by making this a special case of "Automatic tool choice" with a single tool (subsumes Support parallel function calls with tool_choice abetlen/llama-cpp-python#1503).
Case "Automatic tool choice -> respond with a message":
a. ✨ Use user-defined stop and max_tokens.
b. 🐛 Replace incorrect use of follow-up grammar with user-defined grammar.
Case "Automatic tool choice -> one or more function calls":
a. ✨ Add support for streaming the function calls (fixes Feature request: add support for streaming tool use abetlen/llama-cpp-python#1883).
b. ✨ Make tool calling more robust by giving the LLM an explicit way to terminate the tool calls by wrapping them in a <function_calls></function_calls> block.
c. 🐛 Add missing ":" stop token to determine whether to continue with another tool call, which prevented parallel function calling (fixes chatml-function-calling chat format fails to generate multi calls to the same tool abetlen/llama-cpp-python#1756).
d. ✨ Set temperature=0 to determine whether to continue with another tool call, similar to the initial decision on whether to call a tool.

lsorber · 2024-12-26T20:38:37Z

Upstream PR to bring these improvements to llama-cpp-python: abetlen/llama-cpp-python#1884

lsorber added 2 commits December 25, 2024 22:44

feat: add streaming tool use to llama-cpp-python

a7a8685

feat: apply llama-cpp-python streaming tool use

8c9dda8

lsorber requested a review from undo76 December 25, 2024 22:34

lsorber self-assigned this Dec 25, 2024

lsorber added 4 commits December 26, 2024 22:06

fix: improve RAG tool description

a465e03

fix: align MCP query param description

6615926

docs: update chatml-function-calling docstring

3e9e933

feat: improve search_knowledge_base description

5b340d6

lsorber merged commit c57aac1 into main Jan 5, 2025

lsorber deleted the ls-streaming-tools branch January 5, 2025 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add streaming tool use to llama-cpp-python#71

feat: add streaming tool use to llama-cpp-python#71
lsorber merged 6 commits intomainfrom
ls-streaming-tools

lsorber commented Dec 25, 2024 •

edited

Loading

Uh oh!

lsorber commented Dec 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lsorber commented Dec 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lsorber commented Dec 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lsorber commented Dec 25, 2024 •

edited

Loading