Feature/49 tool calling pipeline#50
Merged
michalharakal merged 9 commits intodevelopfrom Apr 11, 2026
Merged
Conversation
Phase 1 of the unified pipeline plan. Tool calling no longer requires GGUFTokenizer — any Tokenizer implementation works. - Extend Tokenizer interface with eosTokenId, bosTokenId, vocabSize - Add ChatSession in llm-agent that bundles runtime + tokenizer + metadata - Refactor ToolCallingDemo and AgentCli to accept Tokenizer, not GGUFTokenizer - Remove GGUFTokenizer cast from kllama Main.kt chat/agent/demo dispatch - Fix JavaAgentLoop instanceof hack with tokenizer.eosTokenId - Update all Tokenizer implementations (GGUF, HF BPE, Tekken, BERT) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 3 of the unified pipeline plan. Tokenization is now a standalone pipeline stage in llm-core, independent of any specific runner. - Move GGUFTokenizer from kllama to llm-core/tokenizer package - Add typealias in kllama for backwards compatibility - Create TokenizerFactory with fromGGUF(), fromTokenizerJson(), fromHuggingFace() - Add skainet-io-gguf and kotlinx-io-core dependencies to llm-core Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…detection Phase 2 of the unified pipeline plan. Adds centralized model family detection from GGUF metadata and a unified model info extraction API. - Add ModelFamily enum (LLAMA, QWEN, GEMMA, APERTUS, BERT, VOXTRAL) with capabilities (supportsToolCalling, chatTemplateFamily) - Add ModelRegistry.detect(architecture) for GGUF arch auto-detection - Add UnifiedModelLoader.peek(source) to extract GGUFModelInfo without loading weights (architecture, family, dimensions) DSL network definitions already exist for all major architectures except Gemma3n. CLI migration to OptimizedLLMRuntime is future work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 4 of the unified pipeline plan. New skainet-cli module that auto-detects model architecture from GGUF metadata and supports all modes (generate, chat, agent, demo) for any LLaMA-compatible model. - New llm-apps/skainet-cli module with single entry point - Auto-detects architecture via UnifiedModelLoader.peek() - Supports --chat, --agent, --demo with tool calling for all models - Registered as 'skainet' runner in smoke test script - Existing per-model CLIs preserved (no breaking changes) Usage: skainet -m model.gguf "prompt" # auto-detect and generate skainet -m model.gguf --chat # interactive chat skainet -m model.gguf --demo # tool calling demo Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add tool calling test phase to smoke-test.sh that runs --demo with a prompt for models with toolCalling config. Add Qwen3-8B-Q4 to smoke test config. - Add ToolCallingDemo.runSingleShot() for non-interactive tool calling - Wire --demo with positional prompt to single-shot mode in kllama and skainet CLIs - Add tool calling section to smoke-test.sh with [Tool Call] detection - Add skainet runner to smoke-test.sh runner_args - Increase kllama-cli memory to -Xmx42g -XX:MaxDirectMemorySize=64g - Add Qwen3-8B-Q4 and toolCalling config to smoke-models.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AsciiDoc documentation in Antora site format with four Divio categories: Tutorials: - Getting started with the skainet CLI - Tool calling with any model via ChatSession - Running smoke tests How-to Guides: - Add a new model architecture (DSL vs hand-coded) - Add a compute backend - Add a custom tool - Use the unified CLI Reference: - Architecture overview and module structure - Inference pipeline stages - Tokenizer API and TokenizerFactory - ChatSession API - Model Registry and UnifiedModelLoader - CLI reference (skainet + model-specific CLIs) Explanation: - Pipeline design decisions (why stages are separated) - DSL networks vs hand-coded runtimes (trade-offs) - Tokenizer internals (SentencePiece, BPE, WordPiece) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add docs.yml workflow: builds on push to main/develop, deploys to GitHub Pages from develop branch - Uses dockerized Antora 3.1 with asciidoctor-kroki for Mermaid diagrams - Add antora-playbook.yml with Kroki server integration - Convert ASCII diagrams to Mermaid in pipeline, architecture, tool-calling, and pipeline-design pages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements the unified inference pipeline for SKaiNET Transformers, resolving #49 and building on the tool calling foundation from #46. ## Summary This branch decouples tool calling from the kllama runner, creates a unified model pipeline with architecture auto-detection, and adds comprehensive Antora documentation. ### Phase 1: Decouple Tool Calling - Enhance Tokenizer interface with eosTokenId, bosTokenId, vocabSize - Create ChatSession abstraction in llm-agent (any runner gets tool calling for free) - Refactor ToolCallingDemo and AgentCli to accept Tokenizer, not GGUFTokenizer - Fix JavaAgentLoop instanceof hack ### Phase 2: Model Registry - Add ModelFamily enum (LLAMA, QWEN, GEMMA, APERTUS, BERT, VOXTRAL) - Add ModelRegistry.detect() for GGUF architecture auto-detection - Add UnifiedModelLoader.peek() to extract model info without loading ### Phase 3: Tokenization Pipeline - Move GGUFTokenizer from kllama to llm-core (all runners can use it) - Create TokenizerFactory with fromGGUF(), fromTokenizerJson(), fromHuggingFace() ### Phase 4: Unified CLI - New skainet-cli module: single entry point for all GGUF models - Auto-detects architecture, supports --chat/--agent/--demo modes ### Smoke Tests - Add tool calling test phase with [Tool Call] detection - Add ToolCallingDemo.runSingleShot() for non-interactive testing - Add Qwen3-8B-Q4 to smoke test config ### Documentation (Antora + Divio) - 19 AsciiDoc pages: tutorials, how-to, reference, explanation - Mermaid diagrams via Kroki for pipeline, architecture, agent loop - GitHub Actions workflow for docs build and GitHub Pages deployment Refs: #46 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Custom image based on node:20-alpine with: - Antora 3.1 site generator - asciidoctor-kroki for diagram blocks - @mermaid-js/mermaid-cli with Chromium for local SVG rendering - No external Kroki server dependency The GitHub Actions workflow builds the image from docs/.docker/Dockerfile then uses it to generate the site. Mermaid diagrams are rendered locally inside the container. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat: unified model pipeline with decoupled tool calling (#49)
Implements the unified inference pipeline for SKaiNET Transformers,
resolving #49 and building on the tool calling foundation from #46.
Summary
This branch decouples tool calling from the kllama runner, creates a
unified model pipeline with architecture auto-detection, and adds
comprehensive Antora documentation.
Phase 1: Decouple Tool Calling
calling for free)
GGUFTokenizer
Phase 2: Model Registry
Phase 3: Tokenization Pipeline
fromHuggingFace()
Phase 4: Unified CLI
Smoke Tests
Documentation (Antora + Divio)