Feature/49 tool calling pipeline by michalharakal · Pull Request #50 · SKaiNET-developers/SKaiNET-transformers

michalharakal · 2026-04-11T10:57:03Z

feat: unified model pipeline with decoupled tool calling (#49)
Implements the unified inference pipeline for SKaiNET Transformers,
resolving #49 and building on the tool calling foundation from #46.

Summary

This branch decouples tool calling from the kllama runner, creates a
unified model pipeline with architecture auto-detection, and adds
comprehensive Antora documentation.

Phase 1: Decouple Tool Calling

Enhance Tokenizer interface with eosTokenId, bosTokenId, vocabSize
Create ChatSession abstraction in llm-agent (any runner gets tool
calling for free)
Refactor ToolCallingDemo and AgentCli to accept Tokenizer, not
GGUFTokenizer
Fix JavaAgentLoop instanceof hack

Phase 2: Model Registry

Add ModelFamily enum (LLAMA, QWEN, GEMMA, APERTUS, BERT, VOXTRAL)
Add ModelRegistry.detect() for GGUF architecture auto-detection
Add UnifiedModelLoader.peek() to extract model info without loading

Phase 3: Tokenization Pipeline

Move GGUFTokenizer from kllama to llm-core (all runners can use it)
Create TokenizerFactory with fromGGUF(), fromTokenizerJson(),
fromHuggingFace()

Phase 4: Unified CLI

New skainet-cli module: single entry point for all GGUF models
Auto-detects architecture, supports --chat/--agent/--demo modes

Smoke Tests

Add tool calling test phase with [Tool Call] detection
Add ToolCallingDemo.runSingleShot() for non-interactive testing
Add Qwen3-8B-Q4 to smoke test config

Documentation (Antora + Divio)

19 AsciiDoc pages: tutorials, how-to, reference, explanation
Mermaid diagrams via Kroki for pipeline, architecture, agent loop
GitHub Actions workflow for docs build and GitHub Pages deployment

Phase 1 of the unified pipeline plan. Tool calling no longer requires GGUFTokenizer — any Tokenizer implementation works. - Extend Tokenizer interface with eosTokenId, bosTokenId, vocabSize - Add ChatSession in llm-agent that bundles runtime + tokenizer + metadata - Refactor ToolCallingDemo and AgentCli to accept Tokenizer, not GGUFTokenizer - Remove GGUFTokenizer cast from kllama Main.kt chat/agent/demo dispatch - Fix JavaAgentLoop instanceof hack with tokenizer.eosTokenId - Update all Tokenizer implementations (GGUF, HF BPE, Tekken, BERT) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phase 3 of the unified pipeline plan. Tokenization is now a standalone pipeline stage in llm-core, independent of any specific runner. - Move GGUFTokenizer from kllama to llm-core/tokenizer package - Add typealias in kllama for backwards compatibility - Create TokenizerFactory with fromGGUF(), fromTokenizerJson(), fromHuggingFace() - Add skainet-io-gguf and kotlinx-io-core dependencies to llm-core Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…detection Phase 2 of the unified pipeline plan. Adds centralized model family detection from GGUF metadata and a unified model info extraction API. - Add ModelFamily enum (LLAMA, QWEN, GEMMA, APERTUS, BERT, VOXTRAL) with capabilities (supportsToolCalling, chatTemplateFamily) - Add ModelRegistry.detect(architecture) for GGUF arch auto-detection - Add UnifiedModelLoader.peek(source) to extract GGUFModelInfo without loading weights (architecture, family, dimensions) DSL network definitions already exist for all major architectures except Gemma3n. CLI migration to OptimizedLLMRuntime is future work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phase 4 of the unified pipeline plan. New skainet-cli module that auto-detects model architecture from GGUF metadata and supports all modes (generate, chat, agent, demo) for any LLaMA-compatible model. - New llm-apps/skainet-cli module with single entry point - Auto-detects architecture via UnifiedModelLoader.peek() - Supports --chat, --agent, --demo with tool calling for all models - Registered as 'skainet' runner in smoke test script - Existing per-model CLIs preserved (no breaking changes) Usage: skainet -m model.gguf "prompt" # auto-detect and generate skainet -m model.gguf --chat # interactive chat skainet -m model.gguf --demo # tool calling demo Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add tool calling test phase to smoke-test.sh that runs --demo with a prompt for models with toolCalling config. Add Qwen3-8B-Q4 to smoke test config. - Add ToolCallingDemo.runSingleShot() for non-interactive tool calling - Wire --demo with positional prompt to single-shot mode in kllama and skainet CLIs - Add tool calling section to smoke-test.sh with [Tool Call] detection - Add skainet runner to smoke-test.sh runner_args - Increase kllama-cli memory to -Xmx42g -XX:MaxDirectMemorySize=64g - Add Qwen3-8B-Q4 and toolCalling config to smoke-models.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

AsciiDoc documentation in Antora site format with four Divio categories: Tutorials: - Getting started with the skainet CLI - Tool calling with any model via ChatSession - Running smoke tests How-to Guides: - Add a new model architecture (DSL vs hand-coded) - Add a compute backend - Add a custom tool - Use the unified CLI Reference: - Architecture overview and module structure - Inference pipeline stages - Tokenizer API and TokenizerFactory - ChatSession API - Model Registry and UnifiedModelLoader - CLI reference (skainet + model-specific CLIs) Explanation: - Pipeline design decisions (why stages are separated) - DSL networks vs hand-coded runtimes (trade-offs) - Tokenizer internals (SentencePiece, BPE, WordPiece) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add docs.yml workflow: builds on push to main/develop, deploys to GitHub Pages from develop branch - Uses dockerized Antora 3.1 with asciidoctor-kroki for Mermaid diagrams - Add antora-playbook.yml with Kroki server integration - Convert ASCII diagrams to Mermaid in pipeline, architecture, tool-calling, and pipeline-design pages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implements the unified inference pipeline for SKaiNET Transformers, resolving #49 and building on the tool calling foundation from #46. ## Summary This branch decouples tool calling from the kllama runner, creates a unified model pipeline with architecture auto-detection, and adds comprehensive Antora documentation. ### Phase 1: Decouple Tool Calling - Enhance Tokenizer interface with eosTokenId, bosTokenId, vocabSize - Create ChatSession abstraction in llm-agent (any runner gets tool calling for free) - Refactor ToolCallingDemo and AgentCli to accept Tokenizer, not GGUFTokenizer - Fix JavaAgentLoop instanceof hack ### Phase 2: Model Registry - Add ModelFamily enum (LLAMA, QWEN, GEMMA, APERTUS, BERT, VOXTRAL) - Add ModelRegistry.detect() for GGUF architecture auto-detection - Add UnifiedModelLoader.peek() to extract model info without loading ### Phase 3: Tokenization Pipeline - Move GGUFTokenizer from kllama to llm-core (all runners can use it) - Create TokenizerFactory with fromGGUF(), fromTokenizerJson(), fromHuggingFace() ### Phase 4: Unified CLI - New skainet-cli module: single entry point for all GGUF models - Auto-detects architecture, supports --chat/--agent/--demo modes ### Smoke Tests - Add tool calling test phase with [Tool Call] detection - Add ToolCallingDemo.runSingleShot() for non-interactive testing - Add Qwen3-8B-Q4 to smoke test config ### Documentation (Antora + Divio) - 19 AsciiDoc pages: tutorials, how-to, reference, explanation - Mermaid diagrams via Kroki for pipeline, architecture, agent loop - GitHub Actions workflow for docs build and GitHub Pages deployment Refs: #46 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Custom image based on node:20-alpine with: - Antora 3.1 site generator - asciidoctor-kroki for diagram blocks - @mermaid-js/mermaid-cli with Chromium for local SVG rendering - No external Kroki server dependency The GitHub Actions workflow builds the image from docs/.docker/Dockerfile then uses it to generate the site. Mermaid diagrams are rendered locally inside the container. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

michalharakal and others added 9 commits April 11, 2026 10:25

michalharakal merged commit f9110fb into develop Apr 11, 2026
1 of 4 checks passed

michalharakal deleted the feature/49-tool-calling-pipeline branch April 11, 2026 10:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/49 tool calling pipeline#50

Feature/49 tool calling pipeline#50
michalharakal merged 9 commits intodevelopfrom
feature/49-tool-calling-pipeline

michalharakal commented Apr 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phase 1: Decouple Tool Calling

Phase 2: Model Registry

Phase 3: Tokenization Pipeline

Phase 4: Unified CLI

Smoke Tests

Documentation (Antora + Divio)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

michalharakal commented Apr 11, 2026 •

edited

Loading