Skip to content

Releases: mudler/LocalAI

v4.5.6

Choose a tag to compare

@mudler mudler released this 30 Jun 15:59
02b007a

What's Changed

👒 Dependencies

Other Changes

  • docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #10560
  • fix(distributed): missing agent NATS permission by @ALameLlama in #10549
  • feat(distributed): SyncedMap component + migrate finetune/quant/agent-tasks to cross-replica state by @localai-bot in #10542
  • chore(fish-speech): drop the darwin/metal build target by @localai-bot in #10561
  • fix(config): fall back to DefaultContextSize for unparseable GGUFs; pin NVFP4 gallery context_size by @localai-bot in #10563
  • ci(vibevoice): skip the ASR transcription e2e on release tag builds by @localai-bot in #10567
  • fix(gallery): match mmproj/model quant as a whole token so F16 no longer selects BF16 (#10559) by @localai-bot in #10564
  • fix(distributed): return empty backend list for agent nodes instead of failing backend.list (#10545) by @localai-bot in #10565
  • feat(distributed): add LOCALAI_DISTRIBUTED_SHARED_MODELS to skip staging on shared volumes (#10556) by @localai-bot in #10566
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 9956436c925a367daeab097598b1ea1f32d3503f by @localai-bot in #10533
  • fix(openresponses): bound resume-stream buffer and enforce response ownership by @localai-bot in #10569
  • chore: ⬆️ Update ggml-org/whisper.cpp to 0ae02cdb2c7317b50991367c165736ce42ed96ac by @localai-bot in #10532
  • chore: ⬆️ Update CrispStrobe/CrispASR to 6514c9da00b03a2f0f1b49a43fae4f3a01a41844 by @localai-bot in #10535
  • chore: ⬆️ Update ggml-org/llama.cpp to 0ed235ea2c17a19fc8238668653946721ed136fd by @localai-bot in #10536
  • fix(ik-llama): port multimodal path to mtmd API and bump to f96eaddb (#10534) by @localai-bot in #10568
  • feat(backends): add voice-detect + face-detect ggml backends (replace Python insightface/speaker-recognition) by @localai-bot in #10441
  • fix(kokoro): add explicit click dep so spacy CLI works on intel build by @localai-bot in #10572
  • fix(launcher): robust binary download/upgrade (resume, rate-limit, UX) by @localai-bot in #10575
  • fix(distributed): missing agent NATS permissions by @ALameLlama in #10571
  • fix(fish-speech): allow invalid_reference_casting so tokenizers builds on darwin by @localai-bot in #10573
  • fix(oci): retry layer downloads on transient network errors by @localai-bot in #10579
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #10585
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to c1790754d31bec0731ed5fddc9d5b9ff22ee19cd by @localai-bot in #10584
  • chore: ⬆️ Update CrispStrobe/CrispASR to 6b50f76e59700665358a1aabf5295597fa318e06 by @localai-bot in #10583
  • chore: ⬆️ Update ggml-org/llama.cpp to dbdaece23de9ac63f2e7ca9e6bfcdc4fc156a3fa by @localai-bot in #10582
  • chore: ⬆️ Update mudler/voice-detect.cpp to 3d510772357538c5182808ac7de2278b84824e24 by @localai-bot in #10581
  • chore: ⬆️ Update mudler/face-detect.cpp to 06914b077d52f90d5421299138e7be6bdd06b5e8 by @localai-bot in #10580
  • chore: ⬆️ Update vllm-metal (darwin) to v0.3.0.dev20260628073537 by @localai-bot in #10562
  • chore(recon): re-pin voice/face-detect to squashed release commits (+ graph-cache fix) by @localai-bot in #10591
  • fix(sglang): parse tool_call function arguments before applying the chat template by @pos-ei-don in #10558
  • feat(realtime): Semantic VAD EOU token by @richiejp in #10444
  • fix(openai): stop max_tokens streaming retry loop on reasoning models (#9716) by @Dennisadira in #10448
  • fix(import): derive model name from selected GGUF for repo-root URIs by @Dennisadira in #10589
  • fix(functions): avoid quadratic-time debug logging in CleanupLLMResult / ParseFunctionCall by @pos-ei-don in #10592
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 3b6c9ca97cfcda8e68e719e6670d06379fcbe943 by @localai-bot in #10594
  • chore: ⬆️ Update ggml-org/llama.cpp to 6f4f53f2b7da54fcdbbecaaa734337c337ad6176 by @localai-bot in #10595
  • chore: ⬆️ Update localai-org/privacy-filter.cpp to 595f59630c69d361b5196f2aba2c71c873d0c13c by @localai-bot in #10596
  • chore: ⬆️ Update CrispStrobe/CrispASR to 3b93758f9725d400eca82976f895e4cec3f31260 by @localai-bot in #10597
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to f74a6fb87b315b2c3154166e075360e15021a61d by @localai-bot in #10598
  • fix(import): strip file:// scheme from model path for local imports by @Dennisadira in #10599
  • fix(tests): align openresponses test model name with GGUF-derived naming (#10589) by @localai-bot in #10609
  • fix(macos): staple the notarization ticket to the .app, not just the dmg by @localai-bot in #10606
  • fix(watchdog): persist UI-saved Check Interval across restarts (#10601) by @localai-bot in #10605
  • feat(config): default swa_full:true for sliding-window-attention models by @localai-bot in #10611

New Contributors

Full Changelog: v4.5.5...v4.5.6

v4.5.5

Choose a tag to compare

@mudler mudler released this 27 Jun 12:42
d11b202

What's Changed

Other Changes

  • fix(backends): repair release CI build/test breaks (kokoros, fish-speech, llama-cpp-quantization, sglang) by @localai-bot in #10547
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #10544
  • fix(backends): whisper darwin run.sh loads whichever fallback lib exists (.so/.dylib) by @localai-bot in #10553

Full Changelog: v4.5.4...v4.5.5

v4.5.4

Choose a tag to compare

@mudler mudler released this 27 Jun 00:06
14b29eb

What's Changed

Other Changes

  • fix(backends): derive darwin RUN_BINARY from the exec line only by @localai-bot in #10541

Full Changelog: v4.5.3...v4.5.4

v4.5.3

Choose a tag to compare

@mudler mudler released this 26 Jun 23:45
f0d0bff

What's Changed

Other Changes

  • feat(macos): sign and notarize the DMG, app, and server binary by @localai-bot in #10510
  • fix(backends): set rpath on the piper darwin binary so it can load its bundled libs by @localai-bot in #10525
  • fix(backends): darwin packaging for silero-vad (last Linux-only Go backend) by @localai-bot in #10528
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #10526
  • fix(nodes): show a node's existing labels on the detail view by @localai-bot in #10529
  • docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #10531
  • chore: ⬆️ Update mudler/parakeet.cpp to f469a57270a1cc4554acb15febf60e56619673b9 by @localai-bot in #10530
  • fix(gpu-libs): bundle transitive deps of GPU runtime libs (#10537) by @localai-bot in #10539
  • fix(distributed): broadcast admin model-config changes across replicas by @localai-bot in #10540
  • fix(llama-cpp): stop reinterpreting plain-string message content as JSON (#10524) by @localai-bot in #10538

Full Changelog: v4.5.2...v4.5.3

v4.5.2

Choose a tag to compare

@mudler mudler released this 26 Jun 09:20
6afe127

What's Changed

Other Changes

  • fix(backends): make the opus backend build and package on macOS/Darwin by @localai-bot in #10523

Full Changelog: v4.5.1...v4.5.2

v4.5.1

Choose a tag to compare

@mudler mudler released this 26 Jun 08:39
f58dcef

What's Changed

Other Changes

  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #10472
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to 7ccf1d209588962b96eacca325b37e9b3e8faf5e by @localai-bot in #10456
  • chore: ⬆️ Update CrispStrobe/CrispASR to 96b2a6ee31d30389fed8a7ef1a54239b75231ddc by @localai-bot in #10465
  • chore: ⬆️ Update ggml-org/llama.cpp to be4a6a63eb2b848e19c277bdcf2bd399e8af76d9 by @localai-bot in #10467
  • chore: ⬆️ Update ggml-org/whisper.cpp to 43d78af5be58f41d6ffbc227d608f104577741ea by @localai-bot in #10466
  • chore: ⬆️ Update mudler/parakeet.cpp to 89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a by @localai-bot in #10468
  • fix(agents): URL-decode collection/agent name path params (#10443) by @localai-bot in #10471
  • fix(distributed): track in-flight for SoundDetection requests by @localai-bot in #10475
  • refactor(distributed): make in-flight tracking coverage a compile-time contract by @localai-bot in #10476
  • fix(pii): load default detectors at startup + add LOCALAI_PII_DEFAULT_DETECTORS by @richiejp in #10474
  • i18n(id): update and complete Indonesian translations by @dedyf5 in #10480
  • fix(realtime): resolve model aliases for pipeline sub-models by @localai-bot in #10484
  • fix(backends): darwin/metal support for supertonic by @localai-bot in #10488
  • feat(backends): add darwin/metal build for liquid-audio by @localai-bot in #10486
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #10495
  • docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #10491
  • feat(ui): usage & UX enhancements (last-used model, polling, starter models, usage cost, a11y) by @localai-bot in #10496
  • fix(config): per-device VRAM headroom for Blackwell defaults (#10485) by @localai-bot in #10494
  • feat(ui): data-driven hardware model recommendations + gallery surfacing by @localai-bot in #10500
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to d5507e33ae7ee2b7b41475f08044d3bde3b839ee by @localai-bot in #10498
  • chore: ⬆️ Update ServeurpersoCom/omnivoice.cpp to 0f37401bebe9b20c0160a888e592108fc1d17607 by @localai-bot in #10492
  • fix(backends): darwin/metal support across purego Go backends by @localai-bot in #10481
  • feat(backends): add darwin/metal (MPS) build for trl by @localai-bot in #10487
  • feat(llama-cpp): cpu_moe/n_cpu_moe options + generic upstream-flag passthrough by @localai-bot in #10490
  • chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to 9dbe7ea26a01b30fccb117ae5e86807c1dc23d42 by @localai-bot in #10499
  • fix: correct scheme/host on self-referential URLs behind an HTTPS reverse proxy (#10482) by @localai-bot in #10504
  • chore: ⬆️ Update ggml-org/llama.cpp to 8be759e6f70d629638a7eb70db3824cbdcea370b by @localai-bot in #10501
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 8caa3f908ae6d4a4bef531e73b9a969f266a3d1f by @localai-bot in #10493
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #10505
  • feat(vllm): macOS/Metal support via vllm-metal (MLX) by @localai-bot in #10489
  • feat: single-build ggml CPU_ALL_VARIANTS for llama-cpp + turboquant (x86/arm64/apple) by @localai-bot in #10497
  • fix(config): gate parallel-slot default on per-device VRAM too (#10485) by @localai-bot in #10507
  • fix(auth): make advisory locks dialect-aware and harden SQLite DSN by @localai-bot in #10509
  • feat(backends): darwin/Metal builds for vision C++/ggml backends (depth-anything, locate-anything, rfdetr-cpp, sam3-cpp) by @localai-bot in #10511
  • feat(backends): darwin build for the localvqe backend (acoustic echo cancellation) by @localai-bot in #10512
  • docs(backends): make OS coverage explicit + require darwin support for new backends by @localai-bot in #10516
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to b84902d2ad27c34f989f23947200c4b91b1568fd by @localai-bot in #10515
  • chore: bump localrecall for postgres per-connection timeouts by @localai-bot in #10517
  • chore: pin localrecall to tagged v0.6.3 by @localai-bot in #10518
  • fix(backends): quote $CURDIR in run.sh (fixes backends in paths with spaces) by @localai-bot in #10519
  • chore: ⬆️ Update CrispStrobe/CrispASR to 8f1218141b792b8868861c1af17ba1e361b05dc0 by @localai-bot in #10502
  • chore: ⬆️ Update ggml-org/llama.cpp to 9d5d882d8cd0f0a9283d87ed5e6fe3ee0d925fb1 by @localai-bot in #10514
  • feat(backends): darwin/Metal build for the privacy-filter backend by @localai-bot in #10513
  • feat(backends): make PreferDevelopmentBackends install the development image as primary by @localai-bot in #10520
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #10521
  • fix(backends): ship the package/ dir for darwin go backend images by @localai-bot in #10522

Full Changelog: v4.5.0...v4.5.1

v4.5.0

Choose a tag to compare

@localai-bot localai-bot released this 23 Jun 21:31
deb430f

🎉 LocalAI 4.5.0 Release! 🚀




LocalAI 4.5.0 is out!

This release widens what LocalAI can perceive, sharpens the realtime voice API, and makes multi-user serving fast with zero configuration. Four new backends land, the React UI redesign ships in full, and distributed mode gets a robustness pass.

Highlights:

  • 👁️ See depth - new depth-anything backend (Depth Anything 3): monocular metric depth + camera pose, with a typed Depth RPC and POST /v1/depth.
  • 🔊 Hear events - new ced backend tags 527 AudioSet sound classes (baby cry, glass breaking, alarms) over REST and a VAD-decoupled realtime stream.
  • 🗣️ Speak on-device - new supertonic ONNX TTS backend: multilingual, espeak-free, fast cold start.
  • 🛡️ Filter PII with NER - new privacy-filter.cpp engine adds named-entity token classification alongside a regex secret detector.
  • 🎙️ Smarter realtime - sessions become speaker-aware (identity surfaced to the client and the LLM) and stay cheap on long calls through summarize-then-drop compaction.
  • ⚙️ Concurrent by default - prefix caching, Blackwell-tuned batch sizes, and VRAM-scaled concurrency turn continuous batching on without any config.
  • 🖼️ A redesigned UI - the UX overhaul lands end to end, while we keep improving user experience release after release.

Plus model aliases, word-level ASR timestamps, self-contained Vulkan backends, ds4 SSD streaming for 128 GB-class models, hardened distributed staging, and a broad set of fixes.

ui-home
The redesigned Home: console with a built-in assistant and chat.


📌 TL;DR

Area Summary
👁️ Depth perception New depth-anything C++/ggml backend (Depth Anything 3) - metric depth + camera pose, typed Depth RPC + POST /v1/depth, 8 GGUFs. Plus Depth Anything V2 gallery models.
🔊 Sound-event tagging New ced backend (CED AudioSet tagger, 527 classes) - POST /v1/audio/classification + VAD-decoupled realtime sound detection.
🗣️ On-device TTS New supertonic ONNX backend - multilingual, no espeak/G2P, 10 voices, fast cold start (CPU).
🛡️ PII gets a NER tier New privacy-filter.cpp backend - encoder/NER token classification scanning whole conversations, alongside a restricted-regex secret detector; NER-centric PII editor in the UI.
🎙️ Smarter realtime Speaker-aware conversations (identity → client and LLM), conversation compaction (summarize-then-drop), and OpenAI item.delete / item.truncate / input_audio_buffer.clear.
⚙️ Multi-user serving by default Prefix caching on by default, Blackwell batch (2048), VRAM-scaled n_parallel (continuous batching on out of the box) - concurrent throughput with no KV blow-up.
🔀 Model aliases Redirect/rename a model name to another configured model, swappable live, no client reconfig.
⏱️ Word-level ASR timestamps NeMo + CrispASR word timestamps, plumbed through the gRPC transcription path.
🖼️ The UI, redesigned A calmer, sharper interface lands end to end: new design language, shell/nav, ops/admin data-viz, sortable/mobile tables, unsaved-changes guards, restructured Cluster Nodes.
🛰️ Distributed staging hardened Cold-load staging detached from the request context (large models actually finish), staging progress broadcast across replicas, resumable downloader.

🚀 New Features & Major Enhancements

👁️ Depth Perception: depth-anything

A new native Go gRPC backend (#10352) dlopens depth-anything.cpp (a ggml port of Depth Anything 3) via purego - no Python at inference - for monocular metric depth + camera pose estimation on CPU. Depth has no native OpenAI endpoint, so the model is exposed three ways:

  • A typed Depth gRPC RPC + POST /v1/depth that returns the full output surface (depth map, stats, camera extrinsics 3×4 / intrinsics 3×3).
  • GenerateImage(src, dst) writes a min-max-normalized grayscale depth PNG.
  • Predict returns the depth + pose JSON blob.

Eight Depth Anything 3 GGUFs ship at mudler/depth-anything.cpp-gguf (base/small/large/giant + a monocular mono-large, q4_k/q8_0/f16/f32), with per-CPU-variant self-contained .so builds and the full hardware matrix (cpu, cuda12/13, intel-sycl, vulkan, l4t-arm64). This cycle also adds Depth Anything V2 gallery models (#10413, native version bump) and metric-large + nested metric entries (#10363).

🔗 PRs: #10352, #10413, #10363.


🔊 Sound-Event Classification: ced

A new backend (#10425) backed by ced.cpp - a C++/ggml port of CED (Xiaomi), a 527-class AudioSet tagger (baby cry, footsteps, glass breaking, alarms, dog bark...) with full PyTorch parity (f32 e2e 1.7e-7) and Apache-2.0 weights. CPU perf: f16 is ~1.55× faster than the PyTorch reference (~100× realtime), q8_0 uses 6.5× less memory.

  • REST: POST /v1/audio/classification (fully capability-registered: swagger, /api/instructions, auth feature, React capabilities.js, docs).
  • Realtime: opt-in pipeline.sound_detection emits conversation.item.sound_detection events, decoupled from VAD (a sound-only session runs with turn_detection: none, activating on sounds not speech), with client-driven or server-side windowing.
  • Gallery: 8 entries (ced-{base,tiny,mini,small}-{f16,q8}, 6 MB → 86 MB) at mudler/ced-gguf.

🔗 PR: #10425.


🗣️ On-Device TTS: supertonic

A new native Go gRPC TTS backend (#10342) runs Supertone's supertonic-3 flow-matching model (4 ONNX graphs) via ONNX Runtime - no Python, no espeak-ng / G2P (text preprocessing is NFKD + a Unicode-codepoint→token-id lookup). Upstream's MIT Go pipeline is vendored at a pinned commit and driven from a LocalAI gRPC server, mirroring sherpa-onnx's ONNX-runtime bundling - small image, fast cold start. Ships a supertonic-3 gallery entry (4 ONNX + 10 voice styles F1-F5/M1-M5, SHA256-pinned), with voice / language request mapping and steps/speed/silence knobs. CPU-only in this release; CUDA wiring is scaffolded for a follow-up.

🔗 PR: #10342.


🛡️ PII Filtering Gets a NER Tier: privacy-filter.cpp

PII filtering moves off the patched llama.cpp TokenClassify path onto a new standalone GGML backend, privacy-filter.cpp (#10360), serving OpenAI Privacy Filter NER token-classification models (CPU/CUDA/Vulkan). The filter is reworked to be NER-centric - an encoder/NER detection tier scans whole conversations as a single document - alongside a bounded restricted-regex secret-matching detector tier. Detections are labelled by source (ner vs pattern) with backend trace / confidence / debug observability, analyze/redact exposed as a synchronous API, and request filtering extended to completions, embeddings, edits and Ollama. The React UI gains a NER-centric PII editor, detector-models table, and middleware default-policy controls; the gallery gets a privacy-filter-multilingual token-classify model + an /import-model auto-detect importer. A post-merge pass (#10401) added live NER e2e coverage and review fixes.

🔗 PRs: #10360, #10401.


🎙️ Realtime Voice: Speaker-Aware and Self-Compacting

Speaker-aware conversations (#10424). The realtime voice-recognition gate now surfaces the recognized speaker to the client (a new conversation.item.speaker event - a non-breaking LocalAI extension) and feeds identity to the LLM for personalized replies (per-message OpenAI name field and/or a The current speaker is <Name>. system note). New pipeline.voice_recognition keys decouple surfacing from authorization: enforce: false resolves and surfaces a speaker without ever dropping a turn, while the gate still fails closed when enforcing. Multi-speaker histories stay correctly attributed (each user item carries its own speaker).

Conversation compaction - summarize-then-drop (#10446). Long realtime sessions used to either feed the whole growing buffer to the LLM (expensive on CPU as it grows) or silently forget old turns. Now the server can fold aged-out turns into a rolling summary instead, via an async, post-turn snapshot → summarize → commit compactor that never holds the conversation lock across the summarizer call and never evicts items without a summary replacing them. Plus the OpenAI-parity history events that were missing: conversation.item.delete, conversation.item.truncate, input_audio_buffer.clear.

pipeline:
  max_history_items: 6          # live window - recent turns kept verbatim
  compaction:
    enabled: true
    trigger_items: 12           # high-water mark; summarize overflow back down
    summary_model: ""           # optional small/cheap CPU model; default = pipeline LLM
    max_summary_tokens: 512

Also: configurable pipeline.max_history_items (#10331) and a WebRTC data-channel max-message-size raise + keep-alive fix (#10407).

🔗 PRs: #10424, #10446, #10331, #10407.


⚙️ Multi-User Serving, On by Default

Two related, config-only (no kernel) changes make concurrent serving fast without any tuning. Both only fill values the user left unset - explicit config always wins.

**Hardware-tune...

Read more

v4.4.3

Choose a tag to compare

@mudler mudler released this 13 Jun 23:27
4d3d54d

What's Changed

Other Changes

  • chore: ⬆️ Update CrispStrobe/CrispASR to d745bda4386ae0f9d1d2f23fff8ec95d76428221 by @localai-bot in #10260
  • docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #10259
  • chore: ⬆️ Update antirez/ds4 to d881f2a05e8ff6bec001315a36b794b4aa310173 by @localai-bot in #10262
  • chore: ⬆️ Update mudler/parakeet.cpp to 9db92be63179a27201d3b88d5d40c545b2ac48ae by @localai-bot in #10263
  • feat(react-ui): add Indonesian language support by @dedyf5 in #10266
  • chore: ⬆️ Update ggml-org/llama.cpp to 4c6595503fe45d5a39f88d194e270f64c7424677 by @localai-bot in #10261
  • feat(backend): locate-anything-cpp (open-vocabulary object detection via ggml) by @localai-bot in #10264
  • fix(router): production-ready request router + auto-size batch for embedding/rerank by @richiejp in #10104
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #10270
  • feat(parakeet-cpp): enable GGML_CUDA_GRAPHS in the cublas build by @localai-bot in #10273
  • fix(darwin): publish sherpa-onnx and speaker-recognition images for darwin/arm64 by @localai-bot in #10275
  • fix(crispasr): write piper TTS WAV at the model's native sample rate by @localai-bot in #10277
  • feat(crispasr): bundle espeak-ng and add piper TTS voices to the gallery by @localai-bot in #10283
  • chore: ⬆️ Update mudler/parakeet.cpp to b8012f11e5269126eddb7f4fd02f891a2ccc29b0 by @localai-bot in #10281
  • docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #10279
  • fix(mlx): route vision-language models to the mlx-vlm backend by @localai-bot in #10274
  • fix(darwin): fix vibevoice-cpp build linkage + fail-safe go backend packaging by @localai-bot in #10276
  • fix(agents): emit chat event timestamps in milliseconds (#9867) by @aniruddh909 in #10243
  • fix(realtime): keep transcription model on a language-only session.update by @localai-bot in #10295
  • chore: ⬆️ Update mudler/locate-anything.cpp to 92c1682da792c1e8a5dec91acc2be4b02c742ded by @localai-bot in #10282
  • fix(config): backend-gate the top_k=40 sampler default (#6632) by @localai-bot in #10285
  • feat(gallery): add 60 piper TTS voices across 42 languages (Phase 2) by @localai-bot in #10296
  • fix(deps): bump cogito to fix MCP image-result panic (#10101) by @localai-bot in #10294
  • fix(neutts): pin torchaudio to match torch (fixes undefined symbol) (#9798) by @localai-bot in #10292
  • fix(gallery): make opus a meta backend for platform auto-selection (#9813) by @localai-bot in #10291
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #10298
  • fix(gallery): correct meta-backend definitions for platform auto-selection by @localai-bot in #10299
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #10302
  • ci(darwin): build the ds4 backend for darwin/arm64 (metal) by @localai-bot in #10303
  • fix(react-ui): stop Talk pipeline overflow and center collapsed-rail avatar by @localai-bot in #10305
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #10304
  • fix(react-ui): make agent chat timestamps format-agnostic (#9867) by @localai-bot in #10290
  • model: fix case-insensitive suffix matching and skip .bak files in ListFilesInModelPath by @pos-ei-don in #10306
  • fix(xsysinfo): container-aware total RAM detection (cgroup/lxcfs) (#8059) by @localai-bot in #10288
  • feat(distributed): declarative per-model scheduling via env/args by @localai-bot in #10308
  • feat(sherpa-onnx): add Kokoro TTS + multilingual Piper voices by @localai-bot in #10309
  • feat(omnivoice-cpp): add OmniVoice TTS backend (file + streaming, voice cloning + voice design) by @localai-bot in #10310
  • feat(i18n): add Korean (ko) translation by @moduvoice in #10312
  • feat(qwen3-tts-cpp): migrate to ServeurpersoCom/qwentts.cpp (streaming, speakers, voice design) by @localai-bot in #10316
  • feat(realtime): gate realtime pipeline voice models behind voice recognition by @localai-bot in #10319
  • chore: ⬆️ Update vllm-project/vllm cu130 wheel to 0.23.0 by @localai-bot in #10314
  • test(e2e): live-server voice-recognition gate test by @localai-bot in #10324

New Contributors

Full Changelog: v4.4.2...v4.4.3

v4.4.2

Choose a tag to compare

@mudler mudler released this 11 Jun 22:22
58cdc05

What's Changed

Other Changes

  • chore: ⬆️ Update ggml-org/llama.cpp to ac4cddeb0dbd778f650bf568f6f08344a06abe3a by @localai-bot in #10239
  • chore: ⬆️ Update CrispStrobe/CrispASR to 4b27392ffd0991a857594652cbb8b57e585bcd7b by @localai-bot in #10241
  • fix(vllm): parse tool_call function arguments before applying the chat template by @pos-ei-don in #10256
  • fix(cuda): install cuda-nvrtc-dev alongside the other CUDA dev packages by @pos-ei-don in #10257

Full Changelog: v4.4.1...v4.4.2

v4.4.1

Choose a tag to compare

@mudler mudler released this 11 Jun 16:33
f618636

What's Changed

Other Changes

  • docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #10245
  • chore: ⬆️ Update antirez/ds4 to 8384adf0f9fa0f3bb342dd925372de778b95b263 by @localai-bot in #10242
  • fix(vllm): restore compatibility with vLLM >= 0.22 (get_tokenizer moved to vllm.tokenizers) by @pos-ei-don in #10252
  • feat(realtime): stream the LLM / TTS / transcription pipeline stages by @localai-bot in #10176
  • docs: fix broken relref to realtime page by @localai-bot in #10255

New Contributors

Full Changelog: v4.4.0...v4.4.1