Skip to content

Fix: Resolve multimodal BOA/EOA tokens dynamically from config.json#104

Merged
solderzzc merged 3 commits intoSharpAI:mainfrom
roydsouza:fix/moe-memory-and-multimodal-tokens-rebased
May 8, 2026
Merged

Fix: Resolve multimodal BOA/EOA tokens dynamically from config.json#104
solderzzc merged 3 commits intoSharpAI:mainfrom
roydsouza:fix/moe-memory-and-multimodal-tokens-rebased

Conversation

@roydsouza
Copy link
Copy Markdown
Contributor

Description

Currently, the boaToken and eoaToken values are hardcoded to 255010 and 255011 respectively. This causes compatibility issues with newer multimodal models (like Qwen 2-VL) that use different vocab IDs for their vision encoders.

This PR fixes this by dynamically extracting boa_token_id and eoa_token_id from the model's config.json or its audio_config fallback, gracefully defaulting to the old values if not present.

Motivation

Improves compatibility with dynamic and diverse vision-language models without requiring hardcoded token updates.

Testing

  • Tested on Apple Silicon (M5) compiling under swift build -c release.
  • Tested successful extraction against standard LLaVA and Qwen VL models.

@solderzzc
Copy link
Copy Markdown
Member

Thanks for the PR! I've added a missing test file (MultimodalTokenExtractionTests.swift) to ensure the multimodal token extraction logic handles audio_config fallbacks correctly and maintains test coverage. Since the CI previously passed and the logic is sound, we're ready to merge once this new check passes.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates SwiftLM’s multimodal audio token handling to avoid hardcoded BOA/EOA token IDs by resolving them from a model’s config.json (with audio_config fallback), improving compatibility with newer multimodal models.

Changes:

  • Replace hardcoded BOA/EOA token IDs in model factory wiring with dynamically extracted values from config.json.
  • Expand config parsing from “num audio embeddings only” to “num audio + BOA/EOA tokens” via a new helper.
  • Add unit tests covering defaults, top-level config extraction, and audio_config fallback extraction.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
Sources/SwiftLM/Server.swift Uses dynamically extracted BOA/EOA + audio embedding counts when constructing ALM/Omni processors; introduces extractMultimodalTokens.
tests/SwiftLMTests/MultimodalTokenExtractionTests.swift Adds coverage for token extraction defaults and config-driven overrides.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Sources/SwiftLM/Server.swift
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@solderzzc solderzzc merged commit a04b81e into SharpAI:main May 8, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants