feat: implement multi-credential priority call design by fengluodb · Pull Request #2468 · volcengine/OpenViking

fengluodb · 2026-06-05T11:11:26Z

Description

为 OpenViking 客户端（ov.conf）增加多模型凭证 + 按序自动 fallback能力：同一个 VLM / Embedding 模型可配置一个扁平有序的凭证列表（数组顺序即调用优先级，每条带稳定 id），调用失败时按错误类型自动 fallback
到下一条凭证，并在高优先级凭证冷却后自动 failback。

Related Issue

Type of Change

[✓] New feature (non-breaking change that adds functionality)
[✓] Bug fix (non-breaking change that fixes an issue)
[✓] Refactoring (no functional changes)

Changes Made

核心特性

• 新增 OrderedCredentialSwitcher（N 级有序凭证切换状态机，含逐级 failback），MultiCredentialVLM 与 FailoverEmbedder 包装器；配置层 VLMConfig.credentials / EmbeddingModelConfig.credentials，旧 backup
字段自动归一化兼容。
• 鉴权方式从旧顶层配置切换为 credentials 时，模型/维度未变可无损复用已有向量（allow_metadata_override，默认 False，dimension 变化仍强制 rebuild）。

错误分类与切换策略

• 拆分错误类：PERMANENT(400 参数错误) / 新增 AUTH(401/403/欠费) / 新增 CONTENT_SAFETY / INPUT_TOO_LARGE / QUOTA_EXCEEDED / TRANSIENT / UNKNOWN。
• 切换策略：请求级错误（400 / 输入过大 / 内容安全）fail-fast 并 re-raise 原始异常；凭证级 AUTH 与 QUOTA_EXCEEDED 前移；TRANSIENT 用尽底层重试后前移；UNKNOWN 保守前移。
• circuit breaker 对 AUTH 同样立即熔断。

评审整改

• get_active_index() 改为纯读、无副作用；新增 maybe_failback() 专司 failback（切换时打 info 日志）。
• 移除全局 total_max_retries，凭证耗尽仅由 idx >= n 判定（修复 >10 凭证提前失败）。
• EmbeddingCredential 移除 6 个模型行为字段（修复 or 合并吞掉 0/False）。
• FailoverEmbedder.get_dimension 抛错替代魔法值 2048。
• token usage 改用公开访问器；修复 embedder 共享单例 tracker 导致的重复累加。

Testing

[✓] I have added tests that prove my fix is effective or that my feature works
[✓] New and existing unit tests pass locally with my changes
[✓] I have tested this on the following platforms:
[✓] macOS

测试：test_ordered_credential_switcher / test_vlm_failover / test_failover_embedder_permanent_advance / test_embedding_credential_model_override / test_model_retry / test_circuit_breaker，共 127 passed。

Checklist

[✓] My code follows the project's coding style
[✓] I have performed a self-review of my code
[✓] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation（ov.conf 文档/示例待补）
[✓] My changes generate no new warnings
[ ] Any dependent changes have been merged and published

Additional Notes

• 设计前提：同一模型的所有凭证指向同一逻辑模型（endpoint 可不同）。因此请求级错误换凭证无意义 → fail-fast；凭证级错误（key 失效/欠费）才前移。
• 每条凭证的瞬时重试由其底层实例的 max_retries 负责，无全局重试上限。
• 向后兼容：旧 backup、旧顶层鉴权配置均自动归一化为 credentials，存量配置零改动。

github-actions · 2026-06-05T11:13:24Z

PR Reviewer Guide 🔍

(Review updated until commit `159b88c`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🏅 Score: 85
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Add multi-credential support for embedders Relevant files: openviking_cli/utils/config/embedding_config.py openviking/models/embedder/base.py tests/unit/test_failover_embedder_permanent_advance.py tests/unit/test_embedding_credential_model_override.py Sub-PR theme: Add multi-credential support for VLMs Relevant files: openviking_cli/utils/config/vlm_config.py openviking/models/vlm/base.py openviking/models/vlm/init.py tests/unit/test_vlm_failover.py Sub-PR theme: Implement OrderedCredentialSwitcher and error classification updates Relevant files: openviking/utils/model_retry.py tests/unit/test_ordered_credential_switcher.py
⚡ Recommended focus areas for review Potential regression in legacy backup config handling When migrating legacy backup config to credentials, the code clears `self.backup` before checking if there are explicit credentials. This could lead to loss of backup config if explicit credentials are added but migration logic runs first. Verify that the migration order preserves explicit credentials. def _normalize_credentials(self): """Normalize credentials configuration: 1. Migrate legacy backup config to credentials 2. Migrate top-level credentials to credentials list 3. Assign default IDs to credentials without IDs 4. Apply top-level defaults to credentials """ migrated_credentials = [] # Step 1: Migrate legacy backup config if self.backup is not None and self.backup._has_any_config(): # Primary credential from top-level primary_cred = VLMCredential( id="legacy-primary", provider=self.provider, model=self.model, api_key=self.api_key, api_base=self.api_base, api_version=self.api_version, forward_api_key=self.forward_api_key, extra_headers=self.extra_headers, extra_request_body=self.extra_request_body, stream=self.stream, ) migrated_credentials.append(primary_cred) # Backup credential backup_cred = VLMCredential( id="legacy-backup", provider=self.backup.provider, model=self.backup.model, api_key=self.backup.api_key, api_base=self.backup.api_base, api_version=self.backup.api_version, forward_api_key=self.backup.forward_api_key, extra_headers=self.backup.extra_headers, extra_request_body=self.backup.extra_request_body, stream=self.backup.stream, ) migrated_credentials.append(backup_cred) # Clear backup to avoid double processing self.backup = None # Step 2: If no credentials from backup migration and no explicit credentials, # create credentials from top-level config if not migrated_credentials and not self.credentials: if self._has_legacy_provider_config(): migrated_credentials.append( VLMCredential( id="default", provider=self.provider, model=self.model, api_key=self.api_key, api_base=self.api_base, api_version=self.api_version, forward_api_key=self.forward_api_key, extra_headers=self.extra_headers, extra_request_body=self.extra_request_body, stream=self.stream, ) ) # Step 3: Merge migrated credentials with existing credentials if migrated_credentials: # Only use migrated if no explicit credentials exist if not self.credentials: self.credentials = migrated_credentials # Step 4: Assign default IDs and apply top-level defaults Inconsistent token usage handling between FailoverEmbedder and MultiCredentialVLM FailoverEmbedder returns only the first embedder's token usage (claiming shared singleton), while MultiCredentialVLM merges all instances' trackers. Verify that embedder token trackers are actually shared to avoid undercounting. def get_token_usage(self) -> Dict[str, Any]: """Get combined token usage across credential instances. Embedders share a process-wide singleton token tracker (see ``_get_token_tracker``), so every wrapped embedder reports the same usage. Return a single instance's usage directly; merging would double-count. """ if not self._embedders: return {} return self._embedders[0].get_token_usage()

github-actions · 2026-06-05T11:15:19Z

PR Code Suggestions ✨

Latest suggestions up to 159b88c
Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix logger usage in MultiCredentialVLM to use get_logger The `MultiCredentialVLM` class uses `logging.getLogger(name)` directly, which is inconsistent with the rest of the codebase that uses `get_logger` from `openviking_cli.utils`. Additionally, the `logging` module is not imported, which would cause a runtime error. Let's fix this by using `get_logger` instead. openviking/models/vlm/base.py [12-622] from openviking.utils.model_retry import ( OrderedCredentialSwitcher, PrimaryBackupSwitcher, classify_api_error, ) from openviking_cli.utils import get_logger ... - self._logger = logging.getLogger(__name__) + self._logger = get_logger(__name__) Suggestion importance[1-10]: 6 __ Why: Identifies a missing import for `logging` and inconsistent logger usage. The fix uses the already imported `get_logger` from `openviking_cli.utils`, aligning with the rest of the codebase and resolving a potential runtime error.	Low

Previous suggestions

Suggestions up to commit 2cefa7a

Category	Suggestion	Impact
General	Add logger and centralize AllCredentialsFailedError 1) Add loguru logger initialization to this file (used by OrderedCredentialSwitcher). 2) Move AllCredentialsFailedError here to avoid duplicate definitions in embedder and vlm base files. openviking/utils/model_retry.py [340] -class OrderedCredentialSwitcher: +import threading\nimport time\nfrom typing import Any, List, Optional\n\nfrom openviking_cli.utils import get_logger\n\nlogger = get_logger(__name__)\n\n\nclass AllCredentialsFailedError(Exception):\n """Raised when all credentials in the chain have failed."""\n\n def __init__(self, errors: list[tuple[str, str, Exception, int]]):\n """Initialize the error with a list of credential failures.\n\n Args:\n errors: List of tuples containing (credential_id, error_class, exception, attempts)\n """\n self.errors = errors\n message = "All credentials failed:\\n" + "\\n".join(\n f" - {cred_id}: {error_class} - {exc}" for cred_id, error_class, exc, attempts in errors\n )\n super().__init__(message)\n\n\nclass OrderedCredentialSwitcher: Suggestion importance[1-10]: 6 __ Why: Centralizes `AllCredentialsFailedError` to avoid duplicate definitions in embedder and VLM base files, and adds proper logger initialization, improving code maintainability and consistency.	Low
	Use loguru logger for consistent logging Replace standard library logging with loguru logger (used throughout the codebase) for consistent structured logging. openviking/models/vlm/base.py [638] - self._logger = logging.getLogger(__name__) + from openviking_cli.utils import get_logger\n self._logger = get_logger(__name__) Suggestion importance[1-10]: 5 __ Why: Replaces standard library logging with `get_logger` (used elsewhere in the codebase) for consistent logging, a minor but useful maintainability improvement.	Low
	Import centralized exception class Remove duplicate AllCredentialsFailedError and import it from openviking.utils.model_retry instead. openviking/models/embedder/base.py [649-662] -class AllCredentialsFailedError(Exception):\n """Raised when all credentials in the chain have failed."""\n\n def __init__(self, errors: list[tuple[str, str, Exception, int]]):\n """Initialize the error with a list of credential failures.\n\n Args:\n errors: List of tuples containing (credential_id, error_class, exception, attempts)\n """\n self.errors = errors\n message = "All credentials failed:\\n" + "\\n".join(\n f" - {cred_id}: {error_class} - {exc}" for cred_id, error_class, exc, attempts in errors\n )\n super().__init__(message) +from openviking.utils.model_retry import AllCredentialsFailedError Suggestion importance[1-10]: 5 __ Why: Removes duplicate exception definition and supports importing from a centralized location, improving code consistency, though corresponding changes in other files (e.g., `vlm/base.py`, import files) would also be needed for full implementation.	Low

Add OrderedCredentialSwitcher for N-credential failover, MultiCredentialVLM and FailoverEmbedder for credential switching, support automatic migration from legacy backup config

…itcher

Separate the monolithic PERMANENT error class so the switcher reacts to the actual root cause: - 400 (request-level parameter error) -> PERMANENT, fail-fast: same request fails on every credential of the same model. - 401/403/unauthorized/accountoverdue -> new AUTH class: credential-level, advances to the next credential in multi-credential mode. - new CONTENT_SAFETY class (moderation rejections) and INPUT_TOO_LARGE -> fail-fast: switching credentials cannot help. classify_api_error now checks CONTENT_SAFETY before PERMANENT so a moderation message containing "400" is not misclassified. On fail-fast the failover wrappers re-raise the original exception (preserving type/info) instead of wrapping it, so callers can react (e.g. truncate on input_too_large). AllCredentialsFailedError is reserved for chain exhaustion. The legacy PrimaryBackupSwitcher also switches on AUTH to keep existing backup behavior.

get_active_index() previously mutated state: when a failback threshold was met it would decrement the active index. Because observability properties (active_credential_index / active_credential_id) call it, merely reading the current credential for logging or metrics could accidentally advance the failback state machine. Split the concern: get_active_index() is now a pure read, and a new public maybe_failback() performs the one-step failback (and logs an info line when the active credential index changes). The request loops in MultiCredentialVLM and FailoverEmbedder call maybe_failback() at the top of each attempt, so the failback behavior is unchanged while pure reads no longer have side effects.

The failover loops exited on `idx >= n OR total_attempts >= total_max_retries` (default 10). With more than 10 credentials, or when failback churn inflated the attempt count, this could raise AllCredentialsFailedError before every credential had actually been tried, leaving lower-priority credentials unused. Remove total_max_retries entirely from MultiCredentialVLM and FailoverEmbedder: credential exhaustion is now decided solely by reaching the end of the chain (idx >= n), and per-credential retries remain the responsibility of each underlying instance via its own max_retries. The aggregated error tuple now records the failing credential index instead of the attempt counter. Adds a regression test covering more than 10 credentials all being tried.

encoding_format, model_path, cache_dir, enable_fusion, res_level and max_video_frames describe how a model runs, not which credential is used. All credentials of a single embedding model share the same model, so these belong on the parent EmbeddingModelConfig, not on each credential. Keeping them on the credential forced a `cred.X or config.X` merge in _create_failover_embedder, which silently dropped explicit falsy values (enable_fusion=False, res_level=0, max_video_frames=0) and fell back to the parent value. Remove these six fields from EmbeddingCredential and read them directly from the parent config when building per-credential embedders. id/provider/model/ api_key/api_base/api_version/ak/sk/region/host/extra_headers remain credential-level.

FailoverEmbedder.get_dimension() returned a hardcoded 2048 when the first embedder had no get_dimension(). That path is reached only when wrapping sparse embedders, which have no fixed dense dimension; returning a fabricated 2048 silently feeds a wrong dimension to callers (e.g. schema creation). Delegate to the first embedder and raise AttributeError when it has no get_dimension(), surfacing the misuse instead of hiding it.

Two issues in the cross-instance token usage merge: 1. Encapsulation: FailoverVLM / MultiCredentialVLM / FailoverEmbedder reached into other instances' private _token_tracker. Add a public token_tracker accessor on VLMBase and use it in the VLM mergers. 2. Double counting in FailoverEmbedder: embedders share a process-wide singleton token tracker (_get_token_tracker), so all wrapped embedders point at the same object. Merging N identical trackers inflated usage N-fold. Return a single instance's usage directly instead of merging.

Splitting 401/403/unauthorized/accountoverdue out of PERMANENT into the new AUTH class (commit b477b9f) regressed the circuit breaker: it only tripped immediately on PERMANENT/QUOTA_EXCEEDED, so auth errors no longer opened the breaker right away. For a single embedding instance an auth failure (key invalid / no permission / overdue) is persistent and retrying is pointless, so the breaker should still trip immediately. Add ERROR_CLASS_AUTH to the immediate-trip set and update the classification tests to assert the new AUTH class (403 still trips the breaker).

Without setting _last_switch_time, maybe_failback() retreats to idx 0 immediately because the default 0 timestamp is always older than the 600s timeout, so the unavailable last credential never actually gets exercised.

github-actions · 2026-06-10T08:54:20Z

Persistent review updated to latest commit 159b88c

qin-ctx

本次 review 发现 3 个 blocking 问题，主要是新 credentials 归一化没有完整保留旧 providers 语义，以及 embedding credential 的 provider/model/auth fallback 在校验、维度、运行时构造之间不一致。

Non-blocking:

PR 已注明 ov.conf 文档/示例待补；这是用户可见新配置格式，建议补 credentials 示例和旧 backup/providers 迁移说明。
当前 commit 中有多条 fix/format，建议合并前 squash 或改成可追踪的 conventional commits.

qin-ctx

本次 review 仍发现 1 个 blocking 问题：credentials 模式跳过了 embedding provider-specific 校验，会放行 litellm 未设置 dimension 等无效配置，导致 schema 维度和运行时 embedder 行为不一致。

Non-blocking:

CI lint 当前失败：ruff format --check 提示 openviking/storage/collection_schemas.py 和 openviking_cli/utils/config/embedding_config.py 需要格式化。
PR 已注明 ov.conf 文档/示例待补；这是用户可见的新配置格式，建议补 credentials 示例和旧 backup/providers 迁移说明。
当前 commit 中有多条 fix/format，建议合并前 squash 或改成可追踪的 conventional commits。

github-project-automation Bot added this to OpenViking project Jun 5, 2026

github-project-automation Bot moved this to Backlog in OpenViking project Jun 5, 2026

fengluodb marked this pull request as draft June 5, 2026 11:11

fengluodb force-pushed the feature/multi-credential-failover branch 2 times, most recently from 64ad373 to 44449b7 Compare June 9, 2026 11:21

fengluodb added 15 commits June 10, 2026 15:22

feat: implement multi-credential priority call design

624ab99

Add OrderedCredentialSwitcher for N-credential failover, MultiCredentialVLM and FailoverEmbedder for credential switching, support automatic migration from legacy backup config

refactor: deduplicate AllCredentialsFailedError and fix logging in sw…

0be2b9e

…itcher

chore: fix code formatting for lint compliance

b815e32

chore: fix remaining code formatting

739677e

fix: update FailoverEmbedder for multimodal API compatibility

f26f7c4

fix

70c5775

fix

fbce294

fix

a218db0

fengluodb force-pushed the feature/multi-credential-failover branch from 743c3c6 to a218db0 Compare June 10, 2026 07:22

fengluodb added 2 commits June 10, 2026 15:41

test: bump _last_switch_time when forcing active_idx in ring tests

dba81eb

Without setting _last_switch_time, maybe_failback() retreats to idx 0 immediately because the default 0 timestamp is always older than the 600s timeout, so the unavailable last credential never actually gets exercised.

format

159b88c

fengluodb force-pushed the feature/multi-credential-failover branch from 8cdead0 to 159b88c Compare June 10, 2026 08:22

fengluodb marked this pull request as ready for review June 10, 2026 08:52

qin-ctx requested changes Jun 10, 2026

View reviewed changes

Comment thread openviking_cli/utils/config/vlm_config.py Outdated

Comment thread openviking_cli/utils/config/embedding_config.py Outdated

Comment thread openviking_cli/utils/config/embedding_config.py

fix

f0e3aa9

qin-ctx requested changes Jun 11, 2026

View reviewed changes

Comment thread openviking_cli/utils/config/embedding_config.py

fengluodb added 2 commits June 11, 2026 20:28

fix: add dimension valid

10cebbf

format

57c1299

fengluodb requested a review from qin-ctx June 12, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement multi-credential priority call design#2468

feat: implement multi-credential priority call design#2468
fengluodb wants to merge 20 commits into
mainfrom
feature/multi-credential-failover

fengluodb commented Jun 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

qin-ctx left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qin-ctx left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fengluodb commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

Changes Made

Testing

Checklist

Additional Notes

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 159b88c)

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

qin-ctx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qin-ctx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fengluodb commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 5, 2026 •

edited

Loading

(Review updated until commit `159b88c`)

github-actions Bot commented Jun 5, 2026 •

edited

Loading