Skip to content

feat: implement multi-credential priority call design#2468

Open
fengluodb wants to merge 20 commits into
mainfrom
feature/multi-credential-failover
Open

feat: implement multi-credential priority call design#2468
fengluodb wants to merge 20 commits into
mainfrom
feature/multi-credential-failover

Conversation

@fengluodb

@fengluodb fengluodb commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Description

为 OpenViking 客户端(ov.conf)增加多模型凭证 + 按序自动 fallback能力:同一个 VLM / Embedding 模型可配置一个扁平有序的凭证列表(数组顺序即调用优先级,每条带稳定 id),调用失败时按错误类型自动 fallback
到下一条凭证,并在高优先级凭证冷却后自动 failback。

Related Issue

Type of Change

[✓] New feature (non-breaking change that adds functionality)
[✓] Bug fix (non-breaking change that fixes an issue)
[✓] Refactoring (no functional changes)

Changes Made

核心特性

• 新增 OrderedCredentialSwitcher(N 级有序凭证切换状态机,含逐级 failback),MultiCredentialVLM 与 FailoverEmbedder 包装器;配置层 VLMConfig.credentials / EmbeddingModelConfig.credentials,旧 backup
字段自动归一化兼容。
• 鉴权方式从旧顶层配置切换为 credentials 时,模型/维度未变可无损复用已有向量(allow_metadata_override,默认 False,dimension 变化仍强制 rebuild)。

错误分类与切换策略

• 拆分错误类:PERMANENT(400 参数错误) / 新增 AUTH(401/403/欠费) / 新增 CONTENT_SAFETY / INPUT_TOO_LARGE / QUOTA_EXCEEDED / TRANSIENT / UNKNOWN。
• 切换策略:请求级错误(400 / 输入过大 / 内容安全)fail-fast 并 re-raise 原始异常;凭证级 AUTH 与 QUOTA_EXCEEDED 前移;TRANSIENT 用尽底层重试后前移;UNKNOWN 保守前移。
• circuit breaker 对 AUTH 同样立即熔断。

评审整改

• get_active_index() 改为纯读、无副作用;新增 maybe_failback() 专司 failback(切换时打 info 日志)。
• 移除全局 total_max_retries,凭证耗尽仅由 idx >= n 判定(修复 >10 凭证提前失败)。
• EmbeddingCredential 移除 6 个模型行为字段(修复 or 合并吞掉 0/False)。
• FailoverEmbedder.get_dimension 抛错替代魔法值 2048。
• token usage 改用公开访问器;修复 embedder 共享单例 tracker 导致的重复累加。

Testing

[✓] I have added tests that prove my fix is effective or that my feature works
[✓] New and existing unit tests pass locally with my changes
[✓] I have tested this on the following platforms:
[✓] macOS

测试:test_ordered_credential_switcher / test_vlm_failover / test_failover_embedder_permanent_advance / test_embedding_credential_model_override / test_model_retry / test_circuit_breaker,共 127 passed。

Checklist

[✓] My code follows the project's coding style
[✓] I have performed a self-review of my code
[✓] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation(ov.conf 文档/示例待补)
[✓] My changes generate no new warnings
[ ] Any dependent changes have been merged and published

Additional Notes

• 设计前提:同一模型的所有凭证指向同一逻辑模型(endpoint 可不同)。因此请求级错误换凭证无意义 → fail-fast;凭证级错误(key 失效/欠费)才前移。
• 每条凭证的瞬时重试由其底层实例的 max_retries 负责,无全局重试上限。
• 向后兼容:旧 backup、旧顶层鉴权配置均自动归一化为 credentials,存量配置零改动。

@fengluodb fengluodb marked this pull request as draft June 5, 2026 11:11
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

PR Reviewer Guide 🔍

(Review updated until commit 159b88c)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🏅 Score: 85
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Add multi-credential support for embedders

Relevant files:

  • openviking_cli/utils/config/embedding_config.py
  • openviking/models/embedder/base.py
  • tests/unit/test_failover_embedder_permanent_advance.py
  • tests/unit/test_embedding_credential_model_override.py

Sub-PR theme: Add multi-credential support for VLMs

Relevant files:

  • openviking_cli/utils/config/vlm_config.py
  • openviking/models/vlm/base.py
  • openviking/models/vlm/init.py
  • tests/unit/test_vlm_failover.py

Sub-PR theme: Implement OrderedCredentialSwitcher and error classification updates

Relevant files:

  • openviking/utils/model_retry.py
  • tests/unit/test_ordered_credential_switcher.py

⚡ Recommended focus areas for review

Potential regression in legacy backup config handling

When migrating legacy backup config to credentials, the code clears self.backup before checking if there are explicit credentials. This could lead to loss of backup config if explicit credentials are added but migration logic runs first. Verify that the migration order preserves explicit credentials.

def _normalize_credentials(self):
    """Normalize credentials configuration:
    1. Migrate legacy backup config to credentials
    2. Migrate top-level credentials to credentials list
    3. Assign default IDs to credentials without IDs
    4. Apply top-level defaults to credentials
    """
    migrated_credentials = []

    # Step 1: Migrate legacy backup config
    if self.backup is not None and self.backup._has_any_config():
        # Primary credential from top-level
        primary_cred = VLMCredential(
            id="legacy-primary",
            provider=self.provider,
            model=self.model,
            api_key=self.api_key,
            api_base=self.api_base,
            api_version=self.api_version,
            forward_api_key=self.forward_api_key,
            extra_headers=self.extra_headers,
            extra_request_body=self.extra_request_body,
            stream=self.stream,
        )
        migrated_credentials.append(primary_cred)

        # Backup credential
        backup_cred = VLMCredential(
            id="legacy-backup",
            provider=self.backup.provider,
            model=self.backup.model,
            api_key=self.backup.api_key,
            api_base=self.backup.api_base,
            api_version=self.backup.api_version,
            forward_api_key=self.backup.forward_api_key,
            extra_headers=self.backup.extra_headers,
            extra_request_body=self.backup.extra_request_body,
            stream=self.backup.stream,
        )
        migrated_credentials.append(backup_cred)

        # Clear backup to avoid double processing
        self.backup = None

    # Step 2: If no credentials from backup migration and no explicit credentials,
    # create credentials from top-level config
    if not migrated_credentials and not self.credentials:
        if self._has_legacy_provider_config():
            migrated_credentials.append(
                VLMCredential(
                    id="default",
                    provider=self.provider,
                    model=self.model,
                    api_key=self.api_key,
                    api_base=self.api_base,
                    api_version=self.api_version,
                    forward_api_key=self.forward_api_key,
                    extra_headers=self.extra_headers,
                    extra_request_body=self.extra_request_body,
                    stream=self.stream,
                )
            )

    # Step 3: Merge migrated credentials with existing credentials
    if migrated_credentials:
        # Only use migrated if no explicit credentials exist
        if not self.credentials:
            self.credentials = migrated_credentials

    # Step 4: Assign default IDs and apply top-level defaults
Inconsistent token usage handling between FailoverEmbedder and MultiCredentialVLM

FailoverEmbedder returns only the first embedder's token usage (claiming shared singleton), while MultiCredentialVLM merges all instances' trackers. Verify that embedder token trackers are actually shared to avoid undercounting.

def get_token_usage(self) -> Dict[str, Any]:
    """Get combined token usage across credential instances.

    Embedders share a process-wide singleton token tracker (see
    ``_get_token_tracker``), so every wrapped embedder reports the same
    usage. Return a single instance's usage directly; merging would
    double-count.
    """
    if not self._embedders:
        return {}
    return self._embedders[0].get_token_usage()

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

PR Code Suggestions ✨

Latest suggestions up to 159b88c
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix logger usage in MultiCredentialVLM to use get_logger

The MultiCredentialVLM class uses logging.getLogger(name) directly, which is
inconsistent with the rest of the codebase that uses get_logger from
openviking_cli.utils. Additionally, the logging module is not imported, which would
cause a runtime error. Let's fix this by using get_logger instead.

openviking/models/vlm/base.py [12-622]

 from openviking.utils.model_retry import (
     OrderedCredentialSwitcher,
     PrimaryBackupSwitcher,
     classify_api_error,
 )
 from openviking_cli.utils import get_logger
 ...
-    self._logger = logging.getLogger(__name__)
+    self._logger = get_logger(__name__)
Suggestion importance[1-10]: 6

__

Why: Identifies a missing import for logging and inconsistent logger usage. The fix uses the already imported get_logger from openviking_cli.utils, aligning with the rest of the codebase and resolving a potential runtime error.

Low

Previous suggestions

Suggestions up to commit 2cefa7a
CategorySuggestion                                                                                                                                    Impact
General
Add logger and centralize AllCredentialsFailedError

1) Add loguru logger initialization to this file (used by
OrderedCredentialSwitcher). 2) Move AllCredentialsFailedError here to avoid
duplicate definitions in embedder and vlm base files.

openviking/utils/model_retry.py [340]

-class OrderedCredentialSwitcher:
+import threading\nimport time\nfrom typing import Any, List, Optional\n\nfrom openviking_cli.utils import get_logger\n\nlogger = get_logger(__name__)\n\n\nclass AllCredentialsFailedError(Exception):\n    """Raised when all credentials in the chain have failed."""\n\n    def __init__(self, errors: list[tuple[str, str, Exception, int]]):\n        """Initialize the error with a list of credential failures.\n\n        Args:\n            errors: List of tuples containing (credential_id, error_class, exception, attempts)\n        """\n        self.errors = errors\n        message = "All credentials failed:\\n" + "\\n".join(\n            f"  - {cred_id}: {error_class} - {exc}" for cred_id, error_class, exc, attempts in errors\n        )\n        super().__init__(message)\n\n\nclass OrderedCredentialSwitcher:
Suggestion importance[1-10]: 6

__

Why: Centralizes AllCredentialsFailedError to avoid duplicate definitions in embedder and VLM base files, and adds proper logger initialization, improving code maintainability and consistency.

Low
Use loguru logger for consistent logging

Replace standard library logging with loguru logger (used throughout the codebase)
for consistent structured logging.

openviking/models/vlm/base.py [638]

-        self._logger = logging.getLogger(__name__)
+        from openviking_cli.utils import get_logger\n        self._logger = get_logger(__name__)
Suggestion importance[1-10]: 5

__

Why: Replaces standard library logging with get_logger (used elsewhere in the codebase) for consistent logging, a minor but useful maintainability improvement.

Low
Import centralized exception class

Remove duplicate AllCredentialsFailedError and import it from
openviking.utils.model_retry instead.

openviking/models/embedder/base.py [649-662]

-class AllCredentialsFailedError(Exception):\n    """Raised when all credentials in the chain have failed."""\n\n    def __init__(self, errors: list[tuple[str, str, Exception, int]]):\n        """Initialize the error with a list of credential failures.\n\n        Args:\n            errors: List of tuples containing (credential_id, error_class, exception, attempts)\n        """\n        self.errors = errors\n        message = "All credentials failed:\\n" + "\\n".join(\n            f"  - {cred_id}: {error_class} - {exc}" for cred_id, error_class, exc, attempts in errors\n        )\n        super().__init__(message)
+from openviking.utils.model_retry import AllCredentialsFailedError
Suggestion importance[1-10]: 5

__

Why: Removes duplicate exception definition and supports importing from a centralized location, improving code consistency, though corresponding changes in other files (e.g., vlm/base.py, import files) would also be needed for full implementation.

Low

@fengluodb fengluodb force-pushed the feature/multi-credential-failover branch 2 times, most recently from 64ad373 to 44449b7 Compare June 9, 2026 11:21
fengluodb added 15 commits June 10, 2026 15:22
Add OrderedCredentialSwitcher for N-credential failover, MultiCredentialVLM and FailoverEmbedder for credential switching, support automatic migration from legacy backup config
Separate the monolithic PERMANENT error class so the switcher reacts to
the actual root cause:

- 400 (request-level parameter error) -> PERMANENT, fail-fast: same
  request fails on every credential of the same model.
- 401/403/unauthorized/accountoverdue -> new AUTH class: credential-level,
  advances to the next credential in multi-credential mode.
- new CONTENT_SAFETY class (moderation rejections) and INPUT_TOO_LARGE ->
  fail-fast: switching credentials cannot help.

classify_api_error now checks CONTENT_SAFETY before PERMANENT so a
moderation message containing "400" is not misclassified. On fail-fast the
failover wrappers re-raise the original exception (preserving type/info)
instead of wrapping it, so callers can react (e.g. truncate on
input_too_large). AllCredentialsFailedError is reserved for chain
exhaustion. The legacy PrimaryBackupSwitcher also switches on AUTH to keep
existing backup behavior.
get_active_index() previously mutated state: when a failback threshold was
met it would decrement the active index. Because observability properties
(active_credential_index / active_credential_id) call it, merely reading the
current credential for logging or metrics could accidentally advance the
failback state machine.

Split the concern: get_active_index() is now a pure read, and a new public
maybe_failback() performs the one-step failback (and logs an info line when
the active credential index changes). The request loops in MultiCredentialVLM
and FailoverEmbedder call maybe_failback() at the top of each attempt, so the
failback behavior is unchanged while pure reads no longer have side effects.
The failover loops exited on `idx >= n OR total_attempts >= total_max_retries`
(default 10). With more than 10 credentials, or when failback churn inflated
the attempt count, this could raise AllCredentialsFailedError before every
credential had actually been tried, leaving lower-priority credentials unused.

Remove total_max_retries entirely from MultiCredentialVLM and FailoverEmbedder:
credential exhaustion is now decided solely by reaching the end of the chain
(idx >= n), and per-credential retries remain the responsibility of each
underlying instance via its own max_retries. The aggregated error tuple now
records the failing credential index instead of the attempt counter.

Adds a regression test covering more than 10 credentials all being tried.
encoding_format, model_path, cache_dir, enable_fusion, res_level and
max_video_frames describe how a model runs, not which credential is used.
All credentials of a single embedding model share the same model, so these
belong on the parent EmbeddingModelConfig, not on each credential.

Keeping them on the credential forced a `cred.X or config.X` merge in
_create_failover_embedder, which silently dropped explicit falsy values
(enable_fusion=False, res_level=0, max_video_frames=0) and fell back to the
parent value.

Remove these six fields from EmbeddingCredential and read them directly from
the parent config when building per-credential embedders. id/provider/model/
api_key/api_base/api_version/ak/sk/region/host/extra_headers remain
credential-level.
FailoverEmbedder.get_dimension() returned a hardcoded 2048 when the first
embedder had no get_dimension(). That path is reached only when wrapping
sparse embedders, which have no fixed dense dimension; returning a fabricated
2048 silently feeds a wrong dimension to callers (e.g. schema creation).

Delegate to the first embedder and raise AttributeError when it has no
get_dimension(), surfacing the misuse instead of hiding it.
Two issues in the cross-instance token usage merge:

1. Encapsulation: FailoverVLM / MultiCredentialVLM / FailoverEmbedder reached
   into other instances' private _token_tracker. Add a public token_tracker
   accessor on VLMBase and use it in the VLM mergers.

2. Double counting in FailoverEmbedder: embedders share a process-wide
   singleton token tracker (_get_token_tracker), so all wrapped embedders point
   at the same object. Merging N identical trackers inflated usage N-fold.
   Return a single instance's usage directly instead of merging.
Splitting 401/403/unauthorized/accountoverdue out of PERMANENT into the new
AUTH class (commit b477b9f) regressed the circuit breaker: it only tripped
immediately on PERMANENT/QUOTA_EXCEEDED, so auth errors no longer opened the
breaker right away.

For a single embedding instance an auth failure (key invalid / no permission /
overdue) is persistent and retrying is pointless, so the breaker should still
trip immediately. Add ERROR_CLASS_AUTH to the immediate-trip set and update the
classification tests to assert the new AUTH class (403 still trips the breaker).
@fengluodb fengluodb force-pushed the feature/multi-credential-failover branch from 743c3c6 to a218db0 Compare June 10, 2026 07:22
Without setting _last_switch_time, maybe_failback() retreats to idx 0
immediately because the default 0 timestamp is always older than the
600s timeout, so the unavailable last credential never actually gets
exercised.
@fengluodb fengluodb force-pushed the feature/multi-credential-failover branch from 8cdead0 to 159b88c Compare June 10, 2026 08:22
@fengluodb fengluodb marked this pull request as ready for review June 10, 2026 08:52
@github-actions

Copy link
Copy Markdown

Persistent review updated to latest commit 159b88c

@qin-ctx qin-ctx left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本次 review 发现 3 个 blocking 问题,主要是新 credentials 归一化没有完整保留旧 providers 语义,以及 embedding credential 的 provider/model/auth fallback 在校验、维度、运行时构造之间不一致。

Non-blocking:

  • PR 已注明 ov.conf 文档/示例待补;这是用户可见新配置格式,建议补 credentials 示例和旧 backup/providers 迁移说明。
  • 当前 commit 中有多条 fix/format,建议合并前 squash 或改成可追踪的 conventional commits.

Comment thread openviking_cli/utils/config/vlm_config.py Outdated
Comment thread openviking_cli/utils/config/embedding_config.py Outdated
Comment thread openviking_cli/utils/config/embedding_config.py

@qin-ctx qin-ctx left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本次 review 仍发现 1 个 blocking 问题:credentials 模式跳过了 embedding provider-specific 校验,会放行 litellm 未设置 dimension 等无效配置,导致 schema 维度和运行时 embedder 行为不一致。

Non-blocking:

  • CI lint 当前失败:ruff format --check 提示 openviking/storage/collection_schemas.py 和 openviking_cli/utils/config/embedding_config.py 需要格式化。
  • PR 已注明 ov.conf 文档/示例待补;这是用户可见的新配置格式,建议补 credentials 示例和旧 backup/providers 迁移说明。
  • 当前 commit 中有多条 fix/format,建议合并前 squash 或改成可追踪的 conventional commits。

Comment thread openviking_cli/utils/config/embedding_config.py
@fengluodb fengluodb requested a review from qin-ctx June 12, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants